Hypermedia and forms | Glenn Block
Updated (with a lot of new content)
One challenge when building REST based systems is how can the client determine what it can do next? There can be any number of clients each which need to interact with a system. How do they know HOW to interact? The WSDL approach is to offer a static snapshot of method calls on an API. That approach couples the client heavily to knowing everything about the server, including being coupled as to how things get processed. It inhibits evolvability. A change to the server usually results in breaking all clients. In a world where there are many 3rd party apps across devices consuming your server application this can be detrimental making it extremely difficult to move forward.
Hypermedia (also referred to as Hypertext) is an answer. The word may sound scary but it basically means use links. When Tim Berners Lee, Roy Fielding, and others (Al Gore ) were envisioning the Web, linking was a key component of that design. We’re used to seeing links in UI contexts like a browser, i.e. using a web based ordering system you can click on an “Add Item” link to add the item. But what about in a non-UI context? Well guess what, you can use links there as well.
With a hypermedia approach your server doesn’t only return data. It returns data + links. Those links provide a means for the client to discover the available sets of options that make sense based on where the client is at in the application. A link has 2 standard components, a url for the link and a rel. REL in this case stands for relation and describes to the client how the link relates to the current resource. For example, let’s take a catalog / shopping cart experience exposed in a RESTful way. When browsing the catalog, you get back a list of items. Each item can have links for adding the item to a cart. Below is one such item.
Above the server returned a link for adding an item. Unlike an operation in a WSDL, that link it not static, and hardcoded as part of the contract. The server is offering the links that make sense at specific point in time based on the application and resource state. Assuming for example that “An Item” is out of stock, the client will not get a link for adding and item. Instead it may get a link to place the item on backorder. The client however doesn’t have any knowledge of the rules on the server. It doesn’t know that because the item is out of stock, it can’t add it. All it knows is that rel=”rc:additem” is not present. The server logic might be more sophisticated, maybe there are limits set on how much I can order of a specific item. From the client perspective though all it knows is the link is not present. That’s decoupling.
What about evolvability? The beauty of linking is that clients only care about the links they know, they don’t care about the links they don’t. That means as the server evolves it can offer additional links. For example we may decide to offer clients the ability to put an item in a wish-list rather than in the shopping cart. That means an additional link is returned such as the following:
<link rel="rc:addtowishlist" url="/wishlist/gblock"/>
Older clients won’t know about the wish list. They will however continue to function. Newer clients will easily consume the new functionality. If you imagine a world with many different clients consuming (which is real), that is huge! There’s a further benefit for the older clients. The server has advertised the fact that there are new capabilities. Clients can be designed to track links they don’t know such that they can log that there are new capabilities available.
An additional benefit of the link approach has to do with the urls themselves. Clients are agnostic to to building urls, thus the server can completely change the format of the url, or even where the urls point to without the client being impacted _at_all.
So now we have links. But there’s one problem, how does the client know what to do them? Which HTTP method should be used, is it an HTTP GET, POST, etc? If it is a POST what type of content should be sent and in what form? Another challenge is around versioning, once a service has been updated, how do the clients know know how to upgrade that client to take advantage of the new functionality.
One solution is simple, read the spec. When you build hypermedia based systems you aren’t simply returning a media type (Content-Type header value) of “application/xml” or “application/json”. Rather you are returning something more specific such as “application/atom+xml” or something even more domain specific such as in tje new “REST in Practice book” where the system uses the media type of “application/vnd-restbucks+xml”. Each media type has an RFC/spec associated with it. Thus if a client author reads the spec, it should document what to expect from a links perspective and which method / media type to use.
That is one approach and it absolutely works. The downside of that approach is the client has to be encoded with specific knowledge of how to work with each link.
That is not the only approach however and one really interesting one is Forms.
If we look to HTML we can see some hints on how to address these issues. In a web page, if you click a link such as “Add Item”, the common paradigm is for the server to return an HTML form (the client may also display one through some client script). That form provides information to the human who is reading the site to help them move forward. The Form contains an action that points to uri, and method, as well as other fields which may be either prefilled by the server, or require input from the user. The model works basically in a seem-less fashion. The site can freely evolve as it can offer different forms to different users based on a number of factors including what “version” of the app they are using, their user profile etc.
Now imagine there is no human. There’s a machine client that is consuming some resources over HTTP. That machine may be jQuery code running in a browser, maybe be code executing in a rich client app such as WPF, Silverlight, Flash, or even a Java applet. Or it may be headless server code that is accessing that information. Ultimately the information will be likely surfaced to some human, but that is after some initial processing has taken place. For example there might be an agent that automatically processors orders using a 3rd party fulfillment service.
We can design electronic forms which our servers offer to clients to guide them on how to use the link to transition to the next state.
With this approach, each link is a GET that returns an electronic form. That form specifies url, method as well as pre-filled information from the server such as Item details. The form may also specify required fields such as a Quantity field.
The advantage of forms is they allow the client to be less coupled to knowledge of how to work with links, which yields greater evolvability.
For example for adding item ‘1’ to a shopping cart, the server may return the client a link such as the following.
<link rel="rc:AddItem" url="/restcart/forms/additem/1" />
The client then automatically does a GET on the link and retrieves an AddItem form that looks like the following.
This form offers the client everything it needs to move forward. It does not have to build up the item information to POST as it has already been sent by the server. The server also specifies the media type for the client to use. This form additionally specifies that the client must provide a quantity.
To move forward, the client will POST to the specified url using the included item as the body. As required fields are specified, the client must supply those as well.
The server can also embed a token in the response in order to ensure the price of the item is valid for a period of time (say 2 hours). If there is no such guarantee or the time has expired, the server can return a status code of 409 (Conflict) to the client indicating that it needs to refresh the form.
*Note: There are alternative approaches to exposing the cart as a resource, for example the CART state could be maintained on the client and sent with the POST. In either case it would not rely on session state.
Hypermedia allows clients and servers to independently evolve. In this approach the server offers clients well known links at every stage of the client/server interaction in order to to guide the client as to what they can do next.
There are various approaches to implementing a hypermedia based system, including using electronic forms.
You mileage may vary as to whether you need this or not.
Interested in your thoughts.