May 21, 2012

A Series: Practical API Development

 

When you first set out to build your API, the number of factors to consider may be overwhelming.  Which functions do you need to expose to your clients?  Which should remain hidden?  What if you change your mind in the future?  Can your servers handle all of that additional load?  The list of questions goes on and on, and some of the trickiest issues may not apparent until you are stuck in the middle of them.  Fortunately, there are several existing patterns you can follow to address some of the most common concerns.

In this series, we outline the practical considerations when building an API.

In the previous installment we discussed some of the concerns you need to be aware of when thinking about how to version your API for backwards-compatibility and planning for future changes.  Now let’s take a look at some specific strategies you can use to put these ideas into practice.

Versioning Strategies

There are two distinct strategies for versioning your API and they will have a major impact on how you develop and deploy your API.

1) URI Versioning
This strategy involves using the URI to specify the version of the resource you are using. In practice, this is more likely to represent the version of the API overall and probably not specific to each resource. Twitter's API is a good example of this style of versioning:

http://api.twitter.com/1/statuses/home_timeline.json

In this case, we're using version 1 of the Twitter API to get statuses in JSON format. We can assume that someday Twitter will release a second version of their API and it will be available at http://api.twitter.com/2. Displaying the version prominently in the URL is a reassuring reminder to your developers that apps they build today will continue to work in the future.

Another useful benefit of using the URI is that we can use a load balancer or proxy to handle requests for particular versions of your API differently. Older, deprecated versions of the API might be moved onto constrained hardware while the latest versions are scaled to provide high- availability. There are endless tools and resources out there for managing and manipulating URL paths at the server level (analytics tools, proxies, etc) which means you aren't writing code to do things like traffic analysis and reporting.
REST purists will tell you this strategy is mixing representation and resource and violates the principles of REST, but real software can't always afford to be so dogmatic and in some cases you'll need to trade one rule for another.

2) HTTP Request Headers

This strategy assumes that the location of your resource will never change and that it is better to specify the representation and version of the resource being requested separately from the URI. For example:
GET /statuses/home_timeline HTTP/1.1
Accept: application/vnd.twitter.statuses.home_timeline-v1+json

We're requesting the same information in the prior example, but this time the format and version requirements are supplied in the request headers. On the server side, we'd then need to map this content type and version information to the appropriate representation of the resource.  Over time, the code to handle many versions and representations could lead to codebase that is difficult to change.

In general, I prefer the first strategy for the simple reason that it is easier to maintain a resource shared by one use case than it is to maintain a resource shared by many use cases. I prefer to think of API version as being branches that eventually die off and not features that have to be maintained forever. If we're on v4 of our API and an important v1 client has a change, it is safer and faster to simply update the v1 API and nothing else.

Communicating Changes

When your API does change, You will need to have an easy system of communicating those changes to your customers. It might be a blog or a mailing list, and realtime change notifications might even be a requirement of your API. If you're providing a public, free, "use at your own risk" API, then you can probably just publish changes and let customers read about them if they are motivated. However, if you are providing a paid for, five 9s API that has millions of dollars of transactions flowing through it, you should probably communicate changes in a way that cannot be missed or ignored.

Legacy Support

If your API is going to exist for any amount of time you'll end up with legacy customers who don't want to update their code to the latest and greatest and this will force you to maintain old or deprecated versions of your API or to carefully test feature additions to ensure existing the apps of your customers continue to work.

A Simple Example

To illustrate the points above, let’s look at a simple example. The first API call shows the state of our API before we make any changes.

HTTP GET /v1.0/user/123.xml
This will return an XML document with information about user ID 123, and returns the following data:
<person>
    <id type="integer">183734</id>
    <name>Shane Holland</name> <email>shane@abovelabs.com</email>
    <city>San Francisco</city>
    <state>CA</state>
    <created-at type="datetime">2012-01-31T02:28:48Z</created-at>
</person>

After releasing this version of the API, we decide to switch to a plural naming convention on all endpoints moving /user to /users. We could simply stop supporting this old endpoint, but then our customers would be forced to update their integration with our AP. Consequently, we need to provide mapping on our end to maintain this old endpoint. We’ll need to proxy the old requests to the new requests to provide a seamless experience for our customers.

HTTP GET /v1.0/user/123.xml
HTTP GET /v1.0/users/123.xml

Even later, we decide that <name> is ambiguous. Since it is bad form to mix formatting with data, we decide to add <first-name> and <last-name> and leave it to our clients to decide how they want this information displayed. It is also bad form to have duplicate representations of data, but we can’t safely do anything about this now. Our response is then:
<person>
    <id type="integer">183734</id>
    <name>Shane Holland</name>
    <first-name>Shane</first-name>
    <last-name>Holland</last-name>
    <email>shane@abovelabs.com</email>
    <address>598 Bosworth Street, Suite 4, San Francisco, CA 94131</address>
     <created-at type="datetime">2012-01-31T02:28:48Z</created-at>
</person>

Clients who already make use of the <name> field can decide whether they want to switch to using the new first and last name fields instead, but their existing code won’t break by continuing to use the name field.

Presumably these simple changes are happening throughout your API and you could see how many small changes over time would at some point get messy:
<person>
    <id type="integer">183734</id>
    <name>Shane Holland</name>
    <first-name>Shane</first-name>
    <last-name>Holland</last-name>
    <email>shane@abovelabs.com</email>
    <address>598 Bosworth Street, Suite 4, San Francisco, CA 94131</address>  
    <address1>598 Bosworth Street</address1>
    <address2>Suite 4</address2>
    <city>San Francisco</city>
    <statecode>CA</statecode>
    <postalcode>94131</postalcode>
    <created-at type="datetime">2012-01-31T02:28:48Z</created-at>
</person>
Eventually, we decide to give ourselves a way to deprecate this mess and release a new version of our API:
GET /v2.0/users/123.xml

We no longer have the legacy data fields or data with formatting, so we can drop support for the singular “user” endpoint. Additionally, our response has become cleaner:
<person>
    <id type="integer">183734</id>
    <first-name>Shane</first-name> <last-name>Holland</last-name>  
    <email>shane@abovelabs.com</email>
    <created-at type="datetime">2012-01-31T02:28:48Z</created-at>
    <address>
        <number>598</number> <street>Bosworth Street</street>    
        <suite>Suite4</suite>
        <city>San Francisco</city>
        <statecode>CA</statecode> <postalcode>94131</postalcode>
    </address>
</person>

In future posts, we’ll further expand upon this example as we continue to explore best practices for practical API development.

Shane Holland is a Managing Partner at Above Labs where he and his team help startups and enterprise to develop great products and platforms.