July 17, 2017

REST: The Good Parts

What is REST? That is one of my favorite questions when conducting a technical interview. You can learn so much about one's practical experience with web services and applications by simply chatting casually about REST. I've heard so many definitions over the years. Even people who have built many web services do not always agree on what is and isn't REST.

Where you pull the line when complying to REST? What best practices do we all follow when designing our APIs? What's practical and what's not worth the effort?

What is REST?

Is it a protocol?
Is it a standard?
Is it a technology?
Is it a pattern?
Is it a constraint?
Is it just an idea?

Representational state transfer

REST is an architectural pattern we follow to build RESTful web services and applications. It defines a set of constraints we follow to be consistent when designing and building our backends. If you consider constaints as limitations then you should think of REST as a set of design principles for making network communication more scalable and flexible.

We talk about REST mostly when we are building web APIs. But REST is bigger than that. When building a web application we still follow the same architectural styles that guide us when building an API. We follow REST to craft loosely coupled systems. Is your web service RESTful? According to Roy Fielding: No! Should you nonetheless put yourself in RESTful constraints? Yes, definitely!

RESTful constraints

I won't focus on every little detail here but instead will try to cover the most basic hard rules that I follow in all cases no matter what. I will focus on what is practical and makes sense. I will try to explain where I pull the line when complying to REST principles, which RESTful constraints I readily embrace and which cause me more pain than gain. I am intentionally going to skip Caching, Layered System and Code on Demand, otherwise this blog post could become even longer than Roy Fielding original thesis. Let's focus on the first three: client-server, stateless and uniform interface.

Client-server

The most obvious constraint of them all. You have a client and you have a server. The client sends a request to the server. The server processes that request and responds. A standard request-response cycle. The goal is separation of concerns - clear boundaries between a client and a server. The client and the server can evolve independently of one another, be implemented using any technology and live on any platform. For web applications the browser is the client.

Stateless

RESTful APIs are stateless. The client and the server do not keep knowledge of each other in between requests. Once the server responds, it forgets everything about the client who made the request. The server usually has a persistent storage, a database, where it saves data about the resources it manages. But each request is processed independently from the previous one. That means that each time the client has to send everything the server needs to process the client request.

A request is self-descriptive and has enough context for the server to process that message. Stateless may seem as an overhead at first compared to stateful, but it is the most fundamental concept used in building high-availability systems where you need to distribute the load among many application servers. This RESTful constraint makes it possible to scale the server without breaking clients.

If you have like ten application servers behind a load balancer, the first request from a client could go to one server and the second request from the same client could go to a completely different server and still be processed correctly. That allows your application to easily scale out horizontally when facing heavy load.

Uniform interface

Everything in REST is represented by a resource. That is fundamental to REST. This constraint simplifies and decouples RESTful architectures the most. There are a lot of aspects to touch here but instead I just focus on the key points.

Identification of resources

A resource should exist at a uniform consistent URL. For example, if you want to get all orders made by a user you hit an endpoint such as:

GET /users/1/orders

Having nested resources with ID in between is not a RESTful constraint as defined by Roy Fielding, but it is a good practice that most people follow to achieve consistent and understandable designs of their web services and applications. In comparison with remote procedure calls (RPC) where you have URLs modelled around methods, you will then have an endpoint like:

GET /get_orders_by_user_id

You don't identify a resource but an operation. You are coupling the resource with its operations, which may trick you into exposing endpoints that do too much. Not following established conventions makes your URIs ambiguous and inconsistent. Building your endpoints around operations instead of resources can trick developers into inventing all sorts of unimaginable URLs by going really "creative" with the naming. Is it "add-password", "create-password", or "set-password"?

Manipulation of resources through representations

Nouns and verbs describe resources and actions. Resources are your domain objects. Actions are the things you can do with them. That is the basics of REST – we manage resources with HTTP verbs. For example, we use a GET to fetch a representation of a resource, POST to create, PUT to update, PATCH to partially update, and DELETE to destroy.

We use the verbs that come from the HTTP protocol to manage our nouns. A simple rule of thumb is to avoid verbs in your endpoints. We aim for consistency across our applications and services. Endpoints should follow well-established patterns that everyone understands. Avoid RPC-like custom URIs such as /users/1/update_password or you'll end up with custom actions and complex controllers.

Self-descriptive messages

Each message contains enough information to describe how to process the message. I won't go into much detail here. Enough is to mention that everything needed for the server to process a request should come with the request payload. The server should not need to ping back the client and ask for any further information in order to be able to complete the job and send a response.

Hypermedia

Sometimes written as HATEOS (Hypermedia As The Engine Of Application State). The big idea is to provide additional URLs for clients to get more details about a resource. Clients shouldn't be building their URLs (or constructing identifiers of resources). The REST API should provide those for them. The clients should follow the links included in the resource representation.

The whole Hypermedia concept has caused more pain than gain for me. It is much more practical to give a client API documentation that developers can understand and hardcode those URIs for different actions inside the client, rather than the client trying to follow some service discovery and understand which operations are available for a given resource on the fly.

Ephemeral resources

"But we need a database model." – I hear that a lot. The fact that you do not have a resource directly mapped to a database model is no excuse to start inventing custom longhand endpoints. A resource is a domain object representation, not a database table representation.

Despite what many web frameworks try to enforce, there is no 1-to-1 correlation between database schema and resources. Controllers don't need to map to models. Your controller can map to a single column of your table or a plain old Java/Python/Ruby object that hasn't persisted anywhere. You don't need to create a database record somewhere to use POST.

That realization was a light bulb moment for me. I primarily use Rails as a web framework, and internalizing that concept helped improve my Rails code a lot. It helped me design smaller domain objects and not expose the specifics of my database schema at the controller layer.

Let's look at a simple example – subscribing a user to a topic. The headfirst approach could be:

POST /subscribe
POST /unsubscribe

These endpoints are not RESTful as they designate operations rather than resources. They use verbs to specify actions rather than relying on the HTTP verbs. The REST way to achieve the same would be:

POST /topics/1/subscriptions
DELETE /subscriptions/1

The first call creates a subscription, i.e. subscribes the current user to a topic. The second deletes a subscription, i.e. unsubscribes the user from a topic. We do not really do any CRUD operations on a persisted database model here. Still that's no excuse for not being RESTful.

A resource may be ephemeral, like approving a form or authenticating to a server. Whether permanent or ephemeral a resource is a noun, a high-level description of the thing you are trying to manage when you submit a request. We can still use POST when we are not creating a record in a table. We can still use DELETE when we are not deleting a record in a table. That's fine.

POST /login     <>   POST /sessions
POST /logout    <>   DELETE /sessions/1
POST /follow    <>   POST /followings
POST /unfollow  <>   DELETE /followings/1

Of course if you need fancy URLs to show to your users going with login and logout is perfectly fine as long as they are simply aliases to RESTful routes underneath.

Why REST?

If you work on a single small app (or an API) with a few clients under your control, then it doesn't matter what principles you follow. One way or another, you'll produce something that does the job at the end of the day. You don't need to waste your time on academic bikeshedding. What matters is consistency and convention.

REST imposes a particular style, order, and logic on your services. I don't follow REST because Roy Fielding says that I should. I follow REST because it makes my code better and my job easier. My code is easier to read, maintain and extend. My services are easier to scale once the load grows or your small application becomes a giant distributed system.

Consistent discoverable easy to understand code

When discussing why we should follow an architectural style, my position is always the same - if you work alone on a small application you may get away with any approach, but when you have teams working together at large scale, then unless you all put yourselves in the same set of constraints, chaos will reign and your landscape will be a mess.

Imagine you have 4-5 teams and everyone does things differently. How would you even agree on the API contracts among each other leaving alone the clients? I really enjoy the predictability of API endpoints when only the standard seven actions are used. REST brings in convenience and best practices out of the box. It makes a developer's life easier.

RESTful API can easily grow

Since everything is stateless, if you need to scale out horizontally you may add up new application servers behind your load balancer and they immediately will start processing new client requests. The clients remain completely ignorant of that process. You may scale out when facing heavy load and then scale down to reduce infrastructure costs when you have less load.

Simple continuous deployments

Stateless services simplifies deployments of highly available services by a factor. You may tear down any application server behind your load balancer and the others will still continue to process client requests. In contrast with stateful services where the server and the client share a persistent connection, and if the server needs to go down then it breaks the connection with the client and the state is lost.

Simple functional workflows

The client-server model together with the stateless communication allows for very simple functional workflows. The client sends a request, the server process it and responds and we are done. No state is being managed on either side. If you are familiar with the concepts around functional programming, a pure function is very similar to a RESTful interaction between the client and the server. You receive some input, you do some work and return some output. No state preserved, no side effects.

Alternatives to REST

REST was the status quo for a long time and at the time of writing this blog post, it still is one of the most popular choices to connect client and server applications. However, in the constantly evolving world of modern web development, new challenges arise every day that are better solved by new tools. Backends nowadays respond to many clients with specific needs.

In more complex systems, one may easily fall down the road of creating a specific REST API for each client need in order to have more control over what data is sent, as RESTful architecture defines data available for each resource, while a mobile application may only need a fraction of it. This problem is called overfetching.

Another use case where REST falls short is when you have to build a complex dashboard presenting numerous resources at once. In that case, a client application has to read multiple resources through multiple network calls to fetch all the information that is needed. If you are in the shoes of Facebook or Netflix, and REST poses more pains than gains for your complex system, then you should look for alternatives to REST.

GraphQL

GraphQL was introduced by Facebook. It is an entirely different approach. Instead of having a separate endpoint for each resouce, it exposes a whole graph of resources behind a single endpoint and you can query any part of that graph. But unless you are in the same constraints as Facebook, you may not want to go down that road as it will bring tons of complexity to your system and it may not be worth paying that price at the end.

I find GraphQL suitable when exposing API to mobile applications as you can fetch all you need with one call. Still if you have a relational database to store your data, it may be a challenge to answer all these fancy GraphQL queries. I've provided a well-rounded pros and cons list for GraphQL in my blog post GraphQL: Pros and Cons. You may check that out if you are trying to choose whether REST or GraphQL is the better choice for your specific project.

Falcor

Falcor was introduced by Netflix and in general it solves the same problem which Facebook are solving with GraphQL. Same as Facebook, Netflix are put in very specific constraints – serving huge volumes of static content that mostly never changes. But if you are building a business application let's say in the financial sector you are in a completely different set of constraints. So before jumping on any new hype you may want to sit down and think what problem you are solving and what is the simplest possible way to solve it.

Om Next

Om Next is a uniform extensible approach to building networked interactive applications. The bigger picture here is pretty much the same as those two guys above which is - REST makes sense for certain things but we are going to try something else for our use cases.

How do we choose?

Should we follow the modern trend and go all-in with GraphQL? Or should we be more conservative and stick to the good old REST? The anwser is always: "It depends". Who are the clients of your API? Are you serving data to a mobile application that needs everything delivered with one call or are you building a payment provider where clients mostly create transactions? Is your use case solved better (or simpler) with many fine-grained endpoints or one coarse-grained endpoint?

Having many small endpoints, each having a single responsibility, obeys the interface segregation principle and the API is easier to understand and maintain, but could present a challenge to a mobile application which have to perform multiple calls to gather what it needs. That mobile app would be better served by one God endpoint that delivers all data in a single call.

Database schema

Having a GraphQL endpoint that is backed by a relational database poses some challenges as well. You should consider those before dumping REST for GraphQL as you may have to pay a significant price for developing and maintaining a GraphQL API. Mapping a graph-based schema to a normalised relational database is not always a straight-forward job. If you do not have a good abstraction between the queries and the underlying data models, they are tightly coupled, that would make changing the database schema a real pain. N+1 problems become tougher to detect and avoid.

Caching

Another aspect to consider is caching. By having one God endpoint you loose caching. It difficult to leverage caching when you have one endpoint and a crazy number of arbitrary requests. So you have to choose between the performance gains from making one single request and those from caching.

Context is king

Always remember that REST and GraphQL are just tools. Which tool is better always depends on the context. Never blindly follow the hype but always carefully consider your context - the problem you are solving, your tooling, your budget, your consumers, your team experience. GraphQL brings a completely different developer experience. GraphQL poses new challenges to everyone who would consume your API.

Resources

Architectural Styles and the Design of Network-based Software Architectures by Roy Fielding
What RESTful actually means by Lauren Long
Richardson Maturity Model by Martin Fowler
GraphQL Deep Dive: The Cost of Flexibility by Samer Buna