December 21, 2021

GraphQL: Pros and Cons

Following the "great success" of my blog post REST: The Good Parts – it had over four hits last month – I decided to sit down and write an article about its most popular alternative – GraphQL. I ain't going for yet another tutorial on GraphQL. I am going to make you think before you jump. I want you to take a walk with me and see through my joys and miseries leveraging that new technology to create client and server applications. Why should we care about GraphQL? What problem does it solve? When to use it? When not to?

I won't bias you toward or against GraphQL. My goal is to provide a well-rounded pros and cons list, stay objective, and focus on things that I consider important, things that I love, things that help me build better software, things that have caused me pain. We'll go through vivid examples and see use cases where GraphQL is a perfect fit and use cases where REST is the pragmatic choice. I want to help you make an informed decision of whether you need GraphQL or REST will do just fine.

What is GraphQL?

GraphQL is an open-source query language created by Facebook. It specifies how you can exchange information between a client and a server. A GraphQL client can request specific data – only what it needs – thus the client is in control. A client can execute a query (read), a mutation (write), or a subscription (continuous read).

A single graph represents all available application data (or resources). A query that hits the server is interpreted against the entire GraphQL schema. The client receives only the pieces of the graph defined by the query. The network layer is usually HTTP and the payload format is typically JSON. But GraphQL is not opinionated about these transport-level details, your application stack, or your system architecture. It is simply a query language.

GraphQL Advantages

If the only tool you have is a hammer, you tend to see every problem as a nail – Abraham Maslow.

For the past decade REST was the most popular tool for building APIs. REST was our hammer. Every client/server communication looked like a nail. But developers have struggled to create sensible backends that fit right to the motley nature of modern mobile and web frontends. The big tech companies felt the pain first and looked for new tools to help them cope with the new challenges.

Thus, GraphQL was born. It was showcased by Facebook as part of the JavaScript ecosystem interacting with React frontends. But GraphQL is a programming language-agnostic interface between any two parties. Other companies, facing similar issues with their mobile clients, quickly recognized the new technology as the right tool for their problems. Today GraphQL is used in production by GitHub, Shopify, Twitter, WordPress, and many others.

Declarative Fetching

GraphQL follows the declarative programming paradigm. The client selects objects with their fields from the whole graph in a single query. A thick boundary is drawn between the client and the server to keep those two separated – the client knows what it needs, the server knows how to get it. They share only knowledge of the structure of the data – the GraphQL schema. A clear separation of concerns.

Take a Facebook user's feed as an example. The feed contains posts, comments, and reactions. A Facebook client creates one query that retrieves all the data – posts with their comments and reactions – needed to render the current view. You may think of the process as UI-driven data fetching. The client is in control. It declares what it needs and the server delivers.

Let's draw an analogy to RESTful architectures. In REST your API is organized around resources. You have a separate endpoint for each resource. The client is forced to follow an imperative approach to fetch all the necessary pieces and bundle them together. It must make multiple network requests to each URI and then aggregate the collected information itself. The server is in control.

No Overfetching

A mobile client consuming a REST API has to make multiple network requests to fetch all the data from separate endpoints modeled around resources – posts, comments, and reactions. The client receives the whole resource entity even if only a fraction of it is enough. The problem is called overfetching.

Apart from being inefficient, the waterfall network requests bring unnecessary complexity to the client-side. A mobile application has to wait until all the requests have finished, then it has to massage the data to obtain the structure that fits the current page.

Overfetching poses problems not only on the client side but also on the server side. Imagine millions of users tapping on their phones and each tap instead of creating just one network request, it makes three or four or even more related requests. I've had my fair share of troubleshooting nasty concurrency issues at the backend. Analyzing performance bottlenecks is non-trivial as well.

A GraphQL client makes a single network request to fetch multiple resources at once. It defines a particular set of nested objects with their specific fields – only the author's first name, the title for posts, and the date for the comments.

I can hear REST advocates shout out claiming that you can achieve the same UX with a RESTful endpoint. A client-specific coarse-grained API endpoint – or an aggregate if you are into DDD – can return multiple partial resources in one call. This approach, however, could bring snowballing complexity as the variety of your clients grows. It'll slow down new feature delivery as even for the smallest change in the API the frontend team has to wait for the backend team to amend the payload and release the new API version.

API Documentation

API documentation is a pain. It's a pain to create. It's a pain to maintain. It's a pain to share. I cannot remember the last time I worked with a well-written and up-to-date API documentation. Docs either don't exist – especially for internal services – or are outdated or don't explain any semantics around the request/response payloads.

I've documented APIs with RAML, Blueprint, and Swagger. These are all beautiful tools that get the job done. The main issue I'm witnessing over the years is that the person who changes the code forgets to update the docs. It sounds so simple – you change the code and then change the docs. If so easy that is, why do we deal with tons of outdated API docs every day?

GraphQL reduces the time you spend on documenting your API. Even if you write zero documentation, with GraphQL you still get plenty of it out of the box – endpoints, request/response payloads, type information, null values, etc. You can add text descriptions directly into the code, where they are easier to maintain and would never be forgotten.

Many handy tools take advantage of the GraphQL schema and make development a breeze. You can use GraphiQL, an interactive playground, as the primary form of documentation. With GraphiQL you can explore the schema, compose queries, and execute them against the GraphQL server.

GraphQL Introspection makes it possible to retrieve the GraphQL schema from a GraphQL server. It is perfect for autogenerating API documentation, mocking the GraphQL schema client-side, testing, and retrieving schemas from multiple microservices during schema stitching.

API Versioning

I enjoy building RESTful backends. I have done so for more than a decade and continue to create new ones and support old ones. My affliction has always been – how to version my API. What approach to versioning allows me to evolve my API without breaking the whole world depending on it? How to avoid the suffering of maintaining multiple versions, but still doing no harm to all my clients?

There is no right answer to this question. Whatever approach you choose, it’s always a pain to move your consumers from one API version to the next. Nowadays, when people ask me “How do you version your API?”, my answer is simple – “I don’t”. It’s not worth it. API versioning brings unnecessary complexity. Whether you version your API or not, you still have to support multiple versions of your code.

GraphQL folks feel more or less the same way. In GraphQL there are no API versions. You can deprecate your API on a field level. Your clients will receive a deprecation warning when going for a deprecated field. It’s their job to migrate before the deprecated field is dropped. That way your API evolves over time without the need for API versioning.

Let’s go through a simple example to see how all that actually works. Imagine you want to change the field “company_name” – of type string holding the name of the company – with a “company” object which will hold a bit more information about a company.

You change your database schema, creating the new companies table, and so on. The API endpoint continues to work the old way – returning the company name, only now the code gets it from the newly created table. The next step is to have the endpoint return both old and new data, “company_name” alongside with “company” entity. When adding the “company” to the API payload, you mark the “company_name” as deprecated but continue to return it. New clients ask for “company” and get it. Old clients ask for “company_name” and get only that piece, along with a deprecation warning. Older clients are completely unaware of the new “company” entity. With GraphQL you only get what you’ve asked for.

Now you can drop the old “company_name” field from the payload. But how do you know that no one uses it? It’s simple when you have the GraphQL schema in your hands. You can use Apollo tools – or even quickly sketch your own script – to track requests for deprecated fields from your server logs.

Yet again I can hear REST advocates saying – “Well, you can do the same with REST. Just add the new stuff to your payload and after a while drop the old stuff”. I agree that changing a field in REST follows a similar procedure as it does in GraphQL. But it is not as smooth and effortless. You can add a new field to your JSON payload and hope for the best, but there are clients (especially the mobile ones) that will break when the JSON schema changes even though they don’t actually use that new field. With REST you always return the entire resource entity. It makes no difference whether or not your clients need the new stuff, they will get it and will have to either ignore it or adapt to it.

Another problem is communicating that change to your users. In RESTful architectures, there are no practical methods to communicate API changes effectively. You write the changes to the documentation and put it in a static location, where your API docs live, expecting developers who care to just stumble upon it. Or you can write changelogs, use social media, send automated emails – all of these proven ineffective. I am emphasizing "practical methods" as there is a way following REST principles to communicate API changes. It's called Hypermedia. But let's be honest – when was the last time you've worked with a RESTful API having a proper Hypermedia setup. Never is correct.

Strongly-typed Single Source of Truth

GraphQL is a strongly typed query language. The type system helps the tooling: editors, code and docs generators, stand-alone applications, plugins, even small scripts that you can write on your own. The GraphQL schema defines types that describe your application data. The GraphQL tooling can validate a query against the schema and make sure it is syntactically correct.

The type information supports IDE/editor integrations. Developers enjoy smooth auto-completion and syntax validation during compile time. Consuming GraphQL from a statically typed language, like TypeScript, brings even more benefits. Apollo Codegen can generate TypeScript types for your queries. Maven plugins can generate Java classes from your graphqls files.

The GraphQL schema describes all the available application data in one place. The schema is defined on the server. You may have multiple clients (web, mobile, smartwatch) but only one backend to serve them all. And that backend has no favorites – each client sees the exposed API as a perfect fit for its use cases. All the data is out there ready to be fetched and a GraphQL client can fetch only what's needed – nothing more, nothing less.

Having a single schema doesn't mean you are forced to design your system as a monolith. You can have one GraphQL gateway schema fueled by multiple services using schema stitching. GraphQL schema stitching allows you to stitch multiple schemas and create a global schema. GraphQL is not coupled to a data source. You can have ten databases, external APIs, anything – literary you can pull your data from anywhere – but still expose everything behind a single endpoint having a single schema.

Less Bikeshedding

A common situation today is for product companies to have a mobile application served by an internal API. Normally your organizational structure follows your system architecture – you have a team of mobile developers building the frontend and a team of backend developers building the API. Since your API is not end-user-facing, your teams are easygoing when it comes to following conventions and best practices – you get away with anything as long as working features are released quickly.

Now you have two opposing sides, each defending its ground relentlessly. The API consumers want the API to be exposed in a way it's best suited for the client-side use cases. The API providers want the API to be exposed in a way it's best suited around the backend architecture and database schema. Defending a position in that battleground involves a good deal of bikeshedding. Let's follow REST and invent ephemeral resources. No, no, we perform operations, let's go for RPC-styled endpoints. Agreeing on a small change can waste many hours.

In GraphQL mobile devs have the whole graph in their hands, all the information about the application is available. They can make any query to fetch exactly what they need. If they want to display a new field, it's there, ready, waiting to be retrieved. No need to wait for the backend team to add it to the payload. No need for cross-team collaboration to expose and consume that new field. No need to reach an agreement on the API contract. GraphQL is not a set of nice-to-follow (and sometimes a bit vague) principles like REST. It's a strict protocol. You abide it or your system stops working.

React and GraphQL

If you happen to use mostly React at the client side, you'll find that React and GraphQL form a beautiful friendship – which came about naturally as they grew up together at Facebook. They were born to bring simplicity and correctness to the chaos in front-end development and build a brand new future.

Our React components need data. Props (and PropTypes) express what data I need as a component, but don't say anything about how to get that data. GraphQL comes to fill that missing piece and complete the picture. When building a page, you need to fetch and display different resources. GraphQL fetches all of the resources at once providing React with the data to power the page components.

The GraphQL query structure aligns perfectly with your React component tree. They form a natural fit. The React component tree extends along the nested structure of the GraphQL query. And even if you don't leverage the power of a library like Apollo Client or Relay, you can still build a simplified client using React and consuming a GraphQL API by making plain HTTP calls.

GraphQL Disadvantages

Let's reframe the title – when not to use GraphQL and what pitfalls it brings to the table. Things that may slow down product delivery. Things that could turn good ideas into a fiasco. Things that could cause affliction after the go-live once you have amassed a critical user base.

Complexity

GraphQL is complex. Underestimate its complexity and you'll have many sleepless nights responding to incidents that have brought your production system down. GraphQL poses a serious challenge on the client side, on the server side, and on the engineering team. But if you are curious, if you love learning about new technologies, if you find joy and fulfillment when your mind wanders solving difficult problems – take the leap of faith, GraphQL is your thing.

Working with raw GraphQL using only an HTTP client is hard. That's why good people have created good abstractions on top of it. Abstractions like Apollo and Relay makes using GraphQL much easier. But those tools come with a steep learning curve. You need to know how they work underneath to use them effectively. If you are not aware of the potential pitfalls, if concepts like: maximum query depths, query complexity weighting, avoiding recursion, persistent queries, don't ring any bells with you, then you should be prepared to face performance problems that will keep you awake at night.

Performance

When a complex query – asking for authors, posts, comments, reactions – reaches your server, it needs to be resolved against your data source. In most cases that would be a relational database – that seems to be the default choice today. How do you answer a graph query using a relational database? No matter what approach you choose that's not a trivial task to accomplish. You either hit the database multiple times or end up doing multiple complex joins on related tables. Your database is now your performance bottleneck. You cannot autoscale your database as easily as you can spawn a bunch of web workers to handle the peak loads. Serious performance issues could arise when many clients request many nested fields at once.

Translating GraphQL queries to SQL queries efficiently is no easy task, especially for less experienced developers. Developers who rely primarily on an ORM framework and don't fully understand how the database works are most prone to creating performance issues. But I guess that's true no matter what architectural pattern you chose – REST or GraphQL. The thing is that with GraphQL it's much easier to create a N+1 problem as it happens so naturally – you won't even see it coming.

The naive approach to resolve GraphQL queries is to use Apollo local resolvers to fetch objects independently and let Apollo join everything in your runtime. This approach is the most apparent but has two serious downsides: (1) leads to N+1 problems; (2) joins are performed in the runtime instead of in the database. If you take a look at the SQL queries generated by the Apollo resolvers, you'll see 1 query for the posts and N queries for each post's comments.

Rate Limiting

Rate limiting is yet another complexity you'll have to deal with. It is simple to rate-limit requests in REST. You have a resource, you allow only so many requests to that resource – done. It is difficult to accomplish the same in GraphQL. Every query can be anything between cheap and expensive. The rate-limiting calculations can easily become more complex than resolving the queries themselves. You have to take into account things like maximum query depth and query complexity weighting.

Rate limiting ensures your API is robust. It's a protection mechanism that keeps your servers alive. Even if you are building an internal API, you still need that safety net. A bug in your frontend could result in sending unintentionally too many requests in bursts or in an infinite loop. You have to design a rate-limiting model that best reflects the load each request causes to your servers. Your server has to calculate the cost of each GraphQL query before executing it.

Caching

In REST you have the URI to a resource and it's simple to implement a cache. It is complex in GraphQL as each query is very different from the next. You can ask for the same entity but each time request different fields. Even the most fine-grained cache on a field level could fail to be effective and take a huge effort to be implemented. GraphQL doesn’t rely on the HTTP caching methods. The caching problem could be partially solved by using persisted GraphQL queries, but usually those bring more pains than gains.

Apollo libraries offer caching out of the box. But, as with everything else labelled "bleeding edge" or "under construction", you may face certain challenges having those as dependencies. When you hit a problem, you search for a solution, but the information is either already outdated or insufficient. Newer versions are quite different from the previous, making upgrades quite painful. GitHub remains your best friend in all this newness. But there are plenty of unresolved and unanswered GitHub issues.

The N+1 problem

Falling into the N+1 problem while building a GraphQL API is easy. A problem with a fancy name that drives in both lanes – complexity and performance. Ask your dev team to raise their hands – how many know what the N+1 problem is? Few hands would mean GraphQL would be an even bigger pain to adopt than your initial estimates.

In REST, the situation is more manageable. You usually have one resource per endpoint. Even with nested resources, the request hits one controller method that has all the context about the query it needs to perform. Well, that may only sometimes be the case, as, in large monoliths where you have a hierarchy of high-level abstractions calling low-level abstractions, N+1 problems are hard to track down through all the layers.

While you are not immune to N+1 in REST, in GraphQL, the N+1 problem comes naturally. It joins the party uninvited and unnoticed. You blame the GraphQL architecture for the performance problems in production. Still, the actual cause is – developers relying too much on ORM or domain abstractions, having no idea what's going on underneath the surface.

Let's use an elementary example to illustrate how easy it is to have an N+1 problem, even in the simplest of use cases.

query {
  users {
    name
    address {
      city
    }
  }
}

Let's also assume the following naive implementation in Ruby (all irrelevant details are stripped). I'll use Ruby due to the brevity of its expressive DSL, as the problem is programming language agnostic and will pay you a visit, completely ignoring your fancy tech stack.

class UserType
  field :name, String
  field :address, AddressType

  def address
    Address.find object.address_id
  end
end

The resolution of resolvers goes like so:

QueryType.users -> SELECT * FROM users
  User 1
    UserType.name
    UserType.address -> SELECT * FROM addresses WHERE id = user1.address_id
  User 2
    UserType.name
    UserType.address -> SELECT * FROM addresses WHERE id = user2.address_id
  User 3
    UserType.name
    UserType.address -> SELECT * FROM addresses WHERE id = user3.address_id

You have 1 query to fetch all the users and N queries to get the address for each user. The problem happens easily because we have one resolver function per field, making a roundtrip to the database to fetch the nested data. But don't get intimidated by a problem with a fancy name. A careful team of seasoned developers aware of the N+1 problem could guard the code and resolve queries so that this problem would never occur.

You need to perform a single query to fetch all the addresses at once:

SELECT * FROM addresses WHERE id IN (...)

That would account for 2 queries – 1 for the users and 1 for the addresses. Compare that to the N+1 queries executed in our naive implementation. To solve this problem, you can use batch-loading, lazy evaluation, or preloading. Everything you and your team need to tame the GraphQL complexity is knowledge and experience.

A Steep Learning Curve

I've put the steep learning curve last despite being the most significant point. I've put it outside the cons list because if you have a passion for learning, a steep learning curve is a rousing challenge – something that excites and motivates you. I've put it outside the pros list because a steep learning curve is something you have to face as a team – yet another peak to conquer. Learning new technology will cause certain pain, will make you flip tables, and keep you awake at night.

Every disadvantage on the GraphQL cons list is just a complexity you can easily tackle once you have the knowledge and the skills. Every downside could be resolved by using the right tool. But tools that solve complex problems are not always easy to use. First you need to understand how things work underneath, without an Apollo client for example, fall into pits, get to the bottom of them, and only then you'll become a master of your craft – you'll have the right tools and know how to use them effectively.

Learning everything you need to know about GraphQL takes time. And I'm not taking about the query language per se. I am talking about being comfortable using the Apollo platform. Understanding the problem that each abstraction is created to solve. Being aware of all the potential performance issues you are about to face in production if you are sloppy. Protecting your servers. Ensuring your API is solid.

Accept the fact that learning is a lifelong process, develop a passion for it, constantly go out of your comfort zone, accept new challenges, and you will never cease to grow. Once you've climbed over that mountain, you will see GraphQL as a very exciting new technology that is here to help you with all the problems you've faced in the past while building APIs.

What should I do?

It depends. It depends on your existing tech stack, your team experience, your service architecture (monolithic or microservices), the problem you are trying to solve, the variety of clients you need to support and their specific requirements, etc… the list goes on and on.

There are use cases where REST is the better approach, especially for simple resource-driven applications, when you don’t need all the flexibility of GraphQL queries. I was building a data extraction API in the past couple of years, where we mostly uploaded files. GraphQL is no good when it comes to working with files. We didn’t even consider adopting GraphQL for that project.

But if you are in the shoes of Facebook or Airbnb or Shopify, if you have complex views like the Facebook feed or the Airbnb search results, a RESTful architecture would cause persistent pain. Facebook felt that pain and came up with GraphQL. Netflix felt that pain and came up with Falcor. When REST is not the right tool – just pick a different tool. Not every problem is a nail and sometimes you have to put down your hammer. I recommend you give GraphQL a try for your next client and server application and see if it fits your needs.