Working with a GraphQL Dataloader

Dataloader is a utility that improves the performance of your GraphQL query. Dataloader supports batching and caching functional capabilities.

When you create a Dataloader, Integration Server generates a loader service and a document type for keys. A key uniquely identifies a field(s) in a data source. You can specify the field(s) in the key document. A loader service loads the data for the list of keys and returns a list of values.

While resolving the data for a field, Integration Server invokes the corresponding data resolver service. If you are using a Dataloader, the resolver service does not resolve the data for a field. Instead, it invokes the pub.graphql:load or pub.graphql:loadMany service. Integration Server then collects a batch of keys and invokes the loader service. The loader service loads the data for the batch. Resolver service collects this data and in turn, returns it to the user. All the loaded data is cached. Later, if you want to resolve data for the same keys, then Integration Server returns the values from the cache. This avoids repeated accessing of the data source.

In summary, a batch function helps in reducing multiple requests to the data source and a cache function eliminates repeated loading of same data in a single GraphQL request. Thus, Dataloader increases the query efficiency in GraphQL and resolves the N+1 problem. N+1 problem is explained using the following example.

Consider the following GraphQL schema:

Suppose you are using the following query:

For the above sample query, suppose listPersons resolver (parent resolver) returns five persons. Then, Integration Server invokes the bestFriend resolver (child resolver) five times to fetch the data for the best friend of each person. Therefore, Integration Server queries the data source 5+1 times (five times to get the data for the best friend and once to get the list of persons). Similarly, if the listPersons resolver returns N persons, then Integration Server queries the data source for N+1 times to get the data for person and best friend. This is an N+1 problem.

As the number of values returned from the parent resolver increase, the number calls to retrieve the data also increases. Moreover, if the child query has multiple complex fields, then the number of calls to the data source further increase.

Using Dataloader, the listPersons resolver makes one call to fetch the person list and bestFriend resolver makes one call to fetch the data for all the best friends from the data source. In total, resolver makes two calls instead of six (5+1) calls. Dataloader batches the keys to retrieve the person list and thus avoids multiple calls to data source.