cft

Hidden Anatomy of Backend Applications: Data Dependencies

This article focuses on yet another ubiquitous processing pattern which present in virtually any backend application.


user

Sergiy Yevtushenko

2 years ago | 6 min read

In previous articles (here and here) we’ve looked at backend applications from non-traditional angle. Such a look enabled us to see processing patterns which are inherent to backend applications and don’t depend on used tools, languages and frameworks.

This article focuses on yet another ubiquitous processing pattern which present in virtually any backend application.

It should not be a surprise that almost every backend entry point requires some data to generate response. It might be as simple as constant string or as complex as invocation of several external services and retrieving different pieces of data from the database. Even if the response actually is ignored (for example, if the entry point is used to send a notification) this requires access to some resource — exactly the same as if we have accessed another service or DB. We can also abstract out differences between access to external service and retrieving data from DB. In both these cases we access some data (even if all we need is “OK” response from some external component), so we can assume that all these calls/invocations/retrievals serve the single purpose — get data. With these abstractions in mind, we can analyze how backend apps access data.

Single Data Access Pattern

This one is most frequently found — to generate response application requires only one piece of data. We may consider this case as a subcase of one of the patterns described below.

AND (ALL) Data Access Pattern

Often we need to retrieve two or more pieces of data to prepare the response. If any of these pieces are not available for any reason, then the backend application produces error response. Simple example:

public UserProfileResponse getUserProfileHandler(final User.Id userId) {
final User user = userService.findById(userId);
if (user == null) {
return UserProfileResponse.error(USER_NOT_FOUND);
}

final UserProfileDetails details = userProfileService.findById(userId);

if (details == null) {
return UserProfileResponse.error(USER_PROFILE_NOT_FOUND);
}

return UserProfileResponse.of(user, details);
}

First, the backend application retrieves user information. If there are any error, then error is reported. Then the application retrieves the user profile and if this results to failure, then an error is reported to the caller. Finally, when both pieces of data are available, the result is composed and returned to the caller.

If we omit details and extract only data access pattern then we can describe the code above as follows:

UserProfileResponse = All(User, UserProfileDetails).

where All represents a logic responsible for retrieving both pieces of data; User and UserProfileDetails represent data which we want to obtain.

With this notation we may say that response depends on User and UserProfileDetails. Notice that both pieces of data are independent on each other.

OR (ANY) Data Access Pattern

Much less often can be found a case when we may get the same (or equivalent) information from the several sources. So, we try to retrieve some data, but if there is an error or data not found, we try to retrieve data from other source(s) and return an error response only if all attempts to retrieve data failed. Simple example:

public WeatherDetails getWeather(Location location) {
WeatherDetails details = openWeatherMapProvider.retrieve(location);

if (details == null) {
details = accuWeatherProvider.retrieve(location);
}

return details;
}

First, the backend application tries to retrieve necessary information from the first provider, but if it fails, the application calls another service which provides the same information.

Again, we can omit details and extract data access pattern:

WatherDetails = Any(OpenWeatherProvider::WeatherDetails, AccuWeatherProvider::WeatherDetails)

where Any represents a logic which performs data retrieval; OpenWeatherProvider::WeatherDetails and AccuWeatherProvider::WeatherDetails are actual data we need to retrieve. Since the type of data is the same, they are prefixed with source names to distinguish both data dependencies. Again, both data dependencies are independent on each other.

Digging Deeper

The first thing which we may notice is that both All and Any represent points in code where happens synchronization. Regardless from the processing model (synchronous or asynchronous) in these points we need to wait while dependency (dependencies) will be satisfied, so processing could be continued.

Another observation: in real applications we can see how these data access patterns are combined into more complex data access patterns. Services or repositories, which we call to obtain data, internally have their own data access patterns and so on and so forth. For example, lets imagine that the user service mentioned in the first example above, actually uses two external services to store user information — FireBase and AWS Cognito:

public User findById(final User.Id userId) {
User user = firebaseProvider.findById(userId);

if (user == null) {
user = cognitoProvider.findById(userId);
}

return user;
}

If we take into account this structure, then:

UserProfileResponse = All(User, UserProfileDetails).

can be rewritten as:

UserProfileResponse = All(Any(Firebase::User, Cognito::User), UserProfileDetails).

The process of detailing data dependencies for each entry point can be continued until all internal dependencies are described this way. At the end we’ll have only external dependencies left in the formulae describing data dependencies, forming the Data Dependency Graph (DDG) for the entry point.

Omitted Parts

While exploring data dependencies we made some simplifications and abstracted out some information. These omitted parts might be interesting by themselves, so lets take a look on them.

The first omitted part is the transformation of the data at each step. When data dependencies are retrieved, usually they are transformed into some new form before the response is sent to the client.

Let’s take a look at the following DDG:

UserProfileResponse = All(Any(Firebase::User, Cognito::User), UserProfileDetails).

While all data dependencies are present, there is no information about how these dependencies are transformed into output. Describing transformations in a language-neutral way is an interesting exercise, but might be a part of another research. For now lets assume that we have named functions which perform necessary transformations and rewrite DDG using them. For the convenience we may assume that All returns a tuple of retrieved dependencies and this tuple is then submitted to the transformation function:

UserProfileResponse = TransformToUserDetails(All(Any(Firebase::User, Cognito::User), UserProfileDetails)

Obviously, Any returns only one data dependency.

The second omitted part is the details of how parameters are passed down to the calls which retrieve data dependencies. This information might be really useful to help analyze the application API, for example. The data (parameters) necessary to retrieve data are passed down from entry point to data dependencies, but at the same time these parameters shape the API we’re providing in our application to our clients.

Observations And Considerations

Data dependencies are very exciting patterns. They provide a completely new look at the backend applications. Perhaps most important here is that DDG abstracts out processing details and leaves only the following information:

  • Two types of synchronization points — All and Any. Single data dependency can be represented as either All or Any with just one parameter.
  • Transformation functions. These transformations represent a pure business logic.
  • Data (in the form of dependencies)

In some sense DDG holds purified application business logic. Of course, this logic is tightly coupled with the data layout (i.e. how and where pieces of data are stored in the whole system). If we’ll rearrange data, most likely implementation of business logic will change as well.

DDG might be very helpful here as it enables high level analysis of the business logic and data layout. This analysis, in turn, enables optimization of data layout and business logic. For example, DDG can be used to analyze correctness of the split up of the system into (micro-)services.

As mentioned above, every All or Any operator represents a synchronization point. This might be used to estimate application response times within the given processing model (synchronous or asynchronous). The article mentioned above contains an example of such an estimation.

Another interesting application of DDG might be using it as a model for writing backend applications. This enables, for example, writing business logic as a pure transformation functions without side effects and without a need to handle errors.

Which, in turn, makes testing and reasoning about the code much simpler. Both All and Any operators could be part of the underlying framework, which hide all irrelevant details, including error processing and handling.

Upvote


user
Created by

Sergiy Yevtushenko


people
Post

Upvote

Downvote

Comment

Bookmark

Share


Related Articles