cft

Sorting in Scala — a cat shop example

While Java and Scala both inhabit the same JVM grounds


user

Anton Semenov

3 years ago | 6 min read

While Java and Scala both inhabit the same JVM grounds, the latter has become known as a far more concise and expressive language due to its substantial adoption of functional programming concepts and rich standard library. In this article I present an example of this expressiveness by imagining how a small piece of a codebase could evolve with requirements increasing along the way.

Initial problem

Let’s say we’re in charge of a cat shop, cats being the most popular animal in the Scala’s ecosystem. However, due to the ways the shop gets information about which cats are for sale, in some cases the data doesn’t come from the database and has to be sorted manually before returning it as an HTTP response or processing it in some other way.

The domain object in this scenario is, of course, a Cat, which at the beginning has only 3 fields of primitive types. The objective is to design a concise API with which a collection of Cat objects can be easily sorted given the following requirements:

  • Sort order can be defined for each of the fields
  • Sort order may not be defined for any of the fields
  • Ordering should be stable for fields with undefined sort order
  • Each field has a predefined static priority (e.g. sorting by the age field is always prioritized over sorting by the name field)

First iteration

Cat — iteration 1

As the problem at hand is relatively easy and we have no perspective on the other parts of the project, it’s better to go with a simple solution first. Luckily, scala.Ordering has a convenient method Tuple3 for creating Ordering instances for tuples of arity 3, to which any Cat object can easily be mapped.

However, interacting with Tuple3 directly is quite cumbersome as it requires manual creation of Ordering if we need to change the sort order. Moreover, there is no concise way to construct an identity Ordering for some of the fields — one would probably need to manually switch between Tuple1, Tuple2 and Tuple3 and/or provide an identity Ordering for the ignored fields. In total, one would need to handle 9 (3 sort orders for 3 fields) possible cases due to the sorting priority being predetermined.

To bring the API closer to the problem domain and also make it simpler a “sort order” entity could be introduced. According to the requirements, there are 3 possible orders: ascending (natural), descending (reversed natural) and none (keep the existing order). This natively maps to the following algebraic data type (ADT):

Sort order — iteration 1

A way to apply a sort order to an Ordering is also required. Since SortOrder looks like a highly reusable class, this feature is better introduced as a syntax extension so that it doesn’t interfere with the class definition:

Order syntax — iteration 1

OrderingUtil.identity is a helper function that provides an identity ordering for any type A and is defined as Ordering.by(_ => 0).

Now that the core domain is set up, the only thing left is to construct the actual Ordering[Cat]. For this purpose the CatOrdering object is created:

Cat ordering — iteration 1

Now any desired Ordering[Cat] can be created using the CatOrdering.of function:

CatOrdering.of(SortOrder.Asc, SortOrder.Keep, SortOrder.Desc)

As the number of cases to test is relatively huge, to check that this implementation is correct it’s beneficial to use ScalaTest + ScalaCheck combination for property-based testing. It allows to automatically generate List[Cat] for which the correctness of an Ordering can be verified. The test suite for iteration 1 is available here.

Iteration 2

The shop has grown since the initial implementation and the Cat has received a new optional field:

Cat — iteration 2

The main difference with the primitive types is that now it has a special case — an empty value (has nothing to do with null!) which, depending on the preference, may be considered the highest or the lowest value. That means that there are 4 total possible sort orders by this field:

  • Ascending, empty values first (natural ordering)
  • Ascending, empty values last
  • Descending, empty values first
  • Descending, empty vales last (reversed natural ordering)

While 2 of them are implicitly provided by scala.Ordering.Option, the other 2 would have to be handled manually. As SortOrder is now responsible for handling optional values, it now explicitly specifies their priority:

Sort order — iteration 2

Respective adjustments are made to the SortOrder syntax. optional method provides an Ordering[Option[A]] for any A while applying the rule for empty values specified in the SortOrder object. Note that while it’s possible to create an Ordering[Option[A]] with the apply method, one would need to explicitly provide an Ordering[Option[A]] for that, which prevents doing so by accident. This minor inconsistency can be overcome by providing proof that A is not a descendant of Option, see <:< class documentation and this StackOverflow question for more details (note that Not is available in Dotty as a part of the standard library).

Order syntax — iteration 2

Cat ordering — iteration 2

An example definition of Ordering[Cat] is provided below. Note that it is necessary to provide empty values rules even for non-optional fields.

CatOrdering.toOrdering(
SortOrder.Asc.emptyFirst,
SortOrder.Asc.emptyFirst,
SortOrder.Asc.emptyFirst,
SortOrder.Asc.emptyFirst
)

Given the adjustments for Option ordering the tests now need to cover the aforementioned 4 sort orders for Option fields. The test suite for iteration 2 is available here.

Iteration 3

Now the project has a setup for even the most complex Cat orderings. Nevertheless, it still has some limitations:

  1. Field sorting priority is predefined
  2. If some fields are not required to be sorted by, they still have to be explicitly provided with SortOrder.Keep
  3. Cat class can grow only up to 9 fields. After that, there is no convenient Tuple10 defined under Ordering and the implementation has to drastically change

New business requirements this time relate to the last point. The shop has started to provide so much information about cats, the Cat class now has 10 fields! Given that there are so many fields, it is evident that only a few of them will be used for sorting. Moreover, it is much harder to predefine sensible ordering priorities and explicitly providing a SortOrder for all of them is too verbose. The changes will be more drastic this time.

The main idea now is not only to provide the order of how to sort a certain field but to provide the fields themselves as well — a collection of fields and respective sort orders is then transformed to a Cat ordering. Having an ordered collection means that field sorting priority can be redefined (limitation #1), fields that are not in the collection are ignored (limitation #2) and adding a new field requires only a few lines of code and will never change the implementation (limitation #3). The implementation is as follows (SortOrder and respective syntax are omitted for brevity due to practically no changes):

Cat — iteration 3

Cat field — iteration 3

Cat ordering — iteration 3

The implementation mainly makes use of the orElse method which tries to apply another ordering if comparison on the current one results in equality. It’s very similar to Java’s thenComparing method of Comparator class. This method is available for Ordering as well since the latter extends Comparator for compatibility purposes. orElse, however, works much better with Scala classes and also has a nice orElseBy alternative for convenience.

Note that as the byFields function is applied to an arbitrary collection with no rules on uniqueness of its elements, duplicate entries are handled manually. Moreover, to avoid unnecessary comparisons with OrderingUtil.identity, the case of a non-empty collection is handled separately. These adjustments are for optimization purposes only, a straightforward implementation with fields.foldLeft would work as well since duplicates won’t affect the result.

An additional benefit is that SortOrder.Keep is no longer needed since the collection used to create an ordering only contains the required fields (the rest are ignored) — for no ordering at all one just has to provide an empty collection. This also happens to be quite convenient for mapping HTTP requests or user’s commands, which will only provide required orderings, to this domain model.

The downside of this approach is, of course, a bit more verbose model. Arguably, some sort of enumeration will be required anyway for the aforementioned request mapping, but field objects definition looks more like boilerplate rather than a necessary measure. Perhaps, the 4th iteration could be defining a compiler macro with annotation support for custom orderings, but it’s doubtful that any real-life projects would come to that point. The current setup, however, allows for easy field addition/removal since the sorting code is not affected at all.

The test suite for iteration 3 can be found here.

Conclusion

While the resulting implementation may look similar in some other languages, I feel that Scala offers a good compromise between type safety and code conciseness/readability. Hopefully, this article has given you enough insight into how one can sort a collection in Scala or provided you with ideas that you can adapt in codebases not directly related to Ordering. All the code samples are available in this GitHub repository.

This article was originally published by Anton semenov on medium.

Upvote


user
Created by

Anton Semenov


people
Post

Upvote

Downvote

Comment

Bookmark

Share


Related Articles