Sorting in Scala — a cat shop example
While Java and Scala both inhabit the same JVM grounds
While Java and Scala both inhabit the same JVM grounds, the latter has become known as a far more concise and expressive language due to its substantial adoption of functional programming concepts and rich standard library. In this article I present an example of this expressiveness by imagining how a small piece of a codebase could evolve with requirements increasing along the way.
Let’s say we’re in charge of a cat shop, cats being the most popular animal in the Scala’s ecosystem. However, due to the ways the shop gets information about which cats are for sale, in some cases the data doesn’t come from the database and has to be sorted manually before returning it as an HTTP response or processing it in some other way.
The domain object in this scenario is, of course, a
Cat, which at the beginning has only 3 fields of primitive types. The objective is to design a concise API with which a collection of
Cat objects can be easily sorted given the following requirements:
- Sort order can be defined for each of the fields
- Sort order may not be defined for any of the fields
- Ordering should be stable for fields with undefined sort order
- Each field has a predefined static priority (e.g. sorting by the
agefield is always prioritized over sorting by the
As the problem at hand is relatively easy and we have no perspective on the other parts of the project, it’s better to go with a simple solution first. Luckily,
scala.Ordering has a convenient method
Tuple3 for creating
Ordering instances for tuples of arity 3, to which any
Cat object can easily be mapped.
However, interacting with
Tuple3 directly is quite cumbersome as it requires manual creation of
Ordering if we need to change the sort order. Moreover, there is no concise way to construct an identity
Ordering for some of the fields — one would probably need to manually switch between
Tuple3 and/or provide an identity
Ordering for the ignored fields. In total, one would need to handle 9 (3 sort orders for 3 fields) possible cases due to the sorting priority being predetermined.
To bring the API closer to the problem domain and also make it simpler a “sort order” entity could be introduced. According to the requirements, there are 3 possible orders: ascending (natural), descending (reversed natural) and none (keep the existing order). This natively maps to the following algebraic data type (ADT):
A way to apply a sort order to an
Ordering is also required. Since
SortOrder looks like a highly reusable class, this feature is better introduced as a syntax extension so that it doesn’t interfere with the class definition:
OrderingUtil.identity is a helper function that provides an identity ordering for any type
A and is defined as
Ordering.by(_ => 0).
Now that the core domain is set up, the only thing left is to construct the actual
Ordering[Cat]. For this purpose the
CatOrdering object is created:
Now any desired
Ordering[Cat] can be created using the
CatOrdering.of(SortOrder.Asc, SortOrder.Keep, SortOrder.Desc)
As the number of cases to test is relatively huge, to check that this implementation is correct it’s beneficial to use ScalaTest + ScalaCheck combination for property-based testing. It allows to automatically generate
List[Cat] for which the correctness of an
Ordering can be verified. The test suite for iteration 1 is available here.
The shop has grown since the initial implementation and the
Cat has received a new optional field:
The main difference with the primitive types is that now it has a special case — an empty value (has nothing to do with
null!) which, depending on the preference, may be considered the highest or the lowest value. That means that there are 4 total possible sort orders by this field:
- Ascending, empty values first (natural ordering)
- Ascending, empty values last
- Descending, empty values first
- Descending, empty vales last (reversed natural ordering)
While 2 of them are implicitly provided by
scala.Ordering.Option, the other 2 would have to be handled manually. As
SortOrder is now responsible for handling optional values, it now explicitly specifies their priority:
Respective adjustments are made to the
optional method provides an
Ordering[Option[A]] for any
A while applying the rule for empty values specified in the
SortOrder object. Note that while it’s possible to create an
Ordering[Option[A]] with the
apply method, one would need to explicitly provide an
Ordering[Option[A]] for that, which prevents doing so by accident. This minor inconsistency can be overcome by providing proof that
A is not a descendant of
<:< class documentation and this StackOverflow question for more details (note that
Not is available in Dotty as a part of the standard library).
An example definition of
Ordering[Cat] is provided below. Note that it is necessary to provide empty values rules even for non-optional fields.
Given the adjustments for
Option ordering the tests now need to cover the aforementioned 4 sort orders for
Option fields. The test suite for iteration 2 is available here.
Now the project has a setup for even the most complex
Cat orderings. Nevertheless, it still has some limitations:
- Field sorting priority is predefined
- If some fields are not required to be sorted by, they still have to be explicitly provided with
Catclass can grow only up to 9 fields. After that, there is no convenient
Orderingand the implementation has to drastically change
New business requirements this time relate to the last point. The shop has started to provide so much information about cats, the
Cat class now has 10 fields! Given that there are so many fields, it is evident that only a few of them will be used for sorting. Moreover, it is much harder to predefine sensible ordering priorities and explicitly providing a
SortOrder for all of them is too verbose. The changes will be more drastic this time.
The main idea now is not only to provide the order of how to sort a certain field but to provide the fields themselves as well — a collection of fields and respective sort orders is then transformed to a
Cat ordering. Having an ordered collection means that field sorting priority can be redefined (limitation #1), fields that are not in the collection are ignored (limitation #2) and adding a new field requires only a few lines of code and will never change the implementation (limitation #3). The implementation is as follows (
SortOrder and respective syntax are omitted for brevity due to practically no changes):
The implementation mainly makes use of the
orElse method which tries to apply another ordering if comparison on the current one results in equality. It’s very similar to Java’s
thenComparing method of
Comparator class. This method is available for
Ordering as well since the latter extends
Comparator for compatibility purposes.
orElse, however, works much better with Scala classes and also has a nice
orElseBy alternative for convenience.
Note that as the
byFields function is applied to an arbitrary collection with no rules on uniqueness of its elements, duplicate entries are handled manually. Moreover, to avoid unnecessary comparisons with
OrderingUtil.identity, the case of a non-empty collection is handled separately. These adjustments are for optimization purposes only, a straightforward implementation with
fields.foldLeft would work as well since duplicates won’t affect the result.
An additional benefit is that
SortOrder.Keep is no longer needed since the collection used to create an ordering only contains the required fields (the rest are ignored) — for no ordering at all one just has to provide an empty collection. This also happens to be quite convenient for mapping HTTP requests or user’s commands, which will only provide required orderings, to this domain model.
The downside of this approach is, of course, a bit more verbose model. Arguably, some sort of enumeration will be required anyway for the aforementioned request mapping, but field objects definition looks more like boilerplate rather than a necessary measure. Perhaps, the 4th iteration could be defining a compiler macro with annotation support for custom orderings, but it’s doubtful that any real-life projects would come to that point. The current setup, however, allows for easy field addition/removal since the sorting code is not affected at all.
The test suite for iteration 3 can be found here.
While the resulting implementation may look similar in some other languages, I feel that Scala offers a good compromise between type safety and code conciseness/readability. Hopefully, this article has given you enough insight into how one can sort a collection in Scala or provided you with ideas that you can adapt in codebases not directly related to
Ordering. All the code samples are available in this GitHub repository.
This article was originally published by Anton semenov on medium.