Introduction

So, we had a Tech Day at Inaka's offices, but one of the talks could not be recorded. In that talk, Hernán presented 7 Heuristics for Object-Oriented Design. And since we couldn't have him on record to share all that wonderful stuff with everybody else in the world, I thought I would write a blog post about it. That way, at least some of it will not be lost.

But, while I was thinking about what to write in this blog post I realized that:

I'm not exactly an OOP programmer anymore
The 7 heuristics and especially the ideas behind them can be easily applied to other paradigms. In particular, to functional programming.

So, I decided to write this blog post with those 7 heuristics in mind, but instead of applying them on OOP, I'll show you how to apply them in Erlang. Let's see how that goes…

Abstract Data Types

Before we move on, I'll introduce a concept that is nothing new for Haskell developers, but that may seem somewhat strange to several Erlangers: Abstract Data Types.

In computer science, an abstract data type (ADT) is a mathematical model for data types where a data type is defined by its behavior

The idea here is to define the entities in your system by describing how they are used and not how they are implemented underneath. There is a lot to talk about ADTs, but for the purpose of this post, I'll mainly focus on a good practice we encouraged at Inaka: Keep your models in one module without exposing their internal structure to the world. Let me put it another way by using one of the most loved/hated structures in the Erlang world: records.

Why do we love[d] them?
- Because they perfectly and easily describe and allow us to use complex structures.
Why do we hate[d] them?
- Because they cause what I call nightmares.hrl. When a record needs to be used in multiple modules (as it's usually the case) you have to put its definition on a shared header file and that's when you lose control of it.

So, records are fine, if you use them in just one module. But you want to use them to represent your system entities, therefore you want to use them in multiple modules. The key misconception here is that what you really want is not to use the same record in multiple modules, but the same entity. That's when you can create an ADT, put it in just one module, and export functions that manage it while not exporting the record with which you want to implement your entity.

For example, let's say your system deals with invoices, and you just created a fantastic #invoice record to represent them:


-record( invoice
       , { id       :: binary()
         , date     :: calendar:datetime()
         , customer :: binary()
         , amount   :: number()
         }
       ).

Where would you write that record definition? If you immediately think of an hrl file, think again. This #invoice record represents an entity in your system and what I'm proposing here is that it deserves a module of its own, and also an opaque type, like this:


-module(invoices).
-record( invoice
       , { id       :: binary()
         , date     :: calendar:datetime()
         , customer :: binary()
         , amount   :: number()
         }
       ).
-opaque invoice() :: #invoice{}.
-export_type([invoice/0]).

But then, how do we create an invoice? That's easy, we add a function to our invoices module for that. We can even assign some default values to it, look:


-export([new/2]).
-spec new(binary(), number()) -> invoice().
new(Customer, Amount) ->
    #invoice{ id       = uuid:new()
            , date     = calendar:universal_time()
            , customer = Customer
            , amount   = Amount
            }.

And then, of course, you'll eventually want to use some particular field for an invoice. Let's say that, given an invoice, you want to obtain the amount. This is the function you need to implement and export:


-spec amount(invoice()) -> number().
amount(#invoice{amount = Amount}) -> Amount.

This way, outside of the invoices module, everybody can create invoices, retrieve data from them and (if you allow them to) update them as well. In other words, every module can work with invoices:invoice(), but no one knows anything about #invoice{}.

This concept may sound new to you, but remember we've been doing the same thing for a long time. Just check the dict or sets modules (and a couple of others) in Erlang/OTP.

What's the benefit of this approach? Well, besides preventing hrl hell, data type abstraction also allows you to change more easily your underlying implementations. Let's say you suddenly think that a map is a better structure to represent your invoices than a record. The only module that you need to change is invoices. The other ones will never know what happened.

The Heuristics

With ADTs in mind, let's start analyzing the 7 heuristics proposed by Hernán. Before we start, let's remember two things Hernán said:

These are heuristics, not rules. Because, as it's been proven multiple times in software history, there is no silver bullet, nothing can really be a rule here, nothing applies to all scenarios. Every piece of advice must be contextualized. So, these heuristics are just that: pieces of advice. You have to check if they apply to what you're working on, or not, on your own.
All these heuristics are based on a much more general piece of advice, namely that:

better code is the one that better models the problem at hand and not the one that just performs better

In other words, Hernán (and I wholeheartedly agree) sees software development as a modeling process where what we create are computable representations of what we see in the real world, as opposed to writing instructions to tell the machine what to do.

Hernán's way of representing the entities in the world is by defining objects. I'll show how to do it by defining ADTs. Now, with that in mind as well, let's finally delve into the 7 heuristics:

1. Reality-Model Equivalence

This is, to me, the most important of the 7 heuristics and it's all about how you model your system and not about the code you actually end up writing. What Hernán recommends here is for you to have exactly one model in your system for each entity in your problem's domain. The easiest way to show you what that means is by presenting real-life examples of what should not happen if you want to build better systems. The following 4 scenarios show 4 problems you should try to avoid as much as you can:

Entities that can't be represented as Models in your system

For instance: in our example above, our invoices have no other data than a customer and a total amount. But invoices in real-life generally have lists of purchased items. If those items are important in your problem's domain, they have to be represented in your system. And they should be represented where they belong (i.e. inside your invoice ADT and not in a separate place of which invoices know nothing about). Even if they are stored separately (in another table or bucket or whatever), that should not condition the way you represent them. You should always aim at representing your entities accurately with your ADTs. Persistence is a problem with which you should be able to deal later.

Models that represent Entities that don't exist in reality

It's funny how we Erlangers tend to disregard Java as an ugly/bad language, but for this example, Hernán exhibits a Java class for which we do have the equivalent module in OTP: Calendar. The question here is: what does Calendar represent? (i.e. What entities are represented by Calendar?). The short answer is none. Calendar in Java doesn't represent a calendar, nor it does represent a date. calendar in Erlang has the same problem. On the other hand, calendar is (I think) not designed to be an ADT. But, should we have an ADT for dates? I think it would be a very good idea. Why? Because one of the main problems with Java's Calendar is that it lets you create an object representing the following date: 2016-02-31. Then, and only if you ask really nicely, it tells you that the date is invalid.

In our erlang world, we have no proper definition of what is a date, but most of us will read this as a date {2016,2,12} and this as a DateTime {{2016,2,12},{10,0,12}}. As a matter of fact, we do have type definitions for those things in calendar module:


%%----------------------------------------------------------------------
%% Types
%%----------------------------------------------------------------------
-export_type([date/0, time/0, datetime/0, datetime1970/0]).
-type year()     :: non_neg_integer().
-type year1970() :: 1970..10000.
-type month()    :: 1..12.
-type day()      :: 1..31.
-type hour()     :: 0..23.
-type minute()   :: 0..59.
-type second()   :: 0..59.
-type daynum()   :: 1..7.
-type ldom()     :: 28 | 29 | 30 | 31.
-type weeknum()  :: 1..53.
-type date() :: {year(),month(),day()}.
-type time() :: {hour(),minute(),second()}.
-type datetime() :: {date(),time()}.
…

But, again, according to those definitions this is a perfectly valid DateTime: {{2016,2,31},{0,0,0}}. And if this doesn't seem that bad to you, check heuristics 3 and 4 below.

Entities that are represented by multiple Models

Back again to our previous example with the invoice items. If your system lets you represent those items in the air (i.e. not actually tied to the invoice to which they belong), what happens if you first create them and then you forget/fail to create the invoice itself? What do those items represent then? What real-life entity will they be modeling?

Models that represent multiple Entities in the real world

Hernán in his talk used the number 0 as an example for this. In the sense that 0 can be 0 meters, 0 items, 0 invoices, false, etc. And, since 0 by itself doesn't carry any other information about what it is being used for, when you're trying to debug the system you need to check the context in which that value is used/created/etc. to see what that 0 actually means. That, most of the time, is actually pretty complex, and many times it is just impossible.

In Erlang, I've seen that happening countless times. Even dialyzer is many times confounded on what your [] actually is. If you have a function that receives lists of invoices ([invoices:invoice()]) and you're incorrectly calling that function with lists of users ([users:user()]), dialyzer won't complain since an empty list of users is also a totally valid list of invoices.

Many of those empty-list scenarios are simply unavoidable, but some other similar problems are: In real life, an invoice is actually an invoice once it's fully written; before that, it's just a piece of paper, maybe an invoice template. Your system should properly model that reality: You should not use a single ADT for both the invoices and the invoice_templates. If you keep those two things in independent modules, you'll not need to check the status of each invoice everywhere to see if you can use its amount or if it's still not ready.

2. Immutability

This one is nothing new to my fellow Erlangers, but it's always good to see that people from OOP World also praise Immutability as we do. As a matter of fact, in support of immutability, Hernán exhibited the same benefits and reasonings we're constantly expressing to the world. By keeping objects immutable, we never need to know in which context this object produced that error. Hernán expressed this in a very nice way:

By using immutable objects you don't need to consider the passage of time when you're bulding or debugging your system.

The remaining guidelines in this list will help you achieve this goal as well.

3. Complete Models

Remember: our models represent entities in real-life. And many things in our world cannot be created with missing parts. To continue with our invoice example, real-life invoices are required to have both a customer and a total. Therefore, there can be no invoices with those fields missing.

Imagine that we change our ADT to allow you to build up an invoice field by field, like this:


…
AnInvoice = invoices:new(),
AnInvoiceWithCustomer =
  invoices:customer(AnInvoice, Customer),
AFullInvoice =  
  invoices:amount(
    AnInvoiceWithCustomer, Amount),
…

That might seem silly in this example; but imagine if those 3 steps are executed in different processes or, at least, different functions. Believe me, this example is not as far-fetched as it looks.

The problem here is that, between the call to invoices:new/1 and the call to invoices:amount/2, what do we really have? What real-life entity are AnInvoice andAnInvoiceWithCustomer representing?The answer is certainly none. And that's bad.

The proper implementation is the one we have above, the invoices' constructor takes all the arguments it needs and generates a 100% complete invoice instance that can already be used as an invoice anywhere.

4. Valid Models

Even when you have complete entities, another way to fail in representing the real-world entities in your system is to allow the user to create invalid instances of your models.

Let's say that invoices in real-life can't have amounts lower or equal to 0. Our current implementation of the invoices model will allow invalid invoices to be created. In that case, two things might happen: either the rest of the system is aware of that behavior and the devs have to add unneeded checks to verify that what we have is actually a proper invoice everywhere by checking that amount > 0, or the rest of the system assumes that invoices are all valid and it fails in runtime if that's not true.

The much better way to implement our model is to do it this way:


-spec new(binary(), number()) -> invoice().
new(Customer, Amount) when Amount > 0 ->
    #invoice{ id       = uuid:new()
    …

Here, our users will not just be able to create an invalid invoice and we'll never need to deal with those things anywhere. Erlang allows you to filter those invalid parameters right there in the function head, which is really nice. And sometimes (like with integers) you can even limit the type specification using types like pos_integer() or non_neg_integer(), thereby letting dialyzer help you identify misuses of your functions.

EDITOR NOTE
Heuristic 5 in the original article had a few misconceptions and years later I figured out I was wrong. Besides that, Erlang/OTP was improved and the explanations there no longer made sense.That's why I didn't include it in this re-edition.

6. No Setters

Continuing with our support for immutability and all that we've learned from items 1 to 4, it's easy to see why setters should be almost nonexistent in our models, right? If instances of our models are immutable, complete, and valid, why would you need to change any of their properties?

But, there are some cases where the entities you are modeling actually change over time. Let's say you need your system to let you add a mark to the invoice once it's stored in your archives. To keep that info, you might add this field to your record definition:


…
    , amount :: number()
    , archived :: undefined|calendar:datetime()
    }
…

I usually use dates instead of booleans for those kinds of marks, so you can have a little bit more information there which might prove valuable later.

Now it's time to allow your users to mark the invoices. You can add a setter for that:


-spec archived(invoice(), calendar:datetime())
      -> invoice().
archived(Invoice, Date) ->
  Invoice#invoice{archived = Date}.

But that has a couple of problems:

It exposes your internal representation of that mark. After that, you won't be able to change it.
It allows users to generate invalid invoices, thus violating heuristic #4. Users can provide invalid dates, maybe dates that are in the future or dates that are older than the invoice's date. You can add validations for that, but is it really the way?

A better way to implement the same functionality is to give meaning to your function. Instead of creating a setter, just create a function that reflects the behavior of your ADT. A function that implements the action of archiving the invoice:


-spec archive(invoice()) -> invoice().
archive(Invoice) ->
  Invoice#invoice{
    archived = calendar:universal_time()}.

7. Update Objects by using other Objects

And this final one is a problem that mostly applies only to OOP languages. The issue arises when you want to change an immutable object to another one. Let's say you have multiple references to theLastInvoice which is an invoice, and you want to change that one by another. Let's say for some reason, you need to keep that exact same object but with a whole new set of values. What Hernán recommended, instead of having multiple setters and calling them one by one to update all the properties (therefore having to add proper validation in each of the setters), is to have a single method, called syncWith:anInvoice, that will take an already complete and valid invoice and sync the current one with it.

I've never faced such a problem in Erlang or Haskell so I don't really know if it even makes sense to try to map this heuristic to the functional paradigm. In any case, if there is a reader out there with experience on this front that wants to give us some insight, please do it in the comments below.

Final Words

As I stated above, these heuristics are just that: heuristics, pieces of advice. They are valuable not as rules, but in the sense that they may open your mind and get you thinking about how you architect and develop your systems. They may also lead you (as they lead me a couple of years ago) to review your approach to programming in general.

Seeing programming as something much closer to an art or science than engineering has helped me in many ways and it's something that made me a better developer, I'm sure of that. Hernán played a huge role in that and I'll always be grateful for it. I hope this article helps open the doors for you to walk that same path. And, if you're already on that path, I hope you've found something to move you a step forward :)

Appendix A

This is the final version of invoices.erl:


-module(invoices).
-record(invoice,
    { id       :: binary()
    , date     :: calendar:datetime()
    , customer :: binary()
    , address  :: undefined|binary()
    , amount   :: number()
    , archived :: undefined|calendar:datetime()
    }).
-opaque invoice() :: #invoice{}.
-export_type([invoice/0]).
-export(
  [new/2, amount/1, address/1, archived/2]).
-spec new(binary(), number()) -> invoice().
new(Customer, Amount) when Amount > 0 ->
  #invoice{ id      = uuid:new()
          , date    = calendar:universal_time()
          , customer= Customer
          , amount  = Amount
          }.
-spec amount(invoice()) -> number().
amount(#invoice{amount = Amount}) -> Amount.
-spec address(invoice()) -> undefined|binary().
address(#invoice{address=Address}) -> Address.
-spec archived(invoice(), calendar:datetime())
      -> invoice().archived(Invoice, Date) ->
  Invoice#invoice{archived = Date}.