cft

Introduction to TreeSet and TreeMap

Implementation and complexity


user

Paul Milian

3 years ago | 3 min read

Do you usually use a HashSet or a HashMap implementation whenever you need to use a Set or a Map? Do you know about TreeMap or TreeSet? If not, you’re in luck as we will have a look at what they are, why, and when they might be useful.

Photo by Brandi Redd on Unsplash

Implementation and complexity

Just like HashSet is implemented using a HashMap, TreeSet is implemented using a TreeMap. The TreeMap itself is implemented using a red-black tree which is a self-balancing binary search tree. Since it uses a binary tree, the put(), contains() and remove() operations have a time complexity of O(log n). Furthermore, since the tree is balanced, the worst-case time complexity is also O(log n).

On the other hand, a HashMap has an average time complexity of O(1) for put(), contains() and remove() operations. The worst-case time complexity for those operations is O(log n) since Java 8, and O(n) before that.

Space-complexity wise, both have a complexity of O(n). However, TreeMap is more space-efficient than a HashMap because, by default, a HashMap is at most 75% full to avoid having too many collisions. This results in unused space. This can be tweaked through the load factor.

O(1) beats O(log n), so why would we ever use something slower?

Navigable & Sorted

TreeMap and TreeSet are both Navigable and Sorted, which is not the case for HashMap and HashSet. By default, the order is the natural order, however, this can be changed by providing a Comparator in the constructor.

Photo by Aaron Burden on Unsplash

NavigableSet

The NavigableSet interface offers a lot of very convenient methods:

  • lower(e): returns the greatest element strictly less than e, or null if there is none
  • floor(e): same as lower(e) but includes elements equal to e
  • higer(e): returns the least element strictly greater than e, or null if there is none
  • ceiling(e): same as higher(e) but includes elements equal to e
  • pollFirst() and pollLast(): returns the smallest and biggest elements respectively, or null if the set is empty

Let’s have a look at some examples:

You can run the code by clicking the green arrow

NavigableMap

The NavigableMap offers the same methods, but suffixed by Entry considers keys: lowerEntry(e), higherEntry(e), etc.

SortedSet

Thanks to this interface, we have access to the following functionality:

  • subSet(a, b): returns the subset of elements that are greater or equal to a and strictly lesser than b.
  • headSet(e): returns a set of elements that are strictly lesser than e
  • tailSet(e): returns a set of elements that are greater or equal to e
  • first() and last() : returns the smallest and biggest elements respectively

Again, an example will probably make a lot more sense:

SortedMap

As for the NavigableMap, the methods are the same as SortedSet but suffixed with Map. The behavior is the same as well and uses the map’s keys for ordering.

Iteration

The Sorted interface gives one main benefit to TreeMap and TreeSet: iteration will follow the sorted order. When it comes to a HashSet or HashMap the order will be that of the hash codes, not the elements. This difference can be the criteria for choosing one implementation over the other. It’s possible to create a sorted container from a HashSet or HashMap but sorting usually has a complexity of 0(n log n).

Concurrency

Both TreeMap and TreeSet are not thread-safe. In order to be thread-safe, you can either use Collections.synchronizedNavigableMap(treeMap), Collections.synchronizedSortedMap(treeMap) or you can use ConcurrentSkipListMap() (replace Map with Set for Sets). It’s implemented using a skip list instead of a tree since skip lists are more efficient in concurrent environments, but the time complexities and behavior are the same.

Conclusion

  • TreeMap and TreeSet are more memory efficient compared to their Hash counterparts. If memory is more important than speed, then this data structure can be a good choice.
  • TreeMap and TreeSet are sorted and can be a perfect choice if you need to fetch ranges or iterate over a specific order very often. Actually, databases use b-trees as indexes for ranged queries for the same reasons.

Hopefully, you learned something new, thanks for reading!

This article was originally published by Paul millan on medium.

Upvote


user
Created by

Paul Milian


people
Post

Upvote

Downvote

Comment

Bookmark

Share


Related Articles