cft

Monthly UX Doodles-Shazam Edition

How Shazam Works


user

Michel Abdel Nour

3 years ago | 4 min read

I take on a new Tech Product every month: explaining how it works, what I think makes it a good or bad product and how I think it can be improved.

Disclaimer: I am not affiliated in any way with Shazam and this article is solely a passion piece.

How Shazam Works

In order to understand how Shazam operates in the backend and reveal the magic that happens behind that single button press, it is essential to understand the physics behind how phones and other devices are able to capture sound waves and convert them into data points that can be used by Shazam to generate search results.

When you hear a song and think about “Shazaming” it, the sounds you hear consist of continuous mechanical waves displacing air molecules in the air at a certain wavelength, amplitude and frequency that are caused by vibrations from an audio source.

Recording functions on our personal devices allow to convert the pressure from those continuous mechanical sound waves into discrete electrical signals through the process of sampling that can be easily done since all devices now are equipped with analog-to-digital signal converters.

By setting a sampling rate and other parameters, devices are able to record samples of a signal whilst quantifying parameters of the signal such as the frequency of the signal and its amplitude over time. This 3D rendering of a signal is known as a spectrogram.

Example of a spectrogram

A spectrogram is a good visual representation of a signal that would allow to identify the highest amplitude frequencies of a sound.

However, having to process this many data points from the 3D-space of a regular spectrogram to identify songs would take a lot of time in comparison to how fast Shazam does it.

Instead, Shazam creates what we call fingerprints for given sounds that are already sampled in the application backend. Generating fingerprints for sounds serves as a good method for reducing the number of data points that the app would need to process by projecting the spectrogram into a 2D-space and only accounting for the highest amplitude value of a frequency in a sound frame.

Selecting these fundamental frequencies of the different frames in the sample leads to a shift from the time-domain to the frequency-domain representation of a sound.

To keep this explanation at the conceptual level, it is important to understand how this processing method for signals allows to significantly reduce the needed amount of computation for the app to identify sounds.

Shazam then compares the generated fingerprints of sounds with stored fingerprint records in its backend (I won’t go into the exact process right now), finds the matching sounds and generates a result based on the sequence of sampled fingerprints.

How Shazam build and use their sound fingerprints to find search matches

For the purpose of this article which is mainly to propose an improvement to Shazam, I will not be going into more detail on how Shazam works, however, I do recommend this great article which does a great job at explaining that and which was of great help to me when writing this first section.

Why is Shazam a great App?

For the most part, I will say that Shazam is a great app simply because, well it’s really simple. But to expand on this thought, I want to give a couple of arguments as to why Shazam is such a good product.

1. It hides very well the complexity of its backend and doesn’t let it spill over onto the user experience. And I believe this is what every product should aim to do and the fact is — they do it really well in their case.

2. Shazam knows how to KISS — no, not literally kiss — KISS is a design principle that stands for Keep It Simple Stupid. Don’t get me wrong, “stupid” here has a good connotation and this is why I say that:
Shazam as a product doesn’t need to be more complicated than what it is and Shazam sort of kept it that way. The base feature is still the simple button press, something anyone could do very easily, which allows to meet users’ needs in various conditions (when there is surrounding noise or speech during the recording process).

3. It is very well integrated with other apps. Users can easily connect to their Spotify or Apple Music accounts through the app.

4. Well, it gives great song recommendations!

Time for my UX Doodle!

Think of a time where you heard a song in a video on Youtube. The song could have lyrics and, in some cases, you can search up the lyrics on Google to find the song, and if it doesn’t have lyrics, then there’s not much you can do other than probably ask in the comments what the song is.

Today, I am showing you guys a very simplistic UX design, a proof of concept, for a feature I believe could allow Shazam to:

  • Scale further and increase the number of touch-points with their users
  • Fill an existing gap/pain point in the product

While I kept it extremely simplistic in my design, the concept is there:
It is mainly a way for the app to be ubiquitous on a device (built-in button) and where the user would be able to activate the app in whichever platform they are in, whether it’s Instagram, Youtube, or even in the Photo Gallery.

For this feature to be implemented, it would require a dynamic button (movable button) on screen which would allow to pass a sample of the device’s audio output to Shazam.

The format of that input would need to be matched with the actual input that is passed on to Shazam in order to run the same comparison that would happen when the Shazam button is pressed inside the app.

The notification is simply there to alert the user about the process being done and to give them the option to look up the result inside the app.

Stay tuned for another piece next month! If you would like to give me feedback or suggestions for future pieces, don’t hesitate to get in touch.

Upvote


user
Created by

Michel Abdel Nour


people
Post

Upvote

Downvote

Comment

Bookmark

Share


Related Articles