How Shazam Works

If you’re anything like me, you’ve always been fascinated by Shazam. How is it possible to have a program that listens to a random portion of a random song through various microphones in various environments and come up with a song? For a while I chalked it up to black magic and intangible analysis, but I was led down the path of some fancy audio analysis that helped me make the jump to 10,000 foot understanding.

Fourier Transform

The first thing to get an understanding of is the Fourier Transform. Essentially this will take a signal (in the case of Shazam, a microphone signal) and determine the frequencies (pitches) that make up that signal. A bass guitar will have much lower frequencies than a piccolo. The Fourier Transform can yield important information about all of the pieces that make up what we actually hear. (Hint: Save for electronic sounds, all tones are made up of many different frequencies, leading to what is called timbre.)

The Fast Fourier Transform is what is typically used in the digital world. Your cell phone lives in the digital world. I’ll keep out how fascinating the FFT matrix actually is, and just leave the fact that in order to run an FFT on data, enough time has to be recorded. (If you want to find a 20 Hz signal, it helps to have more than 1/20 seconds.)

Windowed FFT

Now that there is some information about FFTs, I can get into more detail about Shazam. Sure, one can take a Fourier Transform of “In The Garden Of Eden” by I. Ron Butterfly, but that doesn’t really make sense unless you’re looking for underlying patterns. Where the powerful information comes out is when you look much closer. Since the human ear can only hear sounds as low as about 20 Hz, it makes sense to start around there – maybe take samples that can handle half that frequency. This would be 10 Hz, or 100ms of data.

Taking the frequency data of the first 100ms also doesn’t yield very interesting results either. You might be able to get an idea what the first few notes are. But what can happen is you could take another window from, say 1ms-101ms. You could keep doing this “windowed” FFT across the entire 17 minutes of In-a-gadda-da-vida, and you have all the data you need to analyze and catalog the song in a Shazam database!

This technique has been done in several facets of Arts and Engineering: Aphex Twin did it in 1999, Music Visualization Software has been doing it for decades, and there is a pit of Wikipedia pages one can dig through for weeks, if so inclined.

Mel-Frequency Cepstrum

I should mention this, though I feel it isn’t necessary for understanding. Back to how humans hear, the Mel-Frequency Cepstrum Coefficients are used to essentially make the frequencies from an FFT more relative to human hearing sensitivity. It helps to use, but isn’t necessary to understand the fundamentals.

Artifacts

At this point, I should cite the post that sparked my interest in this topic. https://blog.francoismaillet.com/epic-celebration/. He discusses (with images) the idea of using the MFCC to pull out specific sounds. Here he shows the relationship between a sound and the MFCC graph.

With a song, different things will show up in different ways. A singer’s voice, a cymbal crash, keyboard notes… Each of these sequences are unique to a song. Well, almost unique. But they essentially make up a fingerprint of the song.

I’ll leave out optimization (peak finding algorithms), but suffice it to say specific “points” can be found which have the “Maximum Intensity for a Frequency at a Time.” let’s just assume we can determine those with ease.

So now we have a bunch of points of frequency vs time. Using further ingenuity, these points can be cataloged in a database. That’ll possibly be worth another post in it’s own right. But essentially, all that is needed are a few points (differences in time and frequencies of artifacts) and that’s all that is needed for a Shazam match.

Conclusion

Since I didn’t have any sufficient graphs, this might have been a dry read. I’ll possibly fix this in the future. There is an article that I likely read several months / years ago that goes into more detail that can be found here. Shazam has been around for many years, and it wasn’t until I started down the path of some audio analysis that I was able to unveil some of the mysteries of a powerful tool that touches upon many points of Engineering / Computer Science.

Refactoring a MySQL database

First nerdy post here. If you’re anything like me, you’ve found yourself having to refactor your MySQL database more than once. In my case, every time I’ve re-IP’d my local network I have had to do this. This is because I’m using a MySQL database to maintain a synchronized list of media files across multiple instances of Kodi. Maybe that’ll be another post, but since I’ve had to do this more than once I figured I’d try to share this somewhere, if only for my own benefit.

Way back when I had used Python to tie into a MySQL database. I was “comfortable” with this at that time, so I chose Python as my language of choice.

Problem

The main problem that had to be solved was that the database points to a hard-coded local IP. This means that if I want to change my server’s IP from 192.168.1.10 to 192.168.2.10, I have to modify the database manually if I don’t want to lose anything.

Now, Kodi has an interesting database structure. The only thing to really take from that is that everything seems to point back to the main table “path”. Specifically the idPath / strPath pair. Notice that the strPath, below, has the hard-coded IP that I was mentioning. The goal is to change every entry in that table from “192.168.1.10” to “192.168.2.10”

Solution

Enter Python: There is a mysql.connector class that allows one to interact with MySQL from Python.

import mysql.connector

From here, you can connect to the database using your known database address, user name and password:

cnx = mysql.connector.connect(user='username', password='password', host='127.0.0.1', database='MyVideos107')

Now, the thing that I’m most concerned about is the strPath and the idPath. Since I know that idPath will be unique (… it is a database key) I can just select the two columns, refactor the string, and update each entry. First for the selection:

query='SELECT idPath, strPath FROM path'
cursor = cnx.cursor(buffered=True)
cursor.execute(query)

At this point, cursor has every entry of the ‘path’ table. The next thing to do is loop over every entry. Python’s loop syntax can be burdensome, but useful once figured out:

for (idPath, strPath) in cursor:

The next steps of refactoring and replacing are done in a single line of code – well two lines of code but still… Currently this is all in the for loop, but there’s no reason UpdateQuery2 couldn’t be placed in front of everything.

 UpdateQuery2="UPDATE path SET strPath=%s WHERE idPath=%s"
 cursor2.execute(UpdateQuery2, (strPath.replace('192.168.1.10','192.168.2.10'), idPath))

At this point, everything is buffered in the cursor2 object. I left the instantiation out earlier for simplicity, but will show it later. Once the loop is complete, the last thing to do is commit the transaction of cursor2:

cnx.commit()

After running this within Python I went back and looked at the path database and to my delight, everything was updated to the new IP address. I went back to Kodi and almost everything worked. (See improvement suggestions below)

Code

import mysql.connector

cnx = mysql.connector.connect(user='username', password='password', host='127.0.0.1', database='MyVideos107')

query='SELECT idPath, strPath FROM path'

cursor = cnx.cursor(buffered=True)
cursor2 = cnx.cursor()

cursor.execute(query)

for (idPath, strPath) in cursor:
 UpdateQuery2="UPDATE path SET strPath=%s WHERE idPath=%s"
 cursor2.execute(UpdateQuery2, (strPath.replace('192.168.1.10','192.168.2.10'), idPath))

cnx.commit()

Improvements

It’d be nice to be able to further automate this. For example, most versions of Kodi come out with an updated set of tables. It’d be nice to make this script smart enough to sweep through all of those.

Also, there are actually a number of tables where the static IP gets used – specifically the “art” table and “MyMusic” databases. The first step would be to make the above code allow input arguments of database name, key name, and string field name.  Then a list of these values could be looped over to update everything.

Fortunately for me, this is somewhat of a rare occurrence. I re-IP’d when I decided to make a more logical allocation of DHCP reservations. I then had to re-IP in order to allow a graceful VPN into 192.168.1.X networks. So, hopefully I’ll never have to run this again. If I do, I’ll probably make those improvements.