San Francisco, California
As a passionate guitarist, I love the idea of Tabable; an app that can convert any form of audio into guitar tablature– live. I started developing this app in September 2021, and have been actively working on it ever since. It is composed of around 80% Python and 20% Kivy. In this paper, I will explain how I built Tabable 1.1 and the physics that affect its accuracy. While I mention multiple coding languages and modules in this paper, it was not written to explain these, but to explain the process and science behind Tabable. That being said, if you have any questions at all, message me about them– I’d be happy to help.
Building the App
Recording Audio
I used the python sounddevice module for recording and saving audio clips. This module let me choose the amount of time I wished to record for and my recording was automatically saved to a .wav file. While working on this project, I used both a Mac and a Windows PC and had to switch between the number of channels that sounddevice used when recording because the different microphones on each computer used different amounts of channels. My XLR microphone connected to my PC used two channels, while the onboard Macbook microphone used only 1. I also used sounddevice’s playback function to, well, play back my recording. In 1.2, I want to implement a volume threshold to determine when to stop recording, or, at the very least, a ‘stop’ and ‘start’ recording button, so I don’t have to set a rigid recording length.
Audio to Tablature
During the building process, I took much inspiration from Ian Vonseggern’s Note Recognition in Python.
Isolating Individual Notes
I used changes in volume between notes to isolate individual notes. I first wanted to use changes in frequency because a note’s pitch is proportional to its frequency. However, I ended up using volume because of the simple problem of playing the same note twice. I followed Vonseggern’s suggestion and used Pydub, which is a python library that works with .wav files. I also used Pydub’s AudioSegment module. Pydub is very powerful and can manipulate audio in various ways. I mainly used it to find the volume-over-time of my recording and some filtering to increase accuracy. I also used matplotlib to create a plot of volume, in decibels, over time. Matplotlib is an extremely comprehensive library focusing on creating visualization and plotting values in Python. You don’t need to use matplotlib if you want to recreate Tabable, but I used it while comparing my actual note starts with my code’s predicted note starts, similar to Vonseggern.
When testing the note start predictions, I noticed that the background noise in my recording was causing inaccuracies. This background noise often messed up the results of note starts, so I followed Vonseggern’s lead and used Pydub’s filtering ability to remove all noise with a frequency below 80 Hz. This slightly increased the accuracy. I also added a minimum volume threshold for the note to be considered a note, and a minimum increase in volume between notes, so I didn’t accidentally count one note as being two notes. To further prevent the problem mentioned in the previous sentence, I added a rule that prevented two notes from being detected within a 100 ms window. Finally, I had an accurate note start detection system.
Determining the Notes in Each Segment
The next major step I needed to take was converting my audio into a frequency-over-time graph. Again, I was inspired by Vonseggern’s code; I used SciPy’s Fast Fourier transform function to create a frequency-over-time graph. The highest peak in each segment (remember, the segment supposedly contains only one note because of the note isolation done earlier) corresponds to the note played. I also included some of Vonseggern’s extra note classification methods to improve accuracy. Also, instead of making the frequencies predict notes, as in ‘A’, ‘B’, and ‘C’, I made it predict segments of tablature. I spent much time understanding how to print the tablature sections next to one another so that it appeared to be a full tablature sheet, and I mostly succeeded, with the help of the Python public discord. However, there are still some errors that I will address in 1.2.
Visuals
For the GUI, I used kivy for its python compatibility and deployability to IOS, Google Play Store, or even the ability to make it an .exe file using the buildozer module. If I did this again, I would still use kivy because of its extremely helpful users and public discord, which I sometimes used for help during my development.
I created four pages for my app and a popup screen. The first page is the ‘home screen,’ and the second has all the prep tools, like recording and replaying your audio clip. Currently, I have set the recording time to five seconds; however, as stated earlier, I will introduce a volume threshold or a 'start' and 'stop' recording button in 1.2. I also have ‘back’ and ‘next’ buttons when necessary. These first two pages of Tabable contain graphics that I created in Adobe Illustrator:
The third page has one button that says ‘generate tablature.’ When the button is pressed, the user is immediately brought to a new window, where a scrollable popup is displayed containing the tablature. I’ve also added a horizontal limit to the popup, so the tablature doesn’t continue forever on one line. Finally, when the popup is closed, there is a button that brings the user back to the home screen to start recording and tabbing again. I love the ease of use of this app. I think Tabable has the potential to be a very helpful app in somebody’s library if I keep on improving it. At the time of writing this, all the code is on GitHub, and I will soon publish version 1.1 of the app onto the Google Play Store. I’m excited to keep working on this and am motivated to continue bringing it further.
Why Are There Inaccuracies?
The Physics of Sound
A frequency graph of a “perfect” note looks like this:
As you can see, the wave is not complex. If a sound file containing only "perfect" notes was run through Tabable, I’m sure it would produce perfect tablature 99% of the time. The problem is that most sounds in this world are complex. Why does a guitar sound different from a piano? Even if they’re both playing the same note? This is because a guitar’s sound waves differ from a piano’s. If every sound was “perfect,” meaning the frequency graph was perfectly sinusoidal, then every instrument, person, or anything would sound the same when producing the same note.
However, overtones exist. Notice that the graph above is perfect because it only contains one overtone– the 0th overtone, or the fundamental frequency. Here’s a quick visual to explain what happens when a perfect note is combined with one of its overtones.
Almost all of the time, the 0th overtone is the frequency that humans hear most prominently. If you hear a C4 (A C note in the 4th octave), the C4 is likely the 0th overtone of that note. Also, a note’s overtones are multiples of the 0th overtone’s frequency. Assuming a note’s 0th overtone is 80 Hz, its 1st overtone will be around 160 Hz, its second 240 Hz, and so on. Different instruments have different amounts of each overtone in the notes they play. The important thing to remember is that the combination of these overtones is what creates the sound that humans hear. Usually, the raw amount of each overtone in an instrument decreases as the overtone deviates further from the 0th overtone.
How does this affect Tabable?
Tabable would be most accurate if it could analyze uniform, sinusoidal frequency graphs. However, because most sounds are complex, the various peaks and troughs that make up a complex frequency graph confuse the program. In addition to the already complex sound of some instruments, background noise above 80 Hz contributes to the complexity of the graph because Tabable analyzes graphs of entire audio clips, not just of instruments, so if the audio clip captures background noise, it just adds to the frequency graph. So, when using Tabable, eliminate as much background noise as possible for maximum accuracy.
Future fixes
I plan on doing some tests for 1.2 with a possible volume threshold. Hopefully it helps accuracy. Another helpful fix for accuracy would be a way of isolating the fundamental frequency of the note from the other overtones. I would then remove the other overtones from the frequency analysis, so the program would only find the notes from a uniform, sinusoidal wave. As I mentioned before, I am certain that Tabable’s accuracy would become close to perfect if I could implement this fix. However, I’m treading in untravelled land with this. This may take some time to figure out, or not work at all. Nonetheless, I’m determined to see if it’s possible.
Thanks for reading! It’s been a pleasure creating Tabable and writing this report! Endeavor to check our Tabable’s GitHub page and download it from the Google Play Store. Also, check out its website! Lastly, here's a video of Tabable in action: https://vimeo.com/727893778
Sources and Resources
https://stackoverflow.com/questions/55202722/r-install-matplotlib-in-the-new-rstudio-preview-version
Comments