dnb.ck

Use AI to turn your voice into Drum and Bass

February 7, 2023

The story of music and AI is a new one and a rapidly progressing one. But let’s start at the very beginning.

One of the preliminary uses of AI with music was genre classification. The first approach involved a simple feature extractor that would convert a song or sound portion into an n-dimensional vector where each number represents some quantity about the music. For instance, one dimension might represent the centroid of the frequency domain, while others might represent various MFCCs (Mel Frequency Cepstral Coefficients).

My first experiment involved training a simple KNN (K-nearest neighbors) model on 1000 audio samples of 10 genres with various levels of complexity to see how well they perform at genre classification.

Model 0: I started with a 1-dimensional feature vector containing just the centroid. This model had an accuracy of about 17%.

Model 1: Then I added flux and rms for a total of 3 dimensions, and it performed much better with an accuracy of about 30%.

Model 2: Model 2 was 8 dimensions, 5 of which were MFCCs. This model had an accuracy of about 34%.

Model 3: Model 3 now includes 20 MFCCs for a total of 23 dimensions. This model had an accuracy of about 43%.

Model 4: Model 4 was the same as Model 3, with 2 more dimensions added for 25% roll-off and 75% roll-off. The accuracy was also around 43%.

The second experiment was finding a way to use this classification system to generate new music. I wanted to find a way to convert my voice into a drum and bass track.

So far, my project kinda does that. Essentially the program takes the input from the microphone, finds the dnb samples (1 drum and 1 bass) that have feature vectors most similar to the input, then plays them. Here it is kinda working:

Meet dnb-synthesis-mic.ck:

I was impressed by the program's robustness and how well it mimicked what my voice sounded like. Next, I had to think about improvements that I could make to the model that I had already created.

One lacking quality was the ability to tweak hyperparameters while running the program. I wanted the ability to turn on and off certain bass and drum tracks so I could better create shifts in dynamics throughout a performance. For this, I added the ability to increment or decrease the number of bass and drum tracks played simultaneously. There could be, for example, 1 bass track running while 4 drum tracks are running.

My original program was also limited in the fact that it had a fixed synth window that was also relatively long. So, in part two, I added the ability to change how long the synth window was for the bass and drum tracks individually. Now, you can synthesize the bass and drums at different rates while they are still always in sync.

Controls (on keyboard):

number of bass tracks: w increases, and s decreases

number of drum tracks: e increases, and d decreases

bass synth window size: r doubles it, and f halves it

drum synth window size: t doubles it, and g halves it

The video below demonstrates the sonic results of my final result. The video in the background is kind of irrelevant, as it just shows my friend messing around with the program. The text flying over the screen represents how often each drum and bass window is synthesized.

Many things could be improved about this program; however, I do not have infinite time. If I did, I would add envelopes to the various sounds to remove any pops while it is playing, and I would also add a low pass filter that could be controlled with your voice (I think that would sound pretty cool since it would sound like it was coming out of your mouth).

But the most interesting part of this project was how it felt to create music using it. It didn’t feel like playing a smart instrument, but it also didn’t feel like I was playing a dumb instrument. There were essentially 9 inputs to the instrument: 8 keys and my voice. I had to play around with it for a while before I figured out interesting ways to transition between sections and add dynamics to arrangements. But there was also this aspect to the instrument that was inherently mysterious. Maybe I just don’t know how to play it well enough, but part of me thinks that the underlying math that it uses is inherently not intuitive. We don’t listen to sounds and hear the centroid of the MFCCs of a sound, so in some regards, the sounds that came out of the program were surprising.

When using dnb-synthesis-mic.ck, it felt like I was performing while also experiencing a performance. It was, in some ways, a duet between man and machine. It was a partner dance where I took the lead, but she took me places I didn’t even consider.

It’s not a perfect tool by any means, but I think the experience of using dnb-synthesis-mic.ck perfectly walks the line between incorporating enough AI elements and maintaining control through various knobs and inputs.

Here is the code in all of its glory. Huge thank you to Ge Wang for writing most of it.