Awesome work @DimensionsTomorrow I’m inspired. Seems like one of your motivations aligns somewhat with mine (“rolling the dice and find all sorts of interesting stuff you didn’t know you had”)
Nice! In everything I’ve been building I’ve been putting in random generators (usually with various kinds of weightings to make them more useful). They are great to use as idea starters! I’ve found them super helpful for everything from graphic design (color palettes, font sizes, geometric patterns, etc) to musical phrasing (keep the notes, rearrange the rhythmic phrasing, etc) to sample selection. Maybe I’ll share some screenshots of some of my other builds later.
Luddite warning…
I would like to challenge the notion that AI, in the process of solving our problems, will create an outcome in which we are less dependent on AI. Assuming that AI produces output with a sophistication greater than we can write freehand, or even understand…paired with software’s inevitable bugs, that implies to me that users will be reaching, again for AI to fix the imperfections of their existing code. All the while, human ignorance of the AI code will increase at orders of magnitude, compared to whatever knowledge is gained. The general term for this is “technical debt”. Thus progresses the death spiral of dependence. Go ahead, disagree, but the industry is betting on this very outcome.
I’m not getting AI to teach me music. I’m making something that has all of the exercises I need for a lifetime of practice built in.
For instance, in the Weimar Jazz Database researchers have painstakingly encoded and tempo mapped a massive number of jazz performances by the greats (Miles Davis, Coltrane, Charlie Parker, etc). While it’s only one aspect of what I’m creating, imagine being able to pull up actual phrases in the actual timing of legendary jazz figures, with the chords they were played over, transposed in any key showing simultaneously on guitar tab and piano keys. This kind of phrase practice and vocabulary building was a big part of my jazz program. While all the heavy lifting is done on the research side, I’m using AI to make all of that data useable to a musician like me. Its usage is OK for non commercial applications and I don’t plan to distribute it.
https://jazzomat.hfm-weimar.de/dbformat/dboverview.html
As I said that’s only one aspect of it. I know exactly what I need to learn and the issues I have always faced and big computing power has made it possible to solve those problems. For instance, you want to learn some jazz chords but how are those applied? You can pull up dozens of progressions across the Real Books to learn the application, not just the chords, and then drill the chords across the neck of the guitar or learn songs featuring those progressions. Advanced metronome and backing band functions.
Also, some other things I’ve always wanted. Pull up chord progressions and show voicings that are good for the Rhodes. Shift the guitar chord voicings to the type of voicings Kenny Burrell favored, etc.
In the past that information was just too vast for me to process and spread among physical books. This makes it easy to quickly dial in exactly what I want to do and not sift through books.
Also, generative stuff as well. Take Coltrane’s exact rhythm phrasing and swap around the notes or take the notes and swap around the rhythm. ![]()
All of this is offline as well. So once I finish the app it runs on my machine and does not require constant LLM analysis. It’s just old school computer processing. Like a big database. Plus the UI is just like I want it.
The dependency you talked about is how the whole world has evolved. Spotify, paid subscriptions to apps, etc. If anything, having the programs I’m building makes me less dependent on that model (assuming I cancel my AI coding subscription when I’m done). I’ve actually funded the monthly fee by canceling other monthly subscriptions while I’m doing this.
While I don’t think this fully replaces the value of in-person lessons, I’ve done a lot of in-person lessons and just finished a four-year music program. This is just a way for me to keep learning in a more flexible way since I don’t have time to continue those in-person classes. (I do several other in-person lessons for other things because I always like to keep learning. Time is finite though.)
That’s pretty dope dude. How was the learning curve and what kind of time investment did this require?
Also, you could totally sell this ![]()
Thanks, man. I built several smaller things before I attempted this and those really informed this build.
The first thing I built was a kit builder for my MC-101. Loading samples on to that thing is a chore, so I built something you could drag and drop big chunks of samples and the program would analyze and auto sort them into kits that match the stock drums. Kicks on a certain pad, snare on another, closed high hats, etc. This helped me learn how the analysis works and sorting works, and help me to develop workarounds when the analysis fails. It taught me about what’s possible with UI, etc.
Then I got more ambitious and built one for the SP-404, with my first attempt at chopping (which ended up being really good) and did my first (and only) plunge into emulation and DSP. From there I went off and did some graphic design and video apps. All of those informed that Dead Wax sample manager.
Basically each build I learned more and more, what works, what doesn’t. So yeah, there’s definitely a learning curve. I’ve only been at it a few months, so it’s not like it required years of experience or anything. I think the key is starting small and gradually working your way up. There’s a bit of ego involved when you do projects like this where you start to want to think “I must be really talented” but I think the reality is that most people could do it. Haha.
That said, my coworker who has always been a computer geek gave it a try and got really fed up and quit. I think he expected it to be more of thing where you just say “make me an x” and it’s done. The other thing I would say is I tried once using Gemini because I ran out of Claude tokens and it built something that looked like an elementary school kid’s first program. I would never attempt this outside of Claude Cowork.
So I went ahead and got Claude desktop to start making a sample organizer app. I expected it to be error prone like the other AI I’ve tried but its actually much better. Managed to have it build and debug an app that does exactly what i need in only 3 hours.
Basically you point it to a folder of unorganized sounds (in my case a bunch of foley and sfx sounds with no labels or metadata), it uses a cloud based AI to index and find similar sounds and lists them in a separate pane on the right based on % similarity match. It does this significantly better than Sononym’s similarity search. I can then batch select those similar files and move or copy them to a folder of my choice on Windows. So it makes organizing files into categories much faster. Gonna keep working on it to add tagging suggestions and turn it into a full fledged sample manager for sample and sfx library management. The core functionality already does what every other paid app i tried coudn’t do ![]()
EDIT: I was mistaken, the AI is not cloud based but runs locally
Little video of it in action (screen recorded didn’t capture audio for some reason but it auto previews the sounds)
Needed this so I could categorize a bunch of audio files i extracted from a game to put onto tonverk ![]()
Will be happy to share it once it’s a bit more well rounded
EDIT2: I tested it more and it’s not quite perfect, despite adding a second sound analysis model it still confuses short vocal sounds as being similar to wood percussion and other such a quirks. It also misses some similar sounding files despite them being blatantly similar. I think I need to get it to add additional layers of similarity sorting and verification, maybe make it trainable based on human feedback.
I want to know more. Did you give the AI directions about what ‘similar’ is (hi/lo, long/short, tonal/percussive, etc) or did it figure that out itself ?
Things are changing very fast. What claude does well now is miles ahead of 6 months ago.
No I didn’t, it just used an already available AI that matches sounds on similarity. The sounds it found are very similar, however the game audio files do have a lot of very similar sound files that are variations of each other. I need to see how it handles other types of sample files like stuff used for music, but so far it seems to get the general spectral brightness and pitch of a sound very close to the reference file.
I think I’ll add a tagging system in the coming days that include hi and low, and a searching system for finding sounds based on verbal description.
It used microsoft CLAP for the audio analysis (the laison/clap-htsat-unfused model from HuggingFace). It took a few prompts to debug issues but in the end it got it right. While the whole process took a few hours it went very smoothly compared to other AI.
Right now it’s building a modular synth prototyping environment for me. I hope it works. Gonna try to make clap plugins to load into Ableton Move
Oh man, this thing is gonna become a huge time sink for me ![]()
YMMV, but I stay away from things that require an LLM to work as you will have to keep paying for tokens to use your program. There is some free Google API that you can use but they have updated the terms and cut limits I’ve been told. My sample manger uses librosa and is fully offline.
It downloaded the model and runs it locally.
But apparently it’s primarily good for longer audio files and struggles to differentiate shorter percussive sounds from each other
Nice! Just figured I’d throw that out there just in case.
Thanks :). And that’s my bad, I said it was cloud based because that’s what I thought it was doing but turns out it actually downloaded it locally
This is absolutely fucking genius; I doubt many professional software engineers could do something this perfect.
Thanks, I appreciate the kind words!
So I’ve worked some more on the similar sounds finder, the first version had quite a few false positives. Managed to get the AI to utilize several sorting methods (each with a different importance weight) and defined some sorting parameters for it such as what attack time cutoff should be between percussive and non percussive sounds, and now it has gotten decently better. It’s not perfect by any means though but it seems to be in quite a usable state
It was mainly tested with an SFX library, I later threw in a rather small psytrance sample pack to see how well it handles it, it seems to work but the sample pack is just too small to draw any meaningful conclusion about how good or not it is for non sfx sounds.
Turn your volume down first, it gets noisy