Music Synthesizer Technologies made using AI Methods

I was just talking about this the other day with some people at a Max/MSP meetup. I wondered what it could do for accessibility if you could have a synth that doesn’t have “traditional” synthesis parameters, or even use traditional forms of synthesis, but instead you describe the sound, or set up more descriptive parameters. The issue we discussed comes from everyone’s different interpretation of those descriptions, ie. what sounds more “buzzy” or “crunchy” to someone might not have that description for someone else, so it’d have to be trained on your own interpretations.


That sort of description is used so frequently on Elektronauts, it must mean something to someone, but for me most often i haven’t a clue.

But using a specific reference to a song, album, band, or musical genre would convey a lot.

An AI system should be able to give you back, copyright free tools that would give you a place of beginning from that sort of description.


But in all seriousness…cool.


There’s the Google Nsynth super that came out a few years ago, but only DIY and I heard the sound quality was low. As far as I remember it can morph between sounds samples with AI , like , 50% thunder noise 50% trumpet sound. Exciting on paper, I hope we will see more of these in the future.


I still make music manually.

It’s still manual ( or can be ), just with a different set of tools. Tools made to enhance human creativity.

For a lot of useful information in this field look into the work done with the IRCAM-ACIDS ( Artificial Creative Intelligence and Data Science ) projects. This is research done by a group of people in France, mostly open source – all really excellent stuff.

For a place to look, start clicking off this Google search.

I have a hunch, that some of the internet back-end with the WoFI project may be based on some of the research done for the IRCAM-ACIDS project. In particular i am thinking the Flow Synthesizer may be involved. This is only a hunch though, perhaps i am only wishing that it is so.

I think you have put your finger on a central question.

With the IRCAM-ACIDS project Neurorack they generate sounds from seven separate descriptors that can be actively adjusted.

  • Loudness
  • Percussivity
  • Noisiness
  • Tone-like
  • Richness
  • Brightness
  • Pitch

The Neurorack hardware is a small Eurorack module based on the Nvidia Jetson Nano processor, a 128 core GPU along with 4 CPUs. I believe the software is then set up with some basic sounds connected in to those seven descriptors, and then the musician is allowed to vary those seven descriptors in some manner over time to produce sound and music.


A summary of some other ACIDS stuff :

  • DDSP - Differentiable Digital Signal Processing, a PyTorch ( a Python machine learning framework ) module with a PureData wrapper, with a couple of pretrained instrument models, for real-time sound creation.

  • RAVE VST - Realtime Audio Variational autoEncoder, a fast and high-quality audio waveform synthesis system using deep learning models.

  • VSCHAOS - A Python library for variational neural audio synthesis, using PyTorch.

  • Flow Synthesizer - A voice controlled synthesizer, using variational auto-encoders and normalized flows on a set of learned sounds. ( Simplified description. )

  • Generative Timbre Spaces - A descriptor based synthesis method that maintains timbre structure while moving across timbre space. Uses variational auto-encoders.

  • Orchestral Piano - A method that allows the real-time projection of a keyboard played piece, to a full orchestral sound, using analysis of historic examples of music moved by composers, from piano to orchestra. I think this is still under improvement.

  • Orchids - A set of algorithms and features to reconstruct any evolving target sound with a combination of acoustic instruments.

In addition to IRCAM, it appears that Sony CSL ( Computer Science Laboratories Inc. ) may also be assisting with some of this research.


I want to see them come out with an AI that can emulate the Ohio Players, if they can do that I’ll eat my hat!

1 Like

It’s going to be hard to resist this kind of eurorack !

1 Like

You’re talking about the Tonex Amp Modeler. That’s a very competitive price, if it competes well with the Kemper Profiler.

Resynthesis has been around for a while, too. Maybe with ML, it might be easier to apply it to more complex synthesis engines. Of course, there is also a cost trade-off here: why not simply sample the sound you want or combine sampling with synthesis? So how many companies would be willing to invest in this area?

Didn’t the devs of the Hydrasynth use machine learning (ML) internally to emulate the filters of various synths? I remember having heard something like that in the early YT videos. So this already seems to be the reality and would not particularly need any “AI” but “only” some training data and a decent model that after training translates into relevant DSP settings.

Generating music seems easier with Deep Learning today than generating sound though: the training data is already there thanks to music notation and more importantly MIDI data. All you have to provide as input is music in MIDI format and then define the problem as follows: “predict the next note based on the previous note or sequence of notes”, similar to predicting text. That way you avoid expensive manual training of labelled data. This same approach is already used by NLP models these days, including ChatGPT and BERT.

They did, and it’s a clever use of technology, to solve their development problems with the emulation of so many different multivariate filters. I’ve always wondered, how this was incorporated into the final design, as i doubt their custom hardware, actively uses this sort of method live. The engineer who did that is very capable and advanced, and was responsible for many other parts of the HS’s advanced design as well.

Good post g3o2, i’m still processing on the rest of what you wrote, and may respond on it later.

Here is the interview where it is mentioned:

The filters were indeed not implemented using DSP building components or blocks but Chen says using “machine learning”. That can mean a lot unfortunately.

According to ASM: 144 recordings of waveforms (sawtooth, noise, …) through 11 different filter cut-off and resonance settings per filter (handpicked by the team) are used as reference material.

The idea was then to reproduce each of the 144 recordings with the least error possible. Instead of building their own filter signal chain by hand and calibrating that by ear, Chen let the computer automatically find the DSP design component chain and parameters whose output would get the closest to each of the recordings.

As this thread has developed it’s become clearer the terrain involved. As a result i have expanded and generalized the thread title.

The former title was : AI Generated HW Synth Engines.

1 Like

You can train an AI in your voice, or whatever tonal sound, and then perform with that machine learned voice.

Holly Herndon did a short TED talk introducing this technology, and demonstrates her spawned ( not sampled ) voice in this video.

She calls that AI voice Holly+. Part of this process is called timbre transfer which can transfer a live performance in a performers voice, into a synthetic, though very real sounding AI machine learned voice.

You teach an AI a sound in order to generate an entirely new sound.

Yes there have been similar things done in the past, like in particular the Yamaha Vocaloid technology. But be aware of the improvements in sound quality, capabilities and the process involved with this newer technology.

My understanding is that as a part of this that Herndon is using technology from Never Before Heard Sounds. They have a quick demo in this video. ( Not as good a watch as the TED Talk viceo above. )

Also not linked is a Holly+ performance of Jolene, but go ahead and Google it.

ADDED : Herndon is also using AI vocal software from Voctro Labs. They were recently acquired by another AI vocal company VoiceMod. This second company has a long term experience with this sort of thing, having worked on the Vocaloid in the past.

1 Like

Would love to have a sampler or an update for existing sampler (DT, MPC ) where it’s possible to create audio samples on the fly with a command prompt, same way as we create images with playground ai, midjourney etc.

like „one shot low kick mix between cannon gun and 909 kick“

or „12 seconds long luxury noise transition from heavy metal to cats purring“

also more complicated „sound of silver coin falling on the ceramic plate, followed by rain drops and stretched in time 200x“

This would increase creativity incredibly and make process of working with a sampler great again : )

1 Like

There was a group of companies that got together at NAMM 2023 that made presentations and showed their products dealing with AI uses in audio. The article in Sonic State gives an overview, as well as has a video done with a representative from GPU Audio.

I fixed the links at the end of that article which were mostly garbage :

There is also this video with Mntra .

They run through the details and suggests specific products to use to do your own.

I think you could do the same with your own voice, or with someone you know too.

You know the movie Wishmaster? Better to choose your words more carefully :slight_smile:

Back to the topic: yes, I think what you’re asking is already possible with today’s technology. Isn’t it much more fun to create those sounds yourself though, in particular the luxury noise on metal and with cats purring?

Maybe you should be able to ask the “AI” for less real stuff: “one shot of adrenaline running through a 909 into a kick drum”.

AI drum samples generator.

Emergent Drums by Audialab - Generate infinite, royalty-free drum samples…

1 Like