Music Synthesizer Technologies made using AI Methods

What if there was a generic digital hardware synth, for which you could describe in a short paragraph, how you want it to sound, and how you want it to react, and presto, you’ve got a new synth with that engine ?

We’ve seen recently how simple descriptions sent into AI machines can generate things that may go places you had not imagined.

There is a recent article in SynthAnatomy that wonders about this, referencing a newly released product using AI to generate guitar amplifier modelers. See in particular the second half of this two part article, that wonders about the general question.

Quoting from that article:
Will we soon see AI-powered synthesizers with built-in modeling technology that allows us to easily model/recreate any vintage or modern Synthesizer with a few clicks?

In addition there is a new synth coming with a built in wifi connection, made by a company with a lot of experience with high powered internet system design. ( thread )

The details on this new synth are rather sketchy at the moment but one article describes it as having : synthesis models – which are referred to as “machines”. ( Frankly i think this article is exaggerating, but no matter, there will be synths with this sort of capability soon enough. )

So it could be possible that this synth uses AI to generate synthesis models. If not this synth, then others soon will be.

I’m a little surprised actually that synths haven’t already been internet active, given that so much else is ( refrigerators, thermostats, etc ), and AI enhanced, especially considering the benefits available to a customizable data processing device, which is really what a hardware synth is at core.

Even an AI patch randomizer for particular synths would be good. And there actually is experimental software that will generate patch settings to allow your older conventional synth to best sound like a submitted sample.

So what do you think of this idea ? Would you like to use a synthesizer created with AI software ? Would you be interested in having a synth with customizable AI generated machines? What would you give as a prompt to this system for your own custom synth ? Do you have other ideas that go along with this approach ?

EDIT : Language was sharpened slightly in a couple places to clarify this post.

The thread’s fomer title was : AI Generated HW Synth Engines.

4 Likes

I’d be all over it. One use case I’d love to see is reverse-engineering synth sounds by giving it a sound sample, and it programs a sound-alike synth patch, complete with all the basic parameters you’d find on a hardware synth.

2 Likes

There’s something like this. I am trying to recall the exact detail. I’ll post if i recall.

1 Like

I’d be at a loss of words. Literally. It’d be like trying to describe a techno song to a friend. “You know that one song that goes boom tis blap bappa doo slap!”

No. I have no idea what the Hell you’re talking about as you just described all the songs.

4 Likes

I found it – it was a research project, and they created settings for u-he Diva from a sample. I posted about it here :

Audio data as video data representation - #38 by Jukka

The method is neural net based and they train the system to recognize spectral images.

There are versions, that basically do something very similar.

This one comes with source code, but you need to train the network, i think.

https://jakespracher.medium.com/generating-musical-synthesizer-patches-with-machine-learning-c52f66dfe751

And here’s a research paper that may well be an original source for these other two. This is in pdf, and is very technical.

Seems to me a hardware maker could do something similar, where you upload a short sample, and they return a patch for your synth, or an image to work from, if there is no patch storage on the synth.

4 Likes

Thank you! Yes, I think synth makers need to get on this!

1 Like

An AI system could find that song for you. The WoFI system i linked to in the first post, has a mic and is hooked to their internet software, so maybe it could find the song.

What if you could enter rhythmic parts using your voice or from a short sample. And if it had something like Melodyne or Vochlea Dubler on the net you might work from a sample or your voice.

I think that might be the reason there is such a small keyboard on that WoFI. A product like that might not be made for playing, only for note entry and simple melodies. The market is for “newbies and semi-professionals”, so if they were to add the AI smarts on-line, it would allow that market to be more musical and productive, without being trained musicians.

1 Like

I was just talking about this the other day with some people at a Max/MSP meetup. I wondered what it could do for accessibility if you could have a synth that doesn’t have “traditional” synthesis parameters, or even use traditional forms of synthesis, but instead you describe the sound, or set up more descriptive parameters. The issue we discussed comes from everyone’s different interpretation of those descriptions, ie. what sounds more “buzzy” or “crunchy” to someone might not have that description for someone else, so it’d have to be trained on your own interpretations.

3 Likes

That sort of description is used so frequently on Elektronauts, it must mean something to someone, but for me most often i haven’t a clue.

But using a specific reference to a song, album, band, or musical genre would convey a lot.

An AI system should be able to give you back, copyright free tools that would give you a place of beginning from that sort of description.

image

But in all seriousness…cool.

5 Likes

There’s the Google Nsynth super that came out a few years ago, but only DIY and I heard the sound quality was low. As far as I remember it can morph between sounds samples with AI , like , 50% thunder noise 50% trumpet sound. Exciting on paper, I hope we will see more of these in the future.

2 Likes

I still make music manually.

It’s still manual ( or can be ), just with a different set of tools. Tools made to enhance human creativity.

For a lot of useful information in this field look into the work done with the IRCAM-ACIDS ( Artificial Creative Intelligence and Data Science ) projects. This is research done by a group of people in France, mostly open source – all really excellent stuff.

For a place to look, start clicking off this Google search.

I have a hunch, that some of the internet back-end with the WoFI project may be based on some of the research done for the IRCAM-ACIDS project. In particular i am thinking the Flow Synthesizer may be involved. This is only a hunch though, perhaps i am only wishing that it is so.

I think you have put your finger on a central question.

With the IRCAM-ACIDS project Neurorack they generate sounds from seven separate descriptors that can be actively adjusted.

  • Loudness
  • Percussivity
  • Noisiness
  • Tone-like
  • Richness
  • Brightness
  • Pitch

The Neurorack hardware is a small Eurorack module based on the Nvidia Jetson Nano processor, a 128 core GPU along with 4 CPUs. I believe the software is then set up with some basic sounds connected in to those seven descriptors, and then the musician is allowed to vary those seven descriptors in some manner over time to produce sound and music.

=-=-=-=-=-=-=-=

A summary of some other ACIDS stuff :

  • DDSP - Differentiable Digital Signal Processing, a PyTorch ( a Python machine learning framework ) module with a PureData wrapper, with a couple of pretrained instrument models, for real-time sound creation.

  • RAVE VST - Realtime Audio Variational autoEncoder, a fast and high-quality audio waveform synthesis system using deep learning models.

  • VSCHAOS - A Python library for variational neural audio synthesis, using PyTorch.

  • Flow Synthesizer - A voice controlled synthesizer, using variational auto-encoders and normalized flows on a set of learned sounds. ( Simplified description. )

  • Generative Timbre Spaces - A descriptor based synthesis method that maintains timbre structure while moving across timbre space. Uses variational auto-encoders.

  • Orchestral Piano - A method that allows the real-time projection of a keyboard played piece, to a full orchestral sound, using analysis of historic examples of music moved by composers, from piano to orchestra. I think this is still under improvement.

  • Orchids - A set of algorithms and features to reconstruct any evolving target sound with a combination of acoustic instruments.

In addition to IRCAM, it appears that Sony CSL ( Computer Science Laboratories Inc. ) may also be assisting with some of this research.

3 Likes

I want to see them come out with an AI that can emulate the Ohio Players, if they can do that I’ll eat my hat!

1 Like

It’s going to be hard to resist this kind of eurorack !

1 Like

You’re talking about the Tonex Amp Modeler. That’s a very competitive price, if it competes well with the Kemper Profiler.

Resynthesis has been around for a while, too. Maybe with ML, it might be easier to apply it to more complex synthesis engines. Of course, there is also a cost trade-off here: why not simply sample the sound you want or combine sampling with synthesis? So how many companies would be willing to invest in this area?

Didn’t the devs of the Hydrasynth use machine learning (ML) internally to emulate the filters of various synths? I remember having heard something like that in the early YT videos. So this already seems to be the reality and would not particularly need any “AI” but “only” some training data and a decent model that after training translates into relevant DSP settings.

Generating music seems easier with Deep Learning today than generating sound though: the training data is already there thanks to music notation and more importantly MIDI data. All you have to provide as input is music in MIDI format and then define the problem as follows: “predict the next note based on the previous note or sequence of notes”, similar to predicting text. That way you avoid expensive manual training of labelled data. This same approach is already used by NLP models these days, including ChatGPT and BERT.

They did, and it’s a clever use of technology, to solve their development problems with the emulation of so many different multivariate filters. I’ve always wondered, how this was incorporated into the final design, as i doubt their custom hardware, actively uses this sort of method live. The engineer who did that is very capable and advanced, and was responsible for many other parts of the HS’s advanced design as well.

Good post g3o2, i’m still processing on the rest of what you wrote, and may respond on it later.

Here is the interview where it is mentioned: https://youtu.be/VwzNWvQF2ks?t=155

The filters were indeed not implemented using DSP building components or blocks but Chen says using “machine learning”. That can mean a lot unfortunately.

According to ASM: 144 recordings of waveforms (sawtooth, noise, …) through 11 different filter cut-off and resonance settings per filter (handpicked by the team) are used as reference material.

The idea was then to reproduce each of the 144 recordings with the least error possible. Instead of building their own filter signal chain by hand and calibrating that by ear, Chen let the computer automatically find the DSP design component chain and parameters whose output would get the closest to each of the recordings.

As this thread has developed it’s become clearer the terrain involved. As a result i have expanded and generalized the thread title.

The former title was : AI Generated HW Synth Engines.

1 Like