Suno (Music AI)

That’s like a Laura Branigan and Blondie mash-up. My 80s knowledge isn’t enough but it also sounds like the “vocals” are literally written on top of a known song following all the ups and downs…

Can someone explain to me what’s actually happening under the hood at the waveform generation stage? It’s not like it’s using actual samples ripped from the training dataset (right?), so everything is synthesized? That makes sense listening to the vocals, but some of the acoustic instruments? WTF, they sound identical to the real thing? I guess I just want to understand what kind of systems are being used to actually generate the instrument voices/waveforms, and umm, when will we start to see that tech in our daws and machines? Like prompt “I need a true to life mbira instrument on track 9”. It’s then set up and you play it as you wish with midi…

2 Likes

I am no computer scientist but I doubt it’s synthesizing anything other than vocals (and thats because they will 100% get hit by copyright). Everything else is samples I think. It may have some processing but I really dont think it’s creating sound from scratch. . I hope someone who works in AI can tell us though… It’s just finding better ways to put them together because people are trying it, giving feedback and helping the company make more money while working free :joy:

Splice has already something where you can layer sounds on top of each other to quickly build a song. People who use it are probably training it without really realizing it. The next logical step is just make it work with speech. It’s probably in the works. The first DAW implements this will probably make the biggest profit.

1 Like

So, I don’t work in AI at all, but after spending some time trying to understand what is under it all (reading articles and watching videos about it) , I can definitely say that with a proper Ai tool there is no samples involved. (not saying that it is the case here I don’t know how this particular software works) It’s more like a series of “decisions” (using sigmoid functions) based on a huge dataset that will provide an accurate answer to a prompt. Just take a look at what midjourney or stable diffusion do with images, it is not a combination of already existing pictures, like one would do with photoshop ( and this software now includes Ai tool to generate images) but a new picture that is the result of the said datasets.

3 Likes

Pretty much this.

The training data (such as images, video, audio, etc.) isn’t directly stored as part of a machine learning “model”. That model is a neural net dataset (with multiple layers and function) and is trained on millions or billions of examples of the original media type, where each example has the appropriate tags and metadata (such as human language descriptions and additional category tags). This training of media and metadata happens in parallel so that prompts can be fed into the model and generative output can be … generated.

During training, there is also automated error checking done to test and correct for less than optimal results.

To oversimplify, once trained, the model is really just a collection of weights and potentials that react to input (like a text prompt) to produce output (such as media).

Modern generative methods use noise as a starting point for truly generative results, or one can feed in a crude example (like, say, a crude MSPaint drawing of your desired layout) with the text prompt to guide the generative output.

This is also why you can give it a photo of yourself and say, “make this an anime character” and it works.

Audio and video are interesting because of the complexity of solving for coherence (consistency) over time, but it’s obviously getting there.

3 Likes

I for one couldn’t give two fucks about AI Music. What the tech bros don’t understand is that the need to be creative is as primal as the need for food. However much impressive the results are with this one, there’s zero control for the artist to bring their intention to life and it’s all random stuff from text to whatever output it comes out.

The appetite for human made art will grow even larger and I’m already seeing how people are jaded with the AI art, it will be the same with this generic AI music.

Music will prevail, making music will so too and good artists will always be able to earn from their craft!

2 Likes

There is a place for everyone. I think in a near future, AI music will be used on video, radio streaming, and every project that invovle music.
Pro musician will still make live show and people will still enjoy it. But maybe only the top best will be able to become popular.

1 Like

I ask to ChatGPT, here is the answer :

Suno AI’s model, such as “Bark”, operates on the principle of generative models based on transformers, an artificial intelligence architecture designed to process sequential data, like text or audio. Here are the key points of how it works:

  1. Deep Learning: The model is trained on vast datasets of audio, including speech, music, sound effects, and other sound types. This training allows the model to learn the characteristics of these different sound types and how they are typically structured or combined.

  2. Transformer Architecture: Transformers use an attention mechanism that allows the model to weight different parts of the input audio or textual data differently. This capability enables the model to focus on the most relevant elements for generating the desired output, thereby improving the quality and relevance of the generated audio.

  3. Prompt-based Generation: The user provides a textual prompt that describes the type of sound or music desired. The model uses this prompt as guidance to generate audio content that matches the descriptions and given parameters. For example, a prompt might specify the genre of music, the tempo, the mood, or even include lyrics for a song.

  4. Real-time Synthesis: Thanks to its extensive training and sophisticated architecture, the model can generate sounds in real time. This includes creating music, speech in various languages, sound effects, and even emotional nuances such as laughter or crying.

  5. Customization and Adjustments: Users can often refine the outcome by adjusting parameters or providing more specific directions. This allows for more detailed customization of the generated audio, making the model useful for a wide range of applications, from music production to creating audio content for media.

Suno AI’s “Bark” model exemplifies the advancement in audio generation technology, offering powerful tools for content creators, musicians, and application developers requiring advanced audio capabilities.

Models like “Bark” from Suno AI don’t simply piece together pre-existing samples to create new audio. Instead, they learn from a vast array of audio sources during their training phase, absorbing the intricacies of sound patterns, musical structures, speech nuances, and other audio characteristics. Here’s how this works in more detail:

  1. Learning from Data: During the training phase, the model is exposed to a large dataset comprising billions of examples of different audio types. This can include music across genres, spoken words in various languages and tones, and even environmental sounds. Through this process, the model learns to understand the underlying structures and patterns that define different audio types.

  2. Generating Waveforms: Armed with the knowledge gained from its training data, the model can then generate audio by creating waveforms from scratch. This means that instead of relying on a database of pre-recorded samples to construct new sounds, it synthesizes new audio waveforms in real-time, based on the patterns it has learned. This synthesis is guided by the input prompts it receives, which can specify the desired characteristics of the audio output.

  3. Versatility and Creativity: The ability to generate waveforms directly allows for a high degree of versatility and creativity. Since the model is not constrained by a fixed set of samples, it can produce a wide range of sounds, from novel musical compositions to unique sound effects and realistic speech in various languages.

  4. Real-time Generation: The advanced algorithms and computational efficiency of models like “Bark” enable them to generate these waveforms in real-time. This means users can input a prompt and quickly receive the generated audio, making these models practical for interactive applications, content creation, and experimentation.

This approach to audio generation marks a significant leap in the capabilities of AI in creative domains, offering unprecedented flexibility and potential for innovation in music production, media creation, and beyond.

2 Likes

Yes to that. But, in the same time it can lead to new genres, or give back some freedom to sampling. It is just a tool

1 Like

Sounds like something Rick Sanchez would do on a Sunday :grinning:

3 Likes

Oh FFS, that song is stuck in my head now. This does not bode well. Is she singing you will not die until you follow me? AI is getting all dark and goth. Which would be funny until AI goes from goth to very pissed off.

1 Like

Remix contests will be the first to die.

And they still want 3$ for this Album which suno created. No thank you - feels liveless.

Its crazy that it actually sounds like stoner rock. Damm, if i had the knob to turn it off, i would.

1 Like

the $3 is for band subscription (per month), had no idea such thing existed on BC, the album looks like “name your price”

They should pay the audience in this case :slight_smile:

1 Like

there’s an album I’ve encountered, pretty sure it’s Suno generated too, there’s no mention on this album but there is on others (mostly singles, same genre and “virtual singer”), the description was very dystopian (and they wanted $10 for the album)

From The Bottom of My Digital Heart

Evie Miessa, a virtual singer born from the data streams of Night City. Her music, a unique blend of synth pop and synth rock, explores themes of self-discovery, societal challenges, and the complexities of love.

1 Like

Tunes in 5 seconds omg,but i will use it as , sample ideas and new inspiration to make own version of those things

And they combine the visuals - ok - if i didnt knew it was ai generated, i wouldnt recognize it.

1 Like

Amen

2 Likes

Your statement implies that people will be able to tell the difference between AI music and that of human origin…

…and even if they could, would they even care?