The Evolution of Yamaha's Vocaloid

A video posted by @GovernorSilver the other day got me wondering about the evolution of Yamaha’s Vocaloid system. The video he posted appeared to be an MMD creation for the most part, but what really surprised me about it was that no Vocaloid was present.

Well I was there right from the start. I was one of those Yamaha fanboys drooling like a lunatic when they first announced Vocaloid, absolutely ecstatic at the thought of having an on-demand singer sitting in my computer. But was my excitement founded?

Yes, I think it was. Because at the time they first announced it to the public, there was one particular track they threw out there to the public, one which really demonstrated just how well the voice could be articulated if you actually knew how to use the software correctly. I believe the track was programmed by one of Yamaha’s own engineers.

Well here is that track (from nearly 15 years ago now):

Unless you were there right from the start, you will likely never have heard that track, and that’s a shame. It’s a shame because almost a decade and a half later, I’m not convinced the latest Vocaloid Editor or Vocaloid Singers are worthy of such a huge amount of time in evolution, and it’s well-programmed tracks like this that demonstrate just how painfully slow the evolution has been over the following decade and a half.

And here’s what’s ironic about the whole evolution thing, because you see that video I mentioned featuring MMD but not Vocaloid? Well MMD is something of a cult that rocketed to stardom through use of Vocaloid. Pretty much every MMD upload to YouTube would almost certainly feature Vocaloid as it’s soundtrack and of course it still does. But while MMD continues to go from strength to strengh, its development goes from strength to strength and its popularity just grows and grows, what about Vocaloid’s development and evolution?

When are Yamaha going to give it the boost in technical agility that it needs? When I listen to that early pre-release demonstration and compare it to what we have now, I feel like I was right to get excited back in the day. But sadly I also think I’m right to be disappointed at its evolution of a decade and a half!

So play that video again, put on your best cans and be amazed, remembering that what you’re hearing there is a pre-release job from a decade and a half ago. Virtual vocalist software has come from various vendors over the years yet non eof them have managed to nail the virtual vocalist, I mean it’s still not quite there is it?

So what I’d love to know is, why not?

Surely any company that nails this stuff is going to sell a lot of software. In an age where computers have such incredible computing power, why are we not seeng virtual vocalists modeled and sold in VSTi form instead of what we have today?

So hats off to the engineers at Yamaha for Vocaloid, and to the engineer who programmed that pre-release track because frankly, for a pre-release I think it was pretty damn good and I was right to get excited. But the evolution of Vocaloid appears to be stagnating and crying out for vocal modeling to be fused into the technology in order to evolve it to the point where we can no longer tell ithat we’re listening to a virtual vocalist.

I long for the day when we can just twist knobs as we would do on a synthesizer, and dial-in any sort of vocal we wish for. So there you go, a story about how a virtual vocalist managed to rocket animation software to stardom while leaving itself behind.

Personally I think the only way this stuff will evolve for the benefit of the user, is an open source project that is headed by wizards in sound modeling who are willing to contribute.




the population demands immediacy.


I really enjoyed this nod to / use of Vocaloid in character design-


In case you are like me let me save you some time…

Step 1. Wait there’s a Miku Mix?!
Step 2. Daydreams of Jahtari eating cup o’ noodles
Step 3. Why do I not already own this?
Step 4. Google shit
Step 5. Wait there is no Miku in there! What a wank!
Step 6. Cold world…


Most surprising thing is there isn’t a Volca Vocaloid by now.


miku mix? sounds like jelly belly is on the right track. wonder what if it’s ramune and taro flavor… :thinking:

For doing your own voices and a product that follows on from Vocaloid check out Voctro Labs and VoiceMod. I posted about them over here :

Holly+ is open for use.

This has gotten close to the having a singing voice that sounds real. The timbre transfer where you sing in your voice and the software performs in the modelled voice is amazing. ( see the end of the video in my linked post. )


Both vocalists on that track were human - Raychell and Nana Mizuki.

Raychell is a professional singer-bassist who has done a lot of work Bushiroad produced franchises. Nana Mizuki has had a long singing career in addition to anime voice acting, starting with enka singing in childhood.

Raychell singing the same song live in the studio with some guy replacing Nana Mizuki


Careful what you demand, you never know what else hides out there in some corporates mind!

I actually watched it while diggin’ in to a pizza. It was like tea-time kids TV all over again and had an oddly mellow, relaxing and slightly hypnotic feel to it due to the lightly muffled sound that I felt had just the right amount of muffleness to it!

First there was Wavestate, then Opsix, then Modwave, so it goes without saying that Vocawave is next, making it a nice round foursome! Personally though I was hoping the fourth would be a creative sampler, like the Volca Sample but with actual sampling and more knobs!

Cheers mate, took a good look at that! I think the timbre transfer technology is really cool but as pointed out in the video, it does raise IP issues so it’s not something I’d be into personally. It’s also no good unless you provide it with singing in which to transfer the tone too anyway. Another problem with this stuff is the fetish modern software developers have for taking charge of everything, making it browser based and charging subscriptions.

Nah, fuck that.

So as awesome as it is, I think a better solution is actual vocal modeling (preferably open source so that it can evolve quicker and escape the browser and subscription bullshit). To model the human vocal as well as brass and strings and pianos can be modeled these days. That way, it’s an actual synthesizer that you can tweak to your liking and just save your vocal designs as a VSTi preset, just like we do with every other VSTi synth. Such things can also be sequenced with text and a piano roll in much the same way as Vocaloid can.

I reckon the problem with Vocaloid is that they need to remove the sample aspect of it and instead, make it a purely modeled vocal synthesis system. It would also be significantly smaller in size than a Vocaloid library. The closest thing to vocal modeling I ever came across was a very old product called ‘Cantor’ made by ‘VirSyn’. Unfortunately it appears to have been temporarily pulled due to Steinberg dropping production of dongles, so they’re transitioning away from dongle protection.

Soon all singing humanoids could be replaced by Vocaloids!

1 Like

it’s too late! they had my phone tapped all along!!



By the way, here is Cantor and like everything else it seems with potential, it appears to have been shelved from the looks of it. I really don’t understand the lack of interest in vocal synthesis, because surely every single electronic musician out there would love a synth that could sing for them using a vocal that the musician is able to synthesize themselves!

But anyway, Cantor is old now but it showed great promise and you have to remember that this is pure synthesis, it’s not using samples at all. You don’t load a sample bank, you just dial-in the vocal characteristics and automate it all. So what I’d love to see is basically this kinda thing but with proper, highly editable physical modeling of the human vocal tract!


I don’t know why, but Vocaloid terrifies me more than any horror film.


Ooh, you should check out Plogue’s Chipspeech then!

@Beatmode I do love that they used Klaus Nomi! RIP you beautiful queer being of light.


Perhaps Casio could do the pro level version of the CT-S1000V.

( thread )

Fun to look at the features they put into that keyboard singing synth though.

Also worth examining the technology they used in that product, parts of it are open source, iirc. I’d have to look back at my posts in that thread.

1 Like

that casio looks more fun than a lot of boutique garbage that pops up these days. he even mentions specifically that vocaloid is frustrating trying to get proper results out of.


This one really gave me the creeps the first time I saw it. The dude hides a secret beneath his scarf, but he’s pretty much a living, breathing Vocaloid and I’m talking about actual pitch reachability here, not just creepiness (prepare yourself for this one, no, really) :smile:

Interesting character for sure, weird and creepy in a cool sort of way and I intend to look more into his music. And I’m not sure if you’re aware of it but I believe PSB based ‘Twenty Something’ around the dramatic chords of Klaus Nomi’s ‘Cold Song’.

Let the listener be the judge as they say!

Nice to come across others with an interest in Vocal Synthesis, and you clearly have!

I remember the launch of that Casio but was disappointed because it involved an app. Not happy about that at all and sadly couldn’t be bothered with it for that very reason.

Thumbs-up to them though, I do like Casio :+1:


It can use an app BUT you don’t have to use one, you can do everything without it. They just included it for some peoples convenience or preference.

1 Like

Thanks! Always fun to learn a thing, he was iconic and it sounds a loving nod.


he’s an anime villain if I ever saw one.