I’ve just built virtual video-synth in recent months and by the looks and sound of this, it seems to use similar principle. Basically, if you convert video data to arrays of RGB values, you can make wavetable out of it and interpret it in different ways too (like using HSL instead of RGB or converting to CMYK even). I’ve done some tricks to make more kinds of sounds out of this, not just harmonic noise.
So it sounds like a typical wavetable generated from video, but the way it looks may suggest that they might used depth camera (Kinect probably) instead or they just manipulated 2D to look like this (like extreme bump mapping with indirect color projection to distorted 3D plane). But that’s just a guess.
Here’s few examples how my prototype worked: