Aaron Koblin rocks…
Building Radiohead’s House of CardsBy Toby Segaran, Jeff Hammerbacher
|
This is the story of how the Grammy-nominated music video for Radiohead’s “House of Cards” was created entirely with data. Before you read this, you should watch the video. The definitive source for the video is the project’s Google Code page. On that site, you’ll also find several other resources, including samples of the data we used to build the video, a Flash application that lets you view the data in 3-D, some code you can use to create your own visualizations, and a making-of video. Definitely check it out.
How It All Started
The other possibility we discussed was visualizing laser sensor data. I first encountered this technology while working on a project for the Center for Embedded Network Sensing (CENS) at UCLA. CENS was using lasers to detect how light shines through forest canopies, and I was struck by the inherent beauty in the rendered images. James agreed after seeing some examples, and he was impressed by the concept of using lasers to create a piece of film. He said: “You mean you’re shooting video without cameras? You’re shooting video without video?” He immediately saw an opportunity to do something that hadn’t been done before. Not too long afterward, he approached Radiohead with the concept. Hopefully, you’ll find the story of how this video was made to be an inspiration for your own work. I’ll talk first about the equipment we used to capture the data. After that, I’ll talk about the data itself, the video shoot, and the post-processing of the data. Finally, we’ll take a look at the visualization code I provided for the Google Code site and discuss how you can play with it yourself. The Data Capture Equipment Velodyne Lidar Velodyne is a company located just south of San Jose, California, that’s run by two guys who compete in robot combat events like Battle Bots and Robot Wars in their spare time. The company produces loudspeakers, stereo equipment, and (naturally) powerful laser scanning devices, including the HDL-64E Lidar we used to capture the landscape and party scenes in “House of Cards.” The HDL-64E’s real claim to fame is that it was used successfully by several of the 2007 DARPA Urban Challenge vehicles, including the winning team, to achieve environment and terrain vision. In some cases, it was these vehicles’ only vision system. Velodyne’s HDL-64E Lidar is a scanner with 64 laser emitters and 64 laser detectors. It spins in a circle, gathering data 360 degrees horizontally and 26.8 degrees vertically at a rate of over one million data points per second, which approximates to about 5 megabytes of raw data per second. By default, the Lidar rotates at 600 RPM (10 Hz), though this can be adjusted between 300 and 900 RPM by sending a text command through the system’s computer serial port interface. We used the highest setting, 900 RPM, for maximum resolution when scanning the static landscapes. The range of the Lidar varies based on the reflectivity of the environment. Pavement, for example, has a 50-meter range, while cars and foliage (which are more reflective) have a 120-meter range. The minimum range is 3 feet; anything closer, and the light reflects back into the detector too quickly for the device to measure it. The emitter-detector pairs are divided into two 32-laser banks, as you can see in the diagram below. The upper bank is directed at the higher half of the elevation angles; in other words, it scans the top half of the Lidar’s vertical field of view. The lower bank, conversely, scans the lower half of the elevation angles.
After the return light has gone through the sunlight filter, the receiving lens focuses the return light on a photodetector called an Avalanche Photodiode (APD), which generates an output signal relative to the strength of the received light. The output signal from the APD is amplified and then converted from analog to digital. This data is then sent to a digital signal processor, which determines the time of the signal return. The strength and return time of the pulse creates one unit of data. As I said earlier, the HDL-64E model creates over one million data points every second, which is over 5 megabytes of raw data per second. After the data is created, the sensor outputs it to the user through a standard 100BaseT Ethernet port. Data is continuously streamed out of this port at a frame rate equal to the rotation rate (600 RPM would produce a 10 Hz frame rate). Included in these Ethernet data packets are the distance, intensity, and angle data for each emitter-detector pair. The data is then captured using an Ethernet packet capture program and, in our case, saved to a hard drive. |
Geometric Informatics The Lidar is an incredible tool for visualizing outdoor scenery. It detects a point about the size of a nickel, depending on distance, which works fine for large spaces, but for the contours and details of a person’s face, it’s not good enough. For the close-ups of Thom Yorke singing, we needed something else. We needed something with finer vision. While thinking about how to do the close-up shots, I happened to think back to an outfit called Geometric Informatics that I discovered at the 2005 SIGGRAPH conference. (Aside: if you have a cool data visualization technology, please go to every trade show possible on the chance that I may be attending… thank you.) It had a booth at the conference and a demo of its system, which it calls GeoVideo. GeoVideo is a real-time motion capture system that is particularly suited for capturing the geometry of a person’s face. It is significantly better than the Lidar system at close-ups, capable of discerning data points at 0.2 millimeters as opposed to 2 centimeters. With it we were able to capture the fine details of Thom Yorke singing. The point cloud data you see at the opening of the video was captured with GeoVideo. If you think the drawing below looks a bit simple, that’s because the device is not much to look at. The system looks like a beige box roughly a foot on either side with two lenses on it. One lens projects a field of light onto the subject in front of the box, while the other lens captures the data. The light field consists of a grid of 600,000 triangles, which, in effect, forms an instant contour map projected onto the subject in front of the sensor. The sensor then reads each triangle point as a point of data, which is then outputted raw to a computer at 54 megabytes per second. The sensor can capture 180 frames per second.
The advantage of the GeoVideo’s method of projecting a light field onto the subject is that whatever is in front of the sensor isn’t required to have a grid physically drawn onto it, wear a motion capture suit, or sit in front of a green screen with reference marks. The light projection creates an instant, portable reference map. It’s incredibly easy. The GeoVideo system is also capable of texture mapping, meaning that it can not only capture the data points, but also the textures between those data points. Combined together, this results in an eerily accurate 3-D representation of an object or a person’s face. For the “House of Cards” video, we decided to forgo the textures and use only the data points. And even these we heavily downsampled. The result was the digital point cloud of Thom Yorke you see in the opening scene of the video. Rather than an exact likeness of him, he appears to be a digital avatar or soul—at least, that’s how I see it. Having seen both versions of the data—with textures and without—I can say that the version we used without textures is much more interesting. With textures, he looked a bit like a character from a video game. Sometimes taking away data makes the visualization more beautiful. The Advantages of Two Data Capture Systems There are a couple lessons too that you can draw from my experience of finding equipment. The first is that when it comes to finding equipment for data visualization, look everywhere. There are exciting sensing technologies being developed constantly that have never been used artistically. If you’re about to embark on a visualization project, do some research online, at trade shows, or at your local university. Find out if there are new ways to capture your data that you hadn’t considered before. A different piece of equipment might add a theme to your work or reveal data you didn’t see previously. Always be looking for visualization techniques that will surprise people. The second conclusion is a warning: if you only use one of piece of equipment, your work may be seen as just a demo of that piece of equipment. If we had only used the Lidar to create the “House of Cards” video, I have a slight suspicion that the video might have become “the Lidar video.” By using both the GeoVideo system and the Lidar, the final product couldn’t be slapped with a product label. No single tool defined the work. The mixture of two data capture systems made the story behind the video more interesting. The Data They’re in the file 2067.csv, which is the 1,067th frame in the video. Because each second is 30 frames and the first frame is 1001.csv, these data points can be seen at around 0:36 in the video. You can find this datafile on the Google Code site, along with the other frames:
The data are in the format x, y, z, intensity. All of the data we captured was eventually translated into this format. The x, y, z values are relative distance measurements. The GeoVideo system, like the Lidar, has a 0, 0, 0 point upon which it bases all other points. What 70.46 means, therefore, is that the point is 70.46 units along the x-axis away from the 0 point. You can scale these numbers however you want. The intensity range is from 0 (0% white) to 256 (100% white). You’ll find 2,000 frames’ worth of Thom singing on the Google Code site, comprising just over a minute from the video. The audio is available as well. We also included two static landscapes’ worth of data: the city and the cul-de-sac. They are in the HoC_DataApplications_ v1.0.zip archive that includes the viewer program. The data you see on the site is in the same format as the data we delivered to the postprocessing studio, with one minor difference. The studio wanted RGB values for each point, so we repeated the last value twice—in effect, using the intensity field as the color channel. Capturing the Data, aka “The Shoot” The Outdoor Lidar Shoot The first thing we did on arrival in Florida was set the Lidar up on the back of an old van the production crew had rented. We used the van to capture the static landscape data you see in the video, such as the city and the cul-de-sac. Unlike the DARPA Urban Challenge vehicles, we did not put the Lidar on top of the vehicle. Instead, we tilted it 90 degrees and mounted it to the back of the van. This meant that the lasers would sweep the environment vertically. If picturing this is confusing, think of a lighthouse tipped on its side and sticking off the back of the vehicle, like a tail pipe. This meant that the lasers rotated from the street to the sky and back again. This happened 900 times per minute. We did it this way because it gave us a very high-resolution scan of the area. And in fact, during post-processing, we isolated only one laser out of the 64, because all of them were effectively scanning the same thing. As the van was moved forward, then, the laser scanned a unique part of the environment with each revolution. The landscape below was captured with this technique. Our van drove right down the middle of the street. Do you see how the lines on the street are perpendicular to the street itself? That’s because the Lidar was hanging off the back and facing downward. You may notice as well that there are curved lines on the side of the apartment towers. This was caused by the movement of the van coupled with the rotation and angle of the Lidar.
The shoot went very smoothly. The production team had scouted the locations, so we simply drove to each scene and scanned them in order. When we reached an area we wanted to scan, we would slow the van down to around 10 mph and the driver would try to achieve as steady a speed as possible. Then we’d start recording. Unlike a camera, the Lidar doesn’t start and stop. Instead, when it’s on, it’s always rotating and always outputting data. So, we didn’t have to turn it on and off, we just had to know when to start recording the data. When the moment came, our assistant director Larry Zience would shout “roll computer” as a signal to Rick Yoder, the Velodyne field engineer, to start collecting points. (This was mildly funny to some of the crew, because normally a director says “roll camera.”) Rick would then hit a key on his laptop and the Lidar data would begin outputting to his hard drive. When Larry said “cut,” we stopped recording. Rick later sent me a note about what it was like to work with a film crew:
The illustration below shows another landscape. Notice how the power lines appear jagged? There’s a simple reason for that: it’s because the van was bumping up and down due to the uneven road and the natural bounce of the vehicle. Typically, these “errors” would be compensated for with gyroscopes, accelerometers, and other fancy pieces of equipment. In our case, we wanted the errors. Not only was it cheaper and easier to process, but (in my eyes, at least) it made the data more interesting. Perfection is an admirable goal, but not always the most creative.
The Indoor Lidar Shoot We also used the Lidar indoors on a film set. It was used to capture the party scenes at 3:30 and 3:55 in the video. Unlike the landscape scenes, we used all 64 of the Lidar lasers’ data for this part of the shoot rather than just one. That’s because the party scenes are dynamic—the points change with every rotation of the Lidar—which means they change with each frame of the video. Therefore, you see the people in the scene moving. For this part of the shoot, we used the normal horizontal orientation for the Lidar, which is the reason the data appears in horizontal lines. To create the party scene, we recruited some film students from a nearby school. Some of the students got very done up, thinking they would be in a Radiohead video and this was their time to shine; little did they realize, though, that all we really wanted was the form of their bodies. Sorry about that, guys! If you count the horizontal lines in the figure below, you’ll find that there are 64. And notice also how the top half of the image appears brighter? That’s because the 32 lasers at the top of the Lidar trigger faster than the bottom. As I noted earlier, the Lidar is built this way because it normally scans large terrain spaces and requires a higher resolution for elevations approaching the horizon.
Unfortunately, the resolution for the Lidar is very low, about 2 centimeters per point. That’s why the figures look so hazy. To me, this added to the meaning of the video. Parties are often populated by people you don’t know very well, and the visualization reflects this sense of alienation. However, the low resolution of the Lidar wasn’t going to suffice for the close-ups on Thom Yorke. For this, we used the GeoVideo system. The Indoor GeoVideo Shoot The point clouds of Thom Yorke, his “lover” (played by actress Lauren Maher, who you first see at 1:05), and a couple of other scenes (such as the hand at 3:50) were all captured with Geometric Informatics’ GeoVideo system. The GeoVideo system is capable of an astonishing level of realism. If you watch the demo videos on Geometric Informatics’ website, you’ll notice that it achieves a much higher quality image than the point clouds in our video. Its visualizations also don’t suffer from the interference and errors that appear in our video. The reason our video is lower quality is that we made it this way deliberately. James Frost, the director, didn’t want a perfect visual avatar of Thom Yorke; he wanted a fragile, evanescent vision of him (see below). When watching the opening scene, to me this implies that this is not Thom Yorke the man, but something closer to the singer’s soul. We are seeing the ghost in the machine.
The low quality of the data and the frequent errors in the visualization also make it appear as if acquiring the data was difficult. This apparent difficulty enhances the story. A clearer image would not have conveyed the meaning we wanted. The interference in the data was not done in post-processing; rather, it was created on set. The production company brought a number of props with them to break up the data, including little bits of mirror glued to a sheet of plexiglass, feathers that were dropped in front of the scanner, and running water that was poured on a piece of plexiglass in front of Thom. The mirrors ultimately worked the best to disrupt the data in a nonorganic way; the feathers didn’t do a very good job of interrupting the data, and water absorbed the light, creating only empty points in the data set. Including both the GeoVideo and Lidar portions, the interior shoot took about 10 hours. For all of Thom’s scenes, we were careful to back up the data on multiple hard drives for fear that if we lost it, we might not be able to shoot it again. |
Processing the DataAfter all the data was captured, the processing work started. The first thing we did was send the raw Lidar data to 510 Systems, an engineering company in Berkeley, California that has a lot of experience processing this type of data. The company assigned the project to its in-house Lidar data guru, Pierre-Yves Droz. He’s an expert at turning raw Lidar data into usable formats. Pierre did two things for us after receiving the data, which we mailed to him on DVDs. First, for the landscape scenes, he isolated a single laser out of the 64 and created a data set of just that laser’s points. Second, he converted all of the raw Lidar data, including the dynamic party scene data, into individual data points consisting ofx, y, z, and intensity. To convert the raw data, Pierre needed to know the precise position and orientation of each of the Lidar’s laser emitter and detector pairs. This calibration information is provided by Velodyne, and the parameters are unique to each Lidar unit. Pierre also used the speed of the van to help calculate how far the Lidar moved in the real world as it rotated. All told, we gave 510 approximately 4 gigabytes of raw data, which turned into almost 50 gigabytes of processed data in text .obj format. Post-Processing the Data Brandon Davis, The Syndicate’s particle specialist, worked on the project. He sent me an email describing why the project was unusual: From the start, the Radiohead project had very unusual possibilities from a visualization standpoint. With an animated data set, you get a strange paradox: view-dependent data that can be viewed independently. It really is a “second sight,” being able to take what one sees and view it from different perspectives, revealing the gaps in that original sight. This opened the doors for some truly unique imagery. He goes on to describe how he tackled the vaporization effect that you’ll notice throughout the video:
The dynamic point cloud of Thom singing was another matter, however. Brandon continues:
Eventually, The Syndicate figured out a simple way to add the decay effect to the dynamic point cloud using a 2-D mask: it added the mask with a layer of particles blowing away on top of the 3-D point cloud. This meant that there wasn’t a perfect one-for-one particle decay like in the static landscapes, but I think the differences are imperceptible. When I asked James, the director, why he added the particle decay effect in the first place, he said:
When all of the post-processing was finished, the clips were edited together by Nicholas Wayman Harris at Union Editorial. At last, the video was complete. |
Launching the VideoThe “House of Cards” video was the first music video to be premiered by Google. It launched on July 11, 2008. The Google site includes some of the video’s data, so that you may create your own visualizations, as well as a 3-D data visualization tool. Google’s Creative Lab developed the site. The visualization tool was written in Flash by myself and my friend Aaron Meyers. It allows the viewer to rotate the point cloud in real time while the video is playing. To me, this is where the data becomes truly beautiful. The Flash application allows you to look at parts of the video from any angle you want in real time, something traditional video recording will never allow. You may even turn Thom Yorke’s face so that it faces away from you, effectively holding his face as a mask up to yours and allowing you to look through his eyes. This effect is very powerful, in my opinion. It makes the music video tangible in a way I doubt many people have experienced before. We also released some of the data itself—making it open source—along with a video creation tool written in the Processing programming language. We then encouraged people to download the data and create their own videos. I want to share the source code for the video creation tool to show you how easy it is to create your own version of the video in Processing. This is the code that outputs frames of Thom Yorke singing:
size(1024,768, OPENGL); //This is the render size. We’ll use OpenGL to draw as fast as possible//frameRate(30); //Uncomment to watch the animation at 30 frames per second. strokeWeight(1); //Draw lines at a width of 1, for now. } void draw( ){ //Here we state the things we’re going to do every frame background(0); //We’ll use a black background translate(width/2, height/2); //The data has 0,0,0 at the center and we want to translate(-150,-150); //Let’s adjust our center slightly scale(2); //Let’s draw things bigger //rotateY(frameCounter/50.0f); //If uncommented, this makes the data rotate over time //rotateY(mouseX/150.0); //If uncommented, this uses the mouse’s horizontal String[] raw = loadStrings(frameCounter+”.csv”); //Here we load the current frame for(int i = 0; i //raw data String[] thisLine = split(raw[i],’,’); //For each line we’re going separate float x = float(thisLine[0]); //Now we make a decimal variable for each parameter stroke(intensity*1.1,intensity*1.6,200,255); //We set the color of each point to line(x,y,z,x+1,y+1,z+1); //Here we draw a little line for each point; this } frameCounter++; //Add one to the frame variable to keep track of what frame we’re if(frameCounter>2101){ //If we get to the end of the data we’ll exit the exit( ); println(“done”); } //saveFrame(“renderedFrames/”+frameCounter+”.tga”); //This would be a way to save out |
It’s not as beautiful as the data (this isn’t Beautiful Code, after all), but it works great. As written, the code allows you to watch Thom Yorke sing the song head on, but with a couple of modifications, you can customize the experience. Here are two examples of modifications that are commented out in the previous code. The first is:
Uncommenting this line at the beginning of the draw function causes Thom’s face to turn around the y-axis as the frames increase. The second modification is:
Uncommenting this line at the beginning of the draw function allows you to make the rotation a function of the mouse. You may now move Thom’s face as the frames are outputted. I’m sure you can think of other things to modify; many people have done things I hadn’t even considered, which is exactly what I hoped for. Once you’ve rendered out all the frames (by uncommenting the last line), you can put the frames together into a video with a program like QuickTime Pro, Final Cut, or After Effects. Some of the videos created by other people are really impressive. Check them out at the “House of Cards” YouTube group. It really is quite easy. All it takes is some beautiful data with which to get started. Conclusion
|