

This is actually not the first time A$AP has used Radiance Fields in a music video.
Believe it or not, A$AP Rocky is a huge fan of radiance fields.
Yesterday, when A$AP Rocky released the music video for Helicopter, many viewers focused on the chaos, the motion, and the unmistakable early MTV energy of the piece. What’s easier to miss, unless you know what you’re looking at, is that nearly every human performance in the video was captured volumetrically and rendered as dynamic splats.
I spoke with Evercoast, the team responsible for capturing the performances, as well as Chris Rutledge, the project’s CG Supervisor at Grin Machine, and Wilfred Driscoll of WildCapture and Fitsū.ai, to understand how Helicopter came together and why this project represents one of the most ambitious real world deployments of dynamic gaussian splatting in a major music release to date.
The decision to shoot Helicopter volumetrically wasn’t driven by technology for technology’s sake. According to the team, the director Dan Strait approached the project in July with a clear creative goal to capture human performance in a way that would allow radical freedom in post-production. This would have been either impractical or prohibitively expensive using conventional filming and VFX pipelines.
Chris told me he’d been tracking volumetric performance capture for years, fascinated by emerging techniques that could enable visuals that simply weren’t possible before. Two years ago, he began pitching the idea to directors in his circle, including Dan, as a “someday” workflow. When Dan came back this summer and said he wanted to use volumetric capture for the entire video, the proliferation of gaussian splatting enabled them to take it on.

The aesthetic leans heavily into kinetic motion. Dancers colliding, bodies suspended in midair, chaotic fight scenes, and performers interacting with props that later dissolve into something else entirely. Every punch, slam, pull-up, and fall you see was physically performed and captured in 3D.
Almost every human figure in the video, including Rocky himself, was recorded volumetrically using Evercoast’s system. It’s all real performance, preserved spatially.
This is not the first time that A$AP Rocky has featured a radiance field in one of his music videos. The 2023 music video for Shittin’ Me featured several NeRFs and even the GUI for Instant-NGP, which you can spot throughout the piece.

The primary shoot for Helicopter took place in August in Los Angeles. Evercoast deployed a 56 camera RGB-D array, synchronized across two Dell workstations. Performers were suspended from wires, hanging upside down, doing pull-ups on ceiling-mounted bars, swinging props, and performing stunts, all inside the capture volume.
Scenes that appear surreal in the final video were, in reality, grounded in very physical setups, such as wooden planks standing in for helicopter blades, real wire rigs, and real props. The volumetric data allowed those elements to be removed, recomposed, or entirely recontextualized later without losing the authenticity of the human motion.
Over the course of the shoot, Evercoast recorded more than 10 terabytes of raw data, ultimately rendering roughly 30 minutes of final splatted footage, exported as PLY sequences totaling around one terabyte.
That data was then brought into Houdini, where the post production team used CG Nomads GSOPs for manipulation and sequencing, and OTOY’s OctaneRender for final rendering. Thanks to this combination, the production team was also able to relight the splats.
One of the more powerful aspects of the workflow was Evercoast’s ability to preview volumetric captures at multiple stages. The director could see live spatial feedback on set, generate quick mesh based previews seconds after a take, and later review fully rendered splats through Evercoast’s web player before downloading massive PLY sequences for Houdini.
In practice, this meant creative decisions could be made rapidly and cheaply, without committing to heavy downstream processing until the team knew exactly what they wanted. It’s a workflow that more closely resembles simulation than traditional filming.
Chris also discovered that Octane’s Houdini integration had matured, and that Octane’s early splat support was far enough along to enable relighting. According to the team, the ability to relight splats, introduce shadowing, and achieve a more dimensional “3D video” look was a major reason the final aesthetic lands the way it does.
The team also used Blender heavily for layout and previs, converting splat sequences into lightweight proxy caches for scene planning. Wilfred described how WildCapture’s internal tooling was used selectively to introduce temporal consistency. In his words, the team derived primitive pose estimation skeletons that could be used to transfer motion, support collision setups, and allow Houdini’s simulation toolset to handle rigid body, soft body, and more physically grounded interactions.
One recurring reaction to the video has been confusion. Viewers assume the imagery is AI-generated. According to Evercoast, that couldn’t be further from the truth. Every stunt, every swing, every fall was physically performed and captured in real space. What makes it feel synthetic is the freedom volumetric capture affords. You aren’t limited by the camera’s composition. You have free rein to explore, reposition cameras after the fact, break spatial continuity, and recombine performances in ways that 2D simply can’t.
In other words, radiance field technology isn’t replacing reality. It’s preserving everything.
Hi,
I'm David Rhodes, Co-founder of CG Nomads, developer of GSOPs (Gaussian Splatting Operators) for SideFX Houdini. GSOPs was used in combination with OTOY OctaneRender to produce this music video.
If you're interested in the technology and its capabilities, learn more at https://www.cgnomads.com/ or AMA.
Try GSOPs yourself: https://github.com/cgnomads/GSOPs (example content included).
I’m fascinated by the aesthetic of this technique. I remember early versions that were completely glitched out and presented 3d clouds of noise and fragments to traverse through. I’m curious if you have any thoughts about creatively ‘abusing’ this tech? Perhaps misaligning things somehow or using some wrong inputs.
There's a ton of fun tricks you can perform with Gaussian splatting!
You're right that you can intentionally under-construct your scenes. These can create a dream-like effect.
It's also possible to stylize your Gaussian splats to produce NPR effects. Check out David Lisser's amazing work: https://davidlisser.co.uk/Surface-Tension.
Additionally, you can intentionally introduce view-dependent ghosting artifacts. In other words, if you take images from a certain angle that contain an object, and remove that object for other views, it can produce a lenticular/holographic effect.
Y'all did such a good job with this. It captivated HN and was the top post for the entire day, and will probably last for much of tomorrow.
If you don't know already, you need to leverage this. HN is one of the biggest channels of engineers and venture capitalists on the internet. It's almost pure signal (minus some grumpy engineer grumblings - we're a grouchy lot sometimes).
Post your contract info here. You might get business inquiries. If you've got any special software or process in what you do, there might be "venture scale" business opportunities that come your way. Certainly clients, but potentially much more.
(I'd certainly like to get in touch!)
--
edit: Since I'm commenting here, I'll expand on my thoughts. I've been rate limited all day long, and I don't know if I can post another response.
I believe volumetric is going to be huge for creative work in the coming years.
Gaussian splats are a huge improvement over point clouds and NeRFs in terms of accessibility and rendering, but the field has so many potential ways to evolve.
I was always in love with Intel's "volume", but it was impractical [1, 2] and got shut down. Their demos are still impressive, especially from an equipment POV, but A$AP Rocky's music video is technically superior.
During the pandemic, to get over my lack of in-person filmmaking, I wrote Unreal Engine shaders to combine the output of several Kinect point clouds [3] to build my own lightweight version inspired by what Intel was doing. The VGA resolution of consumer volumetric hardware was a pain and I was faced with fpga solutions for higher real time resolution, or going 100% offline.
World Labs and Apple are doing exciting work with image-to-Gaussian models [4, 5], and World Labs created the fantastic Spark library [6] for viewing them.
I've been leveraging splats to do controllable image gen and video generation [7], where they're extremely useful for consistent sets and props between shots.
I think the next steps for Gaussian splats are good editing tools, segmenting, physics, etc. The generative models are showing a lot of promise too. The Hunyuan team is supposedly working on a generative Gaussian model.
[1] https://www.youtube.com/watch?v=24Y4zby6tmo (film)
[2] https://www.youtube.com/watch?v=4NJUiBZVx5c (hardware)
[3] https://www.twitch.tv/videos/969978954?collection=02RSMb5adR...
[4] https://www.worldlabs.ai/blog/marble-world-model
[5] https://machinelearning.apple.com/research/sharp-monocular-v...
[7] https://github.com/storytold/artcraft (in action: https://www.youtube.com/watch?v=iD999naQq9A or https://www.youtube.com/watch?v=f8L4_ot1bQA )
First, all credit for execution and vision of Helicopter go to A$AP, Dan Streit, and Grin Machine (https://www.linkedin.com/company/grin-machine/about/). Evercoast and Wild Capture were also involved.
Second, it's very motivating to read this! My background is in video game development (only recently transitioning to VFX). My dream is to make a Gaussian splatting content creation and game development platform with social elements. One of the most exciting aspects of Gaussian splatting is that it democratizes high quality content acquisition. Let's make casual and micro games based on the world around us and share those with our friends and communities.
Thanks darhodester! It was definitely a broad team effort that started with Rocky and Streit's creative genius which was then made possible by Evercoast's software to capture and generate all the 4D splat data (www.evercoast.com), which then flowed to the incredible people at Grin Machine and Wild capture who used GSOPs and OctaneRender.
What do you think about the sparse voxel approach, shouldn't it be more compute efficient than computing zillions of ellipsoids? My understanding of CGI prolly is t0o shallow but I wonder why it hasn't caught on much..
I believe most of the "voxel" approaches also require some type of inference (MLP). This limits the use case and ability to finely control edits. Gaussian splatting is amazing because each Gaussian is just a point in space with a rotation and non-uniform scale.
The most expensive part of Gaussian splatting is depth sorting.
The ghost effect is pretty cool, too! https://www.youtube.com/watch?v=DQGtimwfpIo
https://youtu.be/eyAVWH61R8E?t=3m53s
Superman is what comes to mind for this
I remember splatting being introduced as a way to capture real life scenes, but one of the links you have provided in this discusson seems to have used a traditional polygon mesh scene as training input for the splat model. How common is this and why would one do it that way over e.g. vertex shader effects that give the mesh a splatty aesthetic?
Yes, it's quite trivial to convert traditional CG to Gaussian splats. We can render our scenes/objects just as we would capture physical spaces. The additional benefits of using synthetic data is 100% accurate camera poses (alignment) which means the structure from motion (SfM) step can be bypassed.
It's also possible to splat from textured meshes directly, see: https://github.com/electronicarts/mesh2splat. This approach yields high quality, PBR compatible splats, but is not quite as efficient as a traditional training workflow. This approach will likely become mainstream in third party render engines, moving forward.
Why do this? 1. Consistent, streamlined visuals across a massive ecosystem, including content creation tools, the web, and XR headsets. 2. High fidelity, compressed visuals. With SOGs compression, splats are going to become the dominant 3D representation on the web (see https://superspl.at). 3. E-commerce (product visualizations, tours, real-estate, etc.) 4. Virtual production (replace green screens with giant LED walls). 5. View-dependent effects without (traditional) shaders or lighting
It's not just about the aesthetic, it's also about interoperability, ease of use, and the entire ecosystem.
From the article:
>Evercoast deployed a 56 camera RGB-D array
Do you know which depth cameras they used?
We (Evercoast) used 56 RealSense D455s. Our software can run with any camera input, from depth cameras to machine vision to cinema REDs. But for this, RealSense did the job. The higher end the camera, the more expensive and time consuming everything is. We have a cloud platform to scale rendering, but it’s still overall more costly (time and money) to use high res. We’ve worked hard to make even low res data look awesome. And if you look at the aesthetic of the video (90s MTV), we didn’t need 4K/6K/8K renders.
You may have explained this elsewhere, but if not—-what kind of post processing did you do to upscale or refine the realsense video?
Can you add any interesting details on the benchmarking done against the RED camera rig?
This is a great question, would love some some feedback on this.
I assume they stuck with realsense for proper depth maps. However, those are both limited to a 6 meters range, and their depth imaging isn't able to resolve features smaller than their native resolution allows (gets worse after 3m too, as there is less and less parallax among other issues). I wonder how they approached that as well.
Aha: https://www.red.com/stories/evercoast-komodo-rig
So likely RealSense D455.
I was not involved in the capture process with Evercoast, but I may have heard somewhere they used RealSense cameras.
I recommend asking https://www.linkedin.com/in/benschwartzxr/ for accuracy.
Kinect Azure
Couldn’t you just use iphone pros for this? I developed an app specifically for photogrammetry capture using AR and the depth sensor as it seemed like a cheap alternative.
EDIT: I realize a phone is not on the same level as a red camera, but i just saw iphones as a massively cheaper option to alternatives in the field i worked in.
ASAP Rocky has a fervent fanbase who's been anticipating this album. So I'm assuming that whatever record label he's signed to gave him the budget.
And when I think back to another iconic hip hop (iconic that genre) video where they used practical effects and military helicopters chasing speedboats in the waters off of Santa Monica...I bet they had change to spear.
Is there any reason to think https://thebaffler.com/salvos/the-problem-with-music doesn't apply here?
A single camera only captures the side of the object facing the camera. Knowing how far away that camera facing side of a Rubik's Cube help if you were making educated guesses(novel view synthesis), but it won't solve the problem of actually photographing the backside.
There are usually six sides on a cube, which means you need minimum six iPhone around an object to capture all sides of it to be able to then freely move around it. You might as well seek open-source alternatives than relying on Apple surprise boxes for that.
In cases where your subject would be static, such as it being a building, then you can wave around a single iPhone for the same effect for a result comparable to more expensive rigs, of course.
I think it's because they already had proven capture hardware, harvest, and processing workflows.
But yes, you can easily use iPhones for this now.
Looks great by the way, i was wondering if there’s a file format for volumetric video captures
Some companies have a proprietary file format for compressed 4D Gaussian splatting. For example: https://www.gracia.ai and https://www.4dv.ai.
Check this project, for example: https://zju3dv.github.io/freetimegs/
Unfortunately, these formats are currently closed behind cloud processing so adoption is a rather low.
Before Gaussian splatting, textured mesh caches would be used for volumetric video (e.g. Alembic geometry).
https://developer.apple.com/av-foundation/
https://developer.apple.com/documentation/spatial/
Edit: As I'm digging, this seems to be focused on stereoscopic video as opposed to actual point clouds. It appears applications like cinematic mode use a monocular depth map, and their lidar outputs raw point cloud data.
A LIDAR point cloud from a single point of view is a mono-ocular depth map. Unless the LIDAR in question is like, using supernova level gamma rays or neutrino generators for the laser part to get density and albedo volumetric data for its whole distance range.
You just can't see the back of a thing by knowing the shape of the front side with current technologies.
Right! My terminology may be imprecise here, but I believe there is still an important distinction:
The depth map stored for image processing is image metadata, meaning it calculates one depth per pixel from a single position in space. Note that it doesn't have the ability to measure that many depth values, so it measures what it can using LIDAR and focus information and estimates the rest.
On the other hand, a point cloud is not image data. It isn't necessarily taken from a single position, in theory the device could be moved around to capture addition angles, and the result is a sparse point cloud of depth measurements. Also, raw point cloud data doesn't necessarily come tagged with point metadata such as color.
I also note that these distinctions start to vanish when dealing with video or using more than one capture device.
No, LIDAR data are necessarily taken from a single position. They are 3D, but literally single eyed. You can't tell from LIDAR data if you're looking at a half-cut apple or an intact one. This becomes obvious the moment you tried to rotate a LIDAR capture - it's just the skin. You need depth maps from all angles to reconstruct the complete skin.
So you have to have minimum two for front and back of a dancer. Actually, the seams are kind of dubious so let's say three 120 degrees apart. Well we need ones looking down as well as up for baggy clothing, so more like nine, 30 degrees apart vertically and 120 degrees horizontally, ...
and ^ this will go far down enough that installing few dozens of identical non-Apple cameras in a monstrous sci-fi cage starts making a lot more sense than an iPhone, for a video.
Recording pointclouds over time i guess i mean. I’m not going to pretend to understand video compression, but could it be possible to do the following movement aspect in 3d the same as 2d?
Why would they go for the cheapest option?
It was more the point that technology is much cheaper. The company i worked for had completely missed it while trying to develop in house solutions.
Can such plugin be possible for Davinci Resolve, to have merge of scene captured from two iPhones with spatial data, into 3D scene? With M4 that shouldn’t be problem?
Yes: https://irrealix.com/plugin/gaussian-splatting-davinci-resol...
(I'm not the author.)
You can train your own splats using Brush or OpenSplat
I do believe a BTS is being developed.
Stay tuned
I've been mesmerized by the visusals of Gaussian splatting for a while now, congratulations for your great work!
Do you have some benchmarks about what is the geometric precision of these reproductions?
Thank you!
Geometric analysis for Gaussian splatting is a bit like comparing apples and oranges. Gaussian splats are not really discrete geometry, and their power lies in overlapping semi-transparent blobs. In other words, their benefit is as a radiance field and not as a surface representation.
However, assuming good camera alignment and real world scale enforced at the capture and alignment steps, the splats should match real world units quite closely (mm to cm accuracy). See: https://www.xgrids.com/intl?page=geomatics.
nice work.
I can see that relighting is still a work in progress, as the virtual spot lights tends to look flat and fake. I understand that you are just making brighter splats that fall inside the spotlight cone and darker the ones behind lots of splats.
Do you know if there are plans for gaussian splats to capture unlit albedo, roughness and metalness? So we can relight in a more realistic manner?
Also, environment radiosity doesnt seem to translate to the splats, am I right?
Thanks
Thank you!
There are many ways to relight Gaussian splats. However, the highest quality results are currently coming from raytracing/path tracing render engines (such as Octane and VRay), with 2D diffusion models in second place. Relighting with GSOPs nodes does not yield as high quality, but can be baked into the model and exported elsewhere. This is the only approach that stores the relit information in the original splat scene.
That said, you are correct that in order to relight more accurately, we need material properties encoded in the splats as well. I believe this will come sooner than later with inverse rendering and material decomposition, or technology like Beeble Switchlight (https://beeble.ai). This data can ultimately be predicted from multiple views and trained into the splats.
"Also, environment radiosity doesnt seem to translate to the splats, am I right?"
Splats do not have their own radiosity in that sense, but if you have a virtual environment, its radiosity can be translated to the splats.
This may interest you: https://www.linkedin.com/posts/radiancefields_in-case-you-we...
Back in 2001 I was the math consultant for "A Beautiful Mind". One spends a lot of time waiting on a film set. Eventually one wonders why.
The majority of wait time was the cinematographer lighting each scene. I imagined a workflow where secondary digital cameras captured 3D information, and all lighting took place in post production. Film productions hemorrhage money by the second; this would be a massive cost saving.
I described this idea to a venture capitalist friend, who concluded one already needed to be a player to pull this off. I mentioned this to an acquaintance at Pixar (a logical player) and they went silent.
Still, we don't shoot movies this way. Not there yet...
Hi David, have you looked into alternatives to 3DGS like https://meshsplatting.github.io/ that promise better results and faster training?
I have. Personally, I'm a big fan of hybrid representations like this. An underlying mesh helps with relighting, deformation, and effective editing operations (a mesh is a sparse node graph for an otherwise unstructured set of data).
However, surface-based constraints can prevent thin surfaces (hair/fur) from reconstructing as well as vanilla 3DGS. It might also inhibit certain reflections and transparency from being reconstructed as accurately.
Really cool work!
Random question, since I see your username is green.
How did you find out this was posted here?
Also, great work!
My friend and colleague shared a link with me. Pretty cool to see this trending here. I'm very passionate about Gaussian splatting and developing tools for creatives.
And thank you!
[dead]
[dead]
[flagged]
Is it possible you didn’t comprehend which parts were 3D?
Or if you did, perhaps a critique is better rather than just a low effort diss.
I viewed on a flat monitor, so perhaps I missed some 4D and 5D too.
/i
That's hurtful.
Take the money and never admit to selling this shit. Why would you ever willingly associate your name with this?
Read the room. Plenty of people are interested in the aesthetics and the technology.
Just because people want to give you money doesn't mean you toss your dignity out the window.
I want to shoutout Nial Ashley (aka Llainwire) for doing this in 2023 as a solo act and doing the visuals himself as well - https://www.youtube.com/watch?v=M1ZXg5wVoUU
A shame that kid was slept on. Allegedly (according to discord) he abandoned this because so many artists reached out to have him do this style of mv, instead of wanting to collaborate on music.
> so many artists reached out to have him do this style of mv, instead of wanting to collaborate on music
Well yes, the visuals are awesome, while the music… isn’t.
I love HN because everyone is so different outside of the core purpose of the site. Sometimes people reference art, or a book or something, that I'd never would think to exist.
Llainwire was my top artist listens throughout 2023, so it’s always funny to bump into reactions that feel totally different from my world/my peers.
You're saying Nial used guassian splatting for his video? Or the style of camerawork, staging, and costuming is similar?
Put another way, is this a scientific comparison or an artistic comparison?
It sounds like to me he [artist] was disappointed that more people were interested in his video editing than his musical efforts.
Never did I think I would ever see anything close to related to A$AP on HN. I love this place.
Hah, for the past day, I've been trying to somehow submit the Helicopter music video / album as a whole to HN. Glad someone figured out the angle was Gaussian.
I run a programming company and one of my sales people was surprised to see I liked soundcloud rap. I was like:
What did you expect?
>Classical music?
Nah I like hype, helps when things are slow.
Prokofiev's Alexander Nevsky goes hard if you do want something in the classical world though.
Doctor Octagon’s “moose bumps” iirc.
I know right? I'm into both things tech and hip hop, didn't expect them to collide
Is he wearing... hair curlers?
That's what one does when they want some fiyah curls.
And nearly a Carti post at the top of HN
I'm taking the opportunity to FWAEH in here
Bro the day I see Carti on HN is the day I'm leaving this site, some things shouldn't mix
One day we’ll see a an Osamason or Xaviersobased post on HN
No fucking way have I just seen Osamason and Xav mentioned on HN
Helicopter had a Carti feature that was pulled but leaked, and a promo photoshoot with the two of them for it.
r/playboicarti is one of my favorite places to go to just turn my brain off and see shitposts that have a certain reminiscence to me, almost a "high school class when the teacher didn't show up" vibe.
95% of its posters are in high school or lower, and are in class during daytime hours, so that's a part of why it makes you feel like that
Indeed. It's a good vibe when you want to turn your brain off sometimes though, same reason why Beavis and Butthead succeeds and is re-aired.
yeah that had me do a double take lol
Why is that “cool” or desirable?
Because expertise, love, and care cut across all human endeavor, and noticing those things across domains can be a life affirming kind of shared experience.
Perfect comment, but it’s very funny to me that you even needed to say it. Some folks on here talk like moon people who have never met humans before.
Favorited. This will be a timeless comment for me, and will remind some perspective to appreciate things I might not be otherwise familiar with, and thereby care about.
Desirable because it’s a rare culture + tooling combo. I’m into both and HN is one of the few places I would see them come together. So yeah, “cool”
[flagged]
"Please don't post shallow dismissals, especially of other people's work. A good critical comment teaches us something."
Right, that's how culture works. There's no universal definition.
I'm not, I promise
[flagged]
Not everything has to be. Sometimes, an artist's style or a particular track just hits a particular vibe one may be after or need in a particular moment.
I'm not a fan of this music either but I could imagine hearing it while I'm studying or coding.
Don't trash something just because it's not your vibe. Not everything has to be Mozart.
I mean, it's not like I trashed it or compared it to Mozart—I even made sure to include "interesting, stimulating, or tonally remarkable" in an attempt to preempt that latter pushback.
But even if I did, why can't I? It's fine to call some music shit. Just like you can call my opinion shit.
Policing dissenting opinions and saying everything is equally worthy of praise are two sides of the same coin sliding in the vending machine that sells us the sad state of affairs we live in today.
You can say whatever you want, but pretentious sneering is annoying, don't be surprised if people push back.
You absolutely trashed it in your first sneering, shitty swipe about “culture”. You don’t get to make comments like that and then whine about “policing” like a four year-old caught in the cookie jar.
Why isn't it?