As immersive video formats grow in popularity, NPR audio engineers have been experimenting with techniques for how to record high-quality spatial audio — an audio format that allows a listener to experience sound in all directions.
NPR’s initial foray into 360-degree video began as part of the Journalism 360 Challenge (J360), which focused on exploring some of the simplest ways to work in an immersive medium with a compact and portable equipment setup.
During a recent trip to Puerto Rico, I helped engineer a rich, immersive audio-video project on the aftermath of Hurricane Maria. Our audio engineering team also explored new territory — combining sound-rich and immersive audio with in-depth reporting and narrative.
As an audio engineer, I was asked to join the project to focus explicitly on high-end immersive spatial audio. This meant two things: First, capturing audio from the camera position with a higher-quality microphone and recorder setup. And second, using more advanced techniques and tools in post-production to enhance the final product, including adding additional audio recorded separately from the main spatial rig.
In this post, we provide a detailed look at recording spatial audio from an engineer’s perspective. It will be most useful for those with a basic understanding of audio engineering fundamentals. We’ll use our recent trip to Puerto Rico to demonstrate one example of an immersive spatial audio workflow.
Read our last post on immersive media: A beginner’s guide to spatial audio in 360-degree video.
Spatial audio, defined
We offered some definitions in a previous post. Here are some additional terms we’ll use:
Binaural audio delivers a full 360-degree soundscape through a specially encoded stereo file that has to be experienced through headphones. It models the way sound reflects around the head and within the folds of the ear. In fact, it is often recorded with a microphone that mimics the size and shape of a human head! You can hear in every direction, but the audio is not responsive to user input. So, if you move your head, the audio doesn’t change accordingly. The industry refers to this as “head-locked” audio. This is a successful and popular demo of binaural audio:
Ambisonics or 3D audio delivers a full 360-degree soundscape that is responsive to a visual field. When you move your head in one direction or another, the audio changes to reflect that movement. This is the type of spatial audio we were most interested in experimenting with as part of the J360 grant.
An ambisonic microphone is designed to capture raw ambisonic audio. It does this with multiple capsules, usually arranged in a tetrahedron with four capsules (or more), each pointing in a different direction. This provides audio signals from various directions that can then be encoded into a binaural or ambisonic/3D audio for listening on headphones or other setups.
A-Format is the raw audio from an ambisonic microphone; one channel of audio for each capsule.
B-Format is a standardized, multi-channel audio format for ambisonic audio. Different models of ambisonic microphones must have their raw A-Format recordings converted to B-Format to be compatible in post-production and for final delivery to content platforms like Facebook or YouTube.
AmbiX and FuMa are two conventions for B-Format ambisonics used to determine channel ordering and weighting. FuMa is an older convention; ambiX is a newer convention used by Facebook and YouTube. It is most common to work in ambiX, but there are tools available to convert FuMa to ambiX.
Point source is a term used to distinguish standard, single-source audio from ambisonic audio. It is often used to refer to auxiliary audio gathered separately from the main ambisonic microphone during a shoot (e.g. audio recorded with a lavalier, shotgun, stereo mics, etc.).
Headlocked audio is audio intended to be stationary regardless of perspective. While spatialized audio will track with the video and change perspective as the video is panned and tilted by the end user, headlocked audio will sound the same throughout. This is used for narration, voice overs, music beds, etc.
Roll, pitch, yaw are the three axes of head movement.
Tips for recording spatial audio
All of the basic tips for recording in 360-degree video that we outlined in our last post remain relevant when moving into more complex productions and gear setups.
Spend time on the rig placement
Be aware of how a scene sounds from the camera/microphone position. We placed the ambisonic mic in the position we thought was best, and then placed other mics around the space to capture isolated sounds up close. That enabled us to enhance the balance of those sounds later. It is difficult to achieve an immersive audio scene synced with the video in real time without this coincident recording. We tried to find the best placement of the ambisonic mic to capture the scene, then captured additional point sources in mono or stereo up close. This gave us the option in post-production of mixing the two to create a more realistic experience.
Keep the ‘room’ small
Again, even with a higher quality and more sensitive microphone, there is a limit to the physics of what it can capture. Make sure there is enough interesting sound in the immediate area for the audio recording. As with most audio recordings, the closer the microphone to the source, the better the signal-to-noise ratio. Don’t count on being able to clearly capture far-away sounds with the ambisonic microphone; consider adding these distant sounds as point sources in post.
Plan your disappearing act
While your audience will realize there will be some equipment visible in a 360-degree video, you can try innovative solutions to minimize these intrusions. For example, we found that turning the microphone upside down made the fuzzy windscreen more subtle and easier to stitch over and hide in the final video:
Mark your zeroes
Be sure to align the camera and microphone orientations. Pay particular attention when inverting the microphone as above; always notate microphone orientation and continue to keep the front of the microphone aligned with the camera zero.
Clap in two or three directions around the rig to make audio/video syncing easy on the backend. It’s also helpful to announce the orientation and direction of these claps e.g., “Front!” ? “Left!” ? “Right!” ? “Back!” ?.
Tag your audio
In addition to slating the main rig recording, keeping track of any additional recordings becomes critical when planning supplementary point sources. Keep notes, audibly slate, and/or rename files. Each will be immensely helpful in post production.
Record for at least three minutes
Extra audio ensures smooth transitions, creates options for editing, provides an additional ambi bed if needed and more. This guidance applies to both the main ambisonic rig and additional point source recordings.
With these technical details in mind, let’s explore our production from our trip to Puerto Rico, where we captured the story of restoring electrical service to remote locations that had been without power for months after Hurricane Maria. Each production presented unique needs and challenges, and provided lessons about recording in different scenarios.
We only had a few weeks for this production. Less time to prepare and less confidence in knowing what we would need meant we ended up with a large and diverse pack.
Once we got to the island, the environment and our subject matter dictate our equipment choice and setup.
The primary equipment we used in this project:
- Sennheiser Ambeo ambisonic microphone
- Sennheiser MKH418 mid-side shotgun microphone
- DPA 4060 omnidirectional lavalier microphones (several on Zaxcom RF beltpacks, plus a stereo pair)
- Sound Devices MixPre-6 multichannel recorder
- Sound Devices 788T multichannel recorder
- Roland R-05 stereo recorder (for 4060 stereo pair)
- Zoom H2n 2/4 channel recorder
- Zaxcom TRXLA3 wireless beltpack/recorder
The main rig was an evolution of our previous setup. We separated the bridge plates with a longer threaded rod to accommodate a larger microphone and rigged the multichannel recorder to hang either between the tripod legs or strapped to the base of the monopod.
I carried additional recording equipment to capture point sources: a back-up recorder, the Sound Devices 788T, the Zaxcom wireless receivers and lavalier beltpacks and a Sennheiser MKH418 shotgun on a fishpole.
I gave myself further options by recording almost constantly with a pair of omni lavaliers clipped to the brim of either a baseball cap or a hardhat. This may seem like a rudimentary, pseudo-binaural recording method, but to be truly binaural, the setup would’ve required an arrangement that more closely mimics the ears’ positioning and the acoustic shading provided by the human head. Dummy heads tend to be expensive, but the method we used — a spaced omni stereo pair — served our purpose for general ambience recording.
We shot several scenes involving helicopters lifting workers and equipment to remote locations. For these, I was able to capture audio with the Zoom H2n for spatial audio (not full 360; it can capture front and back but not vertical information). I also had lavaliers fitted to the workers with beltpacks that could record locally, however they quickly went out of range of my wireless equipment as the helicopter flew away to a remote location.
The video produced from this trip shows the evolution of our team’s approach to 360 video and the value of additional audio production work. Using a combination of ambisonic and point source recordings allows for versatility in post-production and adds depth and richness for viewers.
If you’re interested in looking at more examples of 360 video, check out this video we produced using sound effects in lieu of field recordings to create an immersive listening experience.
Andy Huether is an audio engineer at NPR.
The following NPR team worked on this project: Nick Michael (NPR Visuals), Rob Byers (MPR/APM), Maia Stern, Morgan Smith, and Keith Jenkins (Visuals), Chris Nelson, Kevin Wait (Audio Engineering), and Bill McQuay of Eco Location Sound.