A beginner’s guide to spatial audio in 360-degree video

A foot of water, contaminated by sewage, blocks passage in Florida, Puerto Rico, over a week after Hurricane Maria. NPR tested a ‘run-and-gun’ 360-video and audio rig there in September. (Nick Michael/NPR)

Immersive recordings, whether in the form of virtual reality or 360-degree video, have an unparalleled ability to transport an audience and give them a sense of presence and dimension in a new place. Since winning a grant through the Journalism 360 Challenge last summer, we’ve experimented with bringing 360-degree video and spatial audio together.

This guide captures some of our early findings and includes a sample rig as well as tips on recording, editing and publishing.

Spatial audio, defined

In general, the term “spatial audio” is used to broadly mean audio that is not mono (where, in headphones, you hear the exact same thing on both sides).

There’s a range of complexity for spatial audio:

Stereo audio is the most basic spatial audio. It’s recorded in discrete left and right channels. In headphones, you’d be able to easily place sounds on a two-dimensional axis, from left to right.

Surround sound audio — in most cases — relies on engineers to mix multiple audio channels (e.g. 5.1, 7.1)  for playback on numerous speakers that literally surround an audience. You’ve probably heard surround sound in movie theaters, where it’s presented by companies like DTS, THX and Dolby.

Binaural audio delivers a fully 360-degree soundscape through a specially-encoded stereo file that has to be experienced through headphones. It models the way sound reflects around the head and within the folds of the ear. In fact, it is often recorded with a microphone that mimics the size and shape of a human head! As demonstrated in this video, you can hear in every direction, but the audio is not responsive to user input  — if you move your head, the audio doesn’t change accordingly. The industry refers to this as “head-locked” audio.

Ambisonics or 3D audio delivers a fully 360-degree soundscape that is responsive to a visual field. When you move your head in one direction or another, the audio changes to reflect that movement. This is the type of spatial audio we’re most interested in experimenting with as part of the J360 grant.

This guide provides a slightly deeper dive into these audio types.

Previous NPR experiments with immersive audio and video

NPR audio engineers have been experimenting with custom surround and spatial audio recording rigs for years, even though those immersive sounds in stories would be lost on listeners using household or car radios. One experiment with a technology called “Neural Surround Sound” in the mid-2000s enabled listeners with a surround sound setup to hear the mixes.

NPR audio engineer Josh Rogosin shows off his custom surround sound rig before recording Voodoo rituals in Cove, Benin in 2004. The four multi-tracked omnidirectional mics allowed him to encode for surround sound (not audible on most household radio devices). (Photo courtesy of Josh Rogosin)

In more recent years, NPR’s Visuals and Music teams have done a few experiments with 360-degree video and audio, including:

Some NPR Member stations and affiliates have also experimented with 360-degree audio and video, including: KCRW, Nebraska Educational Television, Classical MPR, The Current, WBUR, Curious City/WBEZ and NHPR.

Now, NPR Visuals and NPR Audio Engineering are interested in combining the forces and collaborating on stories that benefit from both 360-degree audio and video. While we don’t believe immersive audio and video will replace their less-spatialized counterparts anytime soon (if ever), we want to have these 360 recording approaches in our toolbox.

Building a 360-degree video rig for the ‘run-and-gun’ producer

There are many different production scales for 360-degree content, from cinematic multi-camera rigs to handheld recorders. As part of the first phase of our J360 grant, we are establishing a baseline for what a solo 360-degree video producer can manage in the field. In the next phase, we’ll explore what it takes for video and audio teams to produce fully in tandem.

We wanted to build a run-and-gun rig that was:

Affordable: It had to cost less than $2,000.

Portable: We needed to combine video and audio components in the same self-contained rig and — as much as possible — hide the audio equipment from camera view. We also needed the rig to mount to a free-standing monopod or easily attach to fixtures in the environment.

Manageable solo: We needed a kit that could record ambisonic audio and be managed by a solo video producer. Technically speaking, that meant the ability to shoot with few video stitch lines (the seams between different camera angles) to speed post-production and the option to capture discrete files from each camera angle so we can finesse those stitch lines ourselves. We also wanted a camera with a companion smartphone app for remote monitoring.

Ultimately we designed a rig that we road tested in Puerto Rico after Hurricane Maria in September. We shot two videos while there: Maria’s Destruction in Puerto Rico and “Dusting Off Old Traditions” in Maria’s Aftermath.

NPR’s “run-and-gun” 360-video and audio rig. (Nick Michael/NPR)

The key components of the rig are:

  • Kodak Orbit360 4k camera
  • Zoom H2N portable recorder (with windscreen and updated firmware)
  • Sirui SUP204SR monopod
  • 2 x Sandisk 64 GB Micro-SD Card
  • 2 x SmallRig Bridge Plate for RRS B2-LR-II Clamp
  • 3 x threaded steel rods: 1/4″-20 thread size, 6-inch length
  • LowePro Adventura 170 carrying case (not pictured)

In the field, the rig proved relatively rugged and unobtrusive: we planted it in 18 inches of standing flood water and fastened it to the back of a pickup truck. People quickly forgot about it while they siphoned water from roadside mountain springs into household containers.

But what about quality? The audio recorder’s preamp has difficulty capturing quieter sounds (which are often further away). There are also places where the audio image sounds like it collapses: As you turn your head, you can hear dips and transitions between audio poles. The fix here would be more and better mics, which would mean more backend mixing, which would mean longer post-production.

The video’s quality is loosely equivalent to the audio’s. The video resolution is 4k, but when stretched across 360 degrees, that 4k resolution is lower quality than most “flat” 1080p videos. Upping the resolution to 8k would increase the cost of the rig and, in almost every case, require more camera angles. That means backend stitching, which again means longer post-production. So while the 4k resolution is not ideal, it’s forgivable considering we published the videos on social platforms like YouTube.

So yes, the rig is easy for a solo reporter to use. And, it provides 360-degree audio and video. But the quality of both the video and audio are many steps below “immersive.”

Tips for recording 360-degree video and audio

Spend time on the rig placement

Placing the rig requires a producer to balance a scene’s visual and audio interests. You want to choose a recording location that’s worth both seeing and hearing in 360 degrees. And since you’re going to plant the camera and walk away, make sure it’s in a safe spot and on stable ground. In natural settings, you can cover and bolster the rig’s feet with stones or earth.

Keep the ‘room’ small

With just two lenses on the camera, you’re going to get some fisheye effect; elements in the distance will quickly fade into a pixel-y mess (especially when shooting in only 4k). We’ve found it useful to think of the main action of any scene happening inside an area the size of a spacious living room. Anything happening beyond that distance will most likely read as visual ambience, not discernible action.

Plan your disappearing act

When deciding where to place the camera, consider how you’ll disappear from camera view. You have two basic options: blend in with a crowd (this approach has the added benefit of keeping an eye on your gear) or find a hiding spot (which could include trees, hills, fences, buildings, etc).

Anticipate action and audio levels

You have to anticipate both likely action and audio levels for your “room.” After you plant the rig and hide, you can monitor the video on your smartphone — but obviously, you can’t move the camera in response to the action in a scene without resetting the shot.

Like with flat video, there’s some luck involved, but thoughtfully anticipating the action dramatically improves your odds of capturing a good scene.

Beware the seams

Consider where you’re placing the “seam,” or the boundary between two angles. In post-production, your seams will become stitch lines. On tighter editing timelines, it’s difficult to perfectly stitch elements on this seam, especially elements that move across the seam. Plan the orientation of your camera accordingly.

Also, if you have elements in the extreme foreground, you’re likely going to have to choose between perfectly stitching both foreground and background.

Mark your zeroes

Because you’re recording video and audio separately, you’re going to have to double-check alignment of the visual and audio fields on the backend. The idea is for the “zeroes” (the 0 degree of your 360-degree field) of both video and audio to align. If they don’t, try to make the difference some multiple of 90 degrees to keep manual alignment simpler on the backend (you can figure out field rotation once and copy/paste effects).

Do not arbitrarily change the direction of either video or audio during a shoot without documentation. This will require unnecessary and time-consuming work on the back end aligning audio and video fields by ear.

Clap it!

Clap in two or three directions around the rig to make audio/video syncing easy on the backend.

Tag your audio

If you’re worried that you might lose track of which video and audio clips belong together, speak a scene description tag into the top or bottom of your recording (e.g. “This is ambi from the bridge”).

Record for at least three minutes

Five minutes is great. Ten minutes is even better. In the final product, you’ll want to allow for at least 15-20 seconds in each scene to give the audience time to look around. Recording at least three minutes gives you time to run, hide, record, return and have plenty of flexibility in the edit.

The evolving 360-degree video workflow

One of the challenges of making 360-degree video is that new software and software updates are continuously changing workflows. For particulars, it’s better to rely on the most recent rundown (like this one from Adobe).

However, some general workflow stages and principles seem to remain constant.

  1. Ingest. Move files from cards to hard drives. Organize your media so you can easily locate multiple video angles and audio files from the same scene. Renaming files with custom pre-fixes is one way to keep file groups together.
  2. Stitch. Stitch your angles. Depending on the timeline and complexity of your shoot, you might only do a rough stitch for review here. Then, after you’ve picture-locked, you can carefully re-stitch only your final shots.
  3. Set project and sequence settings. Adobe has more specifics here.
  4. Sync audio and video. Lay out a sequence in which to sync all your audio and video clips. Draw your edit selections from here.
  5. Edit. Shots need to sit on screen long enough for people to look around. A good minimum seems to be 15-20 seconds.
  6. Graphics and titling. Design your graphics presentation. We’ve placed titles in every quadrant, but your project may require something different. Prevent graphics bubbling with a plugin like SkyBox (included in Adobe Premiere 2018 CC).
  7. Final stitch + audio mix + color correction. Finesse it all!
  8. Export. Consult Adobe for the latest specifics.

When it comes to editing, there are varying philosophies and techniques for covering up production elements, particularly from the bottom of the scene. For these experiments, we’ve left things as they are in the interest of transparency and speeding up post-production.

One of the more amusing results of this approach is this shot of the “original grip,” a video producer’s hand, holding the rig on the roof of a car (pictured below).

It can be hard to remove yourself from the scene in a 360-degree video. In this YouTube screenshot, you can see the NPR producer’s hand holding the camera rig to the top of the car. (Nick Michael/NPR)

Publishing 360-degree video

There are plenty of publication platforms for 360-degree content. At this point, we imagine most of the audience will encounter our 360-degree videos on Facebook and Youtube via smartphones and computers. In future experiments and bigger stories, we’ll probably produce with other platforms in mind, like headsets and specialized 360-degree web platforms. 

When you’re sharing the videos, remember that you’ll need to guide users through the experience. Remind them which browsers work and ask them to wear headphones.

One important note — so far we’ve published all of our 360-degree experiments on Facebook without spatial audio because we didn’t have enough time to troubleshoot the platform’s Spatial Audio Workstation. It also seemed like less of a loss considering how few users watched these Facebook videos with the sound on (less than 30 percent on each video). We’re interested in seeing whether that number would increase with explicit requests for audiences to wear headphones. Facebook supplies these for pieces with spatial audio, but we could also mention it in our video descriptions.

What’s next

Next, we want to explore higher-fidelity 360-degree audio and video approaches. We expect to send a video producer and audio producer into the field to work in tandem, and we expect to produce a custom mix with much more attention to audio quality. This will require more planning and longer post-production timelines with the goal of producing a signature 360-degree audio and video package.

The NPR team working on this project also includes Rob Byers (Training), Maia Stern, Morgan Smith, and Keith Jenkins (Visuals), Chris Nelson, Kevin Wait, and Andy Huether (Audio Engineering) and Bill McQuay of Eco Location Sound

Nick Michael is the news and projects video editor for NPR Visuals.