This post is for audio producers and journalists who work with news or documentary-style storytelling. This guide will help you make judgment calls about the usability of audio.
There are many ways audio can go wrong: a press conference recording with a buzz, hard-to-understand phone tape or lots of “p-pops” — this list goes on. Sometimes those technical problems raise questions of whether bad tape should be used or fixed (like with audio tools such as equalization, using production techniques to hide the problem, adjusting levels in a mix, etc.).
These three fundamental characteristics of good audio can help guide your decision-making when it comes to problematic tape. When you find yourself stuck trying to decide what to do, run your audio through the DIN* test, an informal diagnostic I developed with input from a number of public radio engineers and producers.
*”DIN” in this context has nothing to do with the German Institute for Standardization. It also sounds better than NID.
Let’s examine these three concepts:
We want to present our listener with audio that tells a story clearly, without yanking her attention toward some random anomaly. Perhaps most obviously, this idea applies to technical problems — we want to avoid hums, hisses, distortion, pops and other technical issues.
But it also applies to the sounds in the scene itself, as we don’t want sirens passing by in the middle of an outdoor interview, lawn mowers outside a window during a tape sync or loud music during an interview in a coffee shop.
Use the DIN test in the field
Ask yourself if sound recorded in this office/state capitol building/city park will intelligible and understandable. Are they any distracting background noises?
Audio also needs to be understandable. If it’s a voice our listener hears, she needs to be able to understand it without straining. If it’s active or scene-setting sound, she needs to be able to make out what’s going on and interpret the various sounds in the scene.
Our audience listens in all sorts of environments — doing the dishes, driving in the car, on earbuds in the subway, etc. The audio we provide needs to be intelligible in those less-than-ideal scenarios. If it is hard to understand on a set of nice headphones or speakers, it probably won’t hold up in imperfect circumstances.
For more on intelligibility, see this related post: Audio truth killers: an approach to collecting better sound.
Everything in our audio needs to sound realistic. This characteristic is helpful when determining the success of an edit — or whether technical problems (like ticks and hums) need attention. If it couldn’t happen in the recorded scene, it shouldn’t happen in your audio.
A note about consistency
Consistency plays a big role in all three of these metrics, though it’s not a characteristic itself. For instance, if a background sound suddenly changes in level, or if a hum stops and starts throughout a clip, both would distract and sound unnatural.
In the same light, if the quality of a VOIP or phone connection constantly changes, inconsistencies in the quality of the guest’s voice would distract and make it sound unnatural.
DIN tests on real-world audio examples
Example 1: FAIL
Here we have phone tape that fails for the simple fact that it is unintelligible. It’s hard to make out a single word.
It’s also distracting. Because you are straining to understand the audio, you’d likely get sucked into the act of deciphering and would miss parts of whatever came next.
There aren’t any ways to improve the intelligibility of this audio, so it should not be used.
Example 2: FAIL
This tape is intelligible but it still fails the DIN test because the buzz on the tape is so loud it’s distracting.
The listener would likely need to take a moment to understand what the new noise is and, in doing so, might miss the first couple of words of the clip. Additionally, the buzz is not natural — it’s some sort of electrical issue that has no place in the scene or context of the tape. If the buzz was removed, this clip could be usable.
Example 3: PASS
In this clip, the speaker is intelligible and there is nothing to distract from his voice.
The background sound is a little noisy, yes, but it’s part of the scene (in this case it happens to be a tattoo parlor), and is not distracting. So this example passes the DIN test!
Example 4: IT DEPENDS
This clip is similar to the previous one, except now we can hear low-level music in the background.
As it stands now, the clip is fine — the music isn’t so loud that it’s distracting. But if the clip needed internal edits, the rhythm of the music would make it difficult. A few small edits would be distracting and make the clip sound unnatural.
Example 5: FAIL
This example comes from a piece about an outdoor fish market.
What is the knocking, banging noise we are hearing? Given the context, we know it’s probably crabs … but is it the sound of crabs being poured out of a bucket? Crabs on a table? Is the man that says “My hands are chew toys …” holding them?
In this context, the tape fails the DIN test because it’s not intelligible. It’s not easy for us to understand what we’re hearing and there is no context provided, other than that they are crabs.
However, with a little more detail in the script, this tape could be usable. For example: “… and like any prima donna, they can be tough to handle. Stan Kaiser tries to wrangle the crabs one-by-one out of a bucket.”
‘But I don’t want it to sound natural!’
There is certainly audio in this world that is explicitly meant to be fantastical and not realistic. In some productions, sounds are created, morphed, changed and “effected” in order to artificially create a listening experience. If your work squarely fits in a more produced genre, that’s OK — but for audio reportage and documentary work, your audio should sound natural.
Thanks to Flawn Williams, Craig Thorson (MPR), Chris Nelson (NPR), Kevin Wait (NPR), Michael Raphael (WNYC) and Andy Huether (NPR) for their assistance with this guide.