The producer's handbook to mixing audio stories

Great mixing is the first step toward creating an immersive listening experience for your audience. (Deborah Lee/NPR)

Whether you are mixing radio news stories or podcasts, this guide provides tips to help you improve the quality of your mixes. You will find a list of essential mixing tools, tips to make adjusting levels less tedious and guidance on how to use equalization and compression. You’ll also find suggestions for solving common audio problems, like bad phone tape.

We recommend that you first check out the Ear training guide for audio producers — it will give you good insight into many of the problems you’ll encounter in the mix process.

You can scroll through the entire guide, or toggle to a specific section below:

Learn more about a section in-depth:

What is mixing?
Definitions
The tools
Adjusting levels
Plug-ins
Equalization
Compression
Fixing common mixing problems
Assess your mix

For a detailed look at an efficient and organized approach to mixing, check out the companion piece to this post: How to mix: 8 steps to master the art of mixing audio stories.

What is mixing?

Mixing is the process of creating balance, consistency and clarity with differing audio sources. It only happens when the elements of the story are edited and arranged.

In mixing, voices become clearer, transitions are smoothed, loudness is made consistent and ambience or music beds are balanced so they do not compete with dialogue. Technical issues, like plosives or tinny audio, are also addressed in mixing.

Most audio stories and podcasts are made from clips of audio that originate from different sources. A news story might consist of recordings of a reporter speaking, a few actualities, some scene-setting sound, and multiple recordings of ambience. Those things are all edited, arranged, and layered in a way that tells a story — but it won’t sound like a cohesive whole until it is mixed. In fact, most anything with audio is mixed in some way, whether it’s a blockbuster movie or a hit pop song. 

I’m thinking from the listener’s point of view… I’m trying to make the production process disappear. I want the listener to focus 100% on the story. I’m trying to eliminate any distraction that pulls the audience out of the zone. That’s what guides me. 

Johnny Vince Evans,
Technical Director, American Public Media

Definitions

These terms are used throughout this post and the step-by-step guide. Find more common audio storytelling terms in our glossary of production terms.

Audio editor: Software that allows you to edit, arrange, and mix audio. Another name for an audio editor is digital audio workstation, or DAW.

Plug-in: Software that performs a specific audio function, like a meter, equalizer, or compressor. Audio editors typically offer a suite of stock plug-ins but you can also use third-party plug-ins. Depending on the audio editor, you can use plug-ins on all audio on a track, on single audio clips, or both.

Loudness: Describes how intense the audio sounds to your ears. Loudness is a reference to perception.

Level: Describes the intensity of an audio signal. The term should be accompanied by a descriptor, such as listening level. When talking about mixing in audio editors, levels is informally used to reference both a measurement of the overall mix as well as the individual balance of each track in relation to the others.

Volume: Used to describe the listening level of a speaker or headphone.


The tools

(Deborah Lee, NPR)

The tools you need to mix audio storytelling are headphones, audio editing software, and a loudness meter. Equalization and compression plug-ins can be helpful, and a noise reduction plug-in can help polish your work.

In this section:

Headphones
Audio editing software
Loudness meter
EQ plug-in
Compression plug-in
Noise reduction

Headphones
A quality pair of headphones is essential for mixing. You want a pair that fits comfortably around your ears to block out the competing noises in your working environment (but stay away from noise cancelling headphones). Most importantly, they need to be high quality to allow you to hear problems — wind noise, plosives, bad edits, tonal issues, etc. The earbuds that came with your phone probably won’t cut it, nor will consumer headphones popular for enjoying music.

Speakers are very useful, but they are finicky tools. A pair of speakers will sound different in every room. Some rooms will make the speakers sound bassy, some will make them sound bright (too much high frequency). That’s why you see all sorts of acoustic treatment in recording studios — to cut down on inconsistent frequencies and reflections.

Laptop speakers are useful for assessing a mix (since people will listen to your work with them). However, they should not be used as your primary monitoring device, since they cannot come close to reproducing the full range of frequencies.

Regardless of which you use, pick a working environment that is free of noise, office chatter, and other distractions.

Audio editing software
The work of mixing for audio storytelling is achieved with the tools that come with most any audio editor: Level controls, equalization, compressors, and meters. There are a few things to consider when picking an audio editor, and most have nothing to do with audio!

Audio production is often a collaborative experience. It’s common for a story to go through multiple people on a production team — and they will all use the same audio editor. Unfortunately, it’s relatively difficult to move projects between different audio editors. For that reason, pick the audio editor that your frequent collaborators use.

“OMF” and “AAF” Projects
You may have heard the terms “OMF” or “AAF” discussed when talking about moving projects between audio editors. These are tools to translate a session from one application to another. OMF and AAF will transfer a minimalized version of the project, with clips and levels intact, but won’t transfer advanced features like plug-ins. This option can work but can also be problematic — especially in large sessions. Your best bet is to keep projects in one format.

Loudness meter
Meters show a visual representation of audio level. They are necessary to avoid distortion and can help you mix in a consistent manner.

Peak meters and loudness meters are the most common. Peak meters display the electrical level of audio, and are helpful to ensure that your levels don’t distort. They can also be useful to see if audio is panned to one side or the other of the stereo field. If you use a portable recorder, it likely has a peak meter to help you capture clean audio. However, peak meters don’t do a good job of representing how loud the audio actually sounds. That’s where loudness meters come in.

They measure audio in a way that is similar to the way we hear. They know our ears are more sensitive to high frequencies and less sensitive to low frequencies. They also know we are more sensitive to sounds that have a long duration versus a short one. They tend to be easier to use than peak meters and will help you achieve more consistent results.

If you can, choose an audio editor that includes a loudness meter, or find a third-party loudness meter plug-in. For an in-depth look at loudness meters in production, including step-by-step guides for using the meters in various audio editors, check out The audio producer’s guide to loudness at Transom.

EQ plug-in
Equalization (EQ) plug-ins are used to adjust tone — the balance of frequencies. For example: bright, tinny sounds have too much high frequency, and using an EQ plug-in to reduce the loudness of high frequencies can help make the audio sound more natural and balanced.

Audio editors typically come with an EQ appropriate for audio storytelling production. A parametric EQ plug-in will be most useful to your work, as it allows for natural-sounding changes and includes helpful functions like filters and shelves. Look for an EQ plugin that offers a few bands of adjustment along with high- and low-pass filters.

To learn how to use an EQ plug-in to fix problems, jump to the section, “Compression can bring control.”

Compression plug-in
Compressors allow for control of dynamics — the range of soft and loud sounds. When set appropriately, a compressor can smooth out loudness variations in a voice and can add strength and power.

Like EQ, most audio editors come with a compressor of some kind. Look for one that offers an adjustable threshold, ratio and make-up gain. Simplified “one knob” compressors exist but generally won’t result in the most natural sound when used on spoken-word material.

To learn how to use a compression plug-in, jump to a deeper dive into EQ and compression.

Optional: Noise reduction
This guide won’t cover noise reduction, but it’s a good tool to be familiar with. Noise-reduction software can reduce hisses, hums, and background sounds present in dialogue recordings. But beware: it’s easy to push the reduction to a point where it sounds processed and results in a garbled or “watery” sound.

Most noise-reduction software requires a short sample of the noise (in the clear, without conversation) in order to learn the noise that you want to remove. However, some products are getting so advanced that they no longer require a noise sample. Through machine learning, the software knows the difference between a human voice and noise and is able to separate the two.

 Back to top

Adjusting levels

(Deborah Lee, NPR)

When the levels of a mix are well-balanced, it will be easy to listen to, and no one will feel the need to adjust the listening volume mid-story. Mastering your audio editor’s level controls is essential to creating a great-sounding mix.

Every audio editor comes with a way to adjust levels. You can adjust faders in the audio editor’s mixer, but the most common way is to adjust a line (representing the fader) drawn over the waveform. You create points (sometimes called handles, breakpoints, keyframes, or automation points) on the line and adjust those points to create adjustments to level. This is often called level automation.

Level automation makes adjusting levels easy — but creating a smooth, consistent mix is challenging. Here are some tips for improving this essential mixing step.

In this section:

Listening to your mix
Adjust sentences and phrases
Track-based vs. clip-based levels
Master tracks
Mask edits with ambience
Fades
Normalization

Listen to your ears, not your eyes
The ultimate decider is how your mix sounds, no matter how the level automation looks or what the meter says. If your ears tell you something’s not quite right, it’s not right. In the words of audio engineer Flawn Williams: The eyes can help … but the ears should still reign!”

To focus your listening experience, take the visual information away. Audio editing software typically provides an avalanche of distracting visual information, so close your eyes (or turn your display off) and just … listen.

Adjust sentences and phrases, not words
For most of us, mixing with level automation can feel tedious (to say the least). In most audio editors it involves making changes by adjusting lines superimposed on a waveform. That can result in making many minute changes for long periods of time.

To make the process easier, adjust levels in sentences and phrases. You’ll find that this approach results in a more natural sound. There are certainly times when adjusting a syllable or a word is appropriate (laughter or a yelled word, for instance). But if you find yourself constantly making level adjustments to syllables and words, you might be making the process more difficult than you need to.

Be mindful of the zoom!
It’s easy to start making tiny changes to words and syllables if the view is zoomed in too far. Keep yourself zoomed out so you see sentences and phrases, not words and syllables.

Create a more natural sound by adjusting sentences and phrases.

Know the difference track-based levels and clip-based levels
Some audio editors give you two places to create level automation: on the track or in individual audio clips. Track-based levels allow you to adjust levels across an entire track, while clip-based levels allow you to adjust levels to individual clips only.

An example of track automation.

Track automation allows level changes anywhere on a track.

The common interface for clip automation.

Clip automation, however, allows level changes only on the clips themselves.

But there’s a bigger difference between the two options, and that’s when the level adjustment occurs. In most audio editors, track-based level automation happens after any adjustments made with plug-ins like EQ or compression. Depending on the audio editor, clip-based level automation happens before the plug-ins, and using it may change the sound of the processed audio.

…many spoken-word voices have a tendency to start each new sentence very loudly and then taper off. So I actually do a lot of clip gain adjustment of the first word or two of a sentence relative to the rest of it.

Flawn Williams
Former Audio Engineer and Producer, NPR

Clip gain may also cause distortion when used to boost level. Since clip automation usually comes before track automation, it’s possible to raise the level so high on the clip that it distorts but compensate for the level boost by bringing down the track automation. Now the meter will look fine — but you’ll hear distortion!

Both are useful tools, but be sure you understand how they work inside of your audio editor. If you are in doubt, stick with track-based level controls.

Use a master track!
In order to create a mix that is consistent and maintains proper levels, you need to keep an eye on the cumulative level of the mix. To do that, use a master track. All of the tracks in the session (the mix) will be routed to the master track. To monitor levels you can use the meter built in to the track — or you can insert a loudness meter, as mentioned above.

The master track is crucial. Too often we find folks NOT watching their cumulative levels. That’s one of the Top 5 things I show new (and rusty) producers.

Corey Schreppel
Technical Director, American Public Media

How various audio tracks are routed into a master.

All of the tracks in the session route to the master track, allowing you to monitor the cumulative levels.

In some audio editors, you have to add the master track manually, but in others, it’s automatically placed in the session.

Mask edits with ambience
Every environment you record in has its own ambience (or room tone). We record ambience to take our listener to a particular place, often using active sound. But it is especially helpful for creating a smooth mix. You can use ambience to mask the entrances and exits of audio clips.

Here’s an example of an actuality that was cut very tightly against the person’s words at the top of the clip. Notice that the entrance of the clip is rather abrupt:

Now, here’s that same edit, using ambience to mask the transition:

On the left, and edit without masking; on the right, using ambience to mask an edit.

Note that the ambience is raised in level just before the transition. In this particular case, that sounded best — but a small bump in level won’t always be necessary. Use your ear and decide for yourself what sounds the most natural.

Also, be intentional about the placement of ambience. Ambience has the power to take the listener to a place — it can also take them out of that place:

You’re taking the listener somewhere. Once you take them there, keep them there! Don’t drop into and out of that reality too abruptly or too frequently. Ambience … can help with maintaining that sense of location, or easing in or out of it more gracefully.

Flawn Williams
Former Audio Engineer and Producer, NPR

Fades
Fades are a nuanced part of mixing that can make or break the immersive listening experience. Like good mixes, good fades don’t call attention to themselves. We’ve already talked about using fades to mask entrances to actualities, but fades can be used for transitions, to create a music bed, or to post scene sound.

Consider these points with ever fade:

  • Content: Changes in level are less noticeable with sparse content (the ambience of an office or acoustic music) than dense content (the ambience of a factory or rock music). Sparse, light content can usually fade more quickly and subtly than dense, heavy content.
  • Length: Longer fade-ins and fade-outs are usually less noticeable. Long fades are especially useful when establishing ambience or a music bed. Short fades are useful at the top and tail of actualities, but be careful: without masking ambience they can be distracting. Quick, sudden fades will call attention to themselves, but can be a useful device for transitions — when used sparingly.
  • Into a post: If the fade is going into a post (where a bed of content will be faded up into the clear after a clip of dialogue), you may find that trying to do the fade all at once is jarring. Instead, cut down on the amount of level the fade has to cover. Begin the fade a few seconds before the end of the clip by slowly raising the level 2-3 dB. As the clip finishes, on the last word, quickly fade the content up into the clear 2 dB from where you want it to sit. Use the next couple of seconds to raise the level the remaining 2 dB. This method uses more time to make the fade happen in a way that does not call attention to itself.
  • Out of a post: The same idea applies to coming out of a post to a bed underneath a dialogue clip. Bring the level of the content down 1-2 dB during the last few seconds of the post. Quickly fade down underneath the first word of the clip — but don’t go all the way to the bed level. Instead, take the next few seconds to fade down the remaining 3-4 dB.

More about fades
Our friends at Transom.org have more to say about fades here: Stupid fade tricks.

Here is an example of poor music fades. Notice that the fade under the voice is very early and the fade out is late. You notice the fade; it’s telegraphed.

These fades are better; they’re tucked under the voice and are less noticeable:

On the left, a fade that’s too early and too late; on the right, the fade is nicely tucked.

As you evaluate a fade or transition, remember: listen with your ears, not your eyes.

Normalization
Normalization is a function that is often talked about like it is a “magic button” for mixing. But it’s not a one-size-fits-all solution, and it comes with many traps! It’s important to understand how normalization works before implementing it, because it may not be the right tool for the job.

There are two kinds of normalization: loudness and peak normalization.

Loudness normalization is based on a perceptual measurement — how the audio actually sounds. It measures the average loudness for the file and then adjusts the level of the file to achieve the target loudness. It doesn’t react to peaks but it is able to manage them.

Here we see two voices on different tracks. The green clip is much lower in level than the red clips:

Here, two tracks have vastly different levels.

Listen to what this sounds like:

The green clip is loudness normalized to match the red clip. Note that the entire clip has increased in level.

Here, both levels have been normalized.

Loudness normalization results in a consistent and even sound. It’s not perfect, but it’s close:

Loudness normalization is a great way to get multiple clips to the same loudness quickly. However, keep in mind that loudness normalization is not aware of context. If you use loudness normalization on a clip of dialogue and a clip of ambience, they will both be the same loudness. The ambience would be much louder than normal and it would sound rather odd.

Some audio editing platforms include loudness normalization tools to help you achieve similar levels when importing audio. As long as you are very careful about context, you may find it saves you time.

You can read more about loudness normalization in Transom’s The audio producer’s guide to loudness.

These days, peak normalization is not the most useful of tools for production. Peak normalization reacts solely to the highest peak in the audio file. It moves this peak up or down in level to reach the chosen target and then changes the rest of the audio the same amount. It reacts only to the waveform — the electrical level — and has nothing to do with the way the audio sounds.

Since peak normalization is based on the highest peak in the file, processing two audio clips from the same interview can have very different results. For example, if one clip includes a high peak from laughter but the other doesn’t. Applying peak normalization to both will achieve a lower overall level in the clip with the laughter.

Because this clip contains a high peak in the waveform, peak normalization won’t help make the two clips sound even. Here, the green clip is peak normalized — but the high peak means its level can’t be raised any higher.

A normalized clip with a high peak.

It sounds almost like the original — not very helpful:

Back to top

Plug-ins

The next two sections deal with equalization and compression. In your audio editor, you’ll likely access these functions via a plug-in. Here are some things to note when using plug-ins:

Learn how to “insert” plug-ins
To use a plug-in in your audio editor, you will need to learn how to “insert” a plug-in on the track you want to adjust. That word “insert” is key — it describes where the plug-in goes in the signal flow as audio moves through your audio editor and means that any audio you place on that track will be affected by the plug-in. Refer to the user manual for your particular audio editor.

Collaboration and plug-ins
If you plan to collaborate and share a project across multiple computers, everyone on the project needs to have the same plug-ins. Plug-ins don’t transfer with the project. If you use an EQ plug-in and your partner doesn’t have the same one, they won’t be able to hear the changes you made. If you stick to the plug-ins that come with your audio editor, this won’t be an issue. If you buy third-party plug-ins, everyone on the team will need to get them.

Before your next project, include a conversation about plug-ins. Which ones will you use? Does everyone on the team have them? If not, how will you manage that?

Plug-ins can change levels
EQ and compression plug-ins can change your levels. For example, if you use an EQ plug-in to reduce bass frequencies, you are lowering the level of a significant portion of the audio. Be mindful of this when you make changes to EQ or compression — you’ll want to make those changes early in your workflow so that levels stay consistent.

It is also possible to distort inside a plug-in. If you ask the plug-in to add too much level, it will distort. Many plug-ins come with built-in peak meters. Keep an eye on that meter to make sure it’s at a tolerable level!

Compromise and compare
Fixing audio with plug-ins like equalizers or compressors is a balancing act based on compromises. Rarely will you be able to get the sound perfect. The key: get to a point where you can decide if the fix is better than the original.

To help make that decision, compare the fixed audio to the original audio. While the audio is playing, use the plug-in’s “bypass,” “power,” or “on/off” button to switch the plug-in on (fixed) and off (original).

Here’s a video to show you how:

Sometimes the audio can’t be made to sound better, just different. It might be best to leave it alone.

Lorna White
Technical Director, NPR

Ask yourself: do you prefer the change you made, the original audio, or perhaps something in between?

Presets are your friend!
Most audio editors offer the ability to store plug-in settings as presets. They can save time recalling commonly used settings. However, presets are only a starting point. You’ll still need to adjust them every time. For example, you could save the compressor settings mentioned earlier. But when you recall the preset, you’ll need to make adjustments to at least the threshold.

Presets are your friend, but not a set-it-and-forget-it tool. Start with them and tweak. Save your own!

Corey Schreppel
Technical Director, American Public Media

Fancy plug-ins
Don’t let yourself get caught up in fancy plug-ins. Similarly, don’t let yourself be intimidated by them! You will find plug-ins that do all sorts of things and have beautifully dazzling interfaces. As you decide what to use, remember: the two most important tools are your ears and the audio editor’s level adjustment tool (ok, that’s really three tools). In most cases, simply balancing levels will improve the sound of your mix dramatically.

 Back to top

Equalization can bring clarity

Use EQ when you want to adjust tone. In audio storytelling, we come across recorded voices that sound muddy and boomy or shrill and tinny. These adjectives typically describe an excess of frequencies. Muddy, boomy audio describes audio that has too much of the lower bass frequencies. Tinny or harsh audio describes the opposite — too much high frequency. With an equalizer we can reduce the excessive frequencies and help the audio sound more natural (we can “equalize” it).

Can you recognize tonal issues? Take this quiz to find out, and learn more in the Ear training guide for audio producers.

There are two types of EQ useful in audio storytelling: the high-pass filter and the parametric EQ.

High-pass filters remove rumble
High-pass filters (also known as low-cut filters) remove audio below a certain frequency. Another way to put this: high-pass filters let the highs pass. We can use high-pass filters (HPFs from here on) to reduce boominess, muddiness, rumble, unnatural noise caused by wind, some kinds of handling noise, and plosives (p-pops). It’s especially useful to clean up field tape. Factories, traffic noise, HVAC systems, or wind can all create distracting low frequency issues that a filter can fix.

High pass filters let the highs pass, and remove low frequencies.

Here’s an example of how a high-pass filter can help. First you’ll hear the original audio, then the audio with a high-pass filter:

To use the high-pass filter in your EQ plug-in:

  1. Insert the EQ plug-in on the track.
  2. You may first have to turn the HPF on by clicking a button to engage it (it might say ON or HP).
  3. Next, set the slope of the filter. The higher the number of the slope, the more frequencies the filter cuts out. Slope is set in increments of 6 dB. A slope set to 12 dB is a good place to start. Slopes of 18 or 24 dB will sound less natural (and can cause other tonal problems) but are useful in situations when audio is incredibly muddy.
  4. Now set the frequency. A good place to start for many issues is 80 Hz. If the problem is not reduced, move the frequency up to 100 Hz. You can raise the level higher, but depending on the individual voice, anything above 110-120 Hz may start to sound unnatural.

Here’s a video to show you how:

A trick I find helpful for producers, especially with HPFs, is to make the adjustment with your eyes closed. Roll up the frequency until you can hear it and then roll it back until the “fullness” of the source comes back. Generally, you find that you can get away with a higher frequency than you’d expect.

Corey Schreppel
Technical Director, American Public Media

Your mic may have a built-in high-pass filter!
Many microphones include a high-pass filter. You can turn it on with a simple switch, and it’s a fantastic problem-solver. Learn more in Don’t fear the filter from the Association of Independents in Radio.

The Parametric equalizer
Parametric equalizers allow you to variably adjust many different “parameters” of tone. They are very helpful for fixing audio that has an excess of low or high frequencies.

Parametric equalizers have three main functions: gain, frequency, and bandwidth adjustment (also known as “Q”).

Parametric EQ plug-ins generally have the following features:

  • Multiple frequency bands that allow you to adjust more than one range of frequencies with a single instance of the plug-in. Each band has a center frequency at which any level change is concentrated.
  • A gain adjustment for each frequency band. This is so you can adjust the level of the selected frequency range.
  • A Q or bandwidth adjustment. This allows you to fine-tune the adjustment around the center frequency. A wider bandwidth (lower Q) will adjust more frequencies than a narrower bandwidth (higher Q).

Note: you’ll find other kinds of EQ available, like the “graphic” EQ, but the parametric EQ generally offers the most natural sound for audio storytelling work.

Find and fix a tonal problem with EQ
EQ allows us to adjust the level of frequencies — you can either raise or lower levels. As a rule of thumb in audio storytelling we generally use EQ to reduce the level of frequencies, not raise it. The most common tonal problem in audio storytelling is audio that has a buildup of frequencies. For example, muddy or boomy audio has too much low frequency — there is a buildup of lower frequencies. Tinny or bright audio has a buildup of high frequencies.

With this in mind, let’s use the EQ’s frequency selector to find a problem frequency and then reduce it:

1. First, make an educated guess as to where the problem frequencies are. Use this list as a rough guide:

  • Does it sound hollow? Try 300 Hz-800 Hz.
  • Does it sound nasally or pinched? 900 Hz-2.5 kHz
  • Does it sound harsh or sibilant (pronounced “ess” sounds)? 4 kHz-7 kHz

Common descriptions of tonal problems and their associated frequency.

2. In the EQ plug-in, find a frequency selector for one of the available bands. Set the frequency selector to the bottom of the ranges from Step 1. You can move the knob to the bottom of the range or type the number in directly.

3. Set the Q (bandwidth) control to 1.0.

4. Slowly boost the gain knob to about 6-8 dB (be careful, you are raising level. Don’t raise it so high or so quickly that you hurt your ears!).

5. Slowly move the frequency selector to the right, up the scale (in most plugins you can click and drag on the frequency selector and slowly glide up to higher frequencies).

6. Once you find the frequency, the problem will become more pronounced and louder. If nothing jumps out, you might have missed it. Start from the bottom of the range again and repeat. If you still don’t hear the problem jumping out at you, you might be in the wrong range. Move the frequency selector to the bottom of one of the neighboring ranges listed and sweep again.

Once you’ve found the problem frequency, most of the hard work is done. The next step is to reduce the level of the problem.

7. Set the gain knob back to 0 dB so that no EQ changes are taking place. Take a minute to listen to this “un-EQ’d” setting for a moment to re-acquaint your ears with the problem.

8. Set the gain knob to -2 dB. You should hear the problem decrease in level. If you want to reduce the problem more, you can, but know that gain changes of more than 5 or 6 dB start to sound unnatural with most spoken-word material. Make your changes in 2 dB increments and stop when you notice that the problem feels better.

Here’s a video to demonstrate:

After you reduce the level of the offending frequency range, you may notice another tonal problem appear that you hadn’t heard before. That’s ok, and is quite common. The first issue was probably masking this new one. Repeat the process to find the new problem, but be careful: this can be a rabbit hole that ends up with too many EQ changes and an unnatural sound. One or two reductions should be all you need for most fixes.

Do as little processing to the audio as possible. If a high-pass filter fixes the issue, stop there. If the high-pass filter plus some reduction … solves the issue, stop there. Only use the tools that are necessary.

J. Czys
Technical Director, NPR

More on EQ
You can read more about using EQ in Real world EQ at Transom.org.

You might be inclined to compensate for a buildup of frequencies by raising an opposite set of frequencies. For example, if you are presented with muddy audio, you might raise the high frequencies in order to add clarity. Reducing a problem frequency usually has a more natural sound than raising another frequency to compensate. And, raising a range of frequencies also means you are increasing the overall level of the audio which might have unintended consequences, like distortion.

Back to top

Compression can bring control

We naturally speak with dynamics. We emphasize certain words and syllables to make points and convey emotion. In mixing, we use level automation to control the dynamics across sentences and phrases, and occasionally it’s helpful to control the peaks of words and syllables. But imagine drawing automation curves for every emphasized syllable in a 20-minute interview! You’d have hundreds of regions to fix, it would be incredibly tedious, and it probably wouldn’t sound very natural. That’s where compression comes in.

Compressors control words and syllables when they are emphasized more than others. As audio moves through a compressor, the tool is looking for level that moves above a threshold that you set. Anything that passes above that threshold will get pushed lower. How much the audio gets pushed lower depends on a control called ratio.

What does compression look like?
Check out this fantastic visualization of compression: The animated guide to compression.

When done well, compression controls level without impacting the natural sound of speech. In fact, compression can help to improve exaggerated dynamics caused by miking voices closely. However, it is easily overused and can result in an aggressive, pumpy, in-your-face sound. This is sometimes a desired effect in music or sound design applications, but in audio storytelling it’s not pleasant. Keep in mind that a little compression goes a long way.

Be careful with compression. When I use it, I still want to hear people making points. It is there to help control the dynamics of a conversation, but not to remove the excitement, energy, or disagreements in a conversation. A polite conversation about gardening should not sound the same as a heated political debate.

Michael Raphael
Rabbit Ears Audio

This audio example compares overly compressed audio with natural sounding audio:

It’s also worth saying that compression is not a replacement for level automation. To maintain a natural sound, the majority of the balancing work should be done by level automation. The compressor will simply help round it out and add strength to the voice.

The standard controls on a compressor plug-in.

Compressor plug-ins have the following controls:

  • Threshold: Sets the level where the compressor will act on the audio. Any audio that goes beyond this point will be compressed.
  • Ratio: Tells the compressor how much reduction to apply. It will be listed as an actual ratio, like 2:1 or 5:1. For dialogue, stick with a ratio in the 1.5:1 to 2:1 range. Anything higher will yield an aggressive, unnatural, and over-compressed sound.
  • Gain (or make-up gain): The gain control allows you to boost or “make up” the level lost after the compressor has acted on it.

Compressors will also usually have two other controls called attack and release. They control how fast the compressor responds to input beyond the threshold. They are time-based settings and marked in milliseconds (ms). Attack time determines how quickly the compressor will reduce the level when it passes the threshold. Release time determines how quickly the compressor will stop acting on the signal after.

Finally, a very important part of the compressor is the gain-reduction meter. The gain reduction meter is a good visual for how much the audio is getting compressed.

How to set a compressor
Using a compressor on dialogue involves the following steps:

  1. Insert a compressor plug-in on the dialogue track.
  2. Set the ratio to 1.5:1  (a setting appropriate for natural-sounding dialogue).
  3. Set the attack time to 11 ms.
  4. Set the release to 110 ms.
  5. Lower the threshold until the gain reduction meter consistently shows 2-3 dB of reduction, occasionally peaking to 5 or 6 dB when the speaker emphasizes a word.
  6. If you set the threshold and ratio appropriately, the overall level has likely dropped a couple of decibels. To make up for this drop in level, adjust the make-up gain up 2 or 3 dB.
  7. Listen to the fix and compare it to the original using the bypass or power button.
  8. If you think the voice needs more control (less dynamics), try 1.8:1, then 2:1.

Here’s a video showing you how:

Each voice will require a different threshold based on the level of the recording and how dynamically the person talks.

With spoken voice as well as music, too much compression can shift the timbre of the sound. Lower ratios coupled with lower thresholds can sound somewhat more natural than just squashing the loudest sections with high ratio compression or limiting.

Flawn Williams
Former Audio Engineer and Producer, NPR

Remember, to achieve a natural sound, compression should not be used as a replacement for level automation.

EQ then compress, or compress then EQ?
Plug-ins process the audio in the order in which they appear in the track (usually that means they work from the top down). When you insert plug-ins on a track, you have a choice to make: should the EQ plug-in go before the compressor, or vice-versa? If you ask this question of audio engineers, they will wax poetic about reasons both have their merits — and they’re not necessarily wrong.

In mixing for audio storytelling, we are primarily concerned with mitigating problems. We use EQ to cut back on frequencies that sound unnatural, and we compress to bring a little more control to a dynamic voice.

For that reason: EQ first, then compress. By doing this, you can clear up tonal problems before compressing. If you compress before you EQ, you may end up accidentally accentuating tonal problems.

Back to top

Fixing common mixing problems

(Deborah Lee/NPR)

This section addresses some of the more common mixing problems you’ll encounter in audio storytelling:

Boomy, bassy voices
Phone audio
Plosives
Music-to-voice balance
Stereo or mono?

Bassy, boomy voices
A rumbly, boomy or bassy voice is a recipe for hard-to-understand dialogue in noisy listening environments. This problem is a prime candidate for the high-pass filter.

  • Start with the filter set at 100 Hz, with a slope of 12 dB/octave.
  • If 100 Hz thins the voice too much, lower to around 85 Hz.
  • If 100 Hz is too bassy, raise to 110-120 Hz.

This is an example of boomy audio with and without a high-pass filter:

Phone audio
Phone audio can be improved with the use of both a high-pass and a low-pass filter. It’s a trick that’s easy to implement and always works (as long as you are working with landline or cell phone calls).
Read more about an easy phone audio fix with step-by-step instructions.

Here’s some hissy phone audio. First you’ll hear it untreated, and then with the fix:

This video walks through the steps to apply the phone audio fix:

You can also use noise reduction to fix phone tape, but it’s usually not necessary, often takes more time, and may accentuate some problems.

Plosives
Plosives, or p-pops, are blasts of air formed by words that begin with the letter “p” (and sometimes “b”, “t” and “k”). They are most easily prevented with good mic placement. But when they occur in recordings, you can sometimes snip them out with an audio editor. A high-pass filter can also help to remove low frequencies present in p-pops. Some noise reduction software has de-plosive modules. For more on dealing with plosives, see The ear training guide for audio producers from NPR Training and P-pops and other plosives, from Jeff Towne at Transom.

Music-to-voice balance
Creating a good balance between voice and music is tricky. Music mixed too loudly will compete with the voice; too low and it will be inaudible in louder listening environments. Music posts might sound perfect in headphones, and then too low on speakers. Keep in mind that mixing is always a compromise, and try these ideas:

Stereo or mono?
Should you work in stereo or mono? This question can come up at multiple points in the production process. If you are presented with stereo recordings (like music or field recordings) in the mixing stage, you need to determine if the audio should continue to be stereo or if you should make it mono.

The first step in your decision is informed by the final product. If your program or podcast format is distributed in mono, you should mix in mono. Forcing stereo elements into a mono show may present you with unexpected differences in level between the stereo and mono elements.

If the final destination is a stereo program, then you need to decide if the content itself is worth keeping in stereo. First, ask the person that recorded the content why they recorded it in stereo. Then, give it a good listen. Is there anything in the stereo field that enhances the story? Is there information in the stereo content that you wouldn’t get if it was mono? Or, is the stereo content distracting — perhaps needlessly pulling your attention from one side to the other? If the stereo nature of the audio distracts, or if there’s nothing compelling about it, you can decide to make it mono.

More on stereo recording
Flawn Williams is a master of recording stereo content in the field. Hear his thoughts on the topic in this interview with How Sound.

To sum, pan each channel of the stereo audio to the center.

Is the stereo content music? If so, and your final destination is stereo, keep the content in stereo. Stereo music often works better under dialogue, and summing stereo content to mono (see below) can result in odd tonal shifts. Stereo music will simply sound better.

Stereo is often used to record in “split-track” — an interview with a different voice recorded in each channel. For example, the reporter might be recorded in the left channel, and the guest in the right. This allows individual control of levels during mixing, but would eventually need to be summed (see below).

If you need to convert stereo content into mono, you have two options. You can either combine the two sides into one with panning (known as “summing”) or you can use just one side of the stereo audio and throw the other one out (called “pulling” left or right).

If you choose to use only one side of a stereo track, you can delete the track you don’t want. This copy-and-paste method works in some audio editors.

Use summing when there is useful information on both sides of the stereo field that you need to retain. For instance, use summing when you want to use stereo music in a mono podcast, because there are likely instruments or sounds on one side of the stereo image that aren’t on the other. To sum, simply use the audio editor’s pan function to pan each side of the stereo track to the center. Note that the audio level will audibly increase with this method (the left and right side will become more equal on the master meter, too).

Pull left or right when the information you want is only in on one side (like in a split-track recording) or if the same information is on both sides (for instance, if a one-mic interview was accidentally recorded into a stereo file). Audio editors all have different workflows for this process, but most will let you copy stereo audio onto two mono tracks. At that point you can delete the track you don’t want (don’t forget to remove the original stereo track, too).

 Back to top

Assess your mix

(Deborah Lee, NPR)

Listen to your mix in multiple ways
Your audience will listen to your work in various ways and in all sorts of environments, so you should do the same. Music engineers do this all the time. They’ll make critical decisions about a music mix in the studio on professional speakers, then they’ll listen in their car to learn how the mix translates outside of the studio.

Listen to your work on different devices: nice speakers, headphones and earbuds. Listen in multiple environments, like a quiet bedroom, a busy commute on foot or while doing the dishes. At a minimum, listen on quality speakers and headphones. All of these listening environments can help you judge balances, ambience levels, and intelligibility.

If you’ve spent a ton of time on a mix, you’re probably consumed by the details and the process of mixing. Separate yourself from it; get a good night’s rest and listen in the morning. At the very least take a break, go for a walk, and listen again. The space will bring a fresh perspective.

Ask a coworker or friend to listen to your mix and give you a few comments. One good variant: be there while your peer is listening, and watch their facial expressions while they listen.

Flawn Williams
Former Audio Engineer and Producer, NPR

Check transitions
Just like you double- and triple-check facts in your storytelling, make sure every transition in the mix is smooth and balanced. Poor transitions will distract the listener, so do a listen through the mix focused on them. Visually scan through your session to find every transition (crossfades, entrances of ambience or music, changes in scenes, edits, and fades) and ensure that they sound smooth and natural. Most audio editors have a keystroke to jump from one edit to the next, which will help you do this step quickly.

Use a rock and roll trick!
Take advantage of a trick that music engineers use to judge balances. Turn down the listening level significantly so you have to really focus to understand the words. The mix should be just audible enough for the words to be intelligible. If any part of your mix sticks out too much or drops away, you need to make an adjustment.

This video demonstrates the rock and roll mixing trick:

This trick works whether you’re on speakers or headphones. But you should generally keep your listening level consistent. When you use this trick, remember to note your original listening level so you can get right back to it.


Conclusion and thank-yous

You are now ready to put this new knowledge into practice! Check out our step-by-step guide to mixing to learn an efficient approach to mixing your stories.

And remember — the two most important tools at your disposal are your ears and the audio editor’s level automation tool. Use them!

Thank you to Serri Graslie, Dylan Scott, Deborah Lee, J. Czys, Lorna White, Casey Herman, Rund Abdelfatah, Sami Yenigun, Alex Drewenskus, Kevin Wait, Matt Fidler, Jeremy Bloom, Alex Kosiorek, Cameron Wiley, Kyle Wesloh, Andy Huether, Jamie Collazo, Zac Schmidt, Casey Holford, Matthew Boll, Jonathan Mitchell and Patrick Murray. And a special thank-you to Flawn Williams, Johnny Vince Evans, Corey Schreppel and Michael Raphael for your generous guidance and enthusiasm for this project.


Rob Byers was a Production Specialist with the NPR Training team, where he focused on audio engineering.