'Butt cut what?' A glossary of audio production terms

(Chelsea Conrad/NPR)

Let’s say you are producing an audio story, and you’re asked to dip the ambi under the track, butt cut the next two acts, and then sweep up and maintain the ambi. If that sentence is confusing, this glossary is for you. Terms for producing and mixing audio go back to the days of cutting real tape with razor blades, but most of them have lived on into the era of digital production.

Actuality (n) — The voices in a story that are not the reporter’s/narrator’s. Usually recorded on-location or in a studio interview.  Also known as “acts,” “cuts,” or “sound bites.”

Ambience (n) — The pervasive sound at a location. (E.g. Traffic on a road. Doors slamming. Sounds of a demonstration. Birds and wind in a forest.) Can be used as an actuality itself or mixed under narration or other actualities.  Also known as “ambi” or “nat sound” or less commonly as “sfx.” [Though, to be clear, ambience is not “sound effects”! It is real sound, not faked.]

Backtime (v) — To determine where to start playing an audio element so that it posts or ends at a specified time. Often used for deadrolls. (See “post” and “deadroll” definitions below).

BEEB (n) — Nickname for the British Broadcasting Corporation.

Bed (n) — Sound running underneath a track or other audio. Not very dynamic – often music or background noise. Common use for ambience. (See image below)



An example of an ambi bed (pink) dipping under voice tracks (blue).


Butt Cut (v or n) — To place one actuality immediately after another, rather than dividing them with copy or ambience. Often used to create a transition point, reinforce a point or demonstrate a contrast.

Button (n) — A short piece of music that creates a transition between two unrelated stories, or stories with contrasting moods and tone. 

Cascade/Waterfall (n) — Type of montage. Three or more distinct pieces of audio combined by fading one into the next. 

Clipped (adj) — When audio is missing the beginning or end of a sound element or word. Also known as “upcut.”Cross Fade (v) — To fade out one sound while fading in another – in order to make the transition seamless.  Usually performed in the background, under other tape. When performed in the clear, a cross fade can indicate a transition.

Deadroll (n) — Sound or music that begins inaudibly at a specific time in a mix – so that it will come to its natural end at a specific time.

Dip/Duck (v) — To fade sound underneath a track or other audio that is at a higher volume.

Dub (n or v) — Making a recording of a recording (for example, recording audio from a video).

Establish (v) — After sound (usually ambience) is swept in, to maintain its volume (see “hold/maintain” below).

Fade (in, out, up, down, under) (v) — To adjust the volume of sound from low to high or high to low at a gradual pace.

Fade to black/Fade away (v) — To decrease the volume of a sound until it is inaudible — while still in the clear.

Hit Hot (v) — To begin playing at full volume.

Hit Warm (v) —  To begin playing at medium volume.

Hold/Maintain (v) — To keep the volume of an audio element at the same level.

In the clear (adj) — When sound is in the foreground without competition from any other sound. Used for ambience or actualities. (E.g. A reporter’s mixing instructions might say, “Maintain ambi of gunshots in the clear for 4 secs.”)

Mask (v) — To use existing ambient sound to cover over bad edits or to smooth transitions.

Montage (n) — (At right) Several pieces of audio combined sequentially to create a single sound element.



An example of a montage with fades.


Mult box (n) — A piece of audio equipment that splits one audio signal into many. They are commonly found at press conferences so multiple reporters can record off of a single podium feed.

Nipper (n) — Affectionate nickname for NPR.

Post (v or n) — v: To bring up a sound at a specific point so that it is in the foreground. Used for actualities or ambience. (E.g. “Post ambi after politician says, ‘I’m fighting for you!'”) n: The point at which the sound appears. (E.g. “Hit the post.”)

Pre-produce (v) — To mix or record a piece or interview in advance of a live show. Can be done to facilitate production, if a segment is tight for time or to simplify technical needs.

Rollover (n) — The recorded feeds of a program that occur after the initial live broadcast. The original live show is recorded and then fed again to allow stations flexibility in scheduling. Rollovers are frequently updated to fix mistakes and to add new information.

Room tone (n) — Indoor ambience recorded at the place where an interview is conducted or an event takes place. Usually low dynamic level.

Sneak (v)  — To slowly fade up or out.

SOC (n) — Short for “Standard Out Cue.” It’s the ID a reporter gives at the end of a piece. For example, “Nina Totenberg, NPR News, Washington.” (Format varies.)

Split-track (n) — An interview with different audio in the left and right channels. For example, in the field, a producer or engineer might record the host in the left channel and the guest in the right channel. This allows for independent control of levels during production. Split-tracking is a great tool — but audio must be mixed onto both channels, or “summed,” before it can be broadcast.

Sweep (v or n) — To quickly fade up; a quick fade up.

Synch up (v) — To combine two or more pieces of audio so that they line up exactly. Usually done with audio that matches (for example, a tape sync).

Tape sync (n) — A variation of a split-track recording. In the case of an interview, the guest speaks to the host/reporter over the telephone. The producer/engineer goes to the scene and records what the interviewee says (or the guest records him/herself with a smartphone app). The guest’s side of the conversation is then combined or “sunk” with the host’s side of the conversation. To the listener, it sounds like the host and the guest are in the same room.

Two-way (n; can also be used as a verb) — An on-air conversation between two people, usually a host and an interviewee. Common term used to describe conversations heard on newsmagazines. (A “three-way” is a host and two guests … and so on.)

Track/Voice Track (n) — The reporter’s narrative, read from their script.

Voicer (n) — A news spot involving only the reporter’s voice — no actualities.

Wrap (n) — A news spot featuring an actuality placed between the reporter’s tracks (the actuality is “wrapped” by the tracks).

Alison MacAdam was a Senior Editorial Specialist with the NPR Training team, where she focused on audio storytelling. Prior to that, she edited All Things Considered.