Technical production tips for voice overs

Technical Audio Production Tips for Voiceovers Part 1

This month sees the first in an occasional series where we look at some technical terms and things you need to know as a studio owning voiceover.

I’m hoping you find this useful, not just in terms of how you run your day to day recordings, but also those times when you’re asked to something slightly unusual or interpreting a client request where a term has been misused or misheard (I still chuckle to myself about the client who asked for files to be supplied as forty four kilo hearts).


1. Resolution

The first thing to consider is the format that we record to. Most DAWs and editors will record to either a wav or aiff file. We should always make sure that we are recording to the highest resolution that we can reasonably manage. Wav and aiff files can vary massively in quality depending on what resolution they are. There is a huge difference between an 8KHz 8-bit wav and a 48KHz 24-bit one. Here I pause to explain some more terms.

Bit depth and sample rate

These are terms that we should be paying close attention to. They totally dictate the quality of the audio that we produce. But what are they? In short they are the X and Y axis of a graph. The X axis is time and this is measured in hertz or kilohertz. In terms of physics this is a number of cycles per second and is used in music to denote pitch (concert A is 440Hz). In digital audio it states how many times a sound wave is measured per second. An 8KHz wav is ‘sampled’ 8,000 times per second, a 44.1KHz wav 44,100 times per second.

As we progress along the X-axis (pass through time) a reading is taken of where the sound wave is on the Y-axis. The Y-axis measurement is a binary reading and is known as the bit depth of the audio. A 16-bit reading has 16 digits, a 24-bit reading has 24. The higher the bit depth, the bigger the dynamic range (the space between the loudest and quietest bits the system can manage).

The combination of these 2 qualities gives a higher or lower resolution of sound as the wave is plotted out on the graph. In lower resolution sound the wave is drawn with lots of square corners rather than the smooth line that would be the analogue wave and therefore is a less accurate replication of the original (see pic below).

The difference between low resolution and high-resolution audio is the equivalent of the difference between Disney/Pixar animation and Teletext pictures. 8KHz 8 bit audio will always sound awful, so don’t panic, you haven’t done anything wrong (probably).

Back to thinking about resolution

My first point was that we should make sure that we’re always recording to the highest resolution we can manage. Hopefully now you will see that better resolution will produce better audio, but the flip side to that is that it takes more processing power from our computers. So as a default position it’s always best to record in the highest possible resolution that we’re likely to be asked for.

To convert audio to a lower resolution is perfectly fine, but if you convert to a higher sample rate you will not – I repeat, you will not – improve the quality of the audio, you’ll just make it compatible with whatever system it’s going to be used on. I would recommend that when you open a new file you set it to 48KHz and 24 bit (or 32 bit float if you can – I won’t go into that here as it’s quite complicated).

Once you’ve voiced and processed it you can save it at that full resolution then ‘save as’ whatever format your client requires. that way you can easily re-work the original recording if you need to without compromising quality. Most of the IP protocols for remote recording also work at 48KHz, so if you’re recording at 44.1KHz whilst using one of these platforms you could end up with all sorts of issues.


2. Levels

In the olden days of analogue when everything was recorded to tape it was important to push the levels as much as possible to maximise the dynamic range of the tape and hide the tape hiss behind audio you wanted. This is still widely done with digital audio, but there really in no need to.

A good reel of 2″ tape has a dynamic range of 50 – 60 dB. This means that between the tape hiss that constitutes the noise floor and the point where the tape becomes overloaded and distorts there is 50 – 60 dB of room for the program material you want. With 16-bit audio you get a 96dB dynamic range and with 24-bit you get 145dB.

Seeing that the dynamic range of the human ear is somewhere around 136dB I think you can see that there is plenty of room in a 24-bit system to not have to push the levels to the max to get the best out of the audio. In fact, the thing you really have to avoid is digital clipping – when the level of the signal gets too loud. The loudest level of any digital system is 0dB and all signals are measured in minus numbers, so I would advise that you want to record your voiceovers so that the highest level is around -6dB. This is plenty high enough and it still gives you lots of headroom for further processing as every process will change the waveform and may increase the levels.


3. Phantom Power

One last thing for today. As a professional voiceover you will use a condenser mic that has to have the phantom power activated or it doesn’t work. Phantom power is activated on your desk, audio interface or preamp by a switch that will either say ‘phantom power’ or ‘+48v’ and by switching it on you will be putting a current of 48v up the ground cable of your XLR lead to your condenser mic (or if you’re using a mixing desk with one universal switch, to everything plugged into the XLR inputs on your desk).

Condenser mics are also known as capacitor mics, and that gives a small clue as to why phantom power is necessary. A capacitor is an electrical component that stores a small amount of electricity, the capsule of your mic acts as such a component. The capsule consists of the diaphragm and a backplate. The diaphragm of your mic carries a charge across it and as you speak the diaphragm vibrates and the gap between it and the backplate changes. This has the effect of altering the amount of charge the ‘capacitor’ can hold and as it fluctuates an electric current is generated which passes through the mic and sends the signal created by your voice vibrations into the rest of your set-up.

The thing you need to remember about phantom power is that it can create a bit of a surge through your system. You should take care when switching it on and off that your speakers and headphones are all muted or switched off. Although modern condenser mics do have some surge protection built into them it’s also still a good idea to make sure that all your mics are plugged in before switching the phantom on or off as if it fails you can put a permanent charge across the diaphragm of any mic regardless of whether it’s a condenser or not by plugging things in or out badly. And once your mic’s got a permanently charged diaphragm it’s a write-off.


If you would like to arrange 1-2-1 audio training or if you need some advice with your studio setup email Rob or contact him here.