A Basic Guide to Digital Audio Recording

The Digital Domain


Since the dawn of time, humans have been attempting to record music.  For the vast majority of human history, this has been really really difficult.  Early cracks at getting music out of the hands of the musician involved mechanically triggered pianos whose instructions for what to play were imprinted onto long scrolls of paper.  These player pianos were difficult to manufacture (this was prior to the industrial revolution) and not really viable for casual music listening.  There was also the all-important phonograph, which recorded sound itself mechanically onto the surface of a wax cylinder.

If it sounds like the aforementioned techniques were difficult to use and manipulate, it was!  Hardly anyone owned a phonograph since they were expensive, recordings were hard to come by, and they really didn’t sound all that great.  Without microphones or any kind of amplification, bits of dust and debris which ended up on these phonograph records could completely obscure the original recording behind a wall of noise.

Humanity had a short stint with recording sound as electromagnetic impulses on magnetic tape.  This proved to be one of the best ways to reproduce sound (and do some other cool and important things too).  Tape was easy to manufacture, came in all different shapes and sizes, and offered a whole universe of flexibility for how sound could be recorded onto it.  Since tape recorded an electrical signal, carefully crafted microphones could be used to capture sounds with impeccable detail and loudspeakers could be used to play back the recorded sound at considerable volumes.  Also at play were some techniques engineers developed to reduce the amount of noise recorded onto tape, allowing the music to be front and center atop a thin floor of noise humming away in the background.  Finally, tape offered the ability to record multiple different sounds side-by-side and play them back at the same time.  These side-by-side sounds came to be known as ‘tracks’ and allowed for stereophonic sound reproduction.

Tape was not without its problems though.  Cheap tape would distort and sound poor.  Additionally, tape would deteriorate over time and fall apart, leaving many original recordings completely unlistenable.  Shining bright on the horizon in the late 1970s was digital recording.  This new format allowed for low-noise, low cost, and long-lasting recordings.  The first pop music record to be recorded digitally was Ry Cooder’s, Bop till you Drop in 1979.  Digital had a crisp and clean sound that was rivaled only by the best of tape recording.  Digital also allowed for near-zero degradation of sound quality once something was recorded.

Fast-forward to today.  After 38 years of Moore’s law, digital recording has become cheap and simple.  Small audio recorders are available at low cost with hours and hours of storage for recording.  Also available are more hefty audio interfaces which offer studio-quality sound recording and reproduction to any home recording enthusiast.


Basic Components: What you Need

Depending on what you are trying to record, your needs may vary from the standard recording setup.  For most users interested in laying down some tracks, you will need the following.

Audio Interface (and Preamplifier): this component is arguably the most important as it connects everything together.  The audio interface contains both analog-to-digital converters and a digital-to-analog convert; these allow it to both turn sound into the language of your computer for recording, and turn the language of your computer back into sound for playback.  These magical little boxes come in many shapes and sizes; I will discus these in a later section, just be patient.

Digital Audio Workstation (DAW) Software: this software will allow your computer to communicate with the audio interface.  Depending on what operating system you have running on your computer, there may be hundreds of DAW software packages available.  DAWs vary greatly in complexity, usability, and special features; all will allow you the basic feature of recording digital audio from an audio interface.

Microphone: perhaps the most obvious element of a recording setup, the microphone is one of the most exciting choices you can make when setting up a recording rig.  Microphones, like interfaces and DAWs, come in all shapes a sizes.  Depending on what sound you are looking for, some microphones may be more useful than others.  We will delve into this momentarily.

Monitors (and Amplifier): once you have set everything up, you will need a way to hear what you are recording.  Monitors allow you to do this.  In theory, you can use any speaker or headphone as a monitor.  However, some speakers and headphones offer more faithful reproduction of sound without excessive bass and can be better for hearing the detail in your sound.


Audio Interface: the Art of Conversion

Two channel USB audio interface.

Two channel USB audio interface.

The audio interface can be one of the most intimidating elements of recording.  The interface contains the circuitry to amplify the signal from a microphone or instrument, convert that signal into digital information, and then convert that information back to an analog sound signal for listening on headphones or monitors.

Interfaces come in many shapes and sizes but all do similar work.  These days, most interfaces offer multiple channels of recording at one time and can record in uncompressed CD-audio quality or better.

Once you step into the realm of digital audio recording, you may be surprised to find a lack of mp3 files.  Turns out, mp3 is a very special kind of digital audio format and cannot be recorded to directly; mp3 can only be created from existing audio files in non-compressed formats.

You may be asking yourself, what does it mean for audio to be compressed?  As an electrical engineer, it may be hard for me to explain this in a way that humans can understand, but I will try my best.  Audio takes up a lot of space.  Your average iPhone or Android device maybe has 32 GB of space but most people can keep thousands of songs on their device.  This is done using compression.  Compression is the computer’s way of listening to a piece of music, and removing all the bits and pieces that most people wont notice.  Soft and infrequent noises, like the sound of a guitarist’s fingers scraping a string, are removed while louder sounds, like the sound of the guitar, are left in.  This is done using the Fourier Transform and a bunch of complicated mathematical algorithms that I don’t expect anyone reading this to care about.

When audio is uncompressed, a few things are true: it takes up a lot of space, it is easy to manipulate with digital effects, and it often sounds very, very good.  Examples of uncompressed audio formats are: .wav on Windows, .aif and .aiff on Macintosh, and .flac for all the free people of the Internet.  Uncompressed audio comes in many different forms but all have two numbers which describe their sound quality: ‘word length’ or ‘bit depth’ and ‘sample rate.’

The information for digital audio is contained in a bunch of numbers which indicate the loudness or volume of the sound at a specific time.  The sample rate tells you how many times per second the loudness value is captured.  This number needs to be at least two times higher than the highest audible frequency, otherwise the computer will perceive high frequencies as being lower than they actually are.  This is because of the Shannon Nyquist Theorem which I, again, don’t expect most of you to want to read about.  Most audio is captured at 44.1 kHz, making the highest frequency it can capture 22.05 kHz, which is comfortably above the limits of human hearing.

The word length tells you how many numbers can be used to represent different volumes of loudness.  The number of different values for loudness can be up to 2^word length.  CDs represent audio with a word length of 16 bits, allowing for 65536 different values for loudness.  Most audio interfaces are capable of recording audio with a 24-bit word length, allowing for exquisite detail.  There are some newer systems which allow for recording with a 32-bit word length but these are, for the majority part, not available at low-cost to consumers.

I would like to add a quick word about USB.  There is a stigma, in the business, against USB audio interfaces.  Many interfaces employ connectors with higher bandwidth, like FireWire and Thunderbolt, and charge a premium for it.  It may seem logical, faster connection, better quality audio.  Hear this now: no audio interface will ever be sold which has a connector that is too slow for the quality audio it can record.  This is to say, USB can handle 24-bit audio with a 96 kHz sample rate, no problem.  If you notice latency in your system, it is from the digital-to-analog and analog-to-digital converters as well as the speed of your computer; latency in your recording setup has nothing to do with what connector your interface uses.  It may seem like I am beating a dead horse here, but many people think this and it’s completely false.

One last thing before we move on to the DAW, I mentioned earlier that frequencies above half the recording sample rate will be perceived, by your computer, as lower frequencies.  These lower frequencies can show up in your recording and can cause distortion.  This phenomena has a name and it’s called aliasing.  Aliasing doesn’t just happen with audible frequencies, it can happen with super-sonic sound too.  For this reason, it is often advantageous to record at higher sample rates to avoid having these higher frequencies perceived within the audible range.  Most audio interfaces allow for recording 24-bit audio with a 96 kHz sample rate.  Unless you’re worried about taking up too much space, this format sounds excellent and offers the most flexibility and sonic detail.


Digital Audio Workstation: all Out on the Table

Apple's pro DAW software: Logic Pro X

Apple’s pro DAW software: Logic Pro X

The digital audio workstation, or DAW for short, is perhaps the most flexible element of your home-studio.  There are many many many DAW software packages out there, ranging in price and features.  For those of you looking to just get into audio recording, Audacity is a great DAW to start with.  This software is free and simple.  It offers many built-in effects and can handle the full recording capability of any audio interface which is to say, if you record something well on this simple and free software, it will sound mighty good.

Here’s the catch with many free or lower-level DAWs like Audacity or Apple’s Garage Band: they do not allow for non-destructive editing of your audio.  This is a fancy way of saying that once you make a change to your recorded audio, you might not be able to un-make it.  Higher-end DAWs like Logic Pro and Pro Tools will allow you to make all the changes you want without permanently altering your audio.  This allows you to play around a lot more with your sound after its recorded.  More expensive DAWs also tend to come with a better-sounding set of built-in effects.  This is most noticeable with more subtle effects like reverb.

There are so many DAWs out there that it is hard to pick out a best one.  Personally, I like Logic Pro, but that’s just preference; many of the effects I use are compatible with different DAWs so I suppose I’m mostly just used to the user-interface.  My recommendation is to shop around until something catches your eye.


The Microphone: the Perfect Listener

Studio condenser and ribbon microphones.

Studio condenser and ribbon microphones.

The microphone, for many people, is the most fun part of recording!  They come in many shapes and sizes and color your sound more than any other component in your setup.  Two different microphones can occupy polar opposites in the sonic spectrum.

There are two common types of microphones out there: condenser and dynamic microphones.  I can get carried away with physics sometimes so I will try not to write too much about this particular topic.

Condenser microphones are a more recent invention and offer the best sound quality of any microphone.  They employ a charged parallel plate capacitor to measure vibrations in the air.  This a fancy way of saying that the element in the microphone which ‘hears’ the sound is extremely light and can move freely even when motivated by extremely quiet sounds.

Because of the nature of their design, condenser microphones require a small amplifier circuit built-into the microphone.  Most new condenser microphones use a transistor-based circuit in their internal amplifier but older condenser mics employed internal vacuum-tube amplifiers; these tube microphones are among some of the clearest and most detailed sounding microphones ever made.

Dynamic microphones, like condenser microphones, also come in two varieties, both emerging from different eras.  The ribbon microphone is the earlier of the two and observes sound with a thin metal ribbon suspended in a magnetic field.  These ribbon microphones are fragile but offer a warm yet detailed quality-of-sound.

The more common vibrating-coil dynamic microphone is the most durable and is used most often for live performance.  The prevalence of the vibrating-coil microphone means that the vibrating-coil is often dropped from the name (sometimes the dynamic is also dropped from the name too); when you use the term dynamic mic, most people will assume you are referring to the vibrating-coil microphone.

With the wonders of globalization, all microphones can be purchase at similar costs.  Though there is usually a small premium to purchase condenser microphones over dynamic mics, costs can remain comfortably around $100-150 for studio-quality recording mics.  This means you can use many brushes to paint your sonic picture.  Often times, dynamic microphones are used for louder instruments like snare and bass drums, guitar amplifiers, and louder vocalists.  Condenser microphones are more often used for detailed sounds like stringed instruments, cymbals, and breathier vocals.

Monitors: can You Hear It?

Studio monitors at Electrical Audio Studios, Chicago

Studio monitors at Electrical Audio Studios, Chicago

When recording, it is important to be able to hear the sound that your system is hearing.  Most people don’t think about it, but there are many kinds of monitors out there: the screen on our phones and computers which allow us to see what the computer is doing, to the viewfinder on a camera which allows us to see what the camera sees.  Sound monitors are just as important.

Good monitors will reproduce sound as neutrally as possible and will only distort at very very high volumes.  These two characteristics are important for monitoring as you record, and hearing things carefully as you mix.  Mix?

Once you have recorded your sound, you may want to change it in your DAW.  Unfortunately, the computer can’t always guess what you want your effects to sound like, so you’ll need to make changes to settings and listen.  This could be as simple as changing the volume of one recorded track or it could be as complicated as correcting an offset in phase of two recorded tracks.  The art of changing the sound of your recorded tracks is called mixing.

If you are using speakers as monitors, make sure they don’t have ridiculously loud bass, like most speakers do.  Mixing should be done without the extra bass; otherwise, someone playing back your track on ‘normal’ speakers will be underwhelmed by a thinner sound.  Sonically neutral speakers make it very easy to hear what you finished product will sound like on any system.

It’s a bit harder to do this with headphones as their proximity to your ears makes the bass more intense.  I personally like mixing on headphones because the closeness to my ear allows me to hear detail better.  If you are to mix with headphones, your headphones must have open-back speakers in them.  This means that there is no plastic shell around the back of the headphone.  With no set volume of air behind the speaker, open-back headphones can effortlessly reproduce detail, even at lower volumes.

closed-vs-open-back-headphones  1

Monitors aren’t just necessary for mixing, they also help to hear what you’re recording as you record it.  Remember when I was talking about the number of different loudnesses you can have for 16-bit and 24-bit audio?  Well, when you make a sound louder than the loudest volume you can record, you get digital distortion.  Digital distortion does not sound like Jimi Hendrix, it does not sound like Metallica, it sounds abrasive and harsh.  Digital distortion, unless you are creating some post-modern masterpiece, should be avoided at all costs.  Monitors, as well as the volume meters in your DAW, allow you to avoid this.  A good rule of thumb is: if it sounds like it’s distorting, it’s distorting.  Sometimes you won’t hear the distortion in your monitors, this is where the little loudness bars on your DAW software come in; those bad boys should never hit the top.


A Quick Word about Formats before we Finish

These days, most music ends up as an mp3.  Convenience is important so mp3 does have its place.  Most higher-end DAWs will allow you to make mp3 files upon export.  My advise to any of your learning sound-engineers out there is to just play around with formatting. However, a basic outline of some common formats may be useful…

24-bit, 96 kHz: This is best format most systems can record to.  Because of large files sizes, audio in this format rarely leaves the DAW.  Audio of this quality is best for editing, mixing, and converting to analog formats like tape or vinyl.

16-bit, 44.1 kHz: This is the format used for CDs.  This format maintains about half of the information that you can record on most systems, but it is optimized for playback by CD players and other similar devices.  Its file-size also allows for about 80 minutes of audio to fit on a typical CD.  Herein lies the balance between excellent sound quality, and file-size.

mp3, 256 kb/s: Looks a bit different, right?  The quality of mp3 is measured in kb/s.  The higher this number, the less compressed the file is and the more space it will occupy.  iTunes uses mp3 at 256 kb/s, Spotify probably uses something closer to 128 kb/s to better support streaming.  You can go as high as 320 kb/s with mp3.  Either way, mp3 compression is always lossy so you will never get an mp3 to sound quite as good as an uncompressed audio file.


In Conclusion

Recording audio is one of the most fun hobbies one can adopt.  Like all new things, recording can be difficult when you first start out but will become more and more fulfilling over time.  One can create their own orchestras at home now; a feat which would have been near impossible 20 years ago.  The world has many amazing sounds and it is up to people messing around with microphone in bedrooms and closets to create more.