Digital Audio Cookbook – Digital Audio Cookbook

The first chapter is about how audio is handled. Just playing around here. See how the WordPress editor works, how things look. And what plugins work for this.

Table of Contents

Stereo Samples

Up to this point, we’ve been looking at functions that take a single input sample and produce a single output sample. You know, something like this:

float Process(float inputSample) {
  float outputSample = 2.0f * inputSample;
  return outputSample;
}

But, let’s say we want to create a mixer that takes two input samples, adds them together, and produces a single output sample. That’s not too bad. We could just extend our function’s argument list to include two input samples instead of one. Something like this:

float MixerProcess(float input1, float input2) {
  float output = input1 + input2;
  return output;
}

That wasn’t too bad. Of course, it means we will have different process functions for processes that take one input versus those that take two inputs, but we solved the issue for now.

But, what if we want to create a splitter process? A splitter takes one input sample and creates two output samples. But, first, let’s make it a little more interesting. Let’s say we want to make a phase splitter process which will have two outputs: the first output will be a copy of the input, and the second output will be the inverse phase of the input (the negative value of the input). To summarize, we will have the following outputs:

output 1 = input
output 2 = -input

That’s going to be really easy to code up. But how do we create two outputs? The first thing that comes to mind is to just create two functions:

float PhaseProcessLeft(float input) {
  return input;
}

float PhaseProcessRight(float input) {
  return -input;
}

Well, that’s one way to do it. But now we have two function calls each time we want to create a phase splitter. Plus, what if the left output needs to know something about the right output before computing. Having two separate functions, one for left and one for right, implies that the two functions are unrelated. I mean, in this example they are, but what about for another type of effect? It sounds like it could lead us into limitations and trouble down the line. Let’s look for a better way.

Let’s create a struct that contains samples for two channels, left and right:

struct StereoSample {
  float left;
  float right;
};

Now we can create a function that returns a StereoSample type:

StereoSample PhaseSplitterProcess(float input) {
  StereoSample output;
  output.left = input;
  output.right = -input;
  return output;
}

Great! Now we can return a StereoSample type, which allows us to return more than one channel from the function call.

We can extend the idea to allow for the input sample to be a StereoSample type as well. Take this example:

StereoSample SwapChannels(StereoSample input) {
  StereoSample output;
  output.left = input.right;
  output.right = input.left;
  return output;
}

Here we just want to swap the channels, i.e. feed the right input channel to the left output channel and feed the left input channel to the right output channel. It’s really simple here, because the code reads exactly like what is being done.

The problem with this, though, is that the function changes depending on how many channels we want to process. Look at the following prototypes we created so far:

float StereoIn_MonoOut_Process(StereoSample input);
StereoSample MonoIn_StereoOut_Process(float input);
StereoSample StereoIn_StereoOut_Process(StereoSample input);

This isn’t a problem inherently. But when we try to make a single base class handle multiple types of effects, it will make a lot more sense if we can have one standardized process function call type which can be overridden by whatever type of effect we dream up. That way, if we want to make a stereo effect with stereo inputs and stereo outputs, or a panner effect which takes a mono input and makes stereo outputs, or even an oscillator or synthesizer that take zero (yes zero) inputs and produces stereo outputs, we can describe all of these with one function. A function like:

AudioChannels Process(AudioChannels input);

Now before we go an create an AudioChannels class and see where that takes us, let’s first look at how this is typically done in C and in efficient C++ implementations:

void Process(float **inputs, float **outputs);

If you are not afraid of pointers, you should be…and for double pointers, doubly so. Seriously though, what’s going on here? The first thing you probably noticed is that now the process function doesn’t return anything! The idea is that the process function will take, as arguments, pointers to arrays of float values: one pointer that points to an array of And double goes for double pointers!

Before we look at what a float** is, let’s first look at what a float* is. A float* is a pointer to a float number. It means “what address is a float sitting in memory”. It’s a way of describing where to find a float. So, for example:

void Process(float *in, float *out) {
  float inputSample, outputSample;
  inputSample = *in;
  outputSample = 2.0f * inputSample;
  *out = outputSample;
}

float input;
float output;

input = 1.0f;
Process(&input, &output);

But what if we want to return more than just a single output sample. Suppose we want to create

x^2 + 4x + 3

Lather, Rinse, Repeat

Here’s a better model. Not quite exactly what really happens, but a good next model to look at on our way forward to understanding digital audio.

Trigger ADC to begin converting the input signal into an input sample
Wait for the ADC to complete its conversion.
Get the input sample from the ADC.
Process the input sample to produce an output sample.
Wait for the DAC to be ready to accept a new output sample.
Pass the output sample to the DAC.
Trigger the DAC to begin converting the output sample to an output signal.
If we wish to continue processing, go to step 1.

This makes perfect logical sense, at least regarding the order in which things seem to take place. Let’s take a look at this process in more detail and see if we can find issues with it or any improvements we can make to it.

Let’s first take a viewpoint where the ADC, the Process, and the DAC are each separate entities (which in actual fact, they are) and let’s see what time slots they occupy in our simple procedure.

The main issue here is that the computer hardware is spending most of its time waiting for the conversions to complete. While the ADC is converting the next input sample and the DAC is simultaneously converting the previous sample to the output, the CPU is just in a loop in step 2, waiting for them to complete. Once the . Worse still, if the ADC completes its conversion in Ts seconds, we can’t issue an output sample until Ts + Tp seconds later, which means the sampling rate is reduced by the time it takes for the process to run and finish. Fortunately there is a better way.

The idea is to have the process working on the previous sample, while the ADC is converting the next sample. In this way, as long as the process takes less than Ts seconds of time to complete, the process is happening while the conversions are happening. And that’s a big deal. Let’s take a look now at what the procedure would look like:

Wait to see if both the ADC conversion is complete and the DAC is ready to accept a new output sample.
Put the next output sample to the DAC.
Grab the input sample from the ADC.
Immediately trigger the ADC and the DAC to both begin converting.
Process the input sample to create a new output sample.
If you wish to continue processing, go to step 1.

Now we have a different approach. Let’s take a look at the time scheduling for this approach:

Now we see that more things are happening concurrently, i.e. at the same time. The ADC begins converting the next input sample, the DAC begins converting the previous output sample, and the Process begins computing the current output sample from the current input sample, all at nearly the same time. The Process finishes (hopefully) at or before the time when the next ADC/DAC sample is ready, and it passes the new information to the DAC, grabs the next input sample from the ADC, and the process continues in this way until we decide to stop.

This system is a lot better than the previous system because now the processor is being used more efficiently. It means that for a given sampling period, you can get more processing out of the CPU. It also means you can use a less powerful CPU to get the same job done, because now it’s not spending a good portion of its time (and power consumption) spinning its wheels waiting in a loop.

There is one drawback though (although we shall see, not really). You may have noticed that now we are juggling three samples in our description here: the previous sample, the current sample, and the next sample. At any given time, the ADC is converting the next sample, the process is working on the current sample, and the DAC is converting the previous sample. Let’s number each audio sample in order that they occur and see how they are scheduling in time with the ADC, Process, and DAC:

If you look carefully at each vertical stack (which represents a particular moment in time), the output sample is always 2 samples behind the input sample. It means that when you design your system in this way, you introduce latency.

Latency

Latency is the amount of samples that the output is behind the input. If you put an input into an audio system, the resulting output does not emerge instantly. It emerges a small amount of time after the arrival of the input. This small amount of time is a delay, and it called the latency of the system.

Let’s first look at what our software is really aiming to accomplish in the simplest terms. We want to process audio. Audio is a stream of samples, numbers, arriving one at a time in sequence. Each number gets processed into a resulting number, called the output. And, the output is then sent out one at a time to get converted into an output audio signal. It works like this:

Wait for an input sample to arrive
Process the input sample into an output sample
Send the output sample to the output
Go to step 1

Our software implementation of this might look something like the following:

do {
  while (!inputSampleArrived); // wait for input sample to arrive
  float inputSample = getInputSample(); // get the input sample
  float outputSample = Process(inputSample); // process the input sample into an output sample
  while (!ableToReceiveOutputSample); // wait for output to be ready to receive output sample
  putOutputSample(outputSample); // send output sample to the output
} while (1); // repeat forever!

Now this is a very simple implementation, and a lot has been glossed over. It should be generating a lot more questions that it answers. But, it’s a start.

In this simple system, where we don’t have any control of how the input samples arrive, or how the output samples get shipped out, we have a very simple function that’s just the process function. For now, let’s assume we are doing mono (single channel) audio and our process function will look something like this:

float Process(float input) {
  float output;
  output = 2.0f * input;
  return output;
}

OK. That’s a pretty simple process. When an input

Polling

In its simplest form, polling for audio means to check if there is audio to process, and then, if it is, to go and process it:


if (thereIsAudioToProcess) {
  Process(theAudio);
}

But if only life were that simple.

Don’t despair. It really kind of is. You just need to add a few more things to this if you need more complexity. Life really is only as complicated as you make it. The purpose here, though, is to explore how things can get complicated. So. If you’re into that, keep reading.

Before we get complicated, however, let’s explore what the above snippet really means. First, let’s expand on how we detect if there is audio to process. Let’s start by defining what thereIsAudioToProcess actually represents.

bool thereIsAudioToProcess = false;

if (thereIsAudioToProcess) {
  Process(theAudio);
}

Ok. That’s a little better. Now we’ve made thereIsAudioToProcess into a boolean type, and is initialized to false.

Of course, now the audio will never be processed. Something from the outside world needs to tell you that audio data has been received and is waiting for you to process it

volatile bool thereIsAudioToProcess = false;

#define BUFFERSIZE 32
float audioPortBuffer[BUFFERSIZE];
float theAudio[BUFFERSIZE];

void AudioInterruptHandler(void) {
  // swap the contents of audioPortBuffer and theAudio
  float temp;
  for (unsigned int i=0; i<BUFFERSIZE; i++) {
    temp = theAudio[i];
    theAudio[i] = audioBuffer[i];
    audioBuffer[i] = temp;
  }
  thereIsAudioToProcess = true;
}

int main(void) {

  if (thereIsAudioToProcess) {
    Process(theAudio);
    thereIsAudioatoProcess = false;
  }
  
}

describe it here

There is an improvement we can make on the performance of that Audio Interrupt Handler. The audio interrupt swaps the contents of the two buffers by copying each sample, one by one, and this take up valuable time. There is a better way, and we’ll see that now.

Now, if you are scared of pointers, you probably should be. But, take a deep breath, and face the beast. It’s not so bad. We’re going to make a simple use of pointers to help speed up our buffer swap. The main idea is that we will use pointers to point to the two buffers, and then swap the pointers instead of the whole contents of each buffer. This will be a huge speed up.

First, let’s rename the actual buffers something bland:

float audioBuffer1[BUFFER_SIZE];
float audioBuffer2[BUFFER_SIZE];

and then declare two pointers, and point them to the above buffers:

float *audioPortBuffer = &audioBuffer1[0];
float *theAudioToProcess = &audioBuffer2[0];

What did we just do here? We’ll first we declare two pointers to floats, one for the audio port buffer and another to the buffer containing audio to process. Then, to initialize them, we set each of those pointers to point to one of the two audioBuffers we created above.

Now, when we want to read data from the audio port buffer, we can access the first pointer just like an array. Same goes for the audio to process. In fact, we could simply replace our previous swap by copying the data as follows:

  for (unsigned int i=0; i<BUFFERSIZE; i++) {
    temp = theAudio[i];
    theAudio[i] = audioBuffer[i];
    audioBuffer[i] = temp;
  }

Nothing changed. Works just as before. Using brackets with a pointer is just like offsetting into an array. So all is still good. But, instead of copying the data, we can do this:

  float *temp = theAudio;
  theAudio = audioBuffer;
  audioBuffer = temp;

Woah! What happened to the for loop? Yep. It’s gone. We can swap the pointers, and now that’s the same as moving the data…well, um, without actually moving the data. Now anything that references theAudio will actually be referencing what used to be audioBuffer, and anything that references audioBuffer now references what used to be theAudio, and nothing actually moved. We basically saved BUFFERSIZE – 1 swap operations which can now be used for something more useful in the future…like maybe some audio processing!

FInally, our interrupt handler looks something like this:

volatile bool thereIsAudioToProcess = false;

#define BUFFERSIZE 32
float audioBuffer1[BUFFERSIZE];
float audioBuffer2[BUFFERSIZE];
float *audioPortBuffer = &audioBuffer1[0];
float *theAudio = &audioBuffer2[0];

void AudioInterruptHandler(void) {
  // swap the audioPortBuffer and theAudio pointers!
  float *temp;
  temp = theAudio;
  theAudio = audioBuffer;
  audioBuffer = temp;
  thereIsAudioToProcess = true;
}

void Process(float *signal) {
  for (unsigned int i=0; i<BUFFERSIZE; i++) {
    signal[i] *= 2.0;
  }
}

int main(void) {

  if (thereIsAudioToProcess) {
    Process(theAudio);
    thereIsAudioatoProcess = false;
  }
  
}

How Digital Audio Flows In and Out of Your System

Audio flows into and out of your system by a serial port, usually I2S or TDM. We are going to describe this process in detail here.

Audio signal are converted into digital by an Analog-to-Digital Converter, also known as an ADC. The ADC receives an electrical pulse telling it to begin a conversion. After a short time, called the sampling period, the analog signal which is at the input pin of the ADC is converted into a number which we call the input sample.

Read the previously converted number from the ADC. We call this number the “input sample”. If this is the first time reading the ADC, you might want to set the input sample to zero instead.
Process the input sample with an algorithm or function to produce an output number. We call this output number the “output sample”.
Check to see if the DAC is finished converting the previous sample and is now ready to receive another sample to convert. If not, wait until it is.
Copy the output sample to the DAC.
Trigger the DAC to use the output sample and begin to convert it into an output voltage.
Trigger the ADC to begin converting the input voltage into another input sample.

Changing Single Parameter Causes Many State Changes

A classic example of this is when you set the cutoff frequency of a filter, and this requires updating 5 or more filter coefficients. The danger is that the coefficients will be partially changed in between their use in the audio process calls, and some corrupt audio block will occur.

The solution is to have the DSP states all updated atomically during an update that is synchronous to the process call. This is often called “safeloading”. It means you also need two copies of the DSP states, one that is being updated by the parameter update, and the other that is being used by the process call. Only when the

This can also be solved using thread safe atomics.

Abrupt State Changes

Audible clicks, pops, and other artifacts, including instabilities, can occur when DSP state variables change abruptly. This simplest example of this is when a gain coefficient changes. If it changes too abruptly between process calls, an audible click is heard.

The solution is to design the DSP process so that it has two copies of the DSP state. First, the target state that the process wishes to arrive at. Second, the current state that the DSP is using. And finally, the DSP process needs a method to move the current state to the target state in a way that is stable, safe, and doesn’t cause unwanted artifacts.

Splitting Updates Into Asynchronous and Synchronous Parts

Updates mean to take a set of user-supplied settings and create from them a set of DSP states that can be used by a DSP process to produce an audio effect. Ultimately, this ends in a change to the DSP states, which to provide “safe-loading” should happen inside the DSP process. However, leading up to this final state, there may be a large amount of processing and computations that should not be done in the processing loop. For this reason, the updates have to have a synchronous part (the part that the DSP process function performs) and an asynchronous part (the part that is performed outside of the DSP process).

isr() {
  process() {
    if (pending_update) {
      // copy new coefficients to current coefficients
    }
    filter(coeffs);
  }
}

main() {
  if (user_makes_a_change) {
    pending_update = false;
    // update the new coefficients
    pending_update = true;
  }
}

Updates Per Sample, Per Block, Per Other

Updating DSP states from the given parameters

Inline Processing Vs. Separate Buffer Pitfalls

class AudioFilter {
public:
  // user settings...
  int type;
  float level_dB;
  float frequency;
  float q;
  
  // public functions...
  float Process(float inputSample);
  void Update(void);
  void Reset(void);
  AudioFilter(); // (constructor)
  
private:
  // dsp states...
  float b0;
  float b1;
  float b2;
  float a1;
  float a2;
  float x1;
  float x2;
  float y1;
  float y2;
};

float AudioFilter::Process(float inputSample) {
  float outputSample;
  outputSample = b0*inputSample + b1*x1 + b2*x2 + a1*y1 + a2*y2;
  y2 = y1;
  y1 = outputSample;
  x2 = x1;
  x1 = inputSample;
  return outputSample;
}

void AudioFilter::Reset(void) {
  x1 = 0.0f;
  x2 = 0.0f;
  y1 = 0.0f;
  y2 = 0.0f;
}

void AudioFilter::Update(void) {
  switch
}

AudioFilter::AudioFilter() {
  type = LOWPASS;
  level_dB = 0.0f;
  frequency = 1000.0f;
  q = 0.7071f;
  Update();
  Reset();
}

C vs. C++

Most applications these days use C++. But, there are some embedded systems like microcontrollers and other systems in which C would be a better approach. Here we will take a quick look at how C and C++ would differ in an audio processing task.

Let’s use a simple pan control and look at how this would be implemented in C++ and C:

class PanControl {
public:
  // user settings
  float pan;
  
  // public functions
  float Process(float inputSample);
  void Update(void);
  void Reset(void);
  PanControl(); // (constructor)
private:
  float gainLeft;
  float gainRight;
};

float PanControl::Process(float inputSample) {
  

void PanControl::Update(void) {
  gainLeft = 1.0f - pan;
  gainRight = pan;
}

And now let’s look how we would implement this in C:

typedef struct {
  float pan;
} pan_settings_t;

typedef struct {
  float gainLeft;
  float gainRight;
} pan_state_t;

void pan_update(pan_settings_t *settings, pan_state_t *state) {
  state->gainLeft = 1.0f - settings->pan;
  state->gainRight = settings-pan;
}

void pan_reset

Hold Logic

The idea of a hold timer, or hold logic, is to wait until a condition remains true for a specified amount of time before accepting that it is true. And, likewise, wait until a condition remains false for a specified amount of time before accepting that is has become false. This achieves resistance to change.

ok write some code?

bool holdLogic(bool condition) {
  static bool state;
  if (state != condition) {
    if (counter == timeout) {
      state = condition;
    } else {
      counter++;
    }
  } else {
    counter = 0;
  }
  return state;
}

Here we see that state is the running output of our holdLogic function. It will only change when the input condition is different from the state AND the timer has timed out. If the input condition is the same as the state, then the timer is reset. So, the only way the state can change is if the input condition remains different from the state for a period of time represented by timeout.

As an example of how we could use this function, let’s see how it would apply to a noise gate. In a typical noise gate, the gate will open only when the input envelope is greater than the given threshold, otherwise the gate is closed:

bool gateOpen = (envelope > threshold);
if (gateOpen) {
  gain = 1.0f;
} else {
  gain = 0.0f;
}

Without the hold logic, the gate is sensitive to opening and closing many times during noisy transition of the envelope across the threshold. Using the holdLogic function, we can clean up this problem by only triggering the noise gate when the envelope is above a threshold for a given number of samples:

bool gateOpen = holdLogic(envelope > threshold);
if (gateOpen) {
  gain = 1.0f;
} else {
  gain = 0.0f;
}

Of course, holdTimer as a function can’t be used for multiple instances. So, we rewrite it as a class so that we can reuse it multiple times as needed:

class HoldTimer {
public:
  HoldTimer() {
    setTimeout(0.010f, 48000.0f);
    reset(false);
  }
  
  void setTimeout(float seconds, float samplingRateHz) {
    timeout = (int) floorf(seconds * samplingRateHz);
  }
  
  void reset(bool setState) {
    state = setState;
    counter = 0;
  }
  
  bool process(bool condition) {
    if (state != condition) {
        if (counter == timeout) {
          state = condition;
        } else {
          counter++;
        }
      } else {
        counter = 0;
      }
    }
    return state;  
  }
  
private:
  bool state;
  int counter;
  int timeout;
};

Now, if you want to use this C++ version, you could do so as follows:

#include "HoldTimer.h"

HoldTimer holdTimer;

holdTimer.setTimeout(0.050f, 44100.0f); // hold time and sampling rate

bool gateOpen = holdTimer.process(envelope > threshold);
if (gateOpen) {
  // open the gate
  gain = 1.0f;
} else {
  // close the gate
  gain = 0.0f;
}