The Power Spectrum Analyzer

 

The Power Spectrum Analyzer (PSA) is the analytical heart of AudioExplorer.  The PSA is responsible for converting audio data from the time domain (e.g., audio signal vs. time) to the frequency domain (e.g., signal strenght vs. frequency).  Internally, AudioExplorer's Power Spectrum Analyzer makes use of a particularly elegant and powerful  open-source module, FFTW ("Fastest Fourier Transform in the West; see www.fftw.org).

The PSA is active during real-time playback and record, and during batch generation of spectral files and NoteMaps.

 

Sample Preparation

While it might be possible to send all audio samples to the Power Spectrum Analyzer for one (extremely!) lengthy analysis, this would provide little useful information.  The resulting spectrum would describe all frequencies and notes in the entire musical selection, but would provide no information about when notes were played or how long they lasted.

In order to extract timing information from an audio selection, the audio data is sliced into discrete "chunks" as shown in the figure below.  The number of samples between spectra determines the time resolution of the analysis.  For example, for "CD Quality" music, there are 44,100 audio samples for each second of music.  If the # samples between spectra is 512, then a chunk is analyzed for every 11.6 milliseconds (512/44100) of music.

 

The # samples per spectrum determines the frequency resolution of the analysis, as shown in the table.  Larger "chunks" improve the frequency resolution, but also cause "smearing" of the notes - e.g., 32768 samples/spectrum provides excellent frequency resolution, but all notes played during any 743 millisecond interval will show up together, even if one note was played early in the time interval, and another note at the end of the time interval.  Furthermore, a large # of samples per spectrum require much more processing power, which may become a limitation during real-time (record and playback) operations.

 

Audio Samples/Spectrum Duration (milliseconds) Frequency Resolution
512 11.6 172 Hz
1024 23.2 86.1 Hz
2048 46.4 43.1 Hz
4096 92.9 21.5 Hz
8192 186 10.8 Hz
16384 327 5.4 Hz
32768 743 2.69 Hz

.

In choosing suitable values for samples between spectra and samples per spectrum, a few considerations are:

  • Is the music very fast, or fairly slow?  If the music is very fast and notes occur in rapid succession, a smaller value for samples between spectra will improve the time resolution.
  • Does the music have intricate bass lines?  If so, a larger value for samples per spectrum may improve the bass resolution.
  • Can my hard-drive accommodate an extremely large spectral file?  Smaller value of samples between spectra and larger values of samples per spectrum both lead to larger storage requirements for the spectral data.  Even if you are preparing a NoteMap and intend to discard the spectral file, this storage space is still required for transient use.

Spectrum Generation

The Power Spectrum Analyzer provides a few statistical tricks to improve the quality of the generated frequency spectra. 

First, each "chunk" of audio samples sent for analysis can be broken down into a number of "subchunks"; each "subchunk" is analyzed separately, and the results are averaged together.  This averaging can reduce the effects of noise present in the audio recording.  Note that if Averaging is set to 1, the original chunk is not subdivided and no averaging actually occurs.  It is important to remember that when averaging is used, the actual frequency resolution of the resulting spectra will be decreased proportionately, just as if a smaller value for Samples per Spectra had been selected.

After dividing the data into "subchunks", a weighting scheme is applied to each subchunk.  In general, these weighting schemes apply a weight of 0 to audio samples at the extreme ends of each subchunk, a weight of 1.0 to the sample at the center of a subchunk, and transition smoothly through other regions of the subchunk.  Note that the square window applies a weight of 1.0 to all samples, effectively disabling the weighting process.  Weighting is an important part of the Power Spectrum Analyzer, as it reduces leakage of information from neighboring chunks and subchunks.  As an experiment, try turning weighting off (by selecting the Square weighting scheme) and comparing the quality of generated NoteMaps. 

 

Weighting Schemes
Parzen Square Welch Hanning

 

Overlapping is a third mechanism for improving the statistical quality of the PSA output.  When overlapping is enabled, the length of each subchunk is doubled so that it overlaps with the subchunks before and after.  Note that this causes the actual number of subchunks to be decreased by 1 (e.g., Averaging - 1), so that if overlapping is used, it may be desirable to also increase the averaging.  Overlapping is disabled when averaging is set to 2 or less.