Note-Recognition in the Note Map

 

MIDI generation from the Note Map is AudioExplorer's batch-mode MIDI-generation tool.  Batch mode has a tremendous advantage over real-time mode MIDI generation in that an entire audio file can be scanned prior to creating the MIDI data, allowing for more accurate determination of note events with much less guesswork.

Generation of MIDI from a Note Map is accomplished in three stages:

  • generation of a dynamic note-map image based on current note selections and the threshold/maximum envelopes.
  • calculation and display of "note regions".  The note regions have several properties which can be edited individually or as groups of notes.  These properties provide fine control over the MIDI generation step.
  • MIDI generation based on the note regions and their properties.

Dynamic Note-map Generation

The Note Map file stores a full signal vs. time profile for each of the 128 MIDI notes, with one time-point for each interval sampled and analyzed by the Power Spectrum Analyzer.  This allows AudioExplorer to very quickly generate a new graphic image in response to changes made to the note selection and/or the envelopes

 


Calculation of Note Regions

As AudioExplorer examines the signal vs. time profiles of each tone in the Note Map, it first applies smoothing (if any) to each profile, and then uses the tone's threshold to determine the time intervals for which the tone is "on".  These above-threshold time intervals are referred to as "note regions".
Figure 1: Effects of smoothing
Instead of examining every time point sampled by the Power Spectrum Analyzer, the dynamic note map can used a windowed average, causing a smoothing of each note's signal vs. time curve.  An example of smoothing is shown below for the tone G3, which is seen to sound rhythmically during the time interval shown.  When smoothing is applied, many of the very rapid changes (e.g., "rough edges" of the peaks) are seen to disappear, and the highest peaks are reduced.

Smoothing = 0

Smoothing = 25


Figure 2: Calculation of Note Regions from a tone's signal vs. time profile and threshold
The figure shows an area of the note-map image (including the note region overlay) superimposed over the signal vs. time profile from which it was generated.

The signal first rises above the threshold between 26.134 and 26.434 seconds, defining the first region.

The second region spans 26.502 to 27.229 seconds, and includes two small shoulder peaks along with the main central signal peak.

The third region spans 27.303 to 29.314 seconds, and clearly includes multiple "events" which have been interpreted as a single region.  Note that these "events" - or regular fluctuations in the tone's signal - may not be the result of musical events (plucking or bowing a string, pressing key, etc.).  They may also result from special effects (such as delay or reverb) which have been applied to the recording.  Although AudioExplorer is capable of breaking the region into multiple sub-regions using the peak-detect function, this may not always be musically desirable.

 


Figure 3: Illustration of the effects of the minimum duration
In the example below, a tone pulses rhythmically and its signal periodically rises above threshold.  However, if the minimum duration were set to 0.25 seconds (250 ms), several of these rises would be discarded, since the time interval over which they remain above the threshold is less than the minimum duration.

 


When to use "Merge Neighbors"?

Due to limitations in the resolution of the Power Spectrum Analyzer, there can be "leakage" of signal from a strongly sounding frequency into neighboring frequencies.  This problem is especially common in the lowest frequencies.  The strategy used to counter this problem is similar to the "Shoulder Merging" used by the real-time Note Processor.
Figure 4A: To merge neighbors or not to merge?
In this figure, the note map clearly shows a cluster of three adjacent notes sounding at the same time, making this a candidate for neighbor merging.

Inspection of the signal vs. time profiles shows that the center note, B flat 2, is the strongest of the three signals, another bit of evidence that neighbor merging is appropriate. 

However, these profiles have two properties which suggest that these regions are independent and should not be merged.  First, each of the three notes starts at a different time, in order B2 - Bb2 - A2.  Second, each note's  profile has a distinctive shape, quite unlike its neighbor's. 

Figure 4B:

In this example, the center note (E3) is again the strong note.  Furthermore, the signal vs. time profiles are quite similar - a strong center with a shoulder on each side.  E3 is a strong candidate for application of neighbor merging.


Overtone Merging

If "pitch" as perceived by the human ear is the dominant frequency of a sound, then the overtones comprise the sound's "quality", "character", or "timbre". 

Dealing with overtones is, in a word, impossible.  A single note played on a violin might have a strong first overtone, resulting in a strong signal one octave above the fundamental note.  Alternatively, the same violin might be playing that note on one string and the note one octave higher on another string, perhaps resulting in a very similar pair of signals.  Without assuming monophony, there is no way to distinguish between these two possibilities without extra information provided by you, the omniscient audio explorer.

By marking a note region with "Merge Overtones", AudioExplorer is instructed to assume that the marked region is a fundamental, and all available overtones are to be merged into it.

Figure 5: When to merge overtones
Figure 5 shows the profiles of a note (A4) and of its overtones.  Signals for the first (A5) and second (E6) overtones strong enough to have been interpreted as "notes" by AudioExplorer.  The profile shapes of these notes are all quite similar, and it is quite likely that they are in fact related as overtones.

 


 

 

Generation of MIDI from the Note Regions

To generate MIDI, a new set of "derived" note regions are created based on the original set.  In creating the derived regions:

  • "hidden" regions are excluded.
  • for regions marked "Merge Neighbors", regions from neighboring notes covering the same time interval are merged and removed.
  • for regions marked "Merge Overtones", regions from the overtones covering the same time interval are merged and removed.

With all hidden and merged regions removed, generation of MIDI data from this derived set of note regions is straightforward - one MIDI note event is created from each remaining note region.  Note Velocities are calculated from the note's signal amplitude, threshold, and maximum: