201b Project: Multi Resolution Audio Transforms

Completed as part of a Masters in Arts in Media, Arts and Technology at UCSB, Graham Wakefield, Dec 2004.

Why Multi-Resolution? :: MXJ API: a Java bridge for Max/MSP :: GWavelet :: MultiRes :: MRvst :: MRcommandline :: Future Development :: Downloads

Multi Resolution Audio

This project investigated techniques to analyze, transform and resynthesize audio signals through the use of multi-resolution analysis to individually process separate time/frequency arrays.

It makes use of the MXJ Java API for Max/MSP in the first project, the Java Swing API in the second, and the Steinberg VST plugin API in the third. The fourth, using PortAudio, has not yet reached a fully working state.

In composition I have often made use of frequency domain transforms, mosttly FFT or FFT-based phase vocoder based, for spectral shaping, spectral delays, convolution filters etc. A frustration with the time/frequency trade-offs led me to investigate different frequency-domain transforms, such as constant-Q transforms. The work presented here demonstrates preliminary experiments in this vein.

Why Multi-Resolution?

FFT analysis metaphorically splits the incoming audio into several separate channels, each representing the weight and phase of each frequency bin in the analysis (as idealised sine wave components). This information can be then used to resynthesize (an estimation of) the input signal; if the time/frequency information is distorted before resynthesis, frequency domain transforms can be performed, such as spectral filtering, spectral gating, spectral delays, etc.

Because of the nature of the FFT algorithm, most of the frequency information of a typical FFT analysis resides in the upper reaches of our hearing. In addition, the time-resolution (window size) of an FFT analysis is uniform across all frequency bins, in contrast to nature, where short-term events (transients) usually reside in the higher frequency ranges.

In contrast, constant-Q and other multi-resolution analysis types use different bin heights (frequency resolution) and window lengths (time resolution) for different bins of frequency analysis. Thus MR analysis can concentrate on frequency resolution for low frequency sounds, and high temporal resolution for high frequency sounds:

Caveat: the trade-off of height/length in frequency analysis still applies.

Though I intend to investigate many techniques (constant Q, wavelet (Haar, Debauchies etc), cochleograms etc), the work presented mostly employs variations of the Haar wavelet transform.

Back to top

MXJ API: a Java bridge for Max/MSP

I made use of a Java bridge for Max/MSP to develop and test the algorithms to be used.

I found this more productive that diving straight into c++ code, as Max/MSP provides an excellent space in which to experiment and prototype by providing many utilities (file loading, waveform display, etc) that may have otherwise slowed my development down.

Using the MXJ API, a Java class (by subclassing com.cycling74.msp.MaxObject) file may be represented by a Max object within a Max patcher.

Calls to methods in the java class can be made by sending Max messages to the object (where the first element of the message is the method to call, and subsequent elements are the arguments).

The com.cycling74.msp.MSPBuffer class gives Java access to modify any audio data stored in Max/MSP's RAM.

Using MXJ to develop was useful, and since I kept all the actual processing code in a separate class files to the interfacing MaxObject class file, the algorithms should be easily portable to different Java audio APIs such as JavaSound, JSyn etc.

Back to top

 

GWavelet

The first working demonstration of the GWavelet class implements the Haar wavelet transform. The Haar wavelet analysis takes a power of 2 vector of samples, and creates two new vectors by applying a lowpass (scaling) and a highpass (wavelet function) filter. The highpass wavelet function is returns an amplitude related value, whilst the lowpass scaling vector returned is then recursively returned for further analysis, until the vector reaches a length of 2 samples. It is normally used for data reduction, particularly in images.

For the purposes of research, the class simply reads in mono audio data from a Max/MSP buffer, performs the Haar analysis, storing the OWT (ordered wavelet transform) into a second Max/MSP buffer, and then resynthesising the original signal (plus transformations) to a third Max/MSP buffer. This constitutes minimal conditions for being able to evaluate the usefulness of Haar wavelet transforms as a useful technique for synthesis.

The analysis method per 1024-sample segment (10 bins):

  • Starts at highest bin & process on the input array
  • Averages this sample & subsequent; puts this result in the next slot of the estimate array
  • Subtracts this average from the first input sample to get the error; places this in the next slot of this bin
  • Continue to end of segment
  • Move to the next lowest bin & process on the estimate array (until the estimate array is only 2 samples long)

I built a means to isolate bands or adjust their weights throught the setAmp(float[]) method; however now you can hear the problem with the Haar scheme – each band just sounds like (and in fact is) a decimated bandpass filtered copy of the original! This made me try to think of a different algorithm that would allow me to interpolate the bins.

Multi-Res

In this implementation I introduced interpolation. The algorithm per 512-sample (10-bin) segment is:

  • Starts at lowest bin & process on the input array
  • Process in windows of (1 << bin) samples (512 for bin 0, 256 for bin 1, etc.)
  • Calculate the average value over this window, and store as a time/frequency coefficient for this bin
  • Create an interpolated curve from the previous coefficient and this one, and subtract it from the input array
  • Continue to end of segment
  • Move to the next highest bin & process on the (modified) input array (until thewindow length is 1 sample long)

Re-synthesis transform parameters include Amp[], Phase[] PhaseQuantise, InterpolationFactor, Gate threshold, Compression knee, Compress polarity

This method was more succesful at re-creating less noisy signals, however I was suprised to find that changing the interpolation factor (linear <-> cosine interpolation) has little effect upon the cleanliness of the signal. I suspect better results may come from using a more sensitive averaging function.

For this version I also constructed a visualisation tool using the Java Swing Graphics2D API, which displays the data stored in each frequency bin.

Back to top

 

MRvst


Example of the MRvst VST plugin in use in Ableton Live

Having achieved interesting sonic results, I decided to make the algorithm more concrete as an audio plugin. Building the a VST plugin meant porting the Java code to C++, which was fairly trivial, but it also meant moving from a non-real-time to a real-time domain, which meant significant rewriting of the code. Essentially however, the main change is that it cycles through each bin for each 512 sample segment analysed, rather than performing an entire samples' worth of data for each bin at a time.

A complication is that the buffer size of the VST host is unknown until runtime, and may be smaller or larger than the 512 buffer size required by my algorithm. To manage this, I had to introduce a further level of buffering, with a pre-process buffer and a post-process buffer, and manage the read/write pointers to these buffers to maximise performance whilst minimising latency.

At present, the full delay (the length of the pre-process and post-process buffers) is fixed at 256*512 samples, but it would not be too difficult to set this to be any number to allow different delay sizes (it would mean changing the variable for the pointer wrapping limits from the buffer size and imposing appropriate upper and lower limits).

A further complication is that the number of parameters for transformation was too large to contemplate having a VST control for each one; instead I implemented a bin selector parameter to choose the bin to modify (with a special setting of 0 to modify all bins).

Back to top

PortAudio command line MR implementation

Having worked through basic PortAudio demos and played around with them, I thought of trying to port the code from my VST plugin to make a command line implementation using PortAudio. This could then be used as the basis of a larger C++ application.

Unfortunately, I have not had time to complete this implementation to a working stage; since I began to try and implement it with classes, the portAudio callback function is complaining of invalid data types, and it is beyond my C++ skills to figure it out... But I've included the source so far.

Ideas for future development

  • The Java Swing representation for the MXJ MultiRes object is very useful, but it would perhaps be more useful still if the user could interact with the data, by stretching, re-ordering, scaling etc., before resynthesis.
  • I would also like to investigate more algorithms, particularly constant-Q and cochlear-auditory models.
  • Clearly the next step for the VST plugin would be to build and interface; however I am not yet sure how to deal with the huge number of parameters that it ought to have (referencing each bin by choosing the bin then moving the parameter is fine for a demo, but won't work in practice)
  • I would like to attempt to port the VST plugin to and AudioUnit, as a learning exercise.
  • Port the MXJ objects to real-time enabled C Max objects.
  • I'm interested to try out techniques of blending analysis data from different audio analyses to synthesize compund sounds, but I suspect that the decimation noise will prevent this of being much use...

Back to top

Downloads

Mxj-GWavelet:
Mxj_MultiRes:
VST-Plugin
PortAudio command-line app
  • Sources (xcode project & c/c++ files)
  • - currently does not compile
Sounds

 

Back to top