201b Project: Multi Resolution Audio TransformsCompleted as part of a Masters in Arts in Media, Arts and Technology at UCSB, Graham Wakefield, Dec 2004. Why Multi-Resolution? :: MXJ API: a Java bridge for Max/MSP :: GWavelet :: MultiRes :: MRvst :: MRcommandline :: Future Development :: Downloads |
Multi Resolution AudioThis project investigated techniques to analyze, transform and resynthesize audio signals through the use of multi-resolution analysis to individually process separate time/frequency arrays. It makes use of the MXJ Java API for Max/MSP in the first project, the Java Swing API in the second, and the Steinberg VST plugin API in the third. The fourth, using PortAudio, has not yet reached a fully working state. In composition I have often made use of frequency domain transforms, mosttly FFT or FFT-based phase vocoder based, for spectral shaping, spectral delays, convolution filters etc. A frustration with the time/frequency trade-offs led me to investigate different frequency-domain transforms, such as constant-Q transforms. The work presented here demonstrates preliminary experiments in this vein. |
Why Multi-Resolution?
Because of the nature of the FFT algorithm, most of the frequency information of a typical FFT analysis resides in the upper reaches of our hearing. In addition, the time-resolution (window size) of an FFT analysis is uniform across all frequency bins, in contrast to nature, where short-term events (transients) usually reside in the higher frequency ranges. In contrast, constant-Q and other multi-resolution analysis types use different bin heights (frequency resolution) and window lengths (time resolution) for different bins of frequency analysis. Thus MR analysis can concentrate on frequency resolution for low frequency sounds, and high temporal resolution for high frequency sounds: Caveat: the trade-off of height/length in frequency analysis still applies. Though I intend to investigate many techniques (constant Q, wavelet (Haar, Debauchies etc), cochleograms etc), the work presented mostly employs variations of the Haar wavelet transform. MXJ API: a Java bridge for Max/MSP
I found this more productive that diving straight into c++ code, as Max/MSP provides an excellent space in which to experiment and prototype by providing many utilities (file loading, waveform display, etc) that may have otherwise slowed my development down. Using the MXJ API, a Java class (by subclassing com.cycling74.msp.MaxObject) file may be represented by a Max object within a Max patcher. Calls to methods in the java class can be made by sending Max messages to the object (where the first element of the message is the method to call, and subsequent elements are the arguments). The com.cycling74.msp.MSPBuffer class gives Java access to modify any audio data stored in Max/MSP's RAM. Using MXJ to develop was useful, and since I kept all the actual processing code in a separate class files to the interfacing MaxObject class file, the algorithms should be easily portable to different Java audio APIs such as JavaSound, JSyn etc.
|
GWavelet
The first working demonstration of the GWavelet class implements the Haar wavelet transform. The Haar wavelet analysis takes a power of 2 vector of samples, and creates two new vectors by applying a lowpass (scaling) and a highpass (wavelet function) filter. The highpass wavelet function is returns an amplitude related value, whilst the lowpass scaling vector returned is then recursively returned for further analysis, until the vector reaches a length of 2 samples. It is normally used for data reduction, particularly in images. For the purposes of research, the class simply reads in mono audio data from a Max/MSP buffer, performs the Haar analysis, storing the OWT (ordered wavelet transform) into a second Max/MSP buffer, and then resynthesising the original signal (plus transformations) to a third Max/MSP buffer. This constitutes minimal conditions for being able to evaluate the usefulness of Haar wavelet transforms as a useful technique for synthesis. The analysis method per 1024-sample segment (10 bins):
I built a means to isolate bands or adjust their weights throught the setAmp(float[]) method; however now you can hear the problem with the Haar scheme – each band just sounds like (and in fact is) a decimated bandpass filtered copy of the original! This made me try to think of a different algorithm that would allow me to interpolate the bins. |
Multi-Res
In this implementation I introduced interpolation. The algorithm per 512-sample (10-bin) segment is:
Re-synthesis transform parameters include Amp[], Phase[] PhaseQuantise, InterpolationFactor, Gate threshold, Compression knee, Compress polarity This method was more succesful at re-creating less noisy signals, however I was suprised to find that changing the interpolation factor (linear <-> cosine interpolation) has little effect upon the cleanliness of the signal. I suspect better results may come from using a more sensitive averaging function. For this version I also constructed a visualisation tool using the Java Swing Graphics2D API, which displays the data stored in each frequency bin.
|
MRvst
Having achieved interesting sonic results, I decided to make the algorithm more concrete as an audio plugin. Building the a VST plugin meant porting the Java code to C++, which was fairly trivial, but it also meant moving from a non-real-time to a real-time domain, which meant significant rewriting of the code. Essentially however, the main change is that it cycles through each bin for each 512 sample segment analysed, rather than performing an entire samples' worth of data for each bin at a time. A complication is that the buffer size of the VST host is unknown until runtime, and may be smaller or larger than the 512 buffer size required by my algorithm. To manage this, I had to introduce a further level of buffering, with a pre-process buffer and a post-process buffer, and manage the read/write pointers to these buffers to maximise performance whilst minimising latency. At present, the full delay (the length of the pre-process and post-process buffers) is fixed at 256*512 samples, but it would not be too difficult to set this to be any number to allow different delay sizes (it would mean changing the variable for the pointer wrapping limits from the buffer size and imposing appropriate upper and lower limits). A further complication is that the number of parameters for transformation was too large to contemplate having a VST control for each one; instead I implemented a bin selector parameter to choose the bin to modify (with a special setting of 0 to modify all bins). |
PortAudio command line MR implementationHaving worked through basic PortAudio demos and played around with them, I thought of trying to port the code from my VST plugin to make a command line implementation using PortAudio. This could then be used as the basis of a larger C++ application. Unfortunately, I have not had time to complete this implementation to a working stage; since I began to try and implement it with classes, the portAudio callback function is complaining of invalid data types, and it is beyond my C++ skills to figure it out... But I've included the source so far. |
Ideas for future development
|
DownloadsMxj-GWavelet:
Mxj_MultiRes:
VST-Plugin
PortAudio command-line app
Sounds |
|