This web-page documents a body of personal research work investigating various approaches to model design in audiovisual art which resulted in a set of working interactive studies, submitted as one third of the final portfolio for the Masters in Music, Composition (Studio) at Goldsmiths College, University of London on 17th June 2004. These studies were presented in performance at Goldsmiths College University of London in 2004 across several dates, including the organised evaluation day.
Short example segment from Path Study (Quicktime capture)
I am inspired by the flexibility offered by the digital environment to explore consequences of the interplay between algorithmic processes and human interaction in visual and auditory time-based domains.
One need only consider the effect the emergence of systems of notation had upon previously oral traditions of music to realise the capacity to which codification of musical data into a visual domain can engender new structural complexities in the results. Translating time-based data into spatial data permits macroscopic perspectives free from the limits of our attentive capabilities, affords visually founded pattern-matching and evaluation, and suggests directions to experiment in transformation not limited by our physical memory. Now that we can codify musical (and other) data into a visual domain using digital tools, these representations need not be static; they can be transformed (or be self-transforming) over time. Furthermore, the aesthetic qualities of musical codification in the visual domains can be further developed and explored in time-variant representations, engendering perhaps new art forms.
“It is only in very recent times that the means have become available to synthesise
(or ‘render’) image and sound concurrently within computers, to produce genuinely
integrated audiovisual art forms. However, this degree of integration is only
possible when all aspects of the composition and performance process can take
place within a coherent environment or framework” (Hunt, Kirk, Orton and Merrision,
1998: 199).
Integration of digital generation in distinct media demands some degree of
algorithmic synthesis subject to parametrical control; however there are
no norms for choosing any one implementation of any one established synthesis
model over another – a problem furthermore confounded as Hunt et. al. state
by the fact that “putting together the two media produces combinatorial relationships
of such complexity that composers will need to develop extensive algorithmic
control in order to maintain stylistic consistency” (Hunt, Kirk, Orton and
Merrison 1998: 200).
Normally the algorithmic model consists of software implementing defined relationships on input data (from other algorithmic models, physical input devices or stored data files) to produce output data (whether digitally for further processing, or through sonic or visual output devices). The definition of the sonic model, the visual model, and their interdependent relationships, consists in what they refer to as “the composer’s script. This script is a statement of the intentions for the entire system.” (ibid. 200) Part of this script is determining the parametrical controls and degrees of freedom offered to the performer(s) in controlling the system.
The work presented in this document, and the accompanying executable files, consists of the author’s own explorations regarding different approaches to the composer’s script in this new domain, with the aim to produce a number of working studies, and the resultant software and performances.
I chose
a fixed framework of relationships in which to concentrate my investigations.
As it is the novel time-variant aspect of digital audiovisual tools that interests
me most, the research concentrated on the development of audiovisual instruments
for performance. The basic framework is described in the diagram opposite.
The quantity and complexity of the real-time input devices were also limited for a number of reasons; to avoid clouding interpretation of the algorithmic design with complex multi-dimensional data, to achieve performable systems with approachable learning curves, and to keep the performer’s attention focused on the sonic and visual domains rather than the physical devices of input. The input controllers used are listed in the table below:
| Mouse position | X & Y continuous* parameters | Continuous controllers |
| Mouse button | Button-down event, Button-up event | Event controllers |
| Computer keyboard | Key-down events for a set of N keys | State controllers |
*Although the mouse cursor position is reported in discrete steps, fine granularity results in a perceptually continuous medium.
“Stated colloquially, reducing how many dimensions of control an instrument has makes it less frightening to its performer. More formally, such a reduction concentrates the set of all possible inputs into a more interesting set by avoiding the redundancy inherent in the exponential growth of increasing dimensionality.” (Goudeseune, 2002: 94).
As the primary control device, the mouse offers a potentially rich source of multidimensional data. Not only does it provide the expected two-dimensional Cartesian co-ordinates, but also these same parameters in combination can provide polar co-ordinates (angle and distance from origin). Furthermore, digital systems may capture time as a third parameter, thus the first and second derivatives of co-ordinates give velocity and acceleration data. More interestingly, by associating the co-ordinate data with the time at which it was input, the spatial data becomes an ordered set available to other kinds of transformation, e.g. recursion, statistical analysis, etc. In order to work further with this sequential ordering, I employ button state as a fourth dimensional element permitting not only different behaviour on mouse data during button-down and button-up states but also grouping of input data between mouse button events into discrete path sets or gestures. Such objectification may further concede special status to the co-ordinates at the gesture start and end points, for example.
Clearly a huge range of potential data may be analysed from the simplest control input, thus to function as a performable tool, it would be prudent to narrow this set to an easily comprehensible sub-set. Considering the precedents in tools for the spatial transfer of temporal data (from quill and canvas to modern graphics tablets), and the pragmatic advantages of using a well-known means of control, I therefore resolved to set as the starting point of each study a basic ‘mouse-drawn path’ visual control metaphor.
This work was used by the author for an improvised performance as part of a Goldsmiths University Electronic Music Studios concert on the 12th March 2004 and as part of the performance evaluation day at Goldsmiths College on 17th June 2004.

Short example segment (audio only)
Path gestures using the mouse are captured by the visual interface and represented onscreen. In each gesture, two paths are recorded. Both paths begin from the same original contact point, however the second path follows an interpolated path by taking an average at each point on the path between the previous point and the mouse position, i.e. a first-order derivative. At each step in the gesture, a line is drawn between the respective points of each path, rather than the paths themselves. The net result of this process is a visual representation reminiscent of the structural forms present in Iannis Xenakis’ score for Metastasis (see screen capture above). Up to four such paths can be active at any time.
In continuation of the influence of Iannis Xenakis, I chose granular synthesis consisting of glissandi as the synthesis model. Each of the four paths may be playing up to five simultaneous grains of sound, where each grain is defined according to the visual properties of the respective line within the path. Thus, each path contains a set of potential grain sounds.
Each grain contains a synthesised wavetable tone which follows a pitch curve over the duration of the grain, according to the vertical start and end points of the line, and are filtered by a band-pass filter whose centre frequency also follows a curve over the duration of the grain, based on the horizontal start and end points of the line. The grain amplitude is set randomly for each grain, and the duration of the grain is proportional to the length of the line. Thus the timbre of the grain is defined by both location and angle of the line.
The grains are played in sequence along the line (from first drawn to last), with up to five-fold overlap (the overlap gradually decreasing until the path ceases to sound as the number of sounding voices reduces to zero), thus the paths seem to begin quickly and gradually slow down then stop. This sequential playback engenders visual cues between the curves of the paths and their sonification.
Besides the mouse control for creating new paths, a number of secondary switch controls (i.e. the computer keyboard) are used to select between different preset wavetables for the glissandi tones, permitting a greater timbral range to the instrument, and two controls for increasing or decreasing the degree of curvature in the paths drawn, permitting more or less homogenous gestures. Finally the spacebar is used as a mute trigger, permitting the generation of short staccato gestures or sudden closures.
Because the rate of progression through the paths is slower than the rate of drawing, and the gestures persist over time as they decay, it is possible to draw ahead into the future of a sonic gesture, and hence construct polyphonic soundscapes or more complex relationships between events. I found this time-based extension to the instrument metaphor to be particularly interesting to work with, and thus I resolved to explore temporally persistent visual elements further in the next study.
This work is a continued development from the previous study. It was presented at the Goldsmiths Electronic Music Studios Installation Day on 04 June 2004 (Figure 5), and as part of the performance evaluation day at Goldsmiths College on 17th June 2004.

Short example segment (audio only)
The author and Ian Stonehouse performed a variation of this piece in the Great Hall, Goldsmiths College University of London as part of the Interlace concert of 27th June.

A visitor using Stroke Study during the Goldsmiths EMS Installation
Day, 04 June 2004
The same basic algorithm from the previous work is used to generate the paths, but in this work each of the four possible simultaneous gestures are differentiated by colour. However the most important difference is that the paths, once drawn, remain persistent (both visually and sonically) and, in order to avoid homogeneity, are no longer static objects (both visually and sonically). The temporal variation was achieved through a pseudo 3-D rotation algorithm to gradually transform the paths around a central vertical axis, affording different perspectives on each gesture over time.
The spatial transfer of temporal data becomes convoluted; the visual representation is less score-like, more instrument-like. The instrument-like aspect inspired further interactivity, achieved by treating the rotating lies as virtual instruments that themselves can be played, by ‘striking’ or ‘strumming’ using the mouse (in button-up mode). The striking of the line occurs when the line’s physical vector on-screen and the line from the previous mouse position to the current mouse position intersect.
For the sonic model, I created a simple polyphonic synthesizer with one voice per line in the visual model (up to a total maximum polyphony of 200 voices per path). The properties of each line are transmitted to the auditory model at the moment the mouse intersects the line, and these properties determine the sound. By basing the sonic model on the visual properties of the lines, their 3-D rotation can continuously change their sonification.
In choosing mapping strategies between visual and sonic models, I tried to maintain an intuitive approach where possible. For example, the amplitude of the sounding voice is proportional to the speed of the mouse movement at the intersection, a metaphor for the mapping between input and output energy present in most acoustic instruments. “Mapping strategies that are not one-to-one, and which utilise a measure of the user’s energy under the control of more than one limb (or body part), can be more engaging to users than one-to-one mappings.” (Hunt and Wanderley, 2002: 103).
Continuing in the physical metaphor vein, the rate of decay of the tones is dependent how near the point of intersection lies to the line’s base or tip; physically based metaphor relating to the increased resistance to oscillation in a physical body when struck near a fixed rather than a free end. Finally, the position of the line in the 3-D space is mapped to sonic properties for spatialisation – horizontal position to panning, and perceptual depth to lowpass filter and reverberation wet/dry mixes.
I also related the base pitch to the length of the line, reminiscent of the relationships between pitch and length of string / chamber in string and wind instruments, for example. The metaphor is slightly deformed however as I base the length on the physical length on-screen rather than the absolute length in the 3-D model – this was done in order to preserve the time-variant sonic result of the visual rotation in this study.
Less formally, I related visual colour to timbre (often referred to as ‘tone-colour’) by assigning different harmonic weights to the additive tones in each path; the relationship is quite arbitrary however, based more on brightness and apparent register than any mathematical basis (an arbitrary mapping seemed quite adequate since the four timbres are non-changing).
Other mapping strategies employed are less obviously intuitive: the pitch curve is related to both the vertical position of the line, and of the point at which the containing path began etc. These other mappings were chosen in order to maximise the performative range and expressive potential of the instrument.
In this piece I also introduced a second time-based state controller to select different performance behaviours of the drawn paths. The path gestures repeat their triggering in sequence, either automatically sounding each line of the gesture in the order in which they were drawn, or in the order in which the user stroked them. The choice of which method is used is dependent on the time at which the path is begun; coloured borders to the canvas space indicate which method will be used by their presence or absence. These sequences also gradually decay to make room for new ones. The introduction of the frame as a visual indicator and time-based behaviour controller adds an interesting and challenging element to the performer by increasing the textural range of the instrument while effectively reducing the performer’s independence.
The only other state controllers used are two keyboard keys to navigate through the different timbre/colours for the subsequent path, and the spacebar again used as a mute/clear control.
Having indirect performer control over a complex but consistent system leads to the impression of a performable generative music instrument. It results in a steeper learning curve, but also permits the discovery of interesting techniques and methods of its use, and the discovery of new ranges of the instrument.
This third piece continues with the drawn path metaphor, but explores a much greater degree of complexity in the visual algorithm. It was presented as part of the performance evaluation day at Goldsmiths College on 17th June 2004.

Short example segment (audio only)
The path steps in this piece use a different generation algorithm based on polar coordinate transformations (drawn line lengths and angles are based on the first-order derivation of mouse angle and velocity). The sequential nature of the paths is also emphasized, both in the path onset (repeating the initial motions of the cursor end-on-end) and the line segments (following each other like links in a chain).
In this piece the mouse is always active in creating new line segments that gradually decay away, whether or not the mouse button is pressed. When the mouse button is held down, the movement is recorded into a gesture path memory, and all gesture paths repeat this movement in sequence, end on end. I introduced a unifying factor by synchronizing each of the four paths to repeat according to the duration of the longest path; therefore the apparent tempo of the piece may vary quite rapidly according to the paths that are being performed.
When the mouse button is not held down, trails begin to form from the ends of any last chain links on-screen, and these trails move according to the vector sum of their own original vector and the mouse movement. This rotation factor can create some quite unpredictable and visually rewarding results.
In the sonic domain, I wanted to broaden the studies beyond pure synthesis, so in this case granulation of a source sample was employed. Each animated bar corresponds to a set of parameters for a grain, chosen at random to create a grain cloud based on the total structure on-screen. When the interface loads, the overall amplitude shape of the source waveform is drawn (as a polar coordinate path in blue) around the central point of the screen. The angle of the vector between this central point and a particular line segment therefore determines the inset within the source sample that should be used to sonify the respective grain; thus paths drawn anticlockwise result in a forward progression through the source sample, while paths drawn clockwise result in a reverse progression through the source sample. Distance from the central point controls the playback pitch of the grains (along an arbitrarily discrete scale).
Using a source sample in this way introduces a static stored data input element to the composer’s script. This stored data is therefore a limiting factor in the sonic range of granular textures, but a guiding factor visually by defining regions of different activity and representation on the canvas. The introduction of a spatial topology to the instrument changes considerably the approach to playing it – and this in itself can be quite a challenge.
In conjunction with the higher order complexities in the visual transformations, this interface has virtually no perceivable one-to-one mappings in performance; while it can produce quite a broad range of sound textures, the complexity in control over sound particulars leads to a performance role closer to governance than direct action.
Particles Study
This work employs a simple mapping between mobile visual agents (particles)
and the voices of a polyphonic FM synthesizer. The mobile agents are subject
to various forces and tendencies (gravitational, elastic, momentum, etc) based
on the user interaction. While an interesting study direction, I found it rarely
lead to the kind of sonic results that could be employed in performance, due
in part to the lack of integration between individual particles – the rich
complexity and interdependence between the particles visible in the visual
representation almost entirely eludes auditory analysis.
Branch Study
In this
study, initial mouse gestures are analysed to deduce an estimated polynomial
equation, used as a seed for a generative algorithm; the analogy to fractal
models of branching is strong. These branches may then be ‘gene-spliced’ with
new gestures using a drag-and-drop movement.While the generative element of
this piece was very intuitive and satisfying, the reductive ‘gene-splicing’
algorithm tended to gradually homogenise the branch forms; the consequent sonification
was not therefore a particularly rich musical result. I am keen to develop
these ideas further in future research with the introduction of chaotic factors
to keep the results continuously interesting.
Both Branch Study and Particles Study perhaps suffer from overly complex mathematic algorithms in the composer’s script to be quickly understood and controllable enough for performance. I suspect however that the approaches taken are valid and interesting directions of research that will tend towards the development of sound-generating tools rather than performance instruments as such.
I chose to build the visual models using Macromedia Flash (www.macromedia.com), which affords well-rendered animated vector graphics based on an object-oriented scripting language (Actionscript, a derivation of Ecmascript). In the studies here presented, the polling or capture of the physical input device data takes place within the interactive movies created using Flash. The sonic models were built using Cycling 74’s Max/MSP (Zicarelli 2002, www.cycling74.com), a graphical object based environment that can be used to construct DSP tools (or ‘patches’).
The visual and sonic models communicate with each other by sending data strings
as packets of binary data over a TCP connection. This was much simplified by
using the XMLSocket feature in Flash, and the Flashserver external for Max/MSP
(Matthes 2004). The advantage of using this mode of communication is that the
visual and auditory models need not be on the same computer, so long as both
computers share a network connection.
Conceptual appraisal
Tying sonic and visual animation together assists in understanding the underlying integration – the composer’s script – through a kind of synaesthetic Gestalt. For example, it is quite clear in the Stroke Study that the lines become more opaque as they trigger a sound, and thus by following the visually beating rhythms and changing lengths of the lines as they rotate, it can be easy to see which are responsible for which tones. By contrast, in the visually static world of the Path Study, it can be very difficult indeed to discern which parts of which paths are representing which sounds. Clearly time-based indications relationships are utmost importance in the design of the composer’s script, for “if the performer can comprehend the mappings embedded in an instrument, obviously a more refined performance can result.” (Goudeseune, 2002: 85).
However the performer’s comprehension is only half of the equation: “Expression is the act of communicating meaning or feeling. Both player and listener, therefore, are involved in an understanding of the mapping between a player’s actions and the sounds produced.” (Fels, Gadd and Mulder, 2002: 110). Some of the studies presented here (and variations) have been performed in concert situations, with the visual element projected on large screens, such that the audience must appreciate the audiovisual presentation without access to the personal, physical knowledge garnered through actual performance. However, the statement above of Fels et al was strongly supported by the audience members to whom I talked afterwards; many particularly indicated that as I had opted to leave the mouse cursor visible in the projection, they were able to watch the performance gestures, which in turn aided them to comprehend the system, and the instrumental usage of it.
Of the three studies presented, I found the Stroke Study to be the most rewarding to use; I believe this was because the interaction incorporates two modes of activity – the one being the instrument-building activity of creating paths, the other being closer to composition in playing these paths. Hunt et al define this kind of interactivity as “…active-score interaction where the performer directly interacts with the audiovisual output of the piece (the score), influencing its subsequent evolution in a direct or indirect way.“ (Hunt, Kirk, Orton and Merrison 1998: 202).
This is a very personal, subjective appraisal however. While demonstrating the Stroke Study as part of the Goldsmiths College Installation Day (04 June 2004), it was very interesting to see how others responded to playing the instrument. Some very different approaches were taken to performance; some visitors appeared to play more with the visual structures (many used the term ‘architectural’), while others focussed on trying to discover the sonic range, others still tended to keep the interaction to a minimum by filling the canvas with paths and trigger sequences, then passively experiencing of the gradually evolving visual landscape and soundscape – i.e. enjoying the algorithm!
Several particularly interesting areas have presented themselves as worthy of further research during this project. Firstly, these studies have been visually driven; each was first inspired by a potential visual model, and the sonic model chosen according to the potential for effective mapping strategy (with the exception of the decision to use an external source sound file for String Study). While there are many more (and more complex) different archetypes of visual models to explore, it would be interesting to continue my studies of time-variant audiovisual canvasses being driven ideologically by sonic models before visual models for a contrasting perspective.
Secondly, the networking capabilities of separating sonic and visual models in digital systems suggest another interesting area to consider being multi-client models; whether the multiplicity is in the user interface, the sonic model and/or the visual model, or even in the audience interface. For example, many ‘performers’ may drive their own visual interface leading to a combined sonic output; or they may share a single audiovisual canvas and be able to transform or ‘play’ the marks made by other performers; or each audience member may have an individual sonic/visual model with which to interact.
Thirdly, the work for Particles Study and Branch Study suggested the development of audiovisual canvasses as tools for composers in the generation or arrangement of sound, outside of a performance context – the primary difference being the abstraction of composition-time from real-time. The most interesting challenge here would be that, as the visual model is time-variant, a distinct kind of spatial representation of composition-time would need to be chosen.
Finally, the research suggested the potential benefit of a generic model allowing various different algorithmic mappings between sonic, visual and physical input data to be constructed, tested and explored, as an experimental research tool.
The executable files were construced using the Max/MSP runtime engine and Macromedia Flash MX, require Macintosh OS 10.2.8 or higher, and assume the availability of the CoreAudio driver ‘Built-in audio controller'. Launch the sonic model application first, then launch the visual model application. Please refer to the README files in the .zip files for further information:
Sidney Fels, Ashley Gadd and Axel Mulder, 2002: ‘Mapping transparency through
metaphor: towards more expressive musical instruments’, Organised Sound Vol.
7, No. 2, pp 109-126.
Goudeseune, Camille, 2002: ‘Interpolated mappings for musical instruments’,
Organised Sound Vol. 7, No. 2, pp85-96.
Andy Hunt, Ross Kirk, Richard Orton and Benji Merrison, 1998: ‘A generic model
for compositional approaches to audiovisual media’, Organised Sound Vol. 3,
No. 3, pp 199-209.
Andy Hunt and Marcelo Wanderley, 2002: ‘Mapping performer parameters to synthesis
engines’, Organised Sound, Vol. 7, No. 2, pp97-108.
Matthes, Olaf, 2004: flashserver External for Max/MSP, web source: http://www.nullmedium.de/dev/flashserver/download/flashserver.pdf
Mulder, A., Fels, S., and Mase, K. 1997: ‘Empty-handed gesture analysis in
Max/FTS’, web source: http://hct.ece.ubc.ca/publications/pdf/mulder-fels-mase-1997.pdf
Zicarelli, David, 2002: "How I Learned to Love a Program That Does Nothing." Computer
Music Journal Vol. 26, No. 4, pp44-51.