Embodied interaction in music
I too have “ditched”:http://interconnected.org/home/2005/04/12/my_40gb_ipod_has my large iPod for the “iPod Shuffle”:http://www.apple.com/ipodshuffle/, finding that “I love the white-knuckle ride of random listening”:http://www.cityofsound.com/blog/2005/01/the_rise_and_ri.html. But that doesn’t exclude the need for a better small-screen-based music experience.
The pseudo-analogue interface of the iPod clickwheel doesn’t cut it. It can be difficult to control when accessing huge alphabetically ordered lists, and the acceleration or inertia of the view can be really frustrating. The combinations of interactions: clicking into deeper lists, scrolling, clicking deeper, turn into long and tortuous experiences if you are engaged in any simultaneous activity. Plus its difficult to use through clothing, or with gloves.
h3. Music and language
My first thought was something “Jack”:http://www.jackschulze.co.uk and I discussed a long time ago, using a phone keypad to type the first few letters of a artist, album or genre and seeing the results in real-time, much like “iTunes”:http://www.apple.com/itunes/jukebox.html does on a desktop. I find myself using this a lot in iTunes rather than browsing lists.
“Predictive text input”:http://www.t9.com/ would be very effective here, when limited to the dictionary of your own music library. (I wonder if “QIX search”:http://www.christianlindholm.com/christianlindholm/2005/02/qix_from_zi_cor.html would do this for a music library on a mobile?)
Maybe now is the time to look at this as we see “mobile”:http://www.sonyericsson.com/spg.jsp?cc=gb&lc=en&ver=4000&template=pp1_loader&php=php1_10245&zone=pp&lm=pp1&pid=10245 “phone”:http://www.nokia.com/n91/ “music convergence”:http://www.engadget.com/entry/1234000540040867/.
h3. Navigating through movement
Since scrolling is inevitable to some degree, even within fine search results, what about using simple movement or tilt to control the search results? One of the problems with using movement for input is context: when is movement intended? And when is movement the result of walking or a bump in the road?
One solution could be a “squeeze and shake” quasi-mode: squeezing the device puts it into a receptive state.
Another could be more reliance on the 3 axes of tilt, which are less sensitive to larger movements of walking or transport.
I’m not sure about gestural interfaces, most of the prototypes I have seen are difficult to learn, and require a certain level of performativity that I’m not sure everyone wants to be doing in public space. But having accelerometers inside these devices should, and would, allow for the hacking together other personal, adaptive gestural interfaces that would perhaps access higher level functions of the device.
One gesture I think could be simple and effective would be covering the ear to switch tracks. To try this out we could add a light or capacitive touch sensor to each earbud.
With this I think we would have trouble with interference from other objects, like resting the head against a wall. But there’s something nicely personal and intimate about putting the hand next to the ear, as if to listen more intently.
h3. More knobs
Things that are truly analogue, like volume and time, should be mapped to analogue controls. I think one of the greatest unexplored areas in digital music is real-time audio-scrubbing, currently not well supported on any device, probably because of technical constraints. But scrubbing through an entire album, with a directly mapped input, would be a great way of finding the track you wanted.
Research projects like the “DJammer”:http://www.hpl.hp.com/research/mmsl/projects/djammer/ are starting to look at this, specifically for DJs. But since music is inherently time-based there is more work to be done here for everyday players and devices. Let’s skip the interaction design habits we’ve learnt from the CD era and go back to vinyl 🙂
h3. Evolution of the display
Where displays are required, I hope we can be free of small, fuzzy, low-contrast LCDs. With new displays being printable on paper, textiles and other surfaces there’s the possibility of improving the usability, readability and “glanceability” of the display.
We are beginning to see signs of this with this OLED display on this “Sony Network Walkman”:http://dapreview.net/comment.php?comment.news.1086 where the display is under the surface of the product material, without a separate “glass” area.
For the white surface of an iPod, the high-contrast, “paper-like surfaces”:http://www.polymervision.com/New-Center/Downloads/Index.html of technologies like e-ink would make great, highly readable displays.
So I really need to get prototyping with accelerometers and display technologies, to understand simple movement and gesture in navigating music libraries. There are other questions to answer: I’m wondering if using movement to scroll through search results would create the appearance of a large screen space, through the lens of a small screen. As with “bumptunes”:http://interconnected.org/home/2005/03/04/apples_powerbook, I think many more opportunities will emerge as we make these things.
h3. More reading
“Designing for Shuffling”:http://www.cityofsound.com/blog/2005/04/designing_for_s.html
“Thoughts on the iPod Shuffle”:http://interconnected.org/home/2005/04/22/there_are_two
“On the body”:http://people.interaction-ivrea.it/b.negrillo/onthebody/
These are some of my notes from Mikael Fernström‘s lecture at AHO.
The aim of the “Soundobject”:http://www.soundobject.org/ research is to liberate interaction design from visual dominance, to free up our eyes, and to do what small displays don’t do well.
Reasons for focusing on sound:
* Sound is currently under-utilised in interaction design
* Vision is overloaded and our auditory senses are seldom engaged
* In the world we are used to hearing a lot
* Adding sound to existing, optimised visual interfaces does not add much to usability
Sound is very good at attracting our attention, so we have alarms and notification systems that successfully use sound in communication and interaction. We talked about using ‘caller groups’ on mobile phones where people in an address book can be assigned different ringtones, and how effective it was in changing our relationship with our phones. In fact it’s possible to sleep through unimportant calls: our brains are processing and evaluating sound while we sleep.
One fascinating thing that I hadn’t considered is that sound is our fastest sense: it has an extremely high temporal resolution (ten times faster than vision), so for instance our ears can hear pulses at a much higher rate than our eyes can watch a flashing light.
h3. Disadvantages of sound objects
Sound is not good for continuous representation because we cannot shut out sound in the way we can divert our visual attention. It’s also not good for absolute display: pitch, loudness and timbre are relative to most people, even people that have absolute pitch can be affected by contextual sounds. And context is a big issue: loud or quiet environments affect the way that sound must be used in interfaces: libraries and airplanes for example.
There are also big problems with spatial representation in sound, techniques that mimic the position of sound based on binaural differences are inaccessible by about a fifth of the population. This perception of space in sound is also intricately linked with the position and movement of the head. “Some Google searches on spatial representation of sound”:http://www.google.com/search?&q=spatial+representation+of+sound. See also “Psychophysical Scaling of Sonification Mappings [pdf]”:http://sonify.psych.gatech.edu/publications/pdfs/2000ICAD-Scaling-WalkerKramerLane.pdf
‘Filling a bottle with water’ is a sound that could work as part of an interface, representing actions such as downloading, uploading or in replacement of progress bars. The sound can be abstracted into a ‘cartoonification’ that works more effectively: the abstraction separates simulated sounds from everyday sounds.
Mikael cites inspiration from “foley artists”:http://en.wikipedia.org/wiki/Foley_artist working on film sound design, that are experienced in emphasising and simplifying sound actions, and in creating dynamic sound environments, especially in animation.
A side effect of this ‘cartoonification’ is that sounds can be generated in simpler ways: reducing processing and memory overhead in mobile devices. In fact all of the soundobject experiments rely on parametric sound synthesis using “PureData”:http://www.puredata.org/: generated on the fly rather than using sampled sound files, resulting in small, fast, adaptive interface environments (sound files and the PD files used to generate the sounds can be found at the “Soundobject”:http://www.soundobject.org/ site).
One exciting and pragmatic idea that Mikael mentioned was simulating ‘peas in a tin’ to hear how much battery is left in a mobile device. Something that seems quite possible, reduced to mere software, with the accelerometer in the “Nokia 3220”:http://www.nokia.com/phones/3220. Imagine one ‘pea’ rattling about, instead of one ‘bar’ on a visual display…
h3. Research conclusions
The most advanced prototype of a working sound interface was a box that responded to touch, and had invisible soft-buttons on it’s surface that could only be heard through sound. The synthesised sounds responded to the movement of the fingertips across a large touchpad like device (I think it was a “tactex”:http://www.tactex.com/ device). These soft-buttons used a simplified sound model that synthesised _impact_, _friction_ and _deformation_. See “Human-Computer Interaction Design based on Interactive Sonification [pdf]”:http://richie.idc.ul.ie/eoin/research/Actions_And_Agents_04.pdf
The testing involved asking users to feel and hear their way around a number of different patterns of soft-buttons, and to draw the objects they found. See “these slides”:http://www.flickr.com/photos/timo/tags/soundobjects/ for some of the results.
The conclusions were that users were almost as good at using sound interfaces as with normal soft-button interfaces and that auditory displays are certainly a viable option for ubiquitous, especially wearable, computing.
h3. More reading
“Gesture Controlled Audio Systems”:http://www.cost287.org/