Open access peer-reviewed chapter - ONLINE FIRST

Auditory Augmented Reality for Well-Being

Written By

Michael Frishkopf and Scott Smallwood

Submitted: 23 March 2025 Reviewed: 30 June 2025 Published: 25 August 2025

DOI: 10.5772/intechopen.1011786

Augmented Reality - Situated Spatial Synergy IntechOpen
Augmented Reality - Situated Spatial Synergy Edited by Michael Cohen

From the Edited Volume

Augmented Reality - Situated Spatial Synergy [Working Title]

Prof. Michael Cohen

Chapter metrics overview

View Full Metrics

Abstract

Augmented reality (AR) harbors an ocularcentric bias toward visual augmented reality (VAR), at the expense of auditory augmented reality (AAR). The latter is more readily available, technologically simpler and lower cost, with great potential for improving human well-being, what we call AAR for Well-being (AAR4W). In our chapter, we discuss the history of AR from this perspective, then present several AAR4W projects, each extending reality with acousmatic sound, as informed, in varying proportions, by disciplines of sound art, electroacoustic composition, ethnomusicology, and health sciences, including context, technical descriptions, experience, and future implications. Each is designed to enhance well-being using soundscapes, which we define as steady-state, nonperiodic sound, primarily admixtures of natural, musical, and synthetic sounds. Soundscapes offer enormous potential to alter mood and promote mental health, while remaining subliminal. Our projects are designed to support reflection, meditation, relaxation, and focus, with explicit intentions ranging among aesthetic priorities of sound artists, multicultural educational priorities of ethnomusicologists, and therapeutic priorities of health care providers. We consider soundscapes an ideal AR vehicle for well-being, as they naturally blend seamlessly with the real world, mixing harmoniously with each other, requiring minimal technological intervention or cost, and fading non-distractively into the perceptual background, unlike VAR presentation of visual information, which tends to be perceptually and cognitively disruptive.

Keywords

  • augmented reality
  • sound
  • soundscapes
  • sound art
  • mental health
  • well-being
  • machine learning
  • soundwalking

1. Introduction: The varieties of AR experience

1.1 Definitions: AR, VAR, AAR

In a broad sense, augmented reality (AR) can be defined as a simulated sensory environment seamlessly superposed upon ordinary reality, rather than replacing it entirely, as in virtual reality (VR), such that the user perceives this environmental fusion through sensory channels. While the human senses are more numerous than the Aristotelian five,1 AR encompasses primarily visual and auditory channels; olfactory, haptic, and gustatory channels are rarely addressed due to technological obstacles. But for the most part, AR combines possibilities offered by visual augmented reality (VAR) and auditory augmented reality (AAR).2 AAR and VAR may aim for coordinated, congruent sensory fusion, providing auditory and visual perspectives on a common multisensory simulated environment (e.g., matching sights and sounds of wild animals, a moving vehicle, or a trumpet player), or they may generate independent auditory and visual environments.

Frequently, AR is narrowly equated (sometimes tacitly) with VAR. As Yang, Barde, and Billinghurst state in their review, referencing earlier authors: “…the overwhelming majority of AR research has been focusing on visual augmentation” [6, 9, 10, 11]. Prevailing definitions center on computer technology generating graphical images augmenting natural vision, using optical see-through displays to superimpose virtual visual objects over reality [12]. The following entry in a technical dictionary is typical of the ocularcentric bias:

augmented reality the idea that an observer's experience of an environment can be augmented with computer generated information. Usually this refers to a system in which computer graphics are overlaid onto a live video picture or projected onto a transparent screen as in a head-up display [13].

Modern VAR is predicated upon high-tech devices developed only in recent decades—such as stereographic (binocular) head-mounted or head-up displays (HMDs, HUDs), AR eyeglasses and contact lenses, and virtual retinal displays (VRDs)—to present virtual images to the user [14, 15]. Indeed, though AR dates to Morton Heilig’s 1962 Sensorama, which combined visual, acoustic, and olfactory stimulation [16, 17], and Sutherland’s early HMD [18], the term “augmented reality” was not coined until 1992, with the development of digital HMD technologies [19], which came to be intimately associated with the concept [20]. Textbook treatments of AR generally pay short shrift to sound, limiting it to a support role in the service of virtual images [15] (for instance, both virtual avatars and NPCs may need to speak) despite the fact that AAR is, in many respects, simpler, less expensive, and more emotionally powerful.

Actually, Mavor and Durlach’s foundational 1995 edited volume on virtual environments devotes its entire third chapter to the auditory channel, following prefatory remarks, presciently echoing our own observations on low-tech AAR:

As indicated previously, the accomplishments and needs associated with the auditory channel differ radically from those associated with the visual channel. Specifically, in the auditory channel, the interface devices (earphones and loudspeakers) are essentially adequate right now. In other words, from the viewpoint of synthetic environment (SE) systems, there is no need for research and development on these devices and no need to consider the characteristics of the peripheral auditory system to which such devices must be matched. What is needed, however, is better understanding of what sounds should be presented using these devices and how these sounds should be generated. Accordingly, most of the material presented in this section is concerned not so much with auditory interfaces as with other aspects of signal presentation in the auditory channel [21].

There follows a discussion of how to communicate information to the user through the auditory channel, including (on p. 136) the related but distinct techniques of data audification and sonification (these terms are elaborated below), as well as spatial sound, all within the broader field of auditory display technology [22], while noting that “…relatively little attention has been given to augmented reality in the auditory channel” [21].

However, despite this broad “vision” for AAR early in AR’s history, in practice most subsequent AR research has focused on the visual field, as well as practical informational applications (medical, robotic, etc.), rather than creative or therapeutic ones, at least until the more recent era of AR gaming. Several widely cited overviews of AR exclude the auditory field entirely. Thus, Dey et al., citing Azuma: “Augmented Reality (AR) is a technology field that involves the seamless overlay of computer generated virtual images on the real world, in such a way that the virtual content is aligned with real world objects, and can be viewed and interacted with in real time (Azuma, 1997)” [9], or Zhou et al.: “Augmented Reality (AR) is a technology which allows computer generated virtual imagery to exactly overlay physical objects in real time” [10].

Yet, in his pioneering article, Ronald Azuma himself defined AR broadly as a system with three characteristics: “1) Combines real and virtual; 2) Interactive in real time; 3) Registered in 3-D” ([23], section 1.2). And while nearly all his examples are visual, he also emphasizes AR’s multisensory nature, specifically mentioning the possibility of AAR:

Augmented Reality might apply to all senses, not just sight. So far, researchers have focused on blending real and virtual images and graphics. However, AR could be extended to include sound. The user would wear headphones equipped with microphones on the outside. The headphones would add synthetic, directional 3–D sound, while the external microphones would detect incoming sounds from the environment. This would give the system a chance to mask or cover up selected real sounds from the environment by generating a masking signal that exactly canceled the incoming real sound (Durlach and Mavor 1995). While this would not be easy to do, it might be possible [23].

Azuma’s definition of AR emphasizes an important ingredient noted by many other authors (e.g., [20]), namely interactivity: the simulated sensory environment (whatever its modalities) is required to respond to the subject’s spatio-temporal environment: location, orientation, movement, time, and potentially internal physiological variables (e.g., heart rate) as well. While perhaps “not easy to do” at the time of his writing (1997), interactive augmentation in the auditory field is readily available today. For instance, earbuds can directionalize sound depending on head orientation or cancel unwanted noises, enabling greater focus on desired sounds. Hearing aids are a case in point. AAR may also interact with the visual field, for instance, reading visible texts aloud [8] or providing an audible dimension to virtual images. Several projects have developed sophisticated AAR for the blind and visually impaired, rendering the world as sound [24, 25, 26, 27, 28, 29]. But AAR need not be technologically complex to be effective. Unfortunately, nearly 30 years after Azuma’s speculations, comprehensive explorations of AAR concepts, technologies, applications, taxonomies, and potentials remain scarce, though several recent articles and one book (by a sound designer) herald an emerging disciplinary maturity (see [5, 6, 7, 30]).

1.2 AAR and VAR compared

VAR generally entails the technological complexity and cost of a mobile, head-mounted visual display in order to effectively fuse the simulated visual field with the real one. But when AR is defined as a computer-generated overlay occupying any sensory modality [31], a far lower-tech version is possible in the acoustic domain, where “…virtual auditory content is blended into the physical world to augment the user’s real acoustic environment.” Yet, “audio augmented reality (AAR) remains relatively under-explored” [6] as well as under-theorized, though the recent grounded theory taxonomy by Dam et al. [5, 30] represents a salutary step toward establishing AAR on a more solid foundation. As in feature film or video game soundtracks, auditory channels, though universally acknowledged as critical, tend to be subordinate to the visual ones. But, like VAR, AAR is powerful, though in a different way, and presents a much lower technological bar.

In its broadest form, AAR is familiar as recorded audio content played over a loudspeaker. Such AR offers a potentially enormous range of content without the need for more advanced technology, and thus at a much lower cost, with fewer barriers to entry. No digital technology is required for AAR; open-backed or bone-conduction headphones suffice, allowing the audio stream to blend seamlessly with the real environment. Indeed, any acousmatic3 sound fulfills AR’s broad sense definition. With the advent of portable, personal listening devices starting with the cassette-based Walkman (1979), through the portable CD and MP3 players, to the contemporary smartphone and the latest crop of smart glasses and lightweight speakers (headphones or earbuds), the radical impact of acousmatic environmental augmentation became portable too [33, 34, 35, 36].

Played through loudspeakers, sound is broadcast relatively omnidirectionally, providing a collective augmented reality by diffusing, reflecting, and diffracting throughout a space, circumnavigating solid objects that would block light, and thereby affecting all who inhabit it, as well as drawing them together through shared auditory experience. Played through personal listening devices, sound becomes mobile and can be personalized. Either way, acousmatic sound can transform mood and color other perceptual modalities far more readily than in the visual domain. The low-tech VAR equivalent to the loudspeaker, for instance, projecting images on one wall of a room,4 is far less immersive. Even when the projected image is large enough to encompass peripheral vision, the user is required to look in a particular direction. But augmentation of visual reality via smart eyewear, the equivalent of a personal listening device, requires the complexity and expense of an HMD in order to blend seamlessly with visual reality.5 Furthermore, at its simplest, the auditory modality is emotionally powerful, widely viewed as more affective, unmediated, soulful, or embodied than the visual, as numerous philosophers and neuroscientists have variously discussed across the ages ([40, 41, 42, 43]; [44] book 3, section 402). Even newer AR technologies have begun to recognize the advantages of AAR. Indeed, Meta’s first foray into the world of AR glasses, the Ray-Ban Meta smart glasses, has resulted in a system that is mostly geared toward sound and lacks an HMD due to current constraints of technology and cost.6

One technology critic has noted that music is the simpler, original AR: “Augmenting our visual field is a new feat, but augmenting our sense of sound is one of the oldest human tricks… When music plays in the background, it provides an emotional undertone for everyone in its presence.” And it does so without the need for intrusive, distracting devices strapped to one’s body. “Having music at our fingertips gives us something many of us lack but crave–control over our environment as well as control of our emotions… Even Google may have realized that augmented reality solutions don’t need a screen. They just need sound” [46].

AAR’s relative simplicity and lower cost depend in part on the fact that providing an immersive auditory field does not require bringing auditory objects into spatial focus, unlike the visual field, where reality augmentation requires spatial focusing on the retina. Indeed, the power of acoustic AR lies partly in the unfocused nature of sound, which tends to permeate enclosed spaces without casting sharp shadows. The tonotopic cochlea performs frequency analysis, and while auditory spatialization follows in the auditory cortex, sonic spatialization is relatively crude compared to spatialization of the visual field. Acousmatic sound takes advantage of these facts while radically widening the range of sonic possibility to anything that has been recorded. Acoustically transparent headphones are thus a simple, relatively low-tech way to augment acoustic reality without blocking it.

However, simply playing music or other sounds in the background does not satisfy everyone’s definition of AR. As stated earlier, in a narrower sense, AR (and VR) require a simulated interactive sensory environment responsive to, as well as superposed upon, the real one. To meet this definition, both AR and VR must respond to the user’s spatial attributes, location, and orientation, using sensors and processors to track GPS position, as well as head, hand, eye, or body movements. Interactivity makes both VAR and AAR more difficult and costly: simulated visual and acoustic fields must be continuously recomputed with corresponding changes in the user position. Such requirements entail additional input and output technology: cameras, accelerometers, magnetometers, gyroscopes, microphones, and GPS to map the environment and detect body and eye location, orientation, and movement; binocular displays and binaural speakers to render the virtual sensible. AR can thereby adjust the projected scene according to the real scene, combined with subject movement.

But even subject to this narrower definition, AAR is simpler than VAR. Given a user trajectory within an AV environment, each AR audio channel is a map from time to amplitude, while each video channel is a map from time and 2D raster pixel space to the four channels defining each pixel (e.g., R, G, B, alpha). And displaying visual information is far more complex than displaying acoustical information, as has been noted. Furthermore, when performing realistic AR modeling through binaural and binocular representations, auditory channels are less variable than the visual due to the more omnidirectional propagation of sound compared to light in the real domain and therefore also in the virtual one. Vision is highly directional, while audition is more omnidirectional and diffused, even when fully modeling binaural hearing by computing a head-related transfer function (HRTF). When the subject moves or turns, VAR necessitates rapid computations corresponding to the quickly changing spatial visual field; in contrast, the spatial auditory field remains more constant, even when computing binaural sound as a function of position, at least when simulating realistic sensory environments.

Interactive sonic technologies have been developed with far fewer technological machinations than required for their visual counterparts. To quote the same technological critic: “Further augmenting its effect, the music we hear can now be responsive to our environments. Spotify allows users to discover popular tracks in their cities or even sync their music to their run. Technology also exists to sync music with gaming experiences or to soundtrack your surroundings” [46, 47, 48, 49, 50].

But interactivity may also entail responsiveness to a user’s physiological attributes by means of biometric data, including heart rate (HR, indicating physiological arousal), heart rate variability (HRV, reflecting stress and autonomic nervous system activity), galvanic skin response (GSR, measuring emotional arousal), respiration rate (RR, related to stress and relaxation), and skin temperature, as well as data from electroencephalography (EEG, indicating brain activity) and electromyography (EMG, indicating muscle activation). This sort of interactivity is crucial in several AAR applications we discuss in this chapter.

VAR and AAR also differ in relation to cognitive (informational) communication versus emotional (affective) evocation: mentation vs. mood. The visual field superimposed in VAR is highly effective for the former, for instance, conveying airspeed to a pilot via a HUD, rendering critical data as a colored plot, or indicating positions of monsters in Pokémon Go [51]. AAR may function in the same way, carrying cognitive content via language (for instance, a synthesized voiceover describing a scene, virtual or real), as well as nonverbal auditory displays, including sonification (data-controlled sound) and audification (data rendered as sound) [22]. Using sonification, one might detect patterns in a dataset (perhaps a lengthy DNA sequence) by hearing it, while audification can render bounded oscillations (seismological data, say) audible. Whole research labs are dedicated to such topics [52]. Likewise, both VAR and AAR evoke mood and arouse emotion. But here the latter is more effective. Perhaps the affective power of sound is most widely recognized in the form of music. However, in our projects, we focus not on music as usually understood, but rather on “soundscapes.”

The soundscape concept first came to prominence through the work of Canadian composer and environmentalist R. Murray Schafer, who in the early 1970s wrote an influential text, The Tuning of the World, in which he introduced the term, encouraging the reader to consider the world soundscape as “a macrocosmic musical composition” [53]. With this text, along with a series of research projects out of Simon Fraser University, Schafer and his colleagues founded the field of acoustic ecology, along with the World Forum for Acoustic Ecology, an organization that still exists today, with its own journal, entitled The Soundscape. Schafer’s project, part activism and part artistic inquiry, inspired a new generation of listeners who combined this “world soundscape” sensibility with Deep Listening methodologies of Pauline Oliveros, an American composer and improviser who cultivated an immersive life practice of listening that complemented many of Schafer’s ideas [54].

Our projects center on soundscapes not as occurring naturally in the world, but rather soundscapes that are deliberately fashioned for aesthetic-educational-therapeutic purposes and generated acoustically. Therefore, our definition is much narrower. For our purposes, we succinctly define a soundscape as a stochastically steady-state, nonperiodic (and hence extensible), non-distractive (hence non-denotative7) acousmatic sound. Such sounds include collages of static natural sounds, such as rain, wind, surf, insects, or dawn choruses; sounds of human activity (e.g., highway traffic, cafe babble); synthetic “colored” noise (white, brown, pink, etc.; see [55]); as well as ambient and minimalist music, especially drones, and so-called sonic or acoustic “wallpaper” [5657]. Brian Eno, for example, who coined the term ambient music in the 1970s, conceived of room ambiences enhanced through musical processes, by which spaces could be suffused with “an atmosphere, or a surrounding influence: a tint” [58]. A great variety of sounds fit, or can be made (through judicious editing) to fit, our definition of soundscape, including various combinations of natural sounds, synthetic sounds, and musical sounds in genres such as ambient, drone, or minimalism, though not the broader category of music as usually conceived (and especially not song!). Following from their definition, it is easy to see that the set of all soundscapes (quite unlike music) is additively closed: mixing two soundscapes produces a third. This closure property—not shared with music—is quite useful, as we will observe. Furthermore, again, unlike most music, soundscapes tend to be non-distractive, fading into the perceptual background, rendering them more suitable for inducing mental calm and focus.

While visual content may trigger emotional responses, as in the visual arts, and especially via emotionally disturbing imagery, the auditory channel is widely regarded as far more potent in this regard [59, 60], generating an affective response by indexing environment features, carrying paralinguistic emotion, evoking affective memory, or setting a mood via musical properties like timbre, melody, and harmony. Our response to natural soundscapes is probably shaped by evolution, stimulating the autonomic nervous system directly, triggering a stress response (sympathetic nervous system) for some sounds (e.g., roaring lion) and a relaxation response (parasympathetic nervous system) for others (e.g., chirping crickets) [61, 62, 63, 64, 65, 66, 67]. Further, audition is naturally immersive; unlike light, sound waves suffuse an enclosed space, and our auditory system is, correspondingly, always “open.” There is no auditory counterpart to “looking away” (“hearing away”?); as Schafer famously remarked, “there are no earlids” [68].

Yet another contrast between VAR and AAR lies in the possibilities for a collective social media experience. VAR can, of course, be synchronized across users, but for the purposes of realism, the technology must uniquely tailor visual imagery to each user in order to achieve perspectival interactivity; unless similarly positioned, no two people will receive the same signal (though they may access different perspectives on the same virtual space, by which the technology can nevertheless induce mediated socialization). However, AAR opens the possibility of a realistic collective social experience in which a group of people simultaneously hears the same sounds. This collective experience enables AAR to generate broad social connections in a way that VAR does not, just as a concert, which may gather thousands to participate in a synchronous communal aural event, is a more effective generator of collective experience than a museum exhibit, in which shared experiences are scarce and largely unsynchronized.

1.3 Auditory augmented reality for well-being (AAR4W)

In this chapter, we are primarily interested in the use of AAR to generate an independent simulated auditory environment, without any coordinated VAR analog, to enhance well-being. Compared to VAR, the relative technical simplicity, immersiveness, emotional power, and collective experience of AAR render it more effective, as well as more practical and economical, toward the promotion of well-being.

Well-being admits a wide array of definitions; here we focus on its broadest psychological and sociological dimensions: human health in a general sense—physical, mental, and social—particularly stress reduction, serenity, focus, insight, and interpersonal connection. Soundscapes appear to be ideal for well-being in this general sense. In our work, we therefore focus upon soundscapes, either as intentionally therapeutic tools or as sound art compositions inducing well-being as a side effect of an intended aesthetic experience. Being steady-state and non-distractive, soundscapes are extensible to arbitrary length, fading into the sensory background while strongly, if subliminally, shaping mood. The closure property mentioned earlier enables sonic mixing and collaging, opening new possibilities for their creation, deployment, and optimization: effective soundscapes can be combined and composited.

Such soundscapes serve to alter the intersensory perception of one’s environment, powerfully and unobtrusively shaping consciousness as surely as other environmental factors, such as lighting, temperature, and wall color. While both sounds and the AR devices that produce them may become objects of perceptual focus, they can also recede into the background realm of utility as naturalized practical instruments for interacting with the world, what Heidegger called “ready-to-hand” [69]. Just as a seasoned pianist, for whom the piano is a virtual bodily extension, needn’t think about her musical tool, neither must the wearer of a portable acousmatic playback device, which likewise becomes a bodily extension.

There is a large literature on the value of music and sound for well-being [63, 70, 71, 72, 73, 74], including among students [75, 76, 77, 78, 79, 80]. Many of these studies focus on soundscapes using a definition comparable to ours (steady-state aperiodic), though typically they do not describe these possibilities as AR. Sounds that activate the parasympathetic autonomic nervous system and thus elicit a relaxation response are widely regarded as therapeutic in a broad as well as specific sense, from accelerating convalescence in the hospital to enhancing focus in the classroom and generally improving mood [81, 82, 83]. Soundscapes (by our definition) tend to have this effect. The literature suggests that colored noise [84, 85] and predictable natural soundscapes are particularly effective [86, 87, 88, 89, 90, 91, 92]. No doubt human beings have evolved over millennia to interpret predictable (hence low-information) nature soundscapes featuring stochastically steady-state sounds resulting from the sonic summation of myriad unthreatening agents (such as chirping crickets, croaking frogs, rustling leaves, light rain, and gentle surf) as auditory streams requiring minimal perceptual attention, to which we therefore respond with decreased stress, increased relaxation, and improved mood. Such sounds contrast with unpredictable high-information impulse sounds emitted by potentially dangerous agents, like the roar of a wild animal or unexpected footsteps in one’s home, which trigger a high perceptual alert, requiring rapid processing and stimulating a stressful fight-flight response8 [86, 94, 95, 96].

Another immersive practice, soundwalking, adds to the already recognized practice of walking (for physical and mental health): a method of listening and attunement to the sense of place through a specific, often artistically curated pathway. Soundwalks can be self-directed or curated and guided, and at a basic level involve moving attentively through an environment and practicing what Pauline Oliveros called Deep Listening, or listening with attention to analytical detail, intentionally [54]. Hildegard Westerkamp—an original member of the aforementioned acoustic ecology movement of R. Murray Schafer, Barry Truax, and others—has written extensively on soundwalking as a practice, describing it as an “intense introduction to the experience of uncompromised listening” [97].

Soundwalking can also be mediated through technology, and this is where it leans into an AR practice. For example, Janet Cardiff’s audio walks, a huge part of her oeuvre since the 1990s, require the listener to wear open-backed headphones, which allow the local soundscape to leak through a mediated version of that same soundscape Cardiff recorded earlier, alongside her reflections, instructions, musings, and storytelling [98]. In another example, Christina Kubisch’s Electrical Walks uses special headphones and instructions for listening to sounds that are created through magnetic induction noises in modern electronics spread throughout cities, including ATMs, security electronics, and other electronic equipment found in everyday urban environments. For this work, the listener is hearing sounds that normally would not be possible to hear, bringing into sonic awareness a whole layer of existence that we normally would not notice [99]. These two examples differ in their approaches to the normative soundscape of the environments in which they happen, Cardiff’s being more attuned to the sense of place and its potential for narrative and historical reflection, whereas Kubisch’s work brings us into a hidden layer where our imaginations can interpret the experience in personal ways. Additionally, both of these examples involve solo reflections, although often they are presented in groups of individual listeners who walk together.

Soundwalks can be mediated by older technologies as well, such as musical instruments or other sounds. One of the current authors experienced such a soundwalk in Koli National Forest in Finland, led by Pessi Parvianinen in 2010. In this walk through the forest, Parvianinen secretly populated the trail with musicians, each in a solo environment playing excerpts of music by Finnish composers or improvising with birds. It was a lovely way to bring an additional sonic layer into the natural sounds of the environment, inviting the participants to consider generations of musicians whose music was influenced by this environment.

Whether intended as aesthetic, therapeutic, or wellness interventions, and whether deliberately combined with real sounds (as in soundwalks) or not, AR soundscapes can contribute to well-being.

Advertisement

2. AAR4W: Past and present

Soundscape-based AR for well-being lies on a continuum between art and utility. At the former pole, projects seek to extend the scope of music composition by integrating sound with the lived environment, beyond the concert hall, embedding sound through installations. At the latter, projects explicitly seek to enhance well-being, using AR soundscapes to support mental health, especially mitigating stress and promoting calm focus. What follows is a review of some representative case studies.9

2.1 Coronium 3500 (Lucie’s Halo)

In 2014, the second author (Smallwood) created an environmental, site-specific sound installation called Coronium 3500 (Lucie’s Halo) on the site of Caramoor Center for Music and the Arts in Katonah, New York [101, 102, 103]. Part of a larger group of sound installations by various artists, this piece sought to tie the soundscape directly to the environment by (a) creating sounds inspired by the natural sounds of the site’s birds and insects and (b) creating sounds that were entirely dependent on available sunlight (see Figure 1). The resulting “solarsonic” voices, which used custom electronics, played through several small speakers distributed around a grassy area. These speakers were powered by solar panels without batteries, which thus produced a variable level of real-time energy, affecting both the intensity and tempo of musical phrases (see [101] for technical details).

Figure 1.

Coronium 3500 (Lucie’s Halo), installed at the Caramoor Center for the Arts in Katonah, NY, in 2014 and 2015. Photograph by Scott Smallwood.

Although it has since been exhibited elsewhere, the original intention was to create a piece tied directly to Caramoor, incorporating its sounds into a mix of other sounds. Field recordings of the installation site were used to create the sound-making circuitry in order to engineer a soundscape inclusive of the local one. Then, when the piece was installed, the environment itself “played” the piece through a combination of sunlight and shadows created by large deciduous trees surrounding the site. Indeed, even the changing seasons played a role in the resulting soundscape, since the sun’s angle, the presence or lack of leaves on the trees, and other environmental factors had a direct bearing on the sounds of the electronic “critters.” This was a risky proposition, since the piece was exhibited on-site for 2 years, and the visitor’s aural experience of the piece could not be controlled or determined. On a cloudy day, for example, the piece would be very quiet, with some voices never sounding at all. It was fun to watch people experiencing the piece, especially when the sun would suddenly come out from behind clouds to awaken previously silent voices, to the surprise and delight of the visitors.

2.2 Emergences/Expanded Anatomies

In another work from 2024, entitled Emergences/Expanded Anatomies, Smallwood collaborated with visual artist Sean Caulfield on a site-specific installation on the historic townsite of Malvina, Quebec, a colonial-era village near the New Hampshire border [104]. In this work, combining multichannel audio with Caulfield’s prints and drawings, many of which were made on-site, the piece captured the historic and current reality of Malvina, inclusive of a garden with its wooden fence and small garden cottage (see Figure 2). The sound piece included sounds of found objects, including many historical objects from the original town, loaned by local residents, and worked into drones and textures created on-site by the artist. In addition to these sounds, the idea was to also include the remarkable sounds of Malvina’s birds, the wind and rain, the creek running by, and occasional tractors driving down the gravel road. Due to the lack of highway noise, these sounds were ever present in this “high fidelity” environment, and so rather than attempting to record them, the piece simply left room for them by including sparse spaces and silences through which they could be heard.

Figure 2.

Emergences, a 4-channel audio piece, with Sean Caulfield’s Expanded Anatomies, in Malvina, Quebec, August and September 2024. Photographs by Scott Smallwood.

In both of these examples, the “AR” is clear: the pieces incorporate the sounds of the place where the piece happens, making the place itself integral to the work. This work owes some inspiration to many of Schafer’s concert works, in which he eschewed the “temples of silence” that are concert halls in favor of natural environments that become part of the music and stories told in the work, often requiring the audience to travel to remote locations in northern Ontario and elsewhere [105]. Schafer’s work reflects a more general shift towards inclusion of space and place as active parameters, due, in part, to the rise of multichannel sound, but also to the opening up of sonic materials to include our human soundscape. In fact, the rise of sound art in the 1980s, a genre of art born in visual art culture in which sound is an artistic medium, introduced ways of thinking about the artistic uses of sound beyond music, and this includes sound in the context of site-specificity.

2.3 Sounding the Garden

Sounding the Garden [106] is an interdisciplinary, educational AR sound art project situated at the intersection of fine arts, humanities, social sciences, artificial intelligence, and engineering. It draws on student contributors, including electroacoustic music composers, world music performers, designers, and ethnomusicologists, aiming to support their development as artists and scholars.

In 2013, the Canadian Centre for Ethnomusicology (CCE) [107] and the Aga Khan Trust for Culture [108] collaborated on an international event entitled “I am a bird from Heaven’s Garden: Music, Sound, and Architecture in the Muslim World,” heralding the creation of a new Islamic garden (later named the “Aga Khan Garden”) to be funded by the Aga Khan and located on the site of the University of Alberta’s Botanic Garden [109, 110]. Out of this initiative emerged an AR project funded by the Canadian government’s Social Science and Humanities Research Council (SSHRC) entitled “Evolving the Botanic Garden,” which provided AR overlays as educational and aesthetic enhancements of the garden experience via visitors’ mobile devices [111].

As CCE director, the first author (Frishkopf) led construction of an AAR layer, Sounding the Garden [106], aiming to enhance the garden experience with poetry, music, and other sound, with a Sufi interpretation: the garden stroll becoming a mystical-aesthetic path. A team was recruited, including both authors along with graduate students in composition, ethnomusicology, computer science, design, and drama.

Sounding the Garden is a spatial mapping of mystical language, projecting a famous poem by the Persian Sufi poet Farid al-Din Attar (c. 1145–c. 1221) entitled “Language of the Birds” (Mantiq al-Tayr), with its seven-stage spiritual journey toward enlightenment, onto the physical garden. We associated each stage—or “valley” (wādi), to adopt Attar’s metaphor—with a section of the garden. An ensemble comprising students, alumni, and faculty of our music program then collectively created and recorded ambient improvisatory music in a Persian dastgah (melodic mode) corresponding (according to prominent Iranian music scholar, performer, and composer Majid Kiani) to each such “valley.” We also embedded relevant tracks from a CD series, “Music of Central Asia” [112], at specific locations in the garden’s augmented reality space, along with bird calls and poetic recitations of Attar’s masterpiece in Persian, with English translations and interpretations.

Traversing the garden counterclockwise (aligning with Islam’s spiritual direction), one progresses through the seven valleys in order while experiencing associated poetry, sounds, and music (see Figure 3). User position is tracked via GPS (or signaled by tapping on-screen icons), triggering the soundscape. In this way, the piece provides visitors a new sonic layer overlaid upon the remarkable soundscape of the garden itself, deepening affective understanding and enhancing well-being through sound.

Figure 3.

Sounding the Garden entails mapping the seven mystical valleys of Attar’s “Language of the Birds” onto the Aga Khan Garden in a counterclockwise circuit (starting from the Garden entrance, upper left): 1. The valley of the quest, 2. The valley of love, 3. The valley of insight into mystery, 4. The valley of detachment, 5. The valley of unity, 6. The value of bewilderment, 7. The value of poverty and nothingness.

2.4 Autonomous adaptive soundscapes

In 2012, sound artist and composer Yoko Senn found herself in the emergency room of a hospital in New York, and the sonic hellscape she described and sought to escape was almost more traumatic than her medical reason for being there. As an electronic musician and composer, Sen started to imagine ways that such a hospital soundscape could be “composed” by sound artists to create a calming space, rather than one of chaos. Her project to redesign hospital alarm sounds and noises has caught the attention of hospital administrators and researchers, and experiments are now underway to collaborate with equipment manufacturers to create a more supportive sonic environment [113].

Hospital environments are excellent candidates for augmenting a stressful reality with soothing sounds. High stress levels and anxiety, associated with delirium and sleep deprivation, are very common in critically ill patients and may compromise recovery and survival, as well as increase the length and costs of hospital stays [114, 115, 116]. Pharmacological approaches typically used to treat these conditions have non-negligible expense, limited effectiveness, and serious side effects. As a means to counter high stress levels and related effects, music and sound therapies are low-cost and noninvasive, with far more limited side effects than medication. Research (including our own systematic review) has shown them to be highly effective if customized to the patient [117, 118]. However, critically ill patients cannot be expected to communicate effectively with music therapists, who are scarce, frequently unavailable, and costly. Linguistic or cultural differences between the patient and therapist may also limit their effectiveness. Furthermore, as discussed above, soundscapes (as we define them) present many advantages over music.

Building on and integrating knowledge from multiple disciplines (music, music therapy, computer science, nursing, and critical care medicine), we developed autonomous adaptive soundscapes (AAS), an intelligent bio-algorithmic system generating therapeutic soundscapes for critically ill patients, using machine learning and biofeedback to induce relaxation, improve sleep, and reduce agitation, anxiety, and delirium [81]. AAS seeks to optimize a patient’s sonic environment by dynamically selecting, tuning, and mixing soundscape files (currently around 10) drawn from an audio library spanning a wide range of recordings (natural, musical, and synthetic). A reinforcement learning approach [119] developed by computer scientist Martha Steenstrup guides the search of the soundscape space based on autonomic biosignals indicating the patient’s current state, thereby delivering a customized soundscape to the patient (see Figure 4). No conscious, active engagement with the system is required from the patient. Our objective is a system suitable for the intensive care unit (ICU), one that is highly effective, always available, simple to operate, minimally intrusive, and low risk.

Figure 4.

The autonomous adaptive soundscape system (from poster session; see [120]).

AAS also promises enormous potential beyond the ICU, as stress, anxiety, and insomnia are pervasive social problems. Recent evidence-based research supports the use of music and sound for sleep [121] and has led to the specification of sonic criteria (rhythm, pitch, frequency, volume, genre, and duration) suitable for relaxation [122]. However, this research has not yet been operationalized in autonomous adaptive soundscapes. Experimentation with AAS was difficult to conduct in practice due to restrictions surrounding ICU access, particularly during the pandemic, but we ran experiments using a medical simulation laboratory [123]; results are pending.

2.5 Mindful Social Listening (MSL)

We decided to modify AAS to provide adaptive soundscapes for relatively healthy subjects outside medical facilities but nevertheless in need of support, starting with those closest to us: our students. For Mindful Social Listening (MSL), we removed the constraint that no conscious, active engagement with the system is required. However, we also imposed a new one: the MSL system should select soundscapes suitable for group rather than individual listening in order to catalyze student socialization and reduce isolation, exacerbated by the use of headphones and earbuds while studying in the library.

The motivation for such a system stemmed from a simple observation: Academic life is replete with exploration, discovery, learning, creativity…and stress. How can soundscapes enhance student well-being and academic success? Across all disciplines, postsecondary students suffer from the stress that inevitably accompanies the harried occupation of university life, impeding learning and jeopardizing well-being. This dire situation has become a veritable mental health crisis. As university mental health programs are woefully under-resourced, students rely on various forms of self-care. Our pilot study revealed that over 75% of students experience high levels of stress, 94% use music and sound to cope, and about 80% find this use highly effective, wearing personal listening devices (headphones, earbuds) to enhance concentration, mask distractions, and reduce stress. However, students may not be optimizing their sonic environments for stress reduction and focus; often, their selected soundscapes, or preferred genres of music, are actually distracting them from work by demanding attention. Furthermore, personal listening devices can not only be deleterious to hearing but also contribute to social isolation. We sought instead to develop an autonomously evolving, shared, non-distracting augmented sonic reality: a soothing sonic factor analogous to such common (but nonadaptive) ergonomic factors as room temperature, lighting, seating, and interior design.

We address these limitations by developing an autonomous system that generates a responsive soundscape environment in a social listening space. Building on AAS, combining data mining, machine learning, and algorithmic composition, and drawing on field recordings, applied ethnomusicology, health science research, and creative sound art, we have designed a responsive soundscape environment that can autonomously adapt to feedback from socially interacting users, enhancing mindful calm and supporting academic well-being. Installed in a library study room on the University of Alberta campus, this octophonic system10 operates by gathering conscious feedback from students, then compiling this information to update a “value” for each soundscape, as shown in Figure 5.

Figure 5.

Left: Autonomous adaptive feedback cycle to optimize soundscape selection. Right: Flowchart representation of the MSL algorithm [124].

MSL deploys a “multi-armed bandit” reinforcement learning algorithm, balanced between exploration (seeking the optimal soundscape) and exploitation (making use of current knowledge), quite similar to that of AAS, and requiring calibration of three critical parameters:α, ε, and T. Each soundscape has a value ∈ [−1,1], adjusted over time according to user feedback, comprising a vector. Initially, all values are set = 1. After playing a soundscape for T minutes, we compute its reward as the average user response: reward ∈ [−1,1]. Next, we recompute its value as a weighted average between the new reward (weight = α) and the old value (weight = 1−α). Finally, we select the next soundscape: with probability ε, select at random (exploration); with probability 1−ε select the highest-valued soundscape (“greedy” exploitation). Currently, we explore a small set of 10 (like AAS) but plan to develop the algorithm to enable exploration of larger sets (see below). Adaptation is difficult due to change: users may enter or exit, and their preferences may shift. Simulations show that with proper parameter settings, such a system can effectively adapt to play effective sounds addressing user preferences, provided the change is not too rapid.

Advertisement

3. Conclusion: Possible AAR4W futures

As Yogi Berra (or perhaps Niels Bohr) famously quipped, it is always difficult to make predictions, especially about the future. Yet, in our exploration of auditory augmented reality for well-being, many future avenues have suggested themselves, and we are currently exploring several.

AAS currently depends on heart rate alone, a biosignal that is easy to measure but does not fully capture stress. We are considering other noninvasive real-time metrics, including HRV, GSR, RR, EMG, and EEG, and currently experimenting with the latter outside the ICU. We are intending to extend MSL by incorporating a portal system, allowing users to log in so that individual value vectors can be computed and persistently retained from one session to another. Combining this concept with the idea of a single-user system would allow users to establish a value vector representing their own preferences prior to entry into a common listening environment. Then the social listening system would adjudicate among known preferences for the group of users presently inhabiting the space, rather than having to relearn for every group of users, as is now the case. Using collective or individual value vectors to identify the most popular soundscapes and then mixing them together (relying upon the additive closure property) is a possible strategy toward maximizing MSL’s effectiveness, highlighting a distinct advantage of soundscapes over music.

For both AAS and MSL, it may be possible to estimate the value vector rather than learning it from scratch, given each user’s demographic profile and sufficient data linking such profiles to sonic preferences. We have already conducted experiments along these lines, correlating various demographic parameters (including gender, age, place of birth, personality traits, education, musical preferences, and medical history) with sonic preferences, and all that is required is to gather more data.

It may also be possible to explore far larger collections of soundscapes if they can be arranged in a hierarchical or structured way, rather than as a flat data set, so as to facilitate heuristic search. For instance, if an algorithm can determine that users wish to hear sounds of water flowing rather than bird sounds, then the category of water flowing sounds can be explored through structured browsing, analogous to “drill-down” menus, or a progressive gallery exposure.

In any sort of social listening space, it may also be possible to localize soundscapes by spatially tracking users, delivering relevant sound to the subjects who prefer them (while avoiding others who do not). This sort of tracking is increasingly possible with advances in geolocation technologies together with the availability of directional loudspeakers, as well as recently emerging technologies for constructing audible enclaves that can be transmitted through curved paths in space [125], though taken to an extreme, personalized sound undermines the principle of social listening (and could be implemented more simply and effectively through headphones). MSL could also incorporate autonomic biosignal data, similar to AAS.

Moving beyond biosignals, nonhuman environmental signals in the listening space could also be included in an MSL-type system, including acoustic, optical, thermal, temporal, and geospatial sensors (e.g., adapting sound to the user’s perceptual field, using cameras and AI to link to physical landmarks or lighting, as well as temperature, season, time of day, or place). Imagine a computer equipped with sensors for sound (microphones), lighting, temperature, humidity, GPS, and a clock, and a sensor to estimate how many people are in the room. All such data could be combined with biosignals and conscious responses as a means of determining, for example, that people prefer a certain kind of soundscape at dusk versus dawn.

MSL team member Khurram Javed has developed a prototype spatial sound single-user system, allowing the subject to position sounds relative to an on-screen avatar, then indicate the degree of satisfaction for each configuration [126]. The idea is that the system will explore the possibilities while bearing in mind configurations that have already been evaluated in this way. The same system can be used for social listening situations with public speakers, combining the spatial sound preferences of everybody in the room. Experimentation with this system, in collaboration with students at NYU’s MARL [127], to address the needs of attention deficit hyperactivity disorder (ADHD) students is in its initial phases.

New soundscapes for these and other projects can be developed using audio resources gleaned by combing through archives, online and off, as well as new recordings made both in the studio and in the field, and new compositions in the emerging domain of sound art. Indeed, we have already curated over 700 soundscapes using online sources alone. Stacy Bliss [128]—educator, musician-improviser, and sound artist/therapist—specializes in the sonorities of gongs and bowls for sound baths of meditative healing. In 2023, we invited her to the University of Alberta to record improvised gong performances in our Sound3Lab recording studio. With these recordings, we generated a library of gorgeous textures, which we hope to use in future systems as layers for soundscape generation, potentially linked to user profiles and preferences. Two graduate students are currently working on 8-channel pieces utilizing these recordings. Similarly, composer and sound designer Greg Mulyk contributed field recordings, and he and composer Nicolás Arnáez composed soundscapes for MSL.

It is also possible to conceptualize generalizations of soundscapes that operate beyond the range of hearing, exploring low and infrasonic frequencies that are sensed haptically more than auditorily, or tapping into the auditory system directly in a manner analogous to retinal displays. Not only are low frequencies known to be therapeutic, but also such a system would be advantageous for the hard-of-hearing or deaf community.

However, soundscapes extend beyond data in the form of sound files and could include procedural algorithms, including stochastic processes and generative AI. One of our graduate students, Deepak Paramashivan, is developing promising generative AI techniques to produce therapeutic soundscapes from smaller bits of sound, granular synthesis, potentially down to the level of an individual sample. Such a system might also be able to generate a soundscape given a user’s stated musical preferences. Although music and soundscapes are different according to our definitions, it may be possible to transform music into a soundscape using AI or other techniques, focusing on static aspects, such as timbre, sonority, harmony, and texture, rather than melody, rhythm, or developmental form.

Extending to other sensory systems, such soundscapes as have been described could be combined with visual displays. Projecting a soundscape into the visual field can be affected in a variety of ways, from the low-tech (e.g., variable lighting in the room, including color, or corresponding images on a display screen) to the high-tech (all the apparatus of AR, from glasses to retinal projection).

As we enter an age of increasing use of wearable technology with lower costs, lighter weight, and better quality, sounds tied to bodily states and daily function are increasingly likely use cases. Already, one can walk to the gym, swim laps, and walk from there to work, while one’s smartwatch tracks every step and stair climb and knows how many strokes one swam and how many calories one burned, all with very little prompting. And already, one’s wireless earbuds serve endless musical and podcast content, often controlled through voice commands and no physical interfacing. Features such as noise cancelation can shut out much of the world, while bone-conductive technologies can provide good-quality sound without blocking the ears’ listening mechanisms.

The technologies of sound production are also now linked to technological forms of personalization, including physiological responses through sensors, but also to our social media data; yet, the linking of these things has only just begun to come into focus.11 One aspect of this is the personal space of the phone/earphone-scape, but what is really exciting to us is the possibility of group soundscape generation, where the soundscape is linked to a group of individuals in real time, reflecting a kind of “public hum,” inside of which would be places for individual escape or contemplation.

In addition to technological affordances, cultural use of sound has expanded beyond music produced in “temples of silence” [105] and includes a whole panoply of soundscape and nature recordings, ambient music, site-specific sound art, and other sound designs meant to enhance one’s mood within their current environment. We have surveyed some artists who work in these realms, but it is also worth noting the various genres of meditative practices that commonly use music designed specifically for mindfulness and healing. Many artists have found places for themselves within the larger ecosystems of mindfulness practices. For example, Kaitlyn Aurelia Smith’s 2013 album Tides: Music for Meditation and Yoga was commissioned by her mother for use in her meditation practice and became a sensation in the electronic music community, being rereleased with special vinyl editions in 2019. Much of this music is created with analog synthesizers and harmonized vocals, which create uniquely rich forms of droning textures.

In summary, whether explicitly geared toward meditation, mental health, or aesthetic experience, or whether conceived as sound art, music, education, or therapy, the affective power of acousmatic auditory augmented reality presents innumerable possibilities for creative innovation, experimentation, and deployment, ranging far beyond AR’s usual ocularcentric limitations, to support human well-being in the widest possible sense.

Advertisement

Acknowledgments

Martha Steenstrup developed the ML algorithms for AAS, with system programming from Yourui Guo and Marc-André Haley, using Health Gauge watches for heart rate measurement; Martha led AAS experimentation at the NAIT medical simulation unit with support from Mariia Ostroha, Marc-André, Shaista Meghani, and Usha Pant. Greg Mulyk and Corona Wang gathered online soundscape recordings; Greg also recorded new ones in situ. Octophonic soundscapes were edited and composed by Greg and Nicolás Arnáez. Greg programmed MSL, and Khurram Javed developed the spatial sound version. MSL experimentation at the University of Alberta could not have proceeded without support from the library, which authorized speaker installation in a study room funded by a grant from United Way. MSL research at the University of Calgary was led by Chelsea Taylor, supervised by Stephanie Plamondon. Sounding the Garden ambient music recordings were organized by Hossein Hosseiniparvar, with sound design by Greg Mulyk and GPS programming by Yourui Guo. Central Asian tracks are courtesy of Fairouz Nishanova, Director of the Aga Khan Music Programme, and Smithsonian Folkways Recordings. The larger Aga Khan Garden web app project was led by Hussein Keshani. Tom Merklinger’s tireless work keeps the Sound Studies Institute running. We also wish to acknowledge and deeply thank our funders: the University of Alberta’s Killam Foundation (MSL) and Vice President Research and Innovation Pilot Seed Grant (AAS), as well as two of Canada’s federal research programs, the New Frontiers in Research Fund (AAS) and the Social Sciences and Humanities Research Council (Sounding the Garden).

Advertisement

Thanks

The authors would like to express their sincere gratitude to the editor of this volume, Michael Cohen, for his incisive critiques, valuable recommendations, and exemplary editorial precision.

Advertisement

Abbreviations

AAR

auditory augmented reality, audio augmented reality, or augmented aural reality

AAR4W

auditory augmented reality for well-being

AAS

autonomous adaptive soundscapes

AR

augmented reality

CCE

Canadian Centre for Ethnomusicology

EEG

electroencephalography

EMG

electromyography

GPS

global positioning system

GSR

galvanic skin response

HR

heart rate

HRTF

head-related transfer function

HRV

heart rate variability

HUD

head-up display

ICU

intensive care unit

LBE

location-based entertainment

LFE

low-frequency effects

MSL

Mindful Social Listening

RR

respiration rate

VAR

visual augmented reality

References

  1. 1. Polansky RM. Aristotle’s De Anima. New York: Cambridge University Press; 2007
  2. 2. Aristotle. On the Soul: On Breath; Parva Naturalia. Cambridge, Mass./London: Harvard University Press/W. Heinemann Ltd.; 1935. (Loeb Classical Library)
  3. 3. Brandt T, Dieterich M, Huppert D. Human senses and sensors from Aristotle to the present. Frontiers in Neurology. 2024;15:1404720
  4. 4. Fritzsch B, editor. The Senses: A Comprehensive Reference. 2nd ed. Cambridge, MA: Academic Press; 2021. Available from: https://www.sciencedirect.com/science/referenceworks/9780128054093
  5. 5. Dam A, Siddiqui A, Leclercq C, Jeon M. Taxonomy and definition of audio augmented reality (AAR): A grounded theory study. International Journal of Human-Computer Studies. 2024;182:103179
  6. 6. Yang J, Barde A, Billinghurst M. Audio augmented reality: A systematic review of technologies, applications, and future research directions. Journal of the Audio Engineering Society. 2022;70(10):788-809
  7. 7. Harju M. Audio Augmented Reality: Concepts, Technologies and Narratives, Technologies and Narratives. London: CRC Press; 2025
  8. 8. Manthorpe R. Listen Up: Augmented Reality is Coming to Your Ears. Wired; 2016. Available from: https://www.wired.com/story/aural-augmented-reality/
  9. 9. Dey A, Billinghurst M, Lindeman RW, Swan JE. A systematic review of 10 years of augmented reality usability studies: 2005 to 2014. Frontiers in Robotics and Artificial Intelligence. 2018;5:1-28. Available from: https://www.frontiersin.org/journals/robotics-and-ai/articles/10.3389/frobt.2018.00037/full
  10. 10. Zhou F, Duh HBL, Billinghurst M. Trends in augmented reality tracking, interaction and display: A review of ten years of ISMAR. In: 2008 7th IEEE/ACM International Symposium on Mixed and Augmented Reality. New York City: ACM; 2008. pp. 193-202. Available from: https://ieeexplore.ieee.org/document/4637362
  11. 11. Kim K, Billinghurst M, Bruder G, Duh HBL, Welch GF. Revisiting trends in augmented reality research: A review of the 2nd decade of ISMAR (2008-2017). IEEE Transactions on Visualization and Computer Graphics. 2018;24(11):2947-2962
  12. 12. Butterfield A, Ngondi GE, Kerr A, editors. Virtual reality. In: A Dictionary of Computer Science. Oxford, UK: Oxford University Press; 2016. Available from: https://www.oxfordreference.com/display/10.1093/acref/9780199688975.001.0001/acref-9780199688975-e-5714
  13. 13. Laplante PA, editor. Dictionary of Computer Science, Engineering and Technology. Boca Raton: CRC Press; 2017
  14. 14. Mohn E. Augmented reality. In: Salem Press Encyclopedia of Science. Amenia, NY: Salem Press; 2024
  15. 15. Peddie J. Augmented Reality. Cham: Springer International Publishing; 2017. Available from: http://link.springer.com/10.1007/978-3-319-54502-8
  16. 16. Rheingold H. Virtual Reality. Ann Arbor: Summit Books; 1991. Available from: http://archive.org/details/virtualreality00rhei_0
  17. 17. Heilig M. Sensorama stimulator. Patent #3,050,870. 1962.
  18. 18. Sutherland IE. A head-mounted three dimensional display. In: Proceedings of the December 9-11, 1968, Fall Joint Computer Conference, Part I. New York, NY, USA: Association for Computing Machinery; 1968. pp. 757-764. (AFIPS ‘68 (Fall, part I))
  19. 19. Caudell TP, Mizell DW. Augmented reality: An application of heads-up display technology to manual manufacturing processes. In: Hawaii International Conference on System Sciences. New York City: ACM; 1992. pp. 659-669. Available from: https://tweakers.net/files/upload/329676148-Augmented-Reality-An-Application-of-Heads-Up-Display-Technology-to-Manual-Manufacturing-Processes.pdf
  20. 20. Rampolla J, Kipper G. Augmented Reality: An Emerging Technologies Guide to AR. Waltham, MA: Syngress; 2013
  21. 21. Mavor AS, Durlach NI. Virtual Reality: Scientific and Technological Challenges. Washington, D.C.: National Academies Press; 1995. (Committee on Virtual Reality Research and Development, Computer Science and Telecommunications Board, National Research Council)
  22. 22. Kramer G. Auditory Display: Sonification, Audification, and Auditory Interfaces. Reading, Mass: CRC Press; 1994
  23. 23. Azuma RT. A survey of augmented reality. Presence: Teleoperators & Virtual Environments. 1997;6(4):355
  24. 24. Loomis JM, Golledge RG, Klatzky RL. Navigation system for the blind: Auditory display modes and guidance. Presence: Teleoperators and Virtual Environments. 1998;7(2):193-203
  25. 25. Meijer PBL. An experimental system for auditory image representations. IEEE Transactions on Biomedical Engineering. 1992;39(2):112-121
  26. 26. Capelle C, Trullemans C, Arno P, Veraart C. A real-time experimental prototype for enhancement of vision rehabilitation using auditory substitution. IEEE Transactions on Biomedical Engineering. 1998;45(10):1279-1293
  27. 27. Auvray M, Hanneton S, O’Regan JK. Learning to perceive with a visuo — Auditory substitution system: Localisation and object recognition with ‘the voice’. Perception. 2007;36(3):416-430
  28. 28. Liu Y, Stiles NR, Meister M. Augmented reality powers a cognitive assistant for the blind. Rieke F, Marder E, editors. eLife. 2018;7:e37841
  29. 29. Albouys-Perrois J, Laviole J, Briant C, Brock AM. Towards a multisensory augmented reality map for blind and low vision people: A participatory design approach. In: Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems. New York, NY, USA: Association for Computing Machinery; 2018. pp. 1-14. (CHI ‘18)
  30. 30. Dam A, Siddiqui A, Leclerq C, Jeon M. Extracting a definition and taxonomy for audio augmented reality (AAR) using grounded theory. Proceedings of the Human Factors and Ergonomics Society Annual Meeting. 2022;66(1):1220-1224
  31. 31. Cipresso P, Giglioli IAC, Raya MA, Riva G. The past, present, and future of virtual and augmented reality research: A network and cluster analysis of the literature. Frontiers in Psychology. 2018;9:2086
  32. 32. Schaeffer P. Treatise on Musical Objects: An Essay across Disciplines. Berkeley, California: University of California Press; 2019. Available from: https://www.degruyter.com/document/doi/10.1525/9780520967465/html
  33. 33. Hosokawa S. The Walkman effect. Popular Music. 1984;4:165-180
  34. 34. Bull M. Sound Moves: iPod Culture and Urban Experience. New York: Routledge; 2008. (International Library of Sociology)
  35. 35. Sterne J. MP3: The Meaning of a Format. Durham: Duke University Press; 2012. (Sign, storage, transmission)
  36. 36. Sterne J. The Audible Past: Cultural Origins of Sound Reproduction. Durham: Duke University Press; 2003
  37. 37. AirPods 4. Apple. 2025. Available from: https://www.apple.com/airpods-4/
  38. 38. AirPods Max. Apple. 2025. Available from: https://www.apple.com/airpods-max/
  39. 39. Apple Vision Pro. Apple. 2025. Available from: https://www.apple.com/apple-vision-pro/
  40. 40. Langer SK. Feeling and Form; A Theory of Art. 1st ed. First printing ed. New York: Scribner; 1953. Available from: https://bac-lac.on.worldcat.org/oclc/301358920
  41. 41. Schopenhauer A. The World as Will and Representation. New York: Dover Publications; 1966. (Dover books of general interest)
  42. 42. Tsushima Y, Nakayama K, Okuya T, Koiwa H, Ando H, Watanabe Y. Brain activities in the auditory area and insula represent stimuli evoking emotional response. Scientific Reports. 2024;14(1):21335
  43. 43. Barthes R. Image, Music, Text. London: Fontana; 1977
  44. 44. Plato JB. The Republic: The Complete and Unabridged Jowett Translation. Vintage classics ed. New York: Vintage Books; 1991. (Vintage classics). Available from: http://catdir.loc.gov/catdir/enhancements/fy0703/90055688-d.html
  45. 45. Heath A. Meta’s Big Tease. The Verge; 2024. Available from: https://www.theverge.com/24253908/meta-orion-ar-glasses-demo-mark-zuckerberg-interview
  46. 46. Bair T. Augmented Reality is Already Here. It’s Called Music. Medium; 2015. Available from: https://medium.com/@TorBair/augmented-reality-is-already-here-it-s-called-music-fe60113bcdfc
  47. 47. Meilus L. Spotify’s Musical Map - The Most Popular Songs in 1,000 Cities - Thrillist. 2015. Available from: https://www.thrillist.com/travel/nation/spotify-s-musical-map-the-most-popular-songs-in-1-000-cities
  48. 48. Spotify Running. Available from: https://www.spotify.com/us/running/
  49. 49. Aurora. 2016. Available from: https://www.auroraapp.com/
  50. 50. Sonic Bloom. Sonic Bloom. 2023. Available from: https://www.koreographer.com
  51. 51. Pokémon GO. Available from: https://pokemongo.com/en
  52. 52. Georgia Tech Sonification Lab -- home. Available from: http://sonify.psych.gatech.edu/
  53. 53. Schafer RM. The Tuning of the World. New York City: Knopf; 1977
  54. 54. Oliveros P. Deep Listening: A Composer’s Sound Practice. New York, NY: iUniverse; 2005
  55. 55. Zhivomirov H. A method for colored noise generation. Romanian Journal of Acoustics and Vibration. 2018;15(1):14-19
  56. 56. Kivy P. The fine art of repetition. In: The Fine Art of Repetition: Essays in the Philosophy of Music. Cambridge [England]: Cambridge University Press; 1993. pp. 327-359
  57. 57. Makomaska S. “Acoustic wallpaper” under control – The case of musique d’ameublement and Muzak. Interdisciplinary Studies in Musicology. 2021;21:39-55
  58. 58. Eno B. Ambient: Music for Airports. New York: Polydor EG Records; 1978
  59. 59. Chua P, Makris D, Herremans D, Roig G, Agres K. Predicting emotion from music videos: Exploring the relative contribution of visual and auditory information to affective responses. arXiv. 2022. Available from: http://arxiv.org/abs/2202.10453
  60. 60. Bradley M, Lang P. Affective reactions to acoustic stimuli. Psychophysiology. 2000;37:204-215
  61. 61. Valenti VE, Guida HL, Frizzo ACF, Cardoso ACV, Vanderlei LCM, de Abreu LC. Auditory stimulation and cardiac autonomic regulation. Clinics (São Paulo, Brazil). 2012;67(8):955-958
  62. 62. Joyner MJ, Charkoudian N, Wallin BG. The sympathetic nervous system and blood pressure in humans: Individualized patterns of regulation and their implications. Hypertension. 2010;56(1):10-16
  63. 63. Erfanian M, Mitchell AJ, Kang J, Aletta F. The psychophysiological implications of soundscape: A systematic review of empirical literature and a research agenda. International Journal of Environmental Research and Public Health. 2019;16(19):3533
  64. 64. Michels N, Hamers P. Nature sounds for stress recovery and healthy eating: A lab experiment differentiating water and bird sound. Environment and Behavior. 2023;55(3):175-205
  65. 65. Jo H, Song C, Ikei H, Enomoto S, Kobayashi H, Miyazaki Y. Physiological and psychological effects of forest and urban sounds using high-resolution sound sources. International Journal of Environmental Research and Public Health. 2019;16(15):2649
  66. 66. Patel AD. Music, Language, and the Brain. Oxford: Oxford University Press; 2008
  67. 67. Horowitz SS. Universal Sense: How Hearing Shapes the Mind. Paperback ed. New York: Bloomsbury; 2013
  68. 68. Schafer RM. The Soundscape: Our Sonic Environment and the Tuning of the World. Rochester, VT: Destiny Books; 1994
  69. 69. Heidegger M. Being and Time. Oxford: Blackwell; 2006
  70. 70. Ahonen H, Deek P, Kroeker J. Low frequency sound treatment promoting physical and emotional relaxation qualitative study. International Journal of Psychosocial Rehabilitation. 2012;17(1):45-58
  71. 71. Cockerham D, Lin L, Chang Z, Schellen M. Cross-sectional studies investigating the impacts of background sounds on cognitive task performance. In: Parsons TD, Lin L, Cockerham D, editors. Mind, Brain and Technology: Learning in the Age of Emerging Technologies. Cham: Springer International Publishing; 2019. pp. 177-194. (Educational Communications and Technology: Issues and Innovations). DOI: 10.1007/978-3-030-02631-8_10
  72. 72. Lesiuk T. The effect of music listening on work performance. Psychology of Music. 2005;33(2):173-191
  73. 73. Shih YN, Huang RH, Chiang HY. Background music: Effects on attention performance. Work. 2012;42(4):573-578
  74. 74. Zhu Y, Huang N, Weng Y, Tong H, Wang X, Chen J, et al. Does soundscape perception affect health benefits, as mediated by restorative perception? Forests. 2023;14(9):1798
  75. 75. Dolegui AS. The impact of listening to music on cognitive performance. Inquiries Journal. 2013;5(09). Available from: http://www.inquiriesjournal.com/articles/1657/the-impact-of-listening-to-music-on-cognitive-performance
  76. 76. Kiss L, Linnell KJ. The effect of preferred background music on task-focus in sustained attention. Psychological Research: An International Journal of Perception, Attention, Memory, and Action. 2021;85(6):2313-2325
  77. 77. Kumar N, Wajidi MA, Chian YT, Vishroothi S, Ravindra S, Aithal A. The effect of listening to music on concentration and academic performance of the student: Cross-sectional study on medical undergraduate students. Research Journal of Pharmaceutical, Biological and Chemical Sciences. 2016;7:1190-1195
  78. 78. Morgan E. Music: A weapon against anxiety. Music Educators Journal. 1975;61(5):38-91
  79. 79. Muslimah M, Apriani W. The effect of listening to music on concentration and academic performance of the students: Cross-Selectional on English education college students. Journal of English Teaching, Applied Linguistics and Literatures (JETALL). 2020;3(1):27-32. Available from: https://ppjp.ulm.ac.id/journal/index.php/jetall/article/view/7779
  80. 80. Peretti PO, Swenson K. Effects of music on anxiety as determined by physiological skin responses. Journal of Research in Music Education. 1974;22(4):278-283
  81. 81. Frishkopf M. Autonomously adaptive soundscapes for stress reduction in the intensive care unit and beyond. AIP Conference Proceedings. 2023;2909(1):110001
  82. 82. Shu S, Ma H. Restorative effects of classroom soundscapes on children’s cognitive performance. International Journal of Environmental Research and Public Health. 2019;16(2):293
  83. 83. Iyendo TO. Exploring the effect of sound and music on health in hospital settings: A narrative review. International Journal of Nursing Studies. 2016;63:82-100
  84. 84. Boggs LJ, Fisher D, Flint GA. Technical note: The “pink” noise generator— An apparatus for inducing relaxation. Behavior Therapy. 1973;4(2):267-269
  85. 85. Singh D, Jain A, Jain D, Goyel V. Effect of white, brown and pink noises on anxious pediatric dental patients. Cardiometry. 2022;25:1252-1258
  86. 86. Buxton RT, Pearson AL, Allou C, Fristrup K, Wittemyer G. A synthesis of health benefits of natural sounds and their distribution in national parks. PNAS. 2021;118(14):1-6. Available from: https://www.pnas.org/content/118/14/e2013097118
  87. 87. Cutshall SM, Anderson PG, Prinsen SK, Wentworth LJ, Olney TL, Messner PK, et al. Effect of the combination of music and nature sounds on pain and anxiety in cardiac surgical patients: A randomized study. Alternative Therapies in Health & Medicine. 2011;17(4):16-23
  88. 88. Davis C, Nussbaum GF. Ambient nature sounds in health care. Perioperative Nursing Clinics. 2008;3(1):91-94
  89. 89. Febriandirza A. The effect of natural sounds and music on driving performance and physiological. Engineering Letters. 2017;25:455-463
  90. 90. Gatti MFZ, da Silva MJP. Ambient music in the emergency services: The professionals’ perception. Rev Latino-Am Enfermagem. 2007;15:377-383
  91. 91. Medvedev O, Shepherd D, Hautus MJ. The restorative potential of soundscapes: A physiological investigation. Applied Acoustics. 2015;96:20-26
  92. 92. Nishida K, Oyama-Higa M. The influence of listening to nature sounds on mental health. In: Pham TD, Ichikawa K, Oyama-Higa M, Coomans D, Jiang X, editors. Biomedical Informatics and Technology. Berlin, Heidelberg: Springer; 2014. pp. 319-323. (Communications in Computer and Information Science)
  93. 93. Meyer LB. Emotion and Meaning in Music. Chicago: University of Chicago Press; 1956
  94. 94. Gould van Praag CD, Garfinkel SN, Sparasci O, Mees A, Philippides AO, Ware M, et al. Mind-wandering and alterations to default mode network connectivity when listening to naturalistic versus artificial sounds. Scientific Reports. 2017;7(1):45273
  95. 95. Ulrich RS, Simons RF, Losito BD, Fiorito E, Miles MA, Zelson M. Stress recovery during exposure to natural and urban environments. Journal of Environmental Psychology. 1991;11(3):201-230
  96. 96. Kaplan R, Kaplan S. The Experience of Nature: A Psychological Perspective. Cambridge: Cambridge University Press; 1989
  97. 97. Westerkamp H. Soundwalking. In: Carlyle A, editor. Autumn Leaves: Sound and the Environment in Artistic Practice. Paris, France: Association Double-Entendre in Association with CRISAP; 2007
  98. 98. Cardiff J, Schaub M. Janet Cardiff: The Walk Book. 1. Aufl. Köln: König, Walther; 2005
  99. 99. Kubisch C. Christina Kubisch: Inaudible, Invisible: 1974 ->2023. Bourogne: Espace Multimédia Gantner; 2023. Available from: http://catalogue.bnf.fr/ark:/12148/cb47341119q
  100. 100. Smallwood S. Scott Smallwood Homepage. 2025. Available from: http://scott-smallwood.com/works.html
  101. 101. Smallwood S. Coronium 3500: A solarsonic installation for Caramoor. In: NIME: Proceedings of the International Conference on New Interfaces for Musical Expression. 2016. Available from: https://zenodo.org/record/1176127
  102. 102. Coronium 3500 (Lucie’s Halo). 2016. Available from: https://vimeo.com/196516663
  103. 103. Caramoor | Katonah, NY | In The Garden of Sonic Delights. Caramoor. 2014. Available from: https://caramoor.org/in-the-garden-of-sonic-delights/
  104. 104. Emergences. Instagram. 2024. Available from: https://www.instagram.com/p/C_yCzLTJ0ku/
  105. 105. Schafer RM. Voices of Tyranny: Temples of Silence. Ontario, Canada: Arcana Editions; 1993
  106. 106. Frishkopf M. Sounding the Garden – Canadian Centre for Ethnomusicology. 2019. Available from: http://bit.ly/soundingthegarden
  107. 107. The Canadian Centre for Ethnomusicology. Available from: https://www.artsrn.ualberta.ca/ccewiki/index.php/The_Canadian_Centre_for_Ethnomusicology_(CCE)
  108. 108. Aga Khan Trust for Culture. Aga Khan Development Network. Available from: https://the.akdn/en/how-we-work/our-agencies/aga-khan-trust-culture
  109. 109. Anonymous. New Sounds from the Arab Lands - I’m a Bird from Heaven’s Garden, 7 Min. 2014. Available from: https://vimeopro.com/akmp/new-sounds-from-arab-lands/video/96997112
  110. 110. Aga Khan Garden, Alberta | University of Alberta Botanic Garden. Available from: https://www.ualberta.ca/en/botanic-garden/whats-on/gardens/aga-khan-alberta.html
  111. 111. Aga Khan Garden Web App. Available from: https://akg.ok.ubc.ca/
  112. 112. Music of Central Asia Series (Co-produced by Smithsonian Folkways and Aga Khan Music Programme). Smithsonian Folkways Recordings. Available from: https://folkways.si.edu/music-of-central-asia-series
  113. 113. Rueb E. To Reduce Hospital Noise, Researchers Create Alarms that Whistle and Sing. Vol. 3. New York Times; 2019. Available from: https://www.nytimes.com/2019/07/09/science/alarm-fatigue-hospitals.html
  114. 114. Canadian Institute for Health Information. Care in Canadian ICUs. Ottawa, Ontario: Canadian Institute for Health Information; 2016
  115. 115. Devlin JW, Skrobik Y, Gélinas C, Needham DM, Slooter AJC, Pandharipande PP, et al. Executive summary: Clinical practice guidelines for the prevention and management of pain, agitation/sedation, delirium, immobility, and sleep disruption in adult patients in the ICU. Critical Care Medicine. 2018;46(9):1532-1548
  116. 116. Cuesta JM, Singer M. The stress response and critical illness: A review. Critical Care Medicine. 2012;40(12):3283-3289
  117. 117. Chlan L, Tracy MF. Music therapy in critical care: Indications and guidelines for intervention. Critical Care Nurse. 1999;19(3):35-41
  118. 118. Papathanassoglou E, Pant U, Meghani S, Saleem Punjani N, Wang Y, Brulotte T, et al. A systematic review of the comparative effects of sound and music interventions for intensive care unit patients’ outcomes. Australian Critical Care. 2025;38(3):101148
  119. 119. Sutton RS, Barto AG. Reinforcement Learning: An Introduction. 2018. Available from: http://incompleteideas.net/book/the-book-2nd.html
  120. 120. Frishkopf M, Papathanasoglou E, Steenstrup M. Autonomously Adaptive Soundscapes for Reducing Stress in Critically-Ill Patients. Toronto Metropolitan University; 2021. Available from: https://www.torontomu.ca/canadian-srs/program/
  121. 121. Chang ET, Lai HL, Chen PW, Hsieh YM, Lee LH. The effects of music on the sleep quality of adults with chronic insomnia using evidence from polysomnographic and self-reported analysis: A randomized control trial. International Journal of Nursing Studies. 2012;49(8):921-930
  122. 122. Chi GCHL, Young A. Selection of music for inducing relaxation and alleviating pain: Literature review. Holistic Nursing Practice. 2011;25(3):127-135
  123. 123. Centre for Advanced Medical Simulation. NAIT.ca. Available from: https://www.nait.ca/centre-for-advanced-medical-simulation
  124. 124. Frishkopf M. Mindful Social Listening: Intelligent Immersive Soundscape Environments for Student Wellbeing. Berlin: IAMM-ISfAM: The Future of Music and Arts in Medicine and Health; 2024. Available from: https://iammonline.com/berlin2024/program/
  125. 125. Zhong J, Jing Y. Researchers created sound that can bend itself through space, reaching only your ear in a crowd. The Conversation. 2025. Available from: http://theconversation.com/researchers-created-sound-that-can-bend-itself-through-space-reaching-only-your-ear-in-a-crowd-252266
  126. 126. Javed K. PersonalScapes. 2024. Available from: https://soundscape-4937f.web.app/index.html
  127. 127. Music and Audio Research Laboratory | NYU Steinhardt. Available from: https://steinhardt.nyu.edu/marl
  128. 128. Stacey Bliss, PhD. Stacey Bliss, PhD. Available from: https://blissresearch.org/

Notes

  • As Aristotle enumerated and affirmed in his De Anima, Book 3, Part 1, “one may be satisfied that there are no senses apart from the five (I mean vision, hearing, smell, taste, and touch)…” [1, 2]. Modern perceptual science recognizes at least seven senses, including proprioception and the vestibular sense [3, 4].
  • Others have used the same initialism for the roughly synonymous terms audio augmented reality [5, 6, 7] or augmented aural reality [8].
  • The acousmatic, a concept developed by Pierre Schaeffer, founder of musique concrète, is that which is heard without the cause being seen [32].
  • It is possible, if expensive and usually impractical, to project omnidirectionally across walls and ceilings of special-purpose spaces, such as a planetarium, or the projective VAR popular at location-based entertainment (LBE) venues such as Meow Wolf (USA) and TeamLab (Japan), thereby obviating special eyewear.
  • Excellent earbuds are available for around $100 USD, headphones for around $300, while a good HMD costs over $1000. Within Apple’s pricier product line, the AirPods 4 earbuds retail for $129, AirPods Max headphones for $549, and Apple Vision Pro HMD for $3499 [37, 38, 39].
  • Meta’s new augmented reality system, Orion, is deemed “too complicated and expensive to manufacture right now” [45].
  • This attribute—required because all denotative reference tautologically entails distraction, through the cognitive shift from sonic signifier to signified—is relative to the listener. In particular, any verbal content must be incomprehensible, either due to dense polyphony (the buzz of a crowded cafe) or linguistic unfamiliarity (a monolingual English speaker hearing Mandarin, though even the uncomprehending perception of meaningfulness may be distracting).
  • Or perhaps also the unpredictability of most music, requiring intensive perceptual processing, triggering an affective response by undercutting expectations, as hypothesized by Leonard Meyer [93].
  • Descriptions of Scott Smallwood’s projects are available in greater detail online [100].
  • Actually 8.2, with all speakers securely mounted on walls and floor. Soundscapes are octophonic; subwoofer LFE channels are derived automatically via low-pass filtering. We have also developed a simpler, more portable stereo version that can be set up anywhere, in use for experimentation at the University of Calgary.
  • Ed. note: Sony has a demo in their Sony Park showroom in the Ginza that samples the heartbeats of about a dozen listeners, then composes music and icons that express that pulse, displayed in a shared space: https://www.sonypark.com/e/activity/003/

Written By

Michael Frishkopf and Scott Smallwood

Submitted: 23 March 2025 Reviewed: 30 June 2025 Published: 25 August 2025