Sounds Of Virtual Reality Technology
Virtual reality, otherwise known as a virtual environment, has users immersed into an advancing technology of realism where digital audio plays a crucial part in the persuasive experience. Aside from the graphics of realism being displayed visually to the users, audio cues affect our sense of spatial awareness and presence in games or even in an actual, physical space. The developers behind the Oculus Rift headset stated in their development documents: “We hope to establish that audio is crucial for creating a persuasive VR experience. Because of the key role that audio cues play in our cognitive perception of existing in space, any effort that development teams devote to getting it right will pay off in spades, as it will contribute powerfully to the user’s sense of immersion.” To achieve an immersive experience, the spatial sound rendering software must take into account the pinpointed location of the sound source within a three-dimensional space (lateral localization and vertical localization) and user head tracking and position. Virtual Reality is best described by Virtual Reality Society as: “the term used to describe a three-dimensional, computer generated environment which can be explored and interacted with by a person. That person becomes part of this virtual world or is immersed within this environment and whilst there, is able to manipulate objects or perform a series of actions.”
The range of technology whose purpose is to imitate such immersion are headsets, omnidirectional treadmills, and special haptic gloves, which all stimulate the sense to create the illusion of reality. Applications for virtual reality have a wide variety such as: architecture, medicine, entertainment, sport, or arts. Before virtual reality audio technology was brought closer to consumers through headphones, audio has always had a crucial part in computer and video gaming experiences starting from arcades filled with sounds of digital explosions. Audio history has steadily improved the state of computer audio: “….from simple wave generators (SID, 1983) to FM synthesis (AdLib, 1987), evolving on to 8-bit mono samples (Amiga OCS, 1985; SoundBlaster, 1989) and 16-bit stereo samples (SoundBlaster Pro), culminating in today’s 5.1 surround sound systems on modern gaming consoles (Xbox, 2001).” (Oculus Rift) However, with the ability to track the user’s head orientation and position has improved the advancement of audio technology. In three dimensions, humans localize sounds by relying on psychoacoustics and inference factoring in for timing, level, phase, and spectral modifications. The ability to pinpoint a sound source within three dimensional space is known as sound localization.
With that, the two primary key components to sound localization is direction and distance. The easiest type to localize is the lateral localization of sound. It is simply when a sound source is closer or father to one ear compared to another, causing the sound to be heard faster or slower, louder or quieter. Two cues that are used to localize a sound source is the interaural time difference (ITD) and the interaural level difference (ILD). Interaural time difference refers to the time delay difference at which a sound can be heard in one ear compared to the other. Interaural level difference is the loudness of a sound source that can be heard in one ear compared to the other. A factor that would affect the sound source being louder in one ear compared to another, aside from being relatively closer, is due to the “shadowing effect” of the human head. For example, a “shadowing effect” caused by the human head would partially block propagations of the sound to the opposite ear of the head. This mainly affects high frequencies than for low because the low frequency sound waves have a larger wavelength than an average dimension for human heads.
As for different frequencies, below 800 Hz, it is hard to distinguish based on level differences but the time delay from one ear to other is half wavelengths greater than the dimensions of an average human head. This makes it possible for the human brain to differentiate the phase differences for low frequencies. As for frequencies above 1500 Hz, which is half wavelengths smaller than an average human had, phase is unreliable, and so we must rely on level differences from the “shadowing effect” on the human head. Frequency range from 800 Hz to 1500 Hz is the transitional zone at which a combination of the interaural time difference and interaural level difference are used for the lateral localization. Compared to the lateral localization, the front, back, and elevation localization is more challenging. The interaural time difference and interaural level difference is difficult to integrate due to sounds that may be located in front, behind, below, or above the listener. This causes ambiguity, making it hard to distinguish from each other if located at same distance from the listener due to sound source have the exact same time difference and level difference being heard by both ears.
To resolve the ambiguity of the sound source, humans would rely on spectral modifications of sound caused by the head and body. Vision spectral modifications as filters and reflections on the human body that would vary from person to person depending on their shape and size of their body; such as, the torso, shoulders, neck, head, and outer ear. For example, a sound source that is located below the listener would be “shadowing” by the body before it reaches the listener’s ears. Vise versa if the sound source was above the listener, it would be refracted by the human body thus reaching the listener’s ears. This is known as direction-selective filtering. Direction-selective filtering is basically stating that a human body causing “shadowing” or “refraction” of sound would affect the measured sound levels. Furthermore, the direction-selection filter of the human body can be encoded as a head-related transfer function (HRTF), which is the major cornerstone for most modern three dimensional spatialization techniques.