top of page
Search

1. Masters Project Introduction

Updated: Jun 23, 2022

Hi, and welcome to my blog!


This following series of blog posts in the ‘Masters Project Planning’ category will provide an overview of the final project that I am working on for my MSc in Audio and Music Technology at the University of York.


Project Introduction

The project is titled ‘Spectral and Temporal Manipulations of ORTF Recording for Personalised Binaural Experiences’. The project was proposed by Sony Interactive Entertainment, who are overseeing this project alongside the Electronic Engineering department at the University of York.


ORTF Recording

ORTF is a popular stereo (2-microphone) recording technique that is used extensively in studio recording, broadcasting, and live concert recording. The two microphones used in ORTF recording are spaced 17cm apart, and angled 110° from each other.

ree
ORTF Microphones Configuration

This configuration provides a close approximation of how humans naturally hear, as the spacing of the microphones are similar to the spacing of the two ears either side of the head.


As a result, the ORTF Recording Technique provides a wide and spacious sound, that is often described as sounding ‘natural’ and ‘realistic’ (Francis Rumsey and Tim McCormick, 2014).


Binaural Audio

‘Binaural’ is the technical term that describes how we naturally hear as humans. Like eyesight, our two ears allow us to have a greater perception of space and distance, with the additional benefit that our sense of hearing is completely three-dimensional. This means that we can perceive sounds from in-front, behind, above, or below us with a high level of accuracy (Howard and Angus, 2009).


This ability is possible thanks to three main ‘Binaural Cues’, which are simply just small differences in the characteristics of a sound arriving at the left and right ear. The theory behind Binaural Hearing will be covered in more detail in an upcoming post, however, I will quickly provide a brief overview of these three cues below.


Binaural Cues

The Interaural Time-Difference (ITD), Interaural Level Difference (ILD), and Spectral Cues are the three main ‘Binaural Cues’ (Howard and Angus, 2009; van Opstal, 2016).


The Interaural Time-Difference is the difference in time between a sound arriving at the left and right ear. This is caused by the spacing between the two ears.


The Interaural Level Difference is the difference in level (amplitude/intensity) of the sound at the left and right ear. This is created by the absorption of the head, which blocks some of the sound from reaching the ear furthest away from the sound source.

ree
Binaural ITD and ILD Cues (Narbutt, M. et al., 2020)

Spectral Cues are spectral filtering of the sound source caused by the shape of the ears, and the reflections made by the ears, head, and torso. Spectral cues change the way the sources sound in each ear.


The strength of each cue will change depending on the direction of the source.

ree


Binaural Synthesis in Interactive Media

Over the past decade, the term Binaural has appeared more in technology and media. This is because media platforms such as music, TV, and gaming are utilising binaural audio to offer a more immersive listening experience.


This is known as ‘Binaural Synthesis’, as it involves a very specific processing technique that makes recorded sounds appear as though they are coming from a specific point in space. This is performed by applying the three main binaural cues to the sound source, and is most effective when the listener is wearing headphones.



Thanks to the increase in computing power over the past couple of decades, Binaural Synthesis can be processed in an interactive environment in real-time (Roginska and Geluso, 2018; Suh and Prophet, 2018). This is particularly useful in games and VR/AR, as the 3D processing of sound can respond to the movement of the listener inside the virtual environment, as well as the sound emitters around them (Armstrong et al., 2018; Roginska and Geluso, 2018).


How does Binaural Synthesis work?

HRTF’s (Head-Related Transfer Functions) are sets of filters that apply the binaural cues to a sound source (Armstrong et al., 2018; Roginska and Geluso, 2018). Essentially, this means that HRTF's simulate 3D human hearing when you are listening to audio through headphones. Traditionally HRTF's are created by placing small microphones within a person’s ears and recording small bursts of noise from different directions – also known as obtaining an impulse response.


ree

There are several tools available that can produce 3D sound within Interactive media by applying HRTF’s to different sound sources in real-time. These include Steam Audio (Steam Audio, 2021), and Google’s Resonance Audio, that uses HRTF’s gathered from the University of York (SADIE | Spatial Audio For Domestic Interactive Entertainment, 2017; Resonance Audio -, 2018; Armstrong et al., 2018).


Opportunity (What problem this project aims to address)

The major limitation of Binaural Synthesis is that the characteristic of each binaural cue will change from person to person, since we all have unique physiology (head sizes, ear shapes). The issue with non-individualised HRTF’s is that people may not be able to accurately determine where the sound is located or may not even perceive the sound as coming from an external source (i.e, may feel as though the sound is located within their head) (Armstrong et al., 2018).


A further issue with non-individualised HRTF’s, is that the spectral filtering caused by the shape of the ear will probably cause undesirable filtering characteristics (van Opstal, 2016). This means that the quality of the binauralised sound will be impacted if the HRTF’s are not well matched to the listener.


The optimum solution to this problem would be to create individual sets of HRTF’s for everyone, however, this is not practical for several reasons. For instance, obtaining HRTF’s requires specialist facilities and equipment, is extremely time-consuming (>1hour to gather), and usually requires people to stay completely still the entire time, which is both uncomfortable and impractical (Guezenoc and Seguier, 2018).



ree
Measuring HRTF's (Armstrong, C. et al, 2018)

Major companies including Sony, Dolby, and Steinberg, are exploring alternative ways of generating individualised HRTF’s, however, none of these methods currently offer flawless results (Guezenoc and Seguier, 2018).


Therefore, there is an opportunity in the current climate to develop an alternative approach to binaural synthesis that will prioritise the quality of the source material, yet still offer a personalised experience that enhances the immersion and localisation of sound sources in interactive media.


In cases where sound sources are heard for extensive periods of time, or heavily attached to the emotional narrative of the media such as dialogue, music, and ambience, it becomes especially important that the quality of the original recording is preserved. Otherwise, listeners will quickly become frustrated with the low-quality audio caused by poorly-matched HRTF's.


Solution (What will this project do to address the problem?)

The ORTF Recording Technique produces interaural time and level differences that are similar to those encountered in Binaural hearing (Francis Rumsey and Tim McCormick, 2014). Therefore, the ORTF recording technique could be used to create a virtual renderer for 3D audio, that provides ITD and ILD cues without greatly affecting the spectral quality of the sound.


This means that instead of spatialising sound with HRTF's, a virtual ORTF microphone configuration would be used to apply Time and Level differences to the sound, creating a 3D audio experience.


This project aims to further investigate the correlation between the ORTF recording method, and Binaural Hearing, to assess whether further processing to the ORTF Technique could produce an individualised binaural experience. The project will also look at manipulating the ITD and ILD cues within ORTF Recording, to match those estimated for individuals (based on simple head geometry data).


Furthermore, the project will investigate whether ITD and ILD manipulation is enough to produce effective localisation, or whether some level of spectral manipulation is also necessary to produce effective localisation.


Project Aim

This project will aim to develop a virtual renderer for 3D sound, that models the ORTF Recording Technique for a personalised Binaural listening experience. The renderer will be available for use in game audio engines such as Unity and Wwise.


Project Objectives

  • The theory behind ORTF Recording and Binaural Audio will be explored and documented in a series of following blog posts, and in a literature review.

  • A virtual ORTF renderer for interactive media will be developed.

  • The performance of the ORTF renderer will be evaluated using perceptual listening tests, which will involve participants estimating where sound sources are coming from.


Reflective Summary - Driscoll's What Model (Driscoll, 2007)

ree

What?

This blog aimed to introduce and provide a contextual background to the project that I will be working on for my Masters Thesis in Audio and Music Technology. The project is titled ‘Spectral and Temporal Manipulations of ORTF Recording for personalised Binaural Experiences’, and is in collaboration with Sony Interactive Entertainment.


The project aims to develop a virtual ORTF renderer for 3D audio that can be used in game audio as an alterative to traditional Binaural Audio.


So What?

In this blog I have discussed how binaural audio is used in games to provide realistic 3D audio experiences over headphones. However, for many people binaural audio is not an accessible audio format because generic HRTF's do not match their personal ear shapes. This is a problem that I have personally experienced when playing games, and in severe situations it can cause poor audio quality, and poor externalisation of sound source.


Initially, I had not considered alternative solutions to this problem other than traditional audio formats such as stereo and surround sound. However, after doing some preliminary reading around ORTF recording, I am excited to see what results can be obtained from the virtual ORTF renderer once I begin development.


What next?

The following blog posts will dive more into the theory and applications surrounding ORTF and Binaural Audio. I also aim to explore more examples of 3D audio in Interactive Media, and investigate what the current industry approach is for implementing 3D audio in video games.


References

Armstrong, C. et al. (2018) ‘A Perceptual Evaluation of Individual and Non-Individual HRTFs: A Case Study of the SADIE II Database’, Applied Sciences, 8(11), p. 2029. doi:10.3390/app8112029.


Driscoll, J. (ed.) (2007) Practicing Clinical Supervision: A Reflective Approach for Healthcare Professionals. Edinburgh: Elsevier.


Francis Rumsey and Tim McCormick (2014) Sound and Recording : Applications and Theory. Burlington MA: Routledge. Available at: https://search.ebscohost.com/login.aspx?direct=true&db=nlebk&AN=707650&site=ehost-live (Accessed: 4 May 2022).


Guezenoc, C. and Seguier, R. (2018) ‘HRTF Individualization: A Survey’, in. Audio Engineering Society Convention 145, Audio Engineering Society. Available at: https://www.aes.org/e-lib/browse.cfm?elib=19855 (Accessed: 12 May 2022).


Howard, D.M. and Angus, J.A.S. (2009) Acoustics and psychoacoustics. 4th ed. Amsterdam Heidelberg: Elsevier.


Narbutt, M. et al. (2020) ‘AMBIQUAL: Towards a Quality Metric for Headphone Rendered Compressed Ambisonic Spatial Audio’, Applied Sciences, 10(9), p. 3188. doi:10.3390/app10093188.

van Opstal, J. (2016) ‘Acoustic Localization Cues’, in The Auditory System and Human Sound-Localization Behavior. Elsevier, pp. 171–208. doi:10.1016/B978-0-12-801529-2.00007-6.


Resonance Audio - (2018). Available at: https://resonance-audio.github.io/resonance-audio/ (Accessed: 18 May 2022).


Roginska, A. and Geluso, P. (eds) (2018) Immersive sound: the art and science of binaural and multi-channel audio. New York ; London: Routledge, Taylor & Francis Group.


SADIE | Spatial Audio For Domestic Interactive Entertainment (2017). Available at: https://www.york.ac.uk/sadie-project/ (Accessed: 18 May 2022).


Steam Audio (2021). Available at: https://valvesoftware.github.io/steam-audio/ (Accessed: 18 May 2022).


Suh, A. and Prophet, J. (2018) ‘The state of immersive technology research: A literature analysis’, Computers in Human Behavior, 86, pp. 77–90. doi:10.1016/j.chb.2018.04.019.


Comments


bottom of page