6. Coordinate Systems in Object-Based Audio

Issac Thomas
May 19, 2022
4 min read

Updated: Jun 23, 2022

Introduction

Following on from the previous post on Object-Based Audio, this next few blog posts will explore some of the technical considerations when it comes to mixing in Object-Based Audio. Covering topics such as a coordinate systems, headtracking, conditional mixing and High-Dynamic Range, this blog will explore some of the approaches that make 3D Audio sound as good as possible.

This first post will explore some of the coordinate systems used in Object-Based Audio, and discuss how and when to use them.

Coordinate Systems

Coordinate systems are used to provide the positional data of sound sources in an object-based audio environment. When it comes to object-based audio there are two main coordinate systems: Spherical Coordinates; and Cartesian Coordinates. However, before discussing the coordinate systems themselves, it is important to establish why different coordinate systems are useful. Generally, each coordinate system is better suited to two different types of ‘frame of reference’ (Roginska and Geluso, 2018).

The two fundamental types of ‘frame of reference’ are ‘Egocentric’ and ‘Allocentric’.

Egocentric Frame of Reference

An Egocentric Frame of Reference describes the position of sounds in relation to the listener. In a first-person game for instance, an egocentric frame of reference would be used to describe the position of the objects and characters around you. This frame of reference is useful for describing the perception of the listener.

As Ceri Thomas demonstrates in this video, egocentric frames are a key aspect of music mixing in Dolby Atmos.

Allocentric Frame of Reference

An Allocentric Frame of reference describes the positions of sounds in relation to a reference location. For instance, in game engines such as Unity, the position of game objects are provided in relation to the origin. This is frame of reference is better for describing a scene where there is no single listener position.

Traditionally, channel based audio mixing was done using an Allocentric frame of reference. This is because sound is positioned between the speakers, instead of at defined positions relative to the listener. Thinking back to the post on Phantom Source Shift, the position of a Phantom Image is a form of allocentric frame of reference, because the position relates to other objects (the loudspeakers), and not the listener.

Mixed Frame of Reference

In many situations it could be beneficial to use a combination of the two. For instance, when developing a first-person game, it is useful to map sound emitters to certain points in the room using an allocentric frame of reference. However, when it comes to navigating through the game as the player, it is beneficial to describe the perception of the sound as the player moves around them using an egocentric frame of reference (Roginska and Geluso, 2018).

Furthermore, as demonstrated in the Phantom Source Shift blog, the source shift value (%) can be converted into degrees if the speaker configuration is known. This is a form of allocentric to egocentric conversion.

This brings us back to coordinate systems!

Spherical Coordinates

First up, we have Spherical Coordinates. This system uses the distance (radius 'r'), the vertical angle (polar angle, 'θ'), and the horizontal angle (azimuth angle, 'φ'), to describe the position of an object. You will see this written in this form (r, θ, φ).

Spherical Coordinates are great for Egocentric Frames, as the listener is always placed in the centre of the coordinate system.

Confusingly, the horizontal plane (azimuth angle) increases anticlockwise (Noth = 0°, East = 270°, South = 180°, West = 90°), and the vertical angle (polar angle) is also measured from the z axis, as shown below (Mattes et al., 2011).

For an object that is 5 meters away, 60° above the listener, and 55° to the left of the listener, the coordinates would be written as:

(5, 30, 55)

Cartesian Coordinates

The cartesian coordinate system uses (x,y,z) coordinates to position an object relative to a common origin. Unlike Spherical coordinates, all the values are written as a distance.

The origin line is usually (0,0,0), and the x axis describes the front-back distance, the y axis describes left-right distance, and the z axis describes the height.

For an object that is positioned 20 meters in front of the origin, 10 left, and 5 meters high, the cartesian coordinates would be (20, -10, 5).

Cartesian Coordinates are often used in allocentric environments because the origin is usually a fixed point.

Cartesian Coordinate System. By Jorge Stolfi - Own work, Public Domain, https://commons.wikimedia.org/w/index.php?curid=6692547

Converting between Coordinate Systems

It is possible to convert between Spherical and Cartesian coordinates using trigonometric functions. Instead of including them here, I have provided a link to a page that presents each function, as well as has a calculator feature that calculates the conversion for you.

Conclusion

This blog has looked at different coordinate systems for object based audio, as well as discussed the two different 'frames of reference' utilised in interactive media.

These systems are important for this project, as both coordinate systems and frames of reference are important for describing the behaviour of sound object in interactive media. Additionally, calculating the phantom source shift requires an understanding of allocentric to egocentric conversion using its own unique coordinate conversion system.

References

Mattes, S. et al. (2011) ‘Towards a perceptual model for 3D Sound Localisation’, Proceedings of the Institute of Acoustics, 33, p. 13.

Roginska, A. and Geluso, P. (eds) (2018) Immersive sound: the art and science of binaural and multi-channel audio. New York ; London: Routledge, Taylor & Francis Group.

ISSAC THOMAS SOUND

6. Coordinate Systems in Object-Based Audio

Recent Posts

Comments