Polynomial Curve Fitting-based Early Room Reflection Analysis using B-Format Room Impulse Response Measurements for Ambient Sound Reproduction

doi:10.23940/ijpe.21.03.p6.307313

Polynomial Curve Fitting-based Early Room Reflection Analysis using B-Format Room Impulse Response Measurements for Ambient Sound Reproduction

Vuppala Swathi¹, Sandeep Chitreddy^,¹^,^*

1 Koneru Lakshmaiah Education Foundation, Hyderabad, 500075, India

^*Corresponding Author(s): * Corresponding author. E-mail address: csreddyiitk@klh.edu.in* Corresponding author. E-mail address: csreddyiitk@klh.edu.in

Abstract

A polynomial curve fitting based early room reflection parameter analysis is discussed in this work. Particularly, the spatial directions of ground reflections that are usually harder to extract from Omni-RIR are analyzed using measurements of B-Format microphones. A simulated model is initially presented with the ray approximation of sound propagation. Subsequently, a recently developed parameterization approach called Reverberant Spatial Audio Object (RSAO) is discussed for early reflection parameter extraction from measured B-Format RIRs. Parameters obtained from Simulated and RSAO methods are analyzed for 10 different source receiver distances. A publicly available classroom B-Format RIR database is used for extracting RSAO parameters for 10 distances. Polynomial curve fitting is performed on the parameters obtained from both methods. The optimal order that can generalize the 10 samples parameters is also obtained as part of this work. The results show that the method is suitable for obtaining parameters for non-measured directions from a sparse set of measurements of B-Format RIRs.

Keywords： RSAO ; spatial audio ; RIR

PDF (469KB) Metadata Related articles Export EndNote| Ris| Bibtex

Cite this article

Vuppala Swathi, Sandeep Chitreddy. Polynomial Curve Fitting-based Early Room Reflection Analysis using B-Format Room Impulse Response Measurements for Ambient Sound Reproduction. International Journal of Performance Analysis in Sport, 2021, 17(3): 307-313 doi:10.23940/ijpe.21.03.p6.307313

Introduction

Humans are gifted with the amazing ability to experience the ambience of nature through their sensory organs. Technology has enabled us to capture the ambience in the form of signals and allowed humans to reproduce it as per their convenience. Particularly, reproducing ambient sound captured in a closed room environment needs thorough understanding of the sound interaction with the environment. These interactions can be characterized using Room Impulse Responses (RIR). In the past few decades, many RIR databases have been developed in order to understand the interaction of sound and reproduce the ambience of closed room environments [1-2]. Broadly, a RIR contains three components: direct, early and late components [3]. While it is possible to extract some of the crucial room acoustic parameters using an Omni-RIR, it is impossible to extract the spatial information (azimuth and elevation) of the direct and early components. In order to obtain the spatial information, an array of spherically distributed microphone is needed [4]. One such arrangement is called a B-Format microphone with tetrahedral arrangement [5]. RIRs are measured using B-Format and capture spatial variations, and therefore, it is possible to extract the spatial information of direct and early reflections.

Recently, a method to completely parameterize these components was developed, called the Reverberant Spatial Audio Object (RSAO) [6-12]. The Reverberant Spatial Audio Object approach exploits the spatial arrangements of the B-Format microphone to extract these parameters. Direct component parameters of RSAO consists of levels, onset times and direction of the direct signal [13-17]. Parameters of early components consists of number of reflections considered, levels and onset times of each reflection, and the spectral characteristics of each reflection. Late component is divided into nine octave sub-bands in the audio frequency spectrum parameterized as exponential decay coefficients. These parameters capture the behavior of direct, early and late reflections. While these parameters are computable using RSAO, the lack of any ground truth makes it harder to test the accuracy of the parameters. In this work, the authors attempt to make an error analysis on the extracted parameters. Particularly, parameters for which a clear relation can be established with the physical dimensions of the room, and it is analyzed using the Polynomial Curve Fitting [18] to generalize the sample parameters obtained.

The rest of the paper is organized as follows. Section 2 contains the discussion about early reflection parameter analysis of both simulated models and the Reverberant Spatial Audio Object. Polynomial curve fitting is performed for both elevation angles and onset times across distances. Section 3 discusses the result analysis of the early parameters with polynomial curve fitting. Section 4 concludes the paper.

Early Reflection Parameter Analysis using Reverberant Spatial Audio Object and Simulated Models

Early reflections are the signals that arrive to the microphones reflected from the surfaces of the room. Using techniques like Reverberant Spatial Audio Object, it is possible to compute various parameters from the signals measured from the B-Format microphone. More importantly, the spatial arrangements of B-Format mics enable the computation of the direction of the reflections using beamforming techniques. However, it is a challenging task to identify nonspurious reflections that have a one-to-one mapping with the walls of the rooms. This is evident due to the fact that there is no systematic pattern in most of the reflection directions when the parameters are computed for various source receiver distances. The only early reflections that are able to exhibit a mapping is the ground and floor reflections [19]. However, there still exists inconsistencies in the computed directions when compared against simulated models like Image source methods. Consider B-Format Room Impulse Responses measured for N different source receiver distances in a classroom like environment as shown in Figure 1. The N microphone positions are arranged in a linear fashion uniformly spaced with a separation of distance d. Also assume the separation between the source and the first B-format microphone position as d. Various early reflection parameters extracted using measured B-Format Room Impulse Responses and the simulated models are discussed below.

Figure 1.

New window| Download| PPT slide

Figure 1. Angular variation of the sound reflections from the ground for microphones located at various distances.

2.1. Computation of Ground Reflection Elevations using Simulated Room Impulse Response and Measured B-Format Room Impulse Response

The ground reflections are indicated as straight lines or rays similar to the assumption made in the image source method
[20-21]. The first early reflection exhibits similar distance before and after ground reflection as shown in Figure 1. The angle subtended by the first reflection with the ground to the nth microphone is given by:

The angle subtended by the reflection with the ground is the same as the angle subtended by the microphone with the horizontal axis (computed in RSAO) as shown in Figure 1 (alternate angles). It can be noted that the tangent of the angle is inversely proportional to the distance between source and the microphone. Although this relation is based on the assumption that the sound propagates as rays, this relation provides insights into how the reflections should change with respect to distance. In this work, the authors consider this as the Simulated Model and take this as a reference to analyze the angular variation of the first reflection extracted through B-Format RIRs. This analysis is performed by modelling elevation angles of ground reflection obtained for measured B-Format RIRs using a polynomial curve fitting.

2.2. Computation of Ground Reflection Onset Times using Simulated RIR and Measured B-Format RIR

The time required for the signal to reach from source to receiver is called onset time. For a Room Impulse Response signal measured in a room, direct signal has the least onset time. The next dominant component is the ground reflection assuming that the ground is closer than the walls and ceiling. Assuming an ideal case, the sound reflected from the ground travels a time duration given by:

Where c=343m/s is the speed of sound in air and t_n is the onset time of the ground reflection to reach the sound from the source to the n^th microphone. Onset time is also obtained from the B-Format signals using the RSAO method. Reverberant Spatial Audio Object utilizes the Dypsa algorithm [22] to identify the ground reflection components and its onset time. Ground reflections obtained by both methods, first using Reverberant Spatial Audio Object and later using Equation 3, are analyzed using a polynomial curve fitting in this work.

2.3. Polynomial Curve Fitting based Early Parameter Analysis

Consider M is the model order and p_n is the early parameter from the ground captured by the B-format mic at the nth position. Then, the M^th order curve fitting model is given as:

Figure. 2.

New window| Download| PPT slide

Figure. 2. Azimuthal angle variation for various source receiver distances

Figure. 3.

New window| Download| PPT slide

Figure. 3. Elevation angle obtained using Simulated Model and RSAO computed for various source receiver distances

Where 𝜃_𝑛is the elevation angle of the ground reflection. t_nis the onset time to the n^thmicrophone. K_0,k₁, … k_Nare polynomial quotients.

This model is used in this work for two purposes. First, to have a similar representation for both simulated and measured early parameters that are independent of the number of sample positions. Second, to predict the elevation angles for the non -measured directions. The solution to the polynomial curve fitting problem for N > M can be obtained using the least squares as follows.

The results obtained for both these analyses are discussed in Section 3.

Result Analysis

In this section, the database used for RSAO parameter analysis is first discussed. Subsequently, results obtained for the ground reflection elevations, azimuths and onset times for both simulated and measured RIRs are presented. Finally, the polynomial curve fitting performance is also discussed.

3.1. B-Format RIR Database

There are very few databases available with B-Format RIRs measurements that are systematically measured for various distances [5,23]. One such database is the QMUL database [5]. This database was developed for a grid of sampling points for three different rooms: Classroom, Octagon, and Great Hall. In this work, the Classroom B-Format RIR database is used for extracting RSAO parameters.

3.2. Performance Analysis of Ground Reflection Directions for Simulated and RSAO Approaches

A ground reflection direction is represented as an ordered pair of azimuth and elevation angles. As the sound source is in line with the microphone, ideally the azimuthal angle should be zero. But when extracted from the measurement, the computed azimuth varies as shown in Figure 2. It hovers around zero degrees. Elevation angle variation is very important in analyzing the ground reflections. As shown in Figure 1, the elevation angle decreases with an increase in distance between source and microphone. The angle subtended by the microphone with the horizontal axis (computed in RSAO) is the same as the angle subtended by the reflection with the ground (computed in Simulation model using Equation 1). The elevation angle computed through RSAO and the angle computed using Equation 1 are shown in Figure 3. As the latter exhibits a closed form expression, the monotonic nature of the curve can be seen in Figure 3. However, the measured elevations have fluctuations that make the task of identifying parameters for non-measured directions difficult. The solution explored in this work for this problem is to apply a polynomial curve fitting. Figure 4 illustrates the elevation angles as scattered points obtained for simulated (top) and RSAO (bottom) for 10 different distances used in the Classroom database. It also illustrates the polynomial curve fitting of order 1, 3, 5, and 7 performed on the 10 sampling points. Because of the monotonic variation of elevation angle for the simulated model, the lower orders were able to successfully model the sampling points. However, the elevation angles obtained using RSAO need a higher order for proper generalization of sampling points. It can be seen that increasing the order attempts to fit the data. However, PCF orders greater than 5 creates over-fitting of the data, which can be observed from
Figure 4. Fixing a particular order will enable the capture of the ground reflection elevation angles for non-measured directions.

Figure. 4.

New window| Download| PPT slide

Figure. 4. Elevation angles obtained by simulated and RSAO for 10 different distances used in the Classroom Database. Polynomial curve fitting of order 1, 3, 5, and 7 performed on the 10 samples.

Figure 5.

New window| Download| PPT slide

Figure 5. Onset times obtained by simulated and RSAO for 10 different distances used in the Classroom Database. Polynomial curve fitting of order 1, 3, 5, and 7 performed on the 10 samples.

3.3. Performance Analysis of Onset Times for Simulated and RSAO Approaches

Onset times computed for 10 sampling positions using the simulated model through Equation 3 and using RSAO is presented in Figure 5. Similar to elevation angles discussed in the previous section, onset times also exhibit a monotonic curve for the simulated model because of the closed form expression. However, the variation of onset times is more for measured signals obtained through RSAO. Polynomial curve fitting helps generalize the sampling points and thereby enables the capture of the onset times for non-measured directions. It has to be noted that onset times obtained through RSAO are relative to the direct signal time instant. Hence, the direct signal onset time is added to the onset times obtained through RSAO to calculate the total onset time of the ground reflection. In this manner, polynomial curve fitting helps identify the early parameters for non-measured directions.

Conclusion

Early parameter variations are more monotonic in simulated models as compared to the parameters extracted from Reverberant Spatial Audio Object (RSAO). Polynomial curve fitting of RSAO parameters for ground reflections needs a higher model order as compared to the simulated model for equivalent parameters. The polynomial curve fitting method is able to generalize the observed sample parameters and enable the computation of the parameters for non-measured directions. This work can be extended in multiple ways. It can be used to measure a slightly denser grid of B-Format RIRs in a closed room so that the error analysis on the early parameters for non-measured directions can be performed. The second extension can focus on other early parameters apart from spatial directions and onset time.

Reference

By original order

By published year

By cited within times

By Impact factor

[1].

Farina, A.

Simultaneous measurement of impulse response and distortion with a swept-sine technique

Journal of The Audio Engineering Society, February 2000.

[2].

Reddy, C.S.and Hegde, R.M. Design and development of bionic ears for rendering binaural audio. In 2016 International Conference on Signal Processing and Communications (SPCOM), pp. 1- 5, 2016.

[3].

Habets, E.A. Speech dereverberation using statistical reverberation models. In Speech Dereverberation, Springer, London, pp. 57- 93, 2010.

[Cited within: 1]

[4].

Remaggi, L. , Jackson, P.J. , Coleman, P. and Wang, W.

Acoustic reflector localization: novel image source reversion and direct localization methods

IEEE/ACM Transactions on Audio, Speech, and Language Processing, 25( 2), pp. 296- 309, 2016.

[Cited within: 1]

[5].

Stewart, R. and Sandler, M. Database of omnidirectional and B-format room impulse responses. In 2010 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 165- 168, 2010.

[Cited within: 3]

[6].

Remaggi, L. , Jackson, P. and Coleman, P.

Estimation of room reflection parameters for a reverberant spatial audio object. In Audio Engineering Society Convention 138. Audio Engineering Society, 2015.

[7].

Coleman, P. , Franck, A. , Jackson, P. , Hughes, R.J. , Remaggi, L. and Melchior, F.

Object-based reverberation for spatial audio

Journal of the Audio Engineering Society, 65( 1/2), pp. 66- 77, 2017.

[8].

Coleman, P. , Franck, A. , Jackson, P. , Hughes, R. , Remaggi, L. and Melchior, F. On object-based audio with reverberation. In Audio Engineering Society Conference: 60th International Conference: DREAMS (Dereverberation and Reverberation of Audio, Music, and Speech). Audio Engineering Society, 2016.

[9].

Woodcock, J. , Franombe, J. , Franck, A. , Coleman, P. , Hughes, R. , Kim, H. , Liu, Q. , Menzies, D. , Simón Gálvez, M.F. , Tang, Y. and Brookes, T. A framework for intelligent metadata adaptation in object-based audio. In Audio Engineering Society Conference: 2018 AES International Conference on Spatial Reproduction-Aesthetics and Science. Audio Engineering Society, 2018.

[10].

Francombe, J. , Mason, R. , Jackson, P.J. , Brookes, T. , Hughes, R. , Woodcock, J. , Franck, A. , Melchior, F. and Pike, C. Media device orchestration for immersive spatial audio reproduction. In Proceedings of the 12th International Audio Mostly Conference on Augmented and Participatory Sound and Music Experiences, pp. 1- 5, 2017.

[11].

Coleman, P. , Franck, A. , Menzies, D. and Jackson, P.J.

Object-based reverberation encoding from first-order Ambisonic RIRs. In Audio Engineering Society Convention 142. Audio Engineering Society, 2017.

[12].

Breebaart, J. , Engdegård, J. , Falch, C. , Hellmuth, O. , Hilpert, J. , Hoelzer, A. , Koppens, J. , Oomen, W. , Resch, B. , Schuijers, E. and Terentiev, L.

Spatial audio object coding (SAOC)-The upcoming MPEG standard on parametric object based audio coding. In Audio Engineering Society Convention 124. Audio Engineering Society, 2008.

[13].

Geier, M. , Ahrens, J. , and Spors, S.

Object-based audio reproduction and the audio scene description format

Organised Sound, 15( 3), pp. 219- 227, 2010.

[14].

Bleidt, R. , Borsum, A. , Fuchs, H. and Weiss, S.M.

Object-based audio: Opportunities for improved listening experience and increased listener involvement

SMPTE Motion Imaging Journal, 124( 5), pp. 1- 13, 2015.

[15].

James, G. , Witten, D. , Hastie, T. , and Tibshirani, R.

An Introduction to Statistical Learning with Applications in R

New York: Springer, 2013.

[16].

Coleman, P. , Franck, A. , Francombe, J. , Liu, Q. , de Campos, T. , Hughes, R. , Menzies, D. , Simón Gálvez, M. , Tang, Y. , Woodcock, J. and Melchior, F.

S3A Audio-Visual System for Object-Based Audio, 2018.

[17].

Kim, H. , Remaggi, L. , Jackson, P.J.and Hilton, A.

Immersive spatial audio reproduction for vr/ar using room acoustic modelling from 360 images. In 2019 IEEE Conference on Virtual Reality and 3D User Interfaces (VR), pp. 120- 126, 2019.

[18].

Bishop, C.M.

Pattern recognition and machine learning

springer, 2006.

[Cited within: 1]

[19].

Chitreddy, S.R.and Jackson, P.J.B.

Source distance perception with reverberant spatial audio object reproduction of real rooms, in Proceedings of 9th Forum Acusticum, 2020.

[Cited within: 1]

[20].

Allen, J.B.and Berkley, D.A.

Image method for efficiently simulating small‐room acoustics

The Journal of the Acoustical Society of America, 65( 4), pp. 943- 950, 1979.

[21].

Lehmann, E.A.and Johansson, A.M.

Prediction of energy decay in room impulse responses simulated with an image-source model

The Journal of the Acoustical Society of America, 124( 1), pp. 269- 277, 2008.

[22].

Kounoudes, A. , Naylor, P.A.and Brookes, M.

The DYPSA algorithm for estimation of glottal closure instants in voiced speech. In 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing, 1, pp. I-349, 2002.

[Cited within: 1]

[23].

Merimaa, J. , Peltonen, T. and Lokki, T.

Concert Hall Impulse Responses - Pori, Finland: Reference. Technical Report, 2005.

[Cited within: 1]

Simultaneous measurement of impulse response and distortion with a swept-sine technique

2016

2010

... Humans are gifted with the amazing ability to experience the ambience of nature through their sensory organs. Technology has enabled us to capture the ambience in the form of signals and allowed humans to reproduce it as per their convenience. Particularly, reproducing ambient sound captured in a closed room environment needs thorough understanding of the sound interaction with the environment. These interactions can be characterized using Room Impulse Responses (RIR). In the past few decades, many RIR databases have been developed in order to understand the interaction of sound and reproduce the ambience of closed room environments [1-2]. Broadly, a RIR contains three components: direct, early and late components [3]. While it is possible to extract some of the crucial room acoustic parameters using an Omni-RIR, it is impossible to extract the spatial information (azimuth and elevation) of the direct and early components. In order to obtain the spatial information, an array of spherically distributed microphone is needed [4]. One such arrangement is called a B-Format microphone with tetrahedral arrangement [5]. RIRs are measured using B-Format and capture spatial variations, and therefore, it is possible to extract the spatial information of direct and early reflections. ...

Acoustic reflector localization: novel image source reversion and direct localization methods

2016

2010

... There are very few databases available with B-Format RIRs measurements that are systematically measured for various distances [5,23]. One such database is the QMUL database [5]. This database was developed for a grid of sampling points for three different rooms: Classroom, Octagon, and Great Hall. In this work, the Classroom B-Format RIR database is used for extracting RSAO parameters. ...

... ]. One such database is the QMUL database [5]. This database was developed for a grid of sampling points for three different rooms: Classroom, Octagon, and Great Hall. In this work, the Classroom B-Format RIR database is used for extracting RSAO parameters. ...

2015

Object-based reverberation for spatial audio

2017

2016

2018

2017

2008

Object-based audio reproduction and the audio scene description format

2010

Object-based audio: Opportunities for improved listening experience and increased listener involvement

2015

An Introduction to Statistical Learning with Applications in R

2013

2018

2019

Pattern recognition and machine learning

2006

... Recently, a method to completely parameterize these components was developed, called the Reverberant Spatial Audio Object (RSAO) [6-12]. The Reverberant Spatial Audio Object approach exploits the spatial arrangements of the B-Format microphone to extract these parameters. Direct component parameters of RSAO consists of levels, onset times and direction of the direct signal [13-17]. Parameters of early components consists of number of reflections considered, levels and onset times of each reflection, and the spectral characteristics of each reflection. Late component is divided into nine octave sub-bands in the audio frequency spectrum parameterized as exponential decay coefficients. These parameters capture the behavior of direct, early and late reflections. While these parameters are computable using RSAO, the lack of any ground truth makes it harder to test the accuracy of the parameters. In this work, the authors attempt to make an error analysis on the extracted parameters. Particularly, parameters for which a clear relation can be established with the physical dimensions of the room, and it is analyzed using the Polynomial Curve Fitting [18] to generalize the sample parameters obtained. ...

2020

... Early reflections are the signals that arrive to the microphones reflected from the surfaces of the room. Using techniques like Reverberant Spatial Audio Object, it is possible to compute various parameters from the signals measured from the B-Format microphone. More importantly, the spatial arrangements of B-Format mics enable the computation of the direction of the reflections using beamforming techniques. However, it is a challenging task to identify nonspurious reflections that have a one-to-one mapping with the walls of the rooms. This is evident due to the fact that there is no systematic pattern in most of the reflection directions when the parameters are computed for various source receiver distances. The only early reflections that are able to exhibit a mapping is the ground and floor reflections [19]. However, there still exists inconsistencies in the computed directions when compared against simulated models like Image source methods. Consider B-Format Room Impulse Responses measured for N different source receiver distances in a classroom like environment as shown in Figure 1. The N microphone positions are arranged in a linear fashion uniformly spaced with a separation of distance d. Also assume the separation between the source and the first B-format microphone position as d. Various early reflection parameters extracted using measured B-Format Room Impulse Responses and the simulated models are discussed below. ...

Image method for efficiently simulating small‐room acoustics

1979

Prediction of energy decay in room impulse responses simulated with an image-source model

2008

2002

... Where c=343m/s is the speed of sound in air and t_n is the onset time of the ground reflection to reach the sound from the source to the n^th microphone. Onset time is also obtained from the B-Format signals using the RSAO method. Reverberant Spatial Audio Object utilizes the Dypsa algorithm [22] to identify the ground reflection components and its onset time. Ground reflections obtained by both methods, first using Reverberant Spatial Audio Object and later using Equation 3, are analyzed using a polynomial curve fitting in this work. ...

2005

〈

〉