Evaluation of Clipped-Sample Restoration Software
April 2010 - Volume 12 - Number 2
Evaluation of Clipped-Sample Restoration Software
Bruce E. Koenig
Audio/Video Forensics Expert
BEK TEK LLC
Douglas S. Lacey
Audio/Video Forensics Expert
BEK TEK LLC
Abstract | Introduction | Test Recordings | Test Procedures | Results |
Conclusions | Acknowledgments | References
The introduction of clipped-sample restoration software in forensic applications has raised questions by some examiners regarding its validity in the enhancement of investigative and other legal recordings. Clipped samples occur in digital audio files when the amplitude of an input signal exceeds the maximum recording quantization level of the system, producing a recording containing completely flattened waveforms in the highest amplitude portions, or peaks. Theoretically, clipped-sample restoration algorithms should decrease the overall or local amplitude of the recorded signal, estimate what the missing peaks should look like, and replace the flattened portions of the recording with the estimated values, thus “restoring the peaks.”
To evaluate the effects of restoration algorithms, two widely available software programs were tested using both discrete sine waves and speech information. The results of the restoration processes were then compared to the original source and clipped recordings using statistical waveform cross-correlation, total harmonic distortion, spectrographic, and critical listening analyses to assess the software “corrections.” This research determined that detailed testing of clipped-sample restoration software is strongly recommended before being utilized to enhance digital audio evidence in forensic laboratories.
The enhancement of digital audio recordings to improve their voice intelligibility (Koenig et al. 2007) is a forensic field that seldom invokes controversy. However, the introduction of clipped-sample restoration software has raised questions by some examiners in both private and government laboratories regarding its use on overdriven investigative and other legal recordings. Clipped samples occur in digital audio recordings when the amplitude of the input signal exceeds the maximum recording level of the system, producing a waveform that is completely flattened in the highest amplitude portions, or peaks. For instance, in 16-bit quantization, 65,536 (216) discrete values define the positive and negative amplitudes of an audio waveform signal above and below the zero axis (Pohlmann 2005). If the amplitude of the audio information exceeds the allowable 65,536 values, the digital recording system will assign it the largest positive or negative quantization value possible, depending on whether the signal is above or below the zero axis. This inaccurate assignment of these maximum and minimum values produces aural distortion, because the flattened areas produce a localized square-wave effect, which adds harmonics to the recorded signal.
Figure 1a is a waveform display of a 1.00 kilohertz (kHz) sine wave with 5.0 milliseconds of time on the horizontal axis and amplitude on the vertical axis. Figure 1b is the same waveform display as Figure 1a, except that the top 6.0 decibels (dB) of its positive and negative peaks have been “clipped.” For illustrative purposes, Figures 1a and 1b are displayed in the same amplitude scaling, whereas clipped digital audio recordings normally have flattened peaks at their maximum amplitude.
All of the figures are in Adobe Portable Document Format (PDF). To view them, you will need to have the Adobe Reader plug-in installed on your computer. The Reader can be downloaded at no cost from Adobe at http://www.adobe.com/products/reader/.
Figure 1: Waveform displays of (a) a 1.00 kHz sine wave over 5.0 milliseconds and (b) the same 1.00 kHz sine wave clipped to +6 dB.
Figures 2a and 2b are narrow-band spectrum displays produced using fast Fourier transform (FFT) software; FFT theory is further described in the Total Harmonic Distortion (THD) section below. These two displays have a frequency range of 0 to 6.00 kHz on the horizontal axis and the same amplitude range on the vertical axis. Figures 2a and 2b display FFT analyses of the same information used for Figures 1a and 1b, respectively. Figure 2a shows the 1.00 kHz sine wave as a single peak at 1.00 kHz, whereas Figure 2b has the same 1.00 kHz peak plus square-wave-induced, high-level harmonics at 3.00 kHz and 5.00 kHz and lower amplitude frequency artifacts, none of which existed in the original 1.00 kHz signal.
Figure 2: Fast Fourier transform displays of the same signals as (a) Figure 1a and (b) Figure 1b .
Theoretically, clipped-sample restoration software should decrease the overall or local amplitude of the recorded signal, estimate what the missing “peaks” should look like, and replace the flattened portions of the recording with estimated sample values, thus “restoring the peaks” and decreasing the audible distortion. Optimally, these programs should not change the waveform shaping of the recorded audio information, other than in the overdriven portions. The debate has arisen because this digital feature, available in many enhancement and audio-editing programs, modifies clipped-sample values in a recording to estimate the original signal. The critics believe that these modifications to the audio data are not appropriate for forensic audio enhancement processes, because it is not possible to perfectly reconstruct the clipped portions, whereas the proponents believe that this forensic tool, though not perfect, provides for a waveform shape that is much closer to the original audio signal. Many government and private examiners believe it is another excellent tool in the enhancement arsenal that, when used prudently and correctly, improves the voice intelligibility of some overdriven recordings.
To determine the specific effects of representative clipped-sample restoration algorithms, a testing regimen was formulated using both discrete sine waves and voice information and then employed to evaluate two widely available commercial programs. The results of the restoration processes were then compared to the original source and clipped samples using waveform cross-correlation, total harmonic distortion, spectrographic, and critical listening analyses to assess the programs’ “corrections.” The following sections describe these analyses. The two programs are not identified because the purpose of this article is to set forth an overall methodology for individual government and private laboratories to use to evaluate such programs. In addition, these programs are regularly updated by their manufacturers, which may produce different results than the versions tested.
To evaluate the effects of the clipped-sample restoration algorithms using the two selected software programs, the authors produced three types of test digital recordings: source, clipped, and restored. The source recordings were the original, unprocessed digital files of laboratory-prepared sine waves and speech samples; the clipped recordings were the source recordings after their amplitudes had been increased beyond their maximum quantization levels, causing clipping of the waveform peaks; and the restored recordings were the clipped recordings after being processed by the two commercial software programs to restore their peaks.Source Recordings
The source digital recordings consisted of three sets of test recordings, described below, that were selected with typical sampling and quantization formats present in many forensic recordings. Five separate “native” pulse-code modulation (PCM) wavefiles were produced of all of the tests using sampling frequency/quantization formats of 8 kHz/8-bit, 11.025 kHz/8-bit, 11.025 kHz/16-bit, 16 kHz/8-bit, and 16 kHz/16-bit (Pohlmann 2005). The following three test sets were prepared using a standard sound-editing program:
- Test-Set 1: 30.0-second wavefiles containing a synthesized 1 kHz sine wave, peaking at 1.00 dB below the maximum positive and negative quantization levels. The 1 kHz sine wave was chosen because it is a repetitive, simple signal located near the maximum energy frequency of speech (Levitt and Webster 1991).
- Test-Set 2: 30.0-second wavefiles containing an equal mix of four synthesized sine waves (500 Hz, 1400 Hz, 2600 Hz, and 3400 Hz), peaking at 1.00 dB below the maximum quantization level. The four discrete tones were chosen because they are not harmonically related and all fall within the expected frequency range of most forensic recordings.
- Test-Set 3: Four speech sentences of two male and two female talkers, randomly selected from the phonetically varied sentences of American English from the Defense Advanced Research Projects Agency (DARPA) Texas Instruments/Massachusetts Institute of Technology (TIMIT) Acoustic-Phonetic Continuous Speech Corpus (Garofolo et al. 1993). Each of the individual talker’s sentences was retained as a separate digital file. The individual phrases within each of the five formats were then normalized to 1.00 dB below their maximum quantization level. The four sentences, separated into their separate normalized portions, were as follows:
- A woman saying, “She had your dark suit in greasy wash water all year.”
- A man saying, “Beg that guard for one gallon of gas.”
- A woman saying, “Put the butcher-block table in the garage.”
- A man saying, “Bob papered over the living-room murals.”
The six files from the test sets in the five different formats produced a total of 30 separate source wavefiles.
To prepare the clipped test recordings, the authors used a sound-editing program to increase the amplitude of the source recordings, which were all at 1.00 dB below their maximum value, by 4.00, 7.00, 10.00, and 13.00 dB. This produced recordings that were overdriven, respectively, by 3.00, 6.00, 9.00, and 12.00 dB above their maximum quantization levels, producing portions with flattened peaks at their highest amplitudes. Table 1 shows the percentage of clipped samples present in each of the 120 clipped source files; that is, the number of clipped samples divided by the total number of samples for a particular source file.
Table 1: Percentage of clipped samples for each of the clipped source recordings
To determine if resampling/re-quantization would improve or worsen the restoration process, the authors clipped, then converted, the 8 kHz/8-bit tests into two additional wavefiles with 8 kHz/16-bit and 44.1 kHz/16-bit formatting; likewise, the 11.025 kHz/8-bit tests were converted to 11.025 kHz/16-bit, the 11.025 kHz/16-bit tests were converted to 44.1 kHz/16-bit, the 16 kHz/8-bit tests were converted to 16 kHz/16-bit, and the 16 kHz/16-bit tests were converted to 44.1 kHz/16-bit. Appropriate antialiasing filters were employed during each of the resampling processes. The resulting files were referred to as “reformatted clipped” files. Because the six test files had been sampled and quantized in five different formats, amplitude adjusted at four different levels, and then six new formats added after clipping, a total of 264 separate clipped wavefiles were produced.
Finally, the restored files were produced by processing the clipped files through the two selected software programs, which should have produced a total of 528 separate digital recordings. However, program #1 could not process the 48 individual 8 kHz wavefiles, resulting in a total of only 480 restored recordings. Both programs processed the clipped files with the clipped peak algorithm using a manually set attenuation setting to create sufficient headroom for the reconstructed peaks. No automatic limiter or other features were enabled.
The original 30 source files, the 264 clipped files, and the 480 restored files were then analyzed using four procedures: waveform cross-correlation, total harmonic distortion, spectrographic, and critical listening analyses.
Waveform Cross-Correlation Analysis
The waveform cross-correlation process compared the source files to their respective clipped, restored, and reformatted restored files, using a MATLAB (The Mathworks, Inc., Natick, Massachusetts) routine that allowed for a sliding sample-by-sample comparison between the respective signals. The equation for this cross-correlation is set forth in the following formula (The Mathworks, Inc. 2008):
where x and y = audio files of sample length N, and m = displacement in time between x and y.
The output of the above cross-correlation is a sequence of values having a length of 2N –1, the total number of time displacements between x and y. The correlation equation will produce a value of exactly 1 when m = 0 and the x and y audio files are identical. Similarly, two identical signals that are 180 degrees out of phase would result in a cross-correlation value of exactly –1 when m = 0. Two signals that have absolutely no correlation, which would be exceedingly rare, would produce a cross-correlation value of exactly 0 when m = 0.
Differences in amplitude between two audio signals of identical shaping will have no effect on the results of this cross-correlation process. For example, a cross-correlation between two identical sine waves having peak quantization levels of ±30,000 and ±15,000 would result in a cross-correlation value of 1 (with m = 0), just as if the ±30,000 or ±15,000 signals were compared to themselves.
Once the cross-correlation values were computed for each pair of audio files, the maximum absolute value from each sequence, representing the highest level of correlation between the two files, was documented. For this experiment, the maximum absolute values always resulted from the largest positive cross-correlation value (never from the greatest negative value). For the restoration process to improve a cross-correlation value, the result of the native-to-restored comparison must be larger than that of the respective comparison between the native and clipped files.
Total Harmonic Distortion Analysis
The total harmonic distortion percentages of the 1 kHz sine waves were measured with a stand-alone FFT analyzer that produced a display that reflects the parameters of frequency on the horizontal axis and amplitude on the vertical axis. The underlying FFT algorithm used by this analyzer converts a signal from the time to the frequency domain through a mathematical relationship discovered by Baron Jean Baptiste Joseph Fourier in 1807, but not published until 1822 in his book The Analytical Theory of Heat (as cited in Bracewell 1989). This “transformation” is based upon the periodicity inherent in sine and cosine functions, which allowed Fourier to define in the frequency domain any continuous, periodic signal as a summation of these trigonometric functions. The Fourier transform was later adapted to the discrete operation of computers with an incremental form called the discrete Fourier transform (DFT). This DFT was, in turn, optimized into the FFT, which requires fewer computations, thus providing an appreciable increase in processing speed. A detailed description of FFT theory is beyond the scope of this article, but many other excellent texts are available on the subject (see Bracewell 2000; Kammler 2000; Owen 2007; Wallace and Koenig 1989).
The THD percentage was determined for the clipped and restored 1 kHz sine wave recordings, allowing a detailed measurement of the harmonic distortion of the single discrete signal. The THD is defined mathematically as the square root of the harmonic power (sum of the squared magnitudes of the harmonics) divided by the sum of the fundamental and the harmonic powers, as set forth in the following formula (Stanford Research Systems, Inc. 1995):
where P1 = the original signal (the 1 kHz sine wave in the tests), and P2 → Pn are the harmonics (the 2 kHz, 3 kHz, and so forth in the tests).
This analysis determined the percentage of the power of all of the harmonics (2 kHz, 3 kHz, and so forth) compared to the power of the fundamental frequency (1 kHz), thus determining the harmonic distortion ratio produced by the clipping and restoration processes.
The spectrographic analysis allowed the comparison of the speech source files with both the clipped and restored files using a software program that transformed the voice information to produce a time (horizontal axis) versus frequency (vertical axis) versus energy (gray-scaling) display for a specific time period (see Figures 8 and 9 in the Results section below). These displays reflect the individual time versus frequency components in the original source recordings of all of the spoken words and other vocal sounds and the effects of the clipping and the restoration processes, which indicate vocal changes and added artifacts. The most obvious features of these spectrograms are the dark, mostly horizontal bands called formants, which show the peak energies of the resonances of the vocal tract.
The software parameters were set to Hamming window weighting; 0.800 preemphasis; linear time and frequency scaling; a 20-dB gray-scaling range; and FFT analysis sizes of 300 ± 50 Hz (depending upon the sample rate of the file). More detailed descriptions of spectrographic displays and analyses are available in a number of available sources (see Bolt et al. 1979; Potter et al. 1947; Tosi 1979).
Critical Listening Analysis
The authors performed critical listening comparisons between the source, clipped, and restored speech files. The tests were conducted using professional headphones, an external sound card, and an A/B switch that allowed for nearly instantaneous, short-term listening comparisons. This subjective process allowed for the identification of similarities and differences involving overall voice quality, listenability, and added artifacts.
The cross-correlation values calculated for each comparison of the native files with their clipped, restored, and reformatted restored versions are reflected in Tables 2 through 7. Table 8 shows the average correlation value differences of all four levels of clipping (+3, +6, +9, and +12, respectively) for each of the scenarios.
Table 2a: Correlation values and differences for the clipped-sample restoration of the 1 kHz sine wave signal, using the two programs
Table 2b: Correlation values and differences for the reformatted clipped-sample restoration of the 1 kHz sine wave signal, using the two programs
Table 3a: Correlation values and differences for the clipped-sample restoration of the multiple sine wave signal, using the two programs
Table 3b: Correlation values and differences for the reformatted clipped-sample restoration of the multiple sine wave signal, using the two programs
Table 4a: Correlation values and differences for the clipped-sample restoration of the first sample sentence, using the two programs
Table 4b: Correlation values and differences for the reformatted clipped-sample restoration of the first sample sentence, using the two programs
Table 5a: Correlation values and differences for the clipped-sample restoration of the second sample sentence, using the two programs
Table 5b: Correlation values and differences for the reformatted clipped-sample restoration of the second sample sentence, using the two programs
Table 6a: Correlation values and differences for the clipped-sample restoration of the third sample sentence, using the two programs
Table 6b: Correlation values and differences for the reformatted clipped-sample restoration of the third sample sentence, using the two programs
Table 7a: Correlation values and differences for the clipped-sample restoration of the fourth sample sentence, using the two programs
Table 7b: Correlation values and differences for the reformatted clipped-sample restoration of the fourth sample sentence, using the two programs
Table 8a: Average correlation value differences for the clipped-sample restorations using program #1
Table 8b: Average correlation value differences for the clipped-sample restorations using program #2
Reviewing Tables 2 through 8 reveals the following characteristics:
- The comparison between the native and clipped files always resulted in a correlation value below 1.0000, reflecting the changes in the signal as a result of the clipping processes.
- The 1 kHz sine wave signal provided for the best overall results of the restoration processes. The other signals tested resulted in mostly decreased values or very minor increased values with each of the two programs tested.
- The restorations of the native 8 kHz and 11.025 kHz files resulted in overall poorer correlations than did the native 16 kHz files.
- Re-quantization of the native 8-bit files to 16-bit files produced very little to no improvement in the correlation values for all of the samples tested.
- Resampling of the native files to 44.1 kHz files usually provided improved correlation values, except for the 1 kHz sine wave signal.
The files with the greatest negative difference between the cross-correlation values of the native-to-clipped comparison and the restored-to-clipped comparison were the 11.025 kHz/8-bit and 11.025 kHz/16-bit multiple sine wave signals, clipped to +12 dB, and restored using program #1. The cross-correlation differences were calculated as –0.4425 and –0.4436, respectively. The waveforms for the native and restored versions of the 11.025 kHz/16-bit multiple sine wave signal are displayed in Figure 3.
Figure 3: Waveform displays of the (a) native, (b) program #1-restored, and (c) program #2-restored versions of the 11.025 kHz/16-bit multiple sine wave signal; the restorations were applied after the file was clipped to +12 dB. Each of the waveforms displays the same 15.0-millisecond range on the horizontal axis with the same relative amplitude on the vertical axis.
Restoration of the +12 dB clipped versions of the 11.025 kHz/8-bit and 11.025 kHz/16-bit 1 kHz sine wave signals using program #1 resulted in correlation values of 0.9025 and 0.9032, respectively. However, the waveforms for the restored files contained significant differences when compared with the native file. Figure 4 illustrates the native 11.025 kHz/8-bit sine wave signal and the restored versions of the +12 dB clipped signal using both programs. The cross-correlation value was 0.9714 for the version restored using program #2, compared to 0.9025 for the version restored using program #1.
Figure 4: Waveform displays of the (a) native, (b) program #1-restored, and (c) program #2-restored versions of the 11.025 kHz/8-bit 1 kHz sine wave signal; the restorations were applied after the file was clipped to +12 dB. Each of the waveforms displays the same 60.0-millisecond range on the horizontal axis with the same relative amplitude on the vertical axis.
The greatest negative difference for the voice samples restored using program #2 occurred with the 8 kHz/8-bit file for sample sentence #2, clipped to +6 dB. The cross-correlation value dropped from 0.9836 for the clipped-to-native comparison to 0.8873 for the restored-to-native comparison. Figure 5 reflects the “s” sound from the word “gas” in the native and the program #2-restored files.
Figure 5: Waveform displays of the (a) native and (b) program #2-restored versions of the 8 kHz/8-bit sample sentence #2 file; the restoration was applied after the file was clipped to +6 dB. Each of the waveforms displays the same 120.0-millisecond range on the horizontal axis with the same relative amplitude on the vertical axis.
Total Harmonic Distortion Analysis
The results of the THD evaluation of the native clipped, restored, and reformatted restored versions of the 1 kHz sine wave files are contained in Tables 9a and 9b, which revealed the following characteristics:
- Both restoration programs appreciably reduced the THD values compared to the clipped samples.
- Program #1 produced better THD values compared to program #2, except that program #1 could not process the 8 kHz files.
- None of the reformatted clipped files produced an appreciable improvement in the THD values and often produced results with higher THD values.
Table 9a: Total harmonic distortion values for the clipped 1 kHz sine wave files and the restored versions using both programs
Table 9b: Total harmonic distortion values for the clipped 1 kHz sine wave files and the reformatted restored versions using both programs
As examples of the harmonic distortion characteristics introduced by the clipped-sample restoration processes, Figures 6 and 7 display FFT plots of the native, program #1-restored, and program #2-restored versions of the 11.025 kHz/16-bit 1 kHz sine wave signal (clipped to +12 dB) resampled to 44.1 kHz and the 16 kHz/16-bit multiple sine wave signal (clipped to +12 dB) resampled to 44.1 kHz, respectively. It is noted that THD measurements were not taken of the multiple sine wave signals.
Figure 6: FFT displays of the (a) native, (b) program #1-restored, and (c) program #2-restored versions of the 11.025 kHz/16-bit 1 kHz sine wave signal (clipped to +12 dB), resampled to 44.1 kHz. Each of the FFT displays has a frequency range of 0 to 10 kHz.
Figure 7: FFT displays of the (a) native, (b) program #1-restored, and (c) program #2-restored versions of the 16 kHz/16-bit multiple sine wave signal (clipped to +12 dB), resampled to 44.1 kHz. Each of the FFT displays has a frequency range of 0 to 10 kHz.
The spectrographic analysis of the speech samples revealed the following characteristics:
- The most obvious effect of the clipped files was that they contained extraneous formants in the vowel sounds, with higher clipping levels resulting in a more obvious effect.
- Both restoration programs often boosted the energy in the lower frequencies of fricatives (the sound produced when saying certain consonants, for example, the English letters “f,” “s,” and “z”), sometimes added transient events, produced amplitude reductions, and smeared the structure of the higher-frequency formants during some vowel sounds.
Figure 8 displays the native, program #1-restored, and program #2-restored spectrograms for the word “murals” in sample sentence #4 at 11.025 kHz/16-bit, clipped to +12 dB. Among the changes are increases in the low-frequency energy of the fricative sound “s” and higher-frequency artifacts of the vowel sounds.
Figure 8: Spectrograms of the (a) native, (b) program #1-restored, and (c) program #2-restored versions of the word “murals” from the 11.025 kHz/16-bit sample sentence #4 file. The horizontal axis reflects time over 0.6 second, and the vertical axis reflects frequency from 0 to 5.5 kHz.
Figure 9 displays the native, program #1-restored, and program #2-restored spectrograms for the words “all year” in sample sentence #1, from the 11.025 kHz/8-bit file that was reformatted to 11.025 kHz/16-bit and clipped to +12 dB. Among the changes are added transient sounds and a loss of higher-frequency formant information.
Figure 9: Spectrograms of the (a) native, (b) program #1-restored, and (c) program #2-restored versions of the words “all year” in sample sentence #1, from the 11.025 kHz/8-bit file that was reformatted to 11.025 kHz/16-bit and clipped to +12 dB.
Critical Listening Analysis
The critical listening analysis of the speech samples revealed the following characteristics:
- There was general, overall agreement with the waveform cross-correlation values, but the cross-correlation analysis provided a more quantifiable measure of sample quality.
- The overall dynamic range of the speech samples was better in the restored files compared to the clipped files.
- The restored files often had artifacts, such as transients and added sounds, even when there was no loss in overall quality.
- There were no overall improvements in voice quality in the restored speech samples, even those with positive waveform cross-correlation values.
- There was no audible, overall deterioration in the speech samples with negative cross-correlation difference values between 0.0000 and –0.0500; however, noticeable quality losses were present with cross-correlation difference values of –0.0500 or below.
The authors’ overall conclusion regarding the forensic value of the two clipped-sample restoration algorithms, as determined through waveform cross-correlation, total harmonic distortion, spectrographic, and critical listening analyses, is that the two programs produced few improvements in the recording quality and listenability, added noises that were not present in the clipped files, and often degraded the voice and other signals. Although the evaluation of the two representative programs revealed that they probably should not be used with many typical investigative file formats, other available software may perform differently.
As a disclaimer, the authors did not evaluate professional-quality PCM wavefiles (such as at 24-bit quantization and 48 kHz sampling), wavefiles with limited signal-to-noise or inherent linear/nonlinear distortion, or other available clipped-sample restoration software or hardware. Additionally, the authors tested only hard-clipped audio recordings and did not test clippings produced by overdriven microphones, recording-system limitations, and acoustic distortion.
Based on the authors’ evaluations, it is strongly recommend that forensic audio enhancement facilities perform detailed testing of any clipped-sample restoration software that is already installed in the laboratory or that is being considered to enhance digital audio evidence. This testing should incorporate the methodology set forth in this article, or a similar protocol, in an effort to determine the viability of such programs.
The authors thank the following individuals who reviewed this paper and provided important technical and grammatical improvements: Steven A. Killion (BEK TEK LLC, Clifton, Virginia); Thomas M. Daniel (Daniel Technology, Inc., Norwalk, Connecticut); Catalin Grigoras (Forensic Science Center, Bucharest, Romania); David J. Hallimore (Forensic Audio/Video Laboratory, Houston Police Department, Houston, Texas); Sim Lai Hua (Singapore Police Force, Singapore); J. Keith McElveen (Wave Sciences Corporation, Hartsville, South Carolina); Peter Mosher (Electronics Section, Digital Evidence Unit, Centre of Forensic Sciences, Ministry of Community Safety and Correctional Services, Toronto, Ontario, Canada); Wayne R. Runion (Treat Mountain Forensic Services, LLC, Tallapoosa, Georgia). The authors also thank Michael Piper (Washington, D.C.) for his guidance with the MATLAB cross-correlation function.
Bolt, R. H., Cooper, F. S., Green, D. M., Hamlet, S. L., McKnight, J. G., Pickett, J. M., Tosi, O. I., and Underwood, B. D. On the Theory and Practice of Voice Identification. National Academy of Sciences, Washington, D.C., 1979.
Bracewell, R. N. The Fourier transform, Scientific American (1989) 260:86–95.
Bracewell, R. N. The Fourier Transform and Its Applications. 3rd ed. McGraw-Hill, Boston, 2000.
Garofolo, J. S., lamel, L. F., Fisher, W. M., Fiscus, J. G., Pallett, D. S., Dahlgren, N. L., and Zue, V. DARPA TIMIT Acoustic-Phonetic Continuous Speech Corpus LDC 93S1 [CD-ROM], Linguistic Data Consortium (1993).
Levitt, H. and Webster, J. C. Effects of noise and reverberation on speech. In: Handbook of Acoustical Measurements and Noise Control. 3rd ed. C. M., Harris, Ed. McGraw-Hill, Inc., New York, 1991, pp.16.1–16.4.
Kammler, D. W. A First Course in Fourier Analysis. Prentice Hall, Upper Saddle River, New Jersey, 2000.
Koenig, B. E., Lacey, D. S., and Killion, S. A. Forensic enhancement of digital audio recordings, Journal of the Audio Engineering Society (2007) 55(5):352–371.
MATLAB® R2008b Product Help. The Mathworks, Inc., Natick, Massachusetts, 2008.
Operating Manual and Programming Reference for Model SR780 Network Signal Analyzer, Revision 2.0. Stanford Research Systems, Inc., Sunnyvale, California, 1995.
Owen, M. Practical Signal Processing. Cambridge University Press, Cambridge, United Kingdom, 2007, pp. 53–57.
Pohlmann, K. C. Principles of Digital Audio. 5th ed. McGraw-Hill, New York, 2005, pp. 23–39, 51–54.
Potter, R. K., Kopp, G. A., and Kopp, H. G. Visible Speech. Dover Publications, Inc., New York, 1947 (reprinted 1966).
Tosi, O. Voice Identification: Theory and Legal Applications. University Park Press, Baltimore, 1979.
Wallace, A. Jr. and Koenig, B. E. An introduction to single channel FFT analysis, Crime Laboratory Digest (1989) 16:33–39.