Hewlett-Packard RM500SL Manual

A SERVICE OF

next previous

0611 RM500SL User’s Guide Version 2.8 Page 87

19.5 Speech signal analysis

FastFacts 19.5: Speech signal analysis

One of the most-used measures of a speech signal is the long-term average

speech spectrum (LTASS). This is a 1/3 octave spectrum averaged over a

sufficiently long portion of the speech material to provide a stable curve. In

practice a 10 second average meets this requirement and, for this reason, all

RM500SL passages are at least 10 seconds long.

The dynamic nature of speech is often characterized by the distribution of short-

term levels in each 1/3 octave band. These levels are determined by calculating a

spectrum for each of a series of short time periods within the passage. Historically,

time periods of 120, 125 or 128 ms have been used. The RM500SL uses a 128

ms time period, resulting in 100 levels (or samples) in each 1/3 octave band for a

12.8 second passage. The level in each band that is exceeded by 1% of the

samples (called either the 1

st

or 99

th

percentile) has historically been referred to as

the speech peak for that band. The curve for these 1% levels is approximately 12

dB above the LTASS. The level in each band that is exceeded by 70% of the

samples (called either the 70

th

or 30

th

percentile) has historically been called the

valley of speech for that band. The curve for these 70% levels is approximately 18

dB below the LTASS. The region between these two curves is often called the

speech region, speech envelope or speech “banana”. The speech envelope, when

derived in this way, has significance in terms of both speech detection and speech

understanding. Generally, speech will be detectable if the 1 % level is at or near

threshold. The Speech Intelligibility Index (SII) is maximized when the entire

speech envelope (idealized as a 30 dB range) is above (masked) threshold. This

will not be an SII of 100% (or 1) because of loudness distortion factors, but higher

SII values will not produce significantly higher scores on most test material. The

speech-reception threshold (SRT) is attained when the LTASS is at threshold

(approximately - depending on test material and the individual)