ACOUSTICS OF SPEECH
FINAL EXAM STUDY GUIDE
Consonant Sound Production:
• Obstruents: fricatives, stops and affricates. Manner of sounds produced. Sound source is turbulence and is all about point of constriction.
• Sonorants: are most like vowels. Include approximants (semi-vowels) and nasals. Resonance through entire vocal tract, little constriction through vocal tract and turbulence through the oral cavity.
Vowels vs. Consonants:
• Formants tell story for vowel and diphthongs; stable, gradually changing pattern. • Consonants, the picture changes due to constriction in oral region. There is acoustic discontinuity that interrupt the smooth formant patterns.
3 Aspects of Consonant Production:
• Voicing: source of the energy (can come from fricative noise or stop noise) o Voicing refers to if the vocal folds are vibrating. A consonant where the vocal folds don’t vibrate is not voiced, but may have a fricative noise because of constriction (t). If a consonant is voiced, the vocal folds vibrate and it may have a fricative noise; it has both (d). If you want to learn more check out A data point on a graph or in a set of results whose value is much larger or smaller than the next nearest data point is known as what?
• Manner: stop, affricate, fricative. How is it produced.
• Place: where does the constriction occur? (bilabial, alveolar)
• Vertical striations on spectrogram for voicing
• Nasals (manner) tend to be weaker than surrounding vowels
• Formant transitions: indicative of place. Rapid bend or change in formant structure. Exact form is dependent on both the vowel and consonant involved. If you want to learn more check out What is defined as materials found in the environment or nature that are used by humans?
Watched a 6-minute video on manner, place and voicing.
Place of articulation:
Bilabial: between the lips
Labiodental: between the lip and teeth
Interdental: tongue between lower teeth
Alveolar: ridge above upper teeth
Palatal: roof (hard palate)
Velar: velum (soft palate)
Glottal: glottis (space between vocal folds) We also discuss several other topics like What is the art of persuasion through words?
Manner of articulation:
Stops: constriction and release of air
Fricatives: Approaches but doesn’t hit an articulator; bottleneck of airflow Affricates: stop plus fricative If you want to learn more check out What is the meaning of appreciative leadership?
Nasal: velum in lowered; air passes through nasal cavity
Liquid: air flows through one or both sides of tongue
Glides: little constriction of air flow
Tap: rapid flick of tongue to some place of articulation
Order: voice, place, manner (ex. voiced alveolar fricative)
• Nasals, glides and liquids (manner) Don't forget about the age old question of What is meant by the geothermal gradient, and where on earth do we find a "steep" geothermal gradient where heat flow is high to the surface?
• Little to no constriction in oral cavity
• Produced with continuous voicing
• Acoustic characteristics are similar to vowels
1. Approximants (semi-vowels)
a. Divided into 2 groups (glides and liquids)
b. Behave as both vowels and consonants
c. Never form nucleus of a syllable, so cannot be classified as a vowel d. They are all voiced
• /j/ and /w/ (/j/ sounds like y in yes).
• Gliding onto other sounds
• Never occur in the final positon of the word (spelling vs. sound) • Always prevocalic (before the vowel)
• /y/ similar to /i/ has a high F2
• /w/ similar to /u/ low F2
• Spectrogram: “say Wes, say yes” (slide 13)
o Darker lines are the vowels and glides
o “s” are lighter and higher in frequency We also discuss several other topics like What is the main definition of operant conditioning?
o Glide and vowels are really close together on spectrogram
o /w/ low F2 and /j/ high F2
• /r/ and /l/
• /r/ can release a vowel (ran, wright) or can terminate a vowel (for, cure) o one of the later developing and frequently errored in kids
• /l/ can be classified as a liquid or lateral (both manner)
o kids also have a lot of errors with
• Are harder to produce; more distinct articulation than glides (more placement distinction)
• Rapid articulator movements
• Distinguished from each other (place) by F3
o Anti-formants are the same as anti-resonance: frequency regions where amplitude of source is severely attenuated (goes down). See slide 16.
How is /l/ Produced?
• Side branch on top of the tongue (lateral):
o Another area sound is coming out (sound is resonating on top of tongue) o Anti-formants
o Most energy in lower frequencies of sound
• Relatively high F2 and F3
o Appear evenly spaced on spectrogram
• Steady state prior to formant transitions: there isn’t a lot of movement
Comparing /l/ To Other Sounds:
• /l/ vs. /t/ place is the same, but manner and voicing is different.
• /l/ vs. /n/ place is about the same, both voiced and manner is different. • What sounds do kids typically substitute for /l/?
o /w/ or /y/
• What about end of word?
• Retroflex: tongue tip is up, tongue tense, and back or… tongue is bunched with the middle portion raised
• Narrowing at palatal region makes F3 low (place= palatal)
o F3 is the lowest for this sound more than any other sound. Close to F2. • No anti-formants
• Rake: silence period before the K on the spectrogram
• Retroflex vs. bunched
o Example on slide 22
• Manner, place, and voiced
o Manner: liquid
o Place: palatal
o Voiced: yes
• Common mistakes
o /r/ in initial position -> /w/
o /r/ in final position -> [o]
• Acoustic measures -> feedback of difference vs. accurate and inaccurate /r/ productions • Baseline for /r/ F3 and then post treatment production of /r/ to show difference
• Only 3 of them
o /m/, /n/, /ŋ/
• Defining a nasal
o Nasal airflow
o Place of obstruction in the oral cavity
• Manner: nasals
• Voicing: all voiced
• Place: /m/ is bilabial. /n/ is alveolar, and /ŋ/ is velar
• Changes in oral cavity size changes resonance in the vocal tract
• Velum is lowered for nasals, but up for all other consonants
Acoustics of Nasal sounds:
• Sounds resonates in pharyngeal cavity, dead-end of the closed oral cavity, and spacious chambers of nasal cavities
• Addition of nasal branches to vocal tract creates larger, longer resonator- frequency is lower. (Helmholtz resonator).
• Pipe model: red bars are “closed-ends.” Nasal cavity is open end. Slide 33.
• Decrease in amplitude
• Compared to vowels, nasals are more damp
• Soft vocal tract walls absorb acoustic energy
• Nasals look lighter on spectrograms. Darkness refers to amplitude and nasals are less intense, so they are lighter.
• All sounds are damped, but nasals are more because there is more area taken up and tissue to absorbed.
• Light gray on slide 41
• Negative resonance when velum is lowered
• Decrease in intensity of nasal and vowel formants
• Lowest frequency for /m/ (long oral tract side branch)
o It has the largest amount of space
• Intermediate for /n/
o Some space
• Highest in /ŋ/
o Because short oral tract side branch
• White spaces are anti-formants slide 41
• Formants produced by both the nasal and oral cavity
• Side branch result in anti-formants (negative resonance)
• Anti-formants: arise from changes in oral cavity when it comes a side branch of vocal tract; decrease in intensity.
• Caused because of acoustic energy that radiates outward from the nasal cavity during production of nasals.
o As a result, we get nasal formant (first formant)
• Characterized by series of formants, the first of which is the nasal formant (easily seen between 250-300 Hz). Relatively low frequency.
• Slide 41, nasal formant is the dark line at bottom of nasals in spectrogram
• Hypernasality: too much nasal resonance (for non-nasal sounds)
o Flaccid dysarthria: decreased innervation of velum, so they can’t raise the velum up enough
o Cleft palate
• Hyponasality: too little nasal resonance for nasals
o Degenerative disorders of the nervous system (muscle weakness or poor timing)
• Damping: sound is not at loud; all speech sounds are damped to some extent because they are traveling through tissue and cavities
• Anti-formants: show up as white spaces on spectrograms
• Don’t need to know 3rd bullet about formant bandwidth, we won’t talk about it • Can vowels become nasalized? Yes!
o Context: Co-articulation, assimilation
o Languages and dialects may have more nasalized vowels
Slide 44: 1st one “ban” (nasal formant), 2nd one “bash” (high frequency), 3rd “bat” (stop gap).
• Where and how the air is obstructed
• 3 types
▪ voiced and voiceless
▪ combo of stop and fricative
• high in frequency (pitch)
• Narrow constriction: turbulence at site makes it a fricative
• shorter in duration than vowels
• random, continuous noise pattern in higher frequency regions
• Know the IPA symbols for the nine fricatives
• Place of articulation
o Lingua dental: tongue between teeth
o Labiodental: lib between teeth
o Alveolar: tongue is touching alveolar ridge (behind teeth)
o Palatal: tongue touches hard palate
o Glottal: sound is constricted in the glottis
9 Fricatives Broken Down:
• Sibilants: /s/, /z/, /ʃ/, /ʒ/
• Non-sibilants: /f/, /v/, “th” voiced /ð/ and unvoiced /θ/
• Aspirate/glottal fricative /h/
Acoustic Properties of Fricatives:
• Produced with narrow constriction of the vocal tract
• Voiceless fricatives
o Aperiodic source (sound doesn’t come from vibrated vocal folds)
▪ Turbulent airflow
• Passing through a narrow constriction at a high rate
• Air pressure fluctuating randomly “noise”
• For speech-concentrated at certain frequency ranges
▪ The filter
• Most important cavity for resonance is the one anterior to the
constriction (ex. /s/ is a higher frequency than /ʃ/ because /s/ is
closer to the front so the area anterior to the sound is smaller).
• If small- high frequencies are amplified, if large- low frequencies
amplified (all relative)
• Voiced fricatives
o Aperiodic and periodic source
Slide 12 shows 5 different spectrograms with different voiceless consonants. The /f/ and /th/ are higher in frequency because they are more fronted in the mouth (smaller oral cavity = higher frequency), the /s/ and /sh/ are darker meaning they have a higher amplitude.
• Most intense of all fricatives
• Strident: all sibilants are strident sound, but not all strident sounds are sibilants. o Strident sounds have noise of relatively high intensity
• Stridency deletion: either the omission or substitution of another sound for a strident
o Ear infections can cause children to omit stridents because they can’t hear it in other people’s speech because they have high frequency hearing loss.
o Caused by narrow constriction at hard palate or alveolar ridge
o Air also hits upper and lower teeth
o If constriction is inappropriate, or if front teeth are missing, the acoustics of the turbulence will be affected.
• The four sibilants /s/, /z/, /ʃ/, /ʒ/
o Perceived as being louder than non-sibilants (most intense sounds) o Alveolar fricatives /s/ and /z/
▪ Cavity in front of constriction is much smaller
▪ Frequencies above 4,000 Hz emphasized (remember)
o Palatal fricatives /ʃ/, /ʒ/
▪ Large cavity in front of constriction
▪ Lips are usually rounded (lengthens vocal tract = lower frequency) ▪ Frequencies above 2,000 Hz emphasized (larger cavity = lower frequency)
/s/ vs. /sh/:
• /sh/ or /ʃ/ has a greater intensity
• /ʃ/ has palatal placement, /s/ has alveolar placement
• /s/ and /ʃ/ have comparatively large acoustic energy and so produce darker patterns than /f/ and θ
• Primary spectral energy lower for /ʃ/ than /s/; noise in /s/ has the highest frequencies
Voices vs. Voiceless Fricatives:
• Voiceless fricatives will appear darker on spectrograms than voiced fricatives due to greater intensity (greater acoustic energy)
• Spectrograms of voiced fricatives may have vertical striations throughout the period of noise.
• Vowels preceding voiced consonants are longer than those preceding voiceless consonants. (important!)
Clinical Applications For /s/:
• Spectrograms can be used to give visual feedback for lisping
o /s/ has a higher overall amplitude than “th” (non-strident fricative)
o /s/ has more amplitude at higher frequencies than /ʃ/
Non-sibilants or Non-stridents:
• Least intense fricatives
• /f/ (voiceless), /v/ (voiced), /θ/ (voiceless), /ð/ (voiced)
o lower amplitude of vibration (quieter)
o noise source at point of constriction (labiodental (f, v) labiodental (/θ/, /ð/)) • Cavity anterior to constriction is very small (basically non-existent)
o Very high frequency resonances
o No real impact on spectrum of sound
• Voiceless “th” /θ/ is the least intense of all the English phonemes (Small text said /h/ and /θ/ are about the same intensity)
Resonance Frequencies of non-sibilants:
• Primarily determined by size of cavity in front of constriction
• Spectrogram on slide 30: /f/ and /θ/ have high frequencies compared to the other consonants.
• Spectrogram on slide 31: /v/ and /ð/ not seeing much sound in the higher frequencies.
Glottal Fricative /h/:
• Voiceless: have contraction, but no vocal fold vibration
• Vocal folds are approximated to produce turbulence
• Largest cavity in front of constriction- lowest frequency spectrum
• Spectrogram: no fundamental frequency. Most energy at lower frequencies. • Main concept: Point of constriction—how it affects resonant frequency. How big the cavity is anterior to the point of constriction determines resonant frequency.
• Difficulty perceiving fricatives because of high frequency spectra
• Non-sibilant fricatives lack intensity
• Difficulty discerning between minimal pairs like “thin/fin” and “elf/else” o Can use mouth position as visual cues to discern which one someone is saying.
• What defines fricatives?
o Point of constriction where turbulence is happening.
• Stridents vs. non-stridents
o Stridents higher frequency more intense
o Which consonants fit under each type
• Intensity of fricatives
• Significance of point of constriction; resonating cavity
• Hearing impairment in high frequencies
• Bilabial stops
o /b/ and /p/
• Lingua-alveolar stops
o /d/ and /t/
• Velar stops
o /g/ and /k/
• Voiced consonants on the left of each list
• Voiced in voiceless cognate pairs (differ by one aspect—voicing)
• Least intense of all sound classes
Stop Consonants (Plosives):
• Plosive is another name for stop: burst of sound
• Aspiration: “puff of air” for /p/, /t/, /k/
o Occurs after the release of the stop
o Usually in the prevocalic position
o Voiceless stop phonemes only
o Looks like high-frequency noise on spectrograms
• Stop gap: precedes the release of a stop sound. It is a silent period that indicates the build-up of intra-oral pressure. Pressure in the mouth before release of stop consonant. Present in all stop consonants.
• Voiceless plosives are perceived to be louder than voiced plosives because of this acoustic energy of the aspiration.
3 Stages of Stops:
• Shutting: movement of articulator toward a stop closure
o Formant transition that goes into the stop
• Closure: articulators coming together to point of constriction
o Stop gap (before you release stop)
• Release: release of point of constriction
o Aspiration/noise burst (opens glottis allows breath stream to flow)
• Silent period in the closure phase in production in stop sound
o How long the vocal tract is closed before sound is released
o Can measure on spectrogram or waveform
• Low frequency energy band during the stop gap of voiced stops, reflecting vibration of vocal folds
• A dark bar that is shown at the low frequencies and it’s usually below 200 Hz • Only for voiced plosives /b, d, g/, which is a primary indicator of voicing in the spectrogram and all kinds of voiced sounds, including vowels, show this voicing bar at such low frequencies.
• /k/ and /g/ (velar sounds)
• place of articulation cue; closeness of F2 and F3 during velar stop production.
• occurs in both consonant-vowel and vowel-consonant productions at around 2,000 Hz. Doesn’t happen when consonant is in isolation; only happens when paired with a vowel (ga).
• Implications for hearing impaired individuals with losses over 1,000 Hz: they’re not going to be able to perceive the velar pinch. Won’t be able to perceive place of articulation (velar), so they might perceive it as an alveolar sound; could misperceive /v/ for /t/.
Voice Onset Time (VOT):
• Time differential between the release of the stop burst and the onset of the voicing of the vowel.
o In “pay”, /p/ isn’t voiced, so the onset of the voicing is the time from stop production to voiced vowel production. In “bay”, /b/ is already voiced, so the onset of voicing is the time before the consonant is produced.
• Salient cue in differentiating voiced from voiceless stop consonants in the initial syllable position.
• For example, VOT for /p/ in “pay” is 86 msec and /b/ in “bay” is 10 msec. • Therefore, a shorter VOT would indicate a stop consonant was voiced. Can have a negative VOT. (important!)
• Voicing voiceless consonants and not voicing voiced consonants
o People with dysarthria
o Kids with speech disorders
• Voiced stops have a voice onset time noticeably less than zero, a negative VOT, meaning the vocal cords start vibrating before the stop is released
• Voiceless unaspirated stops have a voice onset time at or near zero, meaning that the voicing of the following sonorant (such as a vowel) begins at or near to when the stop is released
• Voiceless aspirated stops have a voice onset time greater than unaspirated stops, called a positive VOT.
Reviewed velar pinch on slide 14 and reviewed the acoustic features on slide 12. • Slide 14: velar pinch refers to when F2 and F3 pinch together before the vowel. • Slide 12: Voice bar for /b/ and not /p/. The duration of /p/ is longer than /b/. Stop gap
is during the closing phase; illustrated by white area. Can really see stop gap in the prevocalic position (bottom 2).
• Slide 16, can see aspiration and the stop gap (white space in the middle). • Slide 17, can see the voice bar during the stop gap (white area).
• Only see aspiration in voiceless stops /p/, /t/, and /k/.
Voice Onset Time:
• Time differential between the release of the stop burst and the onset of the voicing in the vowel.
• Help differentiate between voiced and voiceless stop consonants in the initial syllable position
o Patients with hearing impairment may not be able to distinguish difference o Non-native speakers may have trouble distinguishing
• For example, /p/ in “pay” 86 msec, and /b/ in “bay” is 10 msec
• Shorter VOT would indicate a stop consonant was voiced.
o Slide 19, the top one has /b/ because it has a shorter VOT
• Voiced (unaspirated) stops: have a voice onset time noticeably less than zero, a negative VOT, meaning the vocal folds started vibrating before the stop is released. • Voiceless unaspirated stops (not released): have a voice onset time at or near zero, meaning that voicing of following vowel begin at or near to when the stop is released. • Voiceless aspirated stops: have a voice onset time greater than unaspirated stops, called a positive VOT.
• Slide 21: squiggly line is representing voicing. Plosive = stop. Use terms: shutting, closure, and release NOT closure, blockage, and release (like slide 21).
Formant Transitions in Stops:
• F1 transitions are always rising
o F1 in stops caused by constricted vocal tract—much more than in vowels o High tongue positon lowers F1 value
• F2 and F3 transitions signal place of articulation
o Bilabial: rising to the vowel
o Alveolar: lowering (except for front vowels)
o Velar: lowering to the vowel
• Slide 24: L side to R side – movement transitioning from stop consonant from front vowel to back vowel. Hook/tail at the end of formants, is the formant transition.
o White space on spectrogram
• Stop release
o Vertical line on spectrogram
o Noise burst on spectrogram (voiceless stops only)
• Formal transitions
o Going into the vowel
o Think about the burst of energy and the frequencies of the vowel that follow
Slide 27: darker intensity in F1 of /b/ than /p/. Aspiration at /p/, which identifies it as a voiceless consonant. They are stops because there is a stop release.
Post-vocalic Stop (on slide 27):
• Closure (stop gap and voice bar)
o After vowel
• Formant transitions into stop
o Go from vowel to stop
• Release burst (or unreleased consonant)
o “hop” is released, there is aspiration at the end.
o Slide 29: second one is released—there is aspiration at the end.
• Affricate means “blend”. In this case, the blend is that of a stop and a fricative to produce: tʃ (church) and dʒ (judge).
o One is voiced (dʒ) and one is voiceless (tʃ)
• Affricates are obstruents
• A stop that is released into a fricative: combo
• Distinguished acoustically from fricatives by shorter duration and faster amplitude rise time o Amount of time it takes to reach maximum amplitude
• Slide 32: affricate duration and rise time are shorter than fricatives.
• Affricates have aspiration, stops may or may not.
• Acoustic features of stops: phases, VOT, stop gap, velar pinch, aspiration, voice bar, formant transition
• F1 and F2 characteristics for stops
• Affricates: definition and spectrograms
• Important for intelligibility
• Syllable is the level or unit, of stress NOT the word
• Word stress is used to distinguish among noun/verb word pairs (lexical stress: stress that occurs within the word)
• Phrase or sentence stress (the “pointer” indicating the most important or new information. Ex. It was MY daughter who won the race.
How Do We Add Stress?
• More effort
• Higher fundamental frequency (greatest effect)
o A higher fundamental frequency means that the formants in the spectrogram will be higher up.
• Greater duration
• Greatest intensity
• Spectrogram example (slide 40)
o Record myself vs. play a record.
o The yellow line shows intensity
o The verb record (left) is stressed and has a longer duration
• Can convey information about
o Differences in meaning (ex. Statements have falling intonation, questions have rising intonation)
o Literal vs. sarcastic
• Similar to stress
How Do We Express Intonation?
• Rise-fall intonation curves (universal feature of language)
o Question that’s not a yes/no question
o Special emphasis
• End-of-utterance pitch rise (Ex. Is your name Beth?)
o Yes/no questions
o Incomplete sentence
• Clinical application: Some populations of people, like children with autism spectrum disorder, have difficulty or may have it with suprasegmentals with production and comprehension, which can lead to difficulty in understanding the differences in questions and statements.
• Certain sounds are longer than others (diphthongs and tense vowels are longer than lax vowels and continuant consonants are longer than stops). This is a hard and true fact about sound production.
• Vowels are longer before voiced consonants (leave vs leaf)
o “ea” is a longer vowel because “v” is voiced vs. “ea” in leaf because “f” is voiceless (important).
• Vowels are longer before continuants, compared to stops (leave vs. leap) o Consonants that follow the vowel are continuants.
• We lengthen syllables at the end of sentences
o Ex. “Yesterday was really hot” is longer than “It was really hot yesterday.” • Faster rate, articulators undershoot their targets
o An undershot target leads to decrease in speech intelligibility
o Kids who try to speak at a fast rate will sound mumbled and are undershooting their targets.
• Pauses: mark syntactic boundaries, increase listener’s sense of anticipation, indicate hesitations for utterance planning or word retrieval.
• Relates to syllable affiliations (which sounds belong to which syllables) o Ex. Peace talks vs. pea stalks
• Indicated by varying the degree of aspiration of a voiceless stop
o Degree of aspiration changes: more aspiration in pea stalks than peace talks. • Indicated by varying the duration of consonant closures
Review of Prosody:
• Areas of prosody/types
• How we achieve each
• Ex. What is syllable affiliation and how do we indicate it?
• Auditory ability: shapes speech perception
o Compensation for coarticulation
• Phonetic knowledge: shapes speech perception
o Categorical perception: stimulus continuum from /da/ to /ga/
o Phonetic coherence: duplex perception, McGurk effect
• Linguistic knowledge: shapes speech perception
o Slips of the ear
o Ganong effect
o Phonemic restoration
• What is speech perception?
o Human ability to seek and recognize patterns. Infants pay attention and learn something about voice and speech prior to birth
o Infant speech perception videos
▪ Head turn response. Watched a 5-minute video.
• Young babies discriminate sounds from any of the world’s
languages that adults have difficulty hearing.
• 6-8 month olds about to discriminate between /ba/ and the two
different types of da (English and Hindi). 10-12 month olds can
tell the difference between /ba/ and /da/, but not /da/ and a
similar Hindi sound.
▪ 3 procedures. Watched a 12-minute video.
• High amplitude sucking procedure
o Baby sucks on the pacifier until hearing a sound that is
different, then stops sucking to listen. When the sound is
habituated, the baby starts sucking again.
o By 5-6 months, babies learn the particulars of their
language and lose the ability to recognize similar sounds
in other languages.
• Head-turn preference behavior
o Records amount of time babies attend to a stimulus.
o When a light comes on and a sound is played, the baby
looks over and stops looking when they are bored.
o As old as 24 months, infants are sensitive to more
complex speech components that they don’t yet
• Preferential looking procedure
o Looking to see if infants can understand the association
between a word and object.
Normal Phonological Development:
• Categorical perception: we perceive speech sounds according to phonemic categories of native language
• Discrimination of non-native sounds: infants up to 6-8 months can discriminate among non-native sounds that are similar.
• Perceptual consistency: for vowels and consonants children from 5.5-10 months have ability to identify the same sound across different speakers, pitches, etc. • Longitudinal study of children at 6 months and then at 13, 16 and 24 moths showed that early perceptual abilities appear to be related to later language development o Therefore, phonetic perception may play an important role in language acquisition
Inability to perceive subtle differences between sound was thought to be the cause of speech sound disorders.
Theories of Speech Perception:
• Motor theories: based on the connection between speech perception and speech production. Ability to hear and produce speech sounds.
• Auditory theories: based on view that speech perception is primarily auditory and emphasized the sensory, filtering mechanisms of listeners. Stronger focus on auditory piece, don’t generally talk about production.
• Wernicke implicated the left tempopariatal area as key to speech recognition and linguistic expression (Wernicke’s area).
o Patients with damage in the area speak fluency, but conversation doesn’t make sense; have difficulty recognizing the meaning of words.
• Right ear advantage: normal subjects make fewer sound identification mistakes in the right ear (left side of brain) when sounds are simultaneously presented in both ears.
• 2 examples of the auditory system constraining speech perception. We identify patterns based on:
o Voice onset time
▪ Measure of the delay in voicing onset following a stop release burst
▪ We distinguish between aspirated and unaspirated stops with a 30 ms voice onset time boundary. If it gets below 30 ms, we can’t distinguish.
▪ Ex. /pa/ = 86 ms VOT, /ba/ = 10 ms VOT
o Compensation for co-articulation
▪ Our perception of place depends on the preceding phonetic context. ▪ When listener hears sounds on a stimulus continuum from [da] to [ga] in 5 steps. Subjects were presented sounds on a continuum between [da]
and [ga]. Listened to sound progressively moving from [da] to [ga] and
found that we make our judgment (which one we hear) based on what
Phonetic knowledge shapes speech perception:
• Categorical perception: we tend to perceive speech categorically rather than continuously. We’re going to decide which sound we hear, we’re not going to say it’s between two.
o When people listen to sounds on a stimulus continuum, their response is categorical—people usually call first 3 [da] and last 2 [ga].
o Ability is predictable from labels we use to identify members of continuum o There are both identification tasks and discrimination tasks
▪ Identification: listen and identify what the sound is
▪ Discrimination: which one was the /r/ sound?
▪ Watched a short video on categorical perception.
• Phonetic coherence: we can experience phonetic coherence with acoustic components that should be incoherent (duplex perception and McGurk effect)
o Duplex perception: F3 is integral in determining if we hear [da] or [ga] o Base signal presented to right ear ([da] or [ga] sound missing F3)
o “Chirp” noises presented to left ear (80 ms glide with typical F3)
o identification of the syllable as [da] or [ga] is determined by the “chirp”; the “chirp” influences phonetic perception of the base.
o McGurk Effect
▪ Perceptual illusion that only goes away when you close your eyes; shows that we combine info from our ears and eyes to judge what is being said. ▪ Watched a 3-minute video on this effect.
• If we hear someone saying /ba/ but their lips look like they’re
producing an /f/, we think they are saying /fa/.
Linguistic Knowledge Shapes Speech Perception:
• Slips of the ear
o Errors of misperception in listening, like misheard lyrics
o Occur when the listener mistakes a word or phrase for a similar-sounding word or phrase in conversation (ex. Coke and a Danish vs. coconut Danish)
o When listeners misplace word boundaries, they tend to insert them between a weak syllable and a strong similar (ex. Acute back pain vs. a cute back pain). • Ganong effect
o Phoneme category boundaries can shift depending on lexical expectations o If you play listeners a series of stimuli with a word at one end of the continuum and a non-word at the other. The people want to hear the word. Even if you play 5 non-words and 1 word, listeners will say there were more real words.
o You get more real word responses; words are like perceptual magnets o The lexical effect is even stronger when the sound to be identified is at the end of the word.
▪ Ex. dad vs. zad. Zad is not a real word, so people will say they heard the word “dad” more often, but the difference is at the beginning of the
word, so people will correctly identify them more than if the words were baf and bad. Baf isn’t a word, but the difference is at the end
• Phonemic restoration
o Listeners don’t notice that the [s] is missing in the noise-replaced version of “legislation”
o The brain uses lexical knowledge to fill in the missing phoneme
▪ People hear the same word even if there is a phoneme missing.
Watched a video on recent research in perception and intelligibility.
• How can this type of study help us clinically as SLPs?
o Speech intelligibility has a lot to do with the listener. Consider what the listener is perceiving. There’s gray area between children with dysarthria and typically developing kids. Monitor group of kids in gray area to see if there is something going on with their speech that would warrant speech therapy. • What did she find out about within-listener variability?
o There is more variability within listeners than between listeners. The listeners weren’t consistent from one trial to another.
• How did she measure speech intelligibility?
o 5 listeners listened to each kids with dysarthria and typically developing kids and wrote down what they thought they said.
• Kids with intelligibility between 75-85% were in which group?
o Gray area: some dysarthric kids and some typically developing kids.
Review of Speech Perception:
• Speech perception is shaped by:
o General properties of the auditory system (VOT and compensation for co articulation)
o Our phonetic knowledge (categorical perception and phonetic coherence) o Our lexical knowledge (slips of the ear, Ganong, phonemic restoration) o Why do we care?
• How might acoustics be applied clinically?
• Watched a video of a young adult with a high pitch voiced that was strained. After therapy, his voice was much deeper and sounded less strained.
• Acoustic measures: could generate spectrograms on Praat to show his production and get a baseline/show progress.