Log in to StudySoup
Get Full Access to UO - CSD 443 - Study Guide
Join StudySoup
Get Full Access to UO - CSD 443 - Study Guide

Already have an account? Login here
Reset your password

UO / Communication Science and Disorders / CSD 443 / What is the difference between vowels and consonants?

What is the difference between vowels and consonants?

What is the difference between vowels and consonants?


School: University of Oregon
Department: Communication Science and Disorders
Course: Acoustics of Speech
Professor: Jill potratz
Term: Winter 2017
Tags: CDS443, Acoustics, speech pathology, and general and speech acoustics
Cost: 50
Name: CDS 443 Final Exam Study Guide
Description: This study guide includes all of the material lectured on after Exam 2.
Uploaded: 03/19/2017
18 Pages 128 Views 1 Unlocks


What is the difference between vowels and consonants?

CDS 443


Consonant Sound Production: 

• Obstruents: fricatives, stops and affricates. Manner of sounds produced. Sound source is  turbulence and is all about point of constriction.  

• Sonorants: are most like vowels. Include approximants (semi-vowels) and nasals.  Resonance through entire vocal tract, little constriction through vocal tract and  turbulence through the oral cavity.  

Vowels vs. Consonants: 

• Formants tell story for vowel and diphthongs; stable, gradually changing pattern. • Consonants, the picture changes due to constriction in oral region. There is acoustic  discontinuity that interrupt the smooth formant patterns.  

What are the three aspects of consonant production?

3 Aspects of Consonant Production: 

• Voicing: source of the energy (can come from fricative noise or stop noise) o Voicing refers to if the vocal folds are vibrating. A consonant where the vocal  folds don’t vibrate is not voiced, but may have a fricative noise because of  constriction (t). If a consonant is voiced, the vocal folds vibrate and it may have a  fricative noise; it has both (d).  If you want to learn more check out A data point on a graph or in a set of results whose value is much larger or smaller than the next nearest data point is known as what?

• Manner: stop, affricate, fricative. How is it produced.  

• Place: where does the constriction occur? (bilabial, alveolar)

Acoustic Cues: 

• Vertical striations on spectrogram for voicing

• Nasals (manner) tend to be weaker than surrounding vowels

What is the meaning of voicing in consonant?

• Formant transitions: indicative of place. Rapid bend or change in formant structure.  Exact form is dependent on both the vowel and consonant involved.  If you want to learn more check out What is defined as materials found in the environment or nature that are used by humans?

Watched a 6-minute video on manner, place and voicing.

Place of articulation:

Bilabial: between the lips

Labiodental: between the lip and teeth

Interdental: tongue between lower teeth

Alveolar: ridge above upper teeth

Palatal: roof (hard palate)

Velar: velum (soft palate)

Glottal: glottis (space between vocal folds) We also discuss several other topics like What is the art of persuasion through words?

Manner of articulation:

Stops: constriction and release of air

Fricatives: Approaches but doesn’t hit an articulator; bottleneck of airflow Affricates: stop plus fricative  If you want to learn more check out What is the meaning of appreciative leadership?

Nasal: velum in lowered; air passes through nasal cavity

Liquid: air flows through one or both sides of tongue  

Glides: little constriction of air flow

Tap: rapid flick of tongue to some place of articulation

Order: voice, place, manner (ex. voiced alveolar fricative)


• Nasals, glides and liquids (manner) Don't forget about the age old question of What is meant by the geothermal gradient, and where on earth do we find a "steep" geothermal gradient where heat flow is high to the surface?

• Little to no constriction in oral cavity

• Produced with continuous voicing

• Acoustic characteristics are similar to vowels  

1. Approximants (semi-vowels)

a. Divided into 2 groups (glides and liquids)

b. Behave as both vowels and consonants

c. Never form nucleus of a syllable, so cannot be classified as a vowel d. They are all voiced  

2. Nasals


• /j/ and /w/ (/j/ sounds like y in yes).  

• Gliding onto other sounds

• Never occur in the final positon of the word (spelling vs. sound) • Always prevocalic (before the vowel)

• /y/ similar to /i/ has a high F2

• /w/ similar to /u/ low F2

• Spectrogram: “say Wes, say yes” (slide 13)

o Darker lines are the vowels and glides

o “s” are lighter and higher in frequency We also discuss several other topics like What is the main definition of operant conditioning?

o Glide and vowels are really close together on spectrogram

o /w/ low F2 and /j/ high F2


• /r/ and /l/

• /r/ can release a vowel (ran, wright) or can terminate a vowel (for, cure) o one of the later developing and frequently errored in kids

• /l/ can be classified as a liquid or lateral (both manner)

o kids also have a lot of errors with

• Are harder to produce; more distinct articulation than glides (more placement  distinction)

• Rapid articulator movements

• Distinguished from each other (place) by F3

o Anti-formants are the same as anti-resonance: frequency regions where  amplitude of source is severely attenuated (goes down). See slide 16.

How is /l/ Produced? 

• Side branch on top of the tongue (lateral):

o Another area sound is coming out (sound is resonating on top of tongue)  o Anti-formants

o Most energy in lower frequencies of sound

• Relatively high F2 and F3

o Appear evenly spaced on spectrogram  

• Steady state prior to formant transitions: there isn’t a lot of movement  

Comparing /l/ To Other Sounds: 

• /l/ vs. /t/ place is the same, but manner and voicing is different.

• /l/ vs. /n/ place is about the same, both voiced and manner is different. • What sounds do kids typically substitute for /l/?  

o /w/ or /y/

• What about end of word?

o Vowel  

/r/ (Rabbit): 

• Retroflex: tongue tip is up, tongue tense, and back or… tongue is bunched with the  middle portion raised

• Narrowing at palatal region makes F3 low (place= palatal)

o F3 is the lowest for this sound more than any other sound. Close to F2.  • No anti-formants  

• Rake: silence period before the K on the spectrogram

• Retroflex vs. bunched

o Example on slide 22  

• Manner, place, and voiced

o Manner: liquid

o Place: palatal

o Voiced: yes

• Common mistakes

o /r/ in initial position -> /w/

o /r/ in final position -> [o]

Clinical Application: 

• Acoustic measures -> feedback of difference vs. accurate and inaccurate /r/ productions • Baseline for /r/ F3 and then post treatment production of /r/ to show difference


• Only 3 of them

o /m/, /n/, /ŋ/

• Defining a nasal

o Nasal airflow

o Place of obstruction in the oral cavity

o Sonorants

o Voiced  

• Manner: nasals 

• Voicing: all voiced 

• Place: /m/ is bilabial. /n/ is alveolar, and /ŋ/ is velar 

• Changes in oral cavity size changes resonance in the vocal tract

• Velum is lowered for nasals, but up for all other consonants  

Acoustics of Nasal sounds: 

• Sounds resonates in pharyngeal cavity, dead-end of the closed oral cavity, and spacious  chambers of nasal cavities

• Addition of nasal branches to vocal tract creates larger, longer resonator- frequency is  lower. (Helmholtz resonator).  

• Pipe model: red bars are “closed-ends.” Nasal cavity is open end. Slide 33.


• Decrease in amplitude

• Compared to vowels, nasals are more damp

• Soft vocal tract walls absorb acoustic energy

• Nasals look lighter on spectrograms. Darkness refers to amplitude and nasals are less  intense, so they are lighter.  

• All sounds are damped, but nasals are more because there is more area taken up and  tissue to absorbed.  

• Light gray on slide 41


• Negative resonance when velum is lowered

• Decrease in intensity of nasal and vowel formants

• Lowest frequency for /m/ (long oral tract side branch)

o It has the largest amount of space  

• Intermediate for /n/

o Some space  

• Highest in /ŋ/ 

o Because short oral tract side branch

• White spaces are anti-formants slide 41

Nasal Consonants: 

• Formants produced by both the nasal and oral cavity

• Side branch result in anti-formants (negative resonance)

• Anti-formants: arise from changes in oral cavity when it comes a side branch of vocal  tract; decrease in intensity.

Nasal Murmur: 

• Caused because of acoustic energy that radiates outward from the nasal cavity during  production of nasals.

o As a result, we get nasal formant (first formant)

• Characterized by series of formants, the first of which is the nasal formant (easily seen  between 250-300 Hz). Relatively low frequency.  

• Slide 41, nasal formant is the dark line at bottom of nasals in spectrogram

Speech Disorders: 

• Hypernasality: too much nasal resonance (for non-nasal sounds)

o Flaccid dysarthria: decreased innervation of velum, so they can’t raise the velum  up enough

o Cleft palate

• Hyponasality: too little nasal resonance for nasals

o Degenerative disorders of the nervous system (muscle weakness or poor timing)


• Damping: sound is not at loud; all speech sounds are damped to some extent because  they are traveling through tissue and cavities

• Anti-formants: show up as white spaces on spectrograms

• Don’t need to know 3rd bullet about formant bandwidth, we won’t talk about it • Can vowels become nasalized? Yes!

o Context: Co-articulation, assimilation

o Languages and dialects may have more nasalized vowels

Slide 44: 1st one “ban” (nasal formant), 2nd one “bash” (high frequency), 3rd “bat” (stop gap).


• Where and how the air is obstructed

• 3 types

o fricatives

▪ sibilant

▪ non-sibilant

o stops

▪ voiced and voiceless

o affricates  

▪ combo of stop and fricative


• high in frequency (pitch)

• Narrow constriction: turbulence at site makes it a fricative

• shorter in duration than vowels

• random, continuous noise pattern in higher frequency regions

• Know the IPA symbols for the nine fricatives  

• Place of articulation

o Lingua dental: tongue between teeth  

o Labiodental: lib between teeth

o Alveolar: tongue is touching alveolar ridge (behind teeth)

o Palatal: tongue touches hard palate

o Glottal: sound is constricted in the glottis

9 Fricatives Broken Down: 

• Sibilants: /s/, /z/, /ʃ/, /ʒ/ 

• Non-sibilants: /f/, /v/, “th” voiced /ð/ and unvoiced /θ/ 

• Aspirate/glottal fricative /h/ 

Acoustic Properties of Fricatives: 

• Produced with narrow constriction of the vocal tract

• Voiceless fricatives

o Aperiodic source (sound doesn’t come from vibrated vocal folds)

▪ Turbulent airflow  

• Passing through a narrow constriction at a high rate  

• Air pressure fluctuating randomly “noise”

• For speech-concentrated at certain frequency ranges

▪ The filter  

• Most important cavity for resonance is the one anterior to the  

constriction (ex. /s/ is a higher frequency than /ʃ/ because /s/ is  

closer to the front so the area anterior to the sound is smaller).  

• If small- high frequencies are amplified, if large- low frequencies  

amplified (all relative)

• Voiced fricatives

o Aperiodic and periodic source  

Slide 12 shows 5 different spectrograms with different voiceless consonants. The /f/ and /th/  are higher in frequency because they are more fronted in the mouth (smaller oral cavity =  higher frequency), the /s/ and /sh/ are darker meaning they have a higher amplitude.  


• Most intense of all fricatives

• Strident: all sibilants are strident sound, but not all strident sounds are sibilants.  o Strident sounds have noise of relatively high intensity  

• Stridency deletion: either the omission or substitution of another sound for a strident

o Ear infections can cause children to omit stridents because they can’t hear it in  other people’s speech because they have high frequency hearing loss.

• Turbulence  

o Caused by narrow constriction at hard palate or alveolar ridge

o Air also hits upper and lower teeth

o If constriction is inappropriate, or if front teeth are missing, the acoustics of the  turbulence will be affected.  

• The four sibilants /s/, /z/, /ʃ/, /ʒ/ 

o Perceived as being louder than non-sibilants (most intense sounds) o Alveolar fricatives /s/ and /z/

▪ Cavity in front of constriction is much smaller  

▪ Frequencies above 4,000 Hz emphasized (remember)

o Palatal fricatives /ʃ/, /ʒ/ 

▪ Large cavity in front of constriction 

▪ Lips are usually rounded (lengthens vocal tract = lower frequency) ▪ Frequencies above 2,000 Hz emphasized (larger cavity = lower frequency)

/s/ vs. /sh/: 

• /sh/ or /ʃ/ has a greater intensity 

• /ʃ/ has palatal placement, /s/ has alveolar placement 

• /s/ and /ʃ/ have comparatively large acoustic energy and so produce darker patterns  than /f/ and θ 

• Primary spectral energy lower for /ʃ/ than /s/; noise in /s/ has the highest frequencies 

Voices vs. Voiceless Fricatives: 

• Voiceless fricatives will appear darker on spectrograms than voiced fricatives due to  greater intensity (greater acoustic energy)

• Spectrograms of voiced fricatives may have vertical striations throughout the period of  noise.  

• Vowels preceding voiced consonants are longer than those preceding voiceless  consonants. (important!)

Clinical Applications For /s/: 

• Spectrograms can be used to give visual feedback for lisping  

o /s/ has a higher overall amplitude than “th” (non-strident fricative)

o /s/ has more amplitude at higher frequencies than /ʃ/ 

Non-sibilants or Non-stridents: 

• Least intense fricatives  

• /f/ (voiceless), /v/ (voiced), /θ/ (voiceless), /ð/ (voiced) 

o lower amplitude of vibration (quieter)

o noise source at point of constriction (labiodental (f,  v) labiodental (/θ/, /ð/)) • Cavity anterior to constriction is very small (basically non-existent)

o Very high frequency resonances

o No real impact on spectrum of sound

• Voiceless “th” /θ/ is the least intense of all the English phonemes (Small text said /h/  and /θ/ are about the same intensity) 

Resonance Frequencies of non-sibilants: 

• Primarily determined by size of cavity in front of constriction

• Spectrogram on slide 30: /f/ and /θ/ have high frequencies compared to the other  consonants. 

• Spectrogram on slide 31: /v/ and /ð/ not seeing much sound in the higher  frequencies. 

Glottal Fricative /h/: 

• Voiceless: have contraction, but no vocal fold vibration

• Vocal folds are approximated to produce turbulence

• Largest cavity in front of constriction- lowest frequency spectrum  

• Spectrogram: no fundamental frequency. Most energy at lower frequencies. • Main concept: Point of constriction—how it affects resonant frequency. How big the  cavity is anterior to the point of constriction determines resonant frequency.

Hearing Loss: 

• Difficulty perceiving fricatives because of high frequency spectra

• Non-sibilant fricatives lack intensity  

• Difficulty discerning between minimal pairs like “thin/fin” and “elf/else” o Can use mouth position as visual cues to discern which one someone is saying.


• What defines fricatives?  

o Point of constriction where turbulence is happening.  

o Obstruent

• Stridents vs. non-stridents  

o Stridents higher frequency more intense

o Which consonants fit under each type

• Intensity of fricatives

• Significance of point of constriction; resonating cavity

• Hearing impairment in high frequencies  

Stop Consonants: 

• Bilabial stops

o /b/ and /p/

• Lingua-alveolar stops

o /d/ and /t/  

• Velar stops

o /g/ and /k/  

• Voiced consonants on the left of each list

• Voiced in voiceless cognate pairs (differ by one aspect—voicing)

• Least intense of all sound classes

Stop Consonants (Plosives): 

• Plosive is another name for stop: burst of sound

• Aspiration: “puff of air” for /p/, /t/, /k/

o Occurs after the release of the stop

o Usually in the prevocalic position

o Voiceless stop phonemes only

o Looks like high-frequency noise on spectrograms

• Stop gap: precedes the release of a stop sound. It is a silent period that indicates the  build-up of intra-oral pressure. Pressure in the mouth before release of stop consonant.  Present in all stop consonants.  

• Voiceless plosives are perceived to be louder than voiced plosives because of this  acoustic energy of the aspiration.  

3 Stages of Stops: 

• Shutting: movement of articulator toward a stop closure

o Formant transition that goes into the stop

• Closure: articulators coming together to point of constriction

o Stop gap (before you release stop)

• Release: release of point of constriction

o Aspiration/noise burst (opens glottis allows breath stream to flow)

Stop gap: 

• Silent period in the closure phase in production in stop sound

o How long the vocal tract is closed before sound is released

o Can measure on spectrogram or waveform  

Voicing Bar: 

• Low frequency energy band during the stop gap of voiced stops, reflecting vibration of  vocal folds

• A dark bar that is shown at the low frequencies and it’s usually below 200 Hz • Only for voiced plosives /b, d, g/, which is a primary indicator of voicing in the  spectrogram and all kinds of voiced sounds, including vowels, show this voicing bar at  such low frequencies.  

Velar Pinch: 

• /k/ and /g/ (velar sounds)

• place of articulation cue; closeness of F2 and F3 during velar stop production.

• occurs in both consonant-vowel and vowel-consonant productions at around 2,000 Hz.  Doesn’t happen when consonant is in isolation; only happens when paired with a vowel  (ga).  

• Implications for hearing impaired individuals with losses over 1,000 Hz: they’re not  going to be able to perceive the velar pinch. Won’t be able to perceive place of  articulation (velar), so they might perceive it as an alveolar sound; could misperceive /v/  for /t/.

Voice Onset Time (VOT): 

• Time differential between the release of the stop burst and the onset of the voicing of  the vowel.  

o In “pay”, /p/ isn’t voiced, so the onset of the voicing is the time from stop  production to voiced vowel production. In “bay”, /b/ is already voiced, so the  onset of voicing is the time before the consonant is produced.  

• Salient cue in differentiating voiced from voiceless stop consonants in the initial syllable  position.

• For example, VOT for /p/ in “pay” is 86 msec and /b/ in “bay” is 10 msec.  • Therefore, a shorter VOT would indicate a stop consonant was voiced. Can have a  negative VOT. (important!)

• Voicing voiceless consonants and not voicing voiced consonants

o People with dysarthria

o Non-native  

o Kids with speech disorders

VOT Continued: 

• Voiced stops have a voice onset time noticeably less than zero, a negative VOT, meaning  the vocal cords start vibrating before the stop is released

• Voiceless unaspirated stops have a voice onset time at or near zero, meaning that the  voicing of the following sonorant (such as a vowel) begins at or near to when the stop is  released

• Voiceless aspirated stops have a voice onset time greater than unaspirated stops, called  a positive VOT.  

Reviewed velar pinch on slide 14 and reviewed the acoustic features on slide 12.  • Slide 14: velar pinch refers to when F2 and F3 pinch together before the vowel.  • Slide 12: Voice bar for /b/ and not /p/. The duration of /p/ is longer than /b/. Stop gap  

is during the closing phase; illustrated by white area. Can really see stop gap in the  prevocalic position (bottom 2).  

Inter-vocalic Stop: 

• Slide 16, can see aspiration and the stop gap (white space in the middle).  • Slide 17, can see the voice bar during the stop gap (white area).  

• Only see aspiration in voiceless stops /p/, /t/, and /k/.

Voice Onset Time: 

• Time differential between the release of the stop burst and the onset of the voicing in  the vowel.  

• Help differentiate between voiced and voiceless stop consonants in the initial syllable  position  

o Patients with hearing impairment may not be able to distinguish difference o Non-native speakers may have trouble distinguishing  

• For example, /p/ in “pay” 86 msec, and /b/ in “bay” is 10 msec

• Shorter VOT would indicate a stop consonant was voiced.  

o Slide 19, the top one has /b/ because it has a shorter VOT

• Voiced (unaspirated) stops: have a voice onset time noticeably less than zero, a negative  VOT, meaning the vocal folds started vibrating before the stop is released.  • Voiceless unaspirated stops (not released): have a voice onset time at or near zero,  meaning that voicing of following vowel begin at or near to when the stop is released.  • Voiceless aspirated stops: have a voice onset time greater than unaspirated stops, called  a positive VOT.  

• Slide 21: squiggly line is representing voicing. Plosive = stop. Use terms: shutting,  closure, and release NOT closure, blockage, and release (like slide 21).  

Formant Transitions in Stops: 

• F1 transitions are always rising

o F1 in stops caused by constricted vocal tract—much more than in vowels o High tongue positon lowers F1 value

• F2 and F3 transitions signal place of articulation

o Bilabial: rising to the vowel

o Alveolar: lowering (except for front vowels)

o Velar: lowering to the vowel

• Slide 24: L side to R side – movement transitioning from stop consonant from front  vowel to back vowel. Hook/tail at the end of formants, is the formant transition.  

Prevocalic Stops: 

• Closure

o White space on spectrogram

• Stop release  

o Vertical line on spectrogram

• Aspiration  

o Noise burst on spectrogram (voiceless stops only)

• Formal transitions

o Going into the vowel

o Think about the burst of energy and the frequencies of the vowel that follow

Slide 27: darker intensity in F1 of /b/ than /p/. Aspiration at /p/, which identifies it as a voiceless  consonant. They are stops because there is a stop release.

Post-vocalic Stop (on slide 27): 

• Closure (stop gap and voice bar)

o After vowel

• Formant transitions into stop

o Go from vowel to stop  

• Release burst (or unreleased consonant)

o “hop” is released, there is aspiration at the end.  

o Slide 29: second one is released—there is aspiration at the end.


• Affricate means “blend”. In this case, the blend is that of a stop and a fricative to  produce: tʃ (church) and dʒ (judge).  

o One is voiced (dʒ) and one is voiceless (tʃ) 

• Affricates are obstruents 

• A stop that is released into a fricative: combo 

• Distinguished acoustically from fricatives by shorter duration and faster amplitude rise time  o Amount of time it takes to reach maximum amplitude 

• Slide 32: affricate duration and rise time are shorter than fricatives. 

• Affricates have aspiration, stops may or may not. 


• Acoustic features of stops: phases, VOT, stop gap, velar pinch, aspiration, voice bar,  formant transition

• F1 and F2 characteristics for stops

• Affricates: definition and spectrograms

Prosody: (suprasegmentals) 

• Important for intelligibility

• Stress

• Intonation  

• Duration  

• Juncture  


• Syllable is the level or unit, of stress NOT the word

• Word stress is used to distinguish among noun/verb word pairs (lexical stress: stress  that occurs within the word)

• Phrase or sentence stress (the “pointer” indicating the most important or new  information. Ex. It was MY daughter who won the race.

How Do We Add Stress? 

• More effort  

• Higher fundamental frequency (greatest effect)

o A higher fundamental frequency means that the formants in the spectrogram  will be higher up.  

• Greater duration  

• Greatest intensity  

• Spectrogram example (slide 40)

o Record myself vs. play a record.  

o The yellow line shows intensity  

o The verb record (left) is stressed and has a longer duration  


• Can convey information about  

o Attitudes  

o Differences in meaning (ex. Statements have falling intonation, questions have  rising intonation)

o Literal vs. sarcastic

• Similar to stress

How Do We Express Intonation? 

• Rise-fall intonation curves (universal feature of language)

o Declarative

o Question that’s not a yes/no question  

o Special emphasis  

• End-of-utterance pitch rise (Ex. Is your name Beth?)

o Yes/no questions

o Incomplete sentence

• Clinical application: Some populations of people, like children with autism spectrum  disorder, have difficulty or may have it with suprasegmentals with production and  comprehension, which can lead to difficulty in understanding the differences in  questions and statements.  


• Certain sounds are longer than others (diphthongs and tense vowels are longer than lax  vowels and continuant consonants are longer than stops). This is a hard and true fact  about sound production.  

• Vowels are longer before voiced consonants (leave vs leaf)

o “ea” is a longer vowel because “v” is voiced vs. “ea” in leaf because “f” is  voiceless (important).  

• Vowels are longer before continuants, compared to stops (leave vs. leap) o Consonants that follow the vowel are continuants.  

• We lengthen syllables at the end of sentences

o Ex. “Yesterday was really hot” is longer than “It was really hot yesterday.” • Faster rate, articulators undershoot their targets

o An undershot target leads to decrease in speech intelligibility  

o Kids who try to speak at a fast rate will sound mumbled and are undershooting  their targets.  

• Pauses: mark syntactic boundaries, increase listener’s sense of anticipation, indicate  hesitations for utterance planning or word retrieval.  


• Relates to syllable affiliations (which sounds belong to which syllables) o Ex. Peace talks vs. pea stalks  

• Indicated by varying the degree of aspiration of a voiceless stop

o Degree of aspiration changes: more aspiration in pea stalks than peace talks.  • Indicated by varying the duration of consonant closures  

Review of Prosody: 

• Areas of prosody/types

• How we achieve each

• Ex. What is syllable affiliation and how do we indicate it?

Speech Perception: 

• Auditory ability: shapes speech perception  

o VOT  

o Compensation for coarticulation  

• Phonetic knowledge: shapes speech perception

o Categorical perception: stimulus continuum from /da/ to /ga/

o Phonetic coherence: duplex perception, McGurk effect  

• Linguistic knowledge: shapes speech perception  

o Slips of the ear

o Ganong effect

o Phonemic restoration  

• What is speech perception?

o Human ability to seek and recognize patterns. Infants pay attention and learn  something about voice and speech prior to birth

o Infant speech perception videos

▪ Head turn response. Watched a 5-minute video.  

• Young babies discriminate sounds from any of the world’s  

languages that adults have difficulty hearing.  

• 6-8 month olds about to discriminate between /ba/ and the two  

different types of da (English and Hindi). 10-12 month olds can  

tell the difference between /ba/ and /da/, but not /da/ and a  

similar Hindi sound.  

▪ 3 procedures. Watched a 12-minute video.

• High amplitude sucking procedure

o Baby sucks on the pacifier until hearing a sound that is  

different, then stops sucking to listen. When the sound is  

habituated, the baby starts sucking again.  

o By 5-6 months, babies learn the particulars of their  

language and lose the ability to recognize similar sounds  

in other languages.  

• Head-turn preference behavior

o Records amount of time babies attend to a stimulus.  

o When a light comes on and a sound is played, the baby  

looks over and stops looking when they are bored.  

o As old as 24 months, infants are sensitive to more  

complex speech components that they don’t yet  


• Preferential looking procedure

o Looking to see if infants can understand the association  

between a word and object.  

Normal Phonological Development: 

• Categorical perception: we perceive speech sounds according to phonemic categories of  native language  

• Discrimination of non-native sounds: infants up to 6-8 months can discriminate among  non-native sounds that are similar.  

• Perceptual consistency: for vowels and consonants children from 5.5-10 months have  ability to identify the same sound across different speakers, pitches, etc.  • Longitudinal study of children at 6 months and then at 13, 16 and 24 moths showed that  early perceptual abilities appear to be related to later language development  o Therefore, phonetic perception may play an important role in language  acquisition  

Inability to perceive subtle differences between sound was thought to be the cause of speech  sound disorders.  

Theories of Speech Perception: 

• Motor theories: based on the connection between speech perception and speech  production. Ability to hear and produce speech sounds.  

• Auditory theories: based on view that speech perception is primarily auditory and  emphasized the sensory, filtering mechanisms of listeners. Stronger focus on auditory  piece, don’t generally talk about production.  


• Wernicke implicated the left tempopariatal area as key to speech recognition and  linguistic expression (Wernicke’s area).

o Patients with damage in the area speak fluency, but conversation doesn’t make  sense; have difficulty recognizing the meaning of words.

• Right ear advantage: normal subjects make fewer sound identification mistakes in the  right ear (left side of brain) when sounds are simultaneously presented in both ears.  

Auditory Ability: 

• 2 examples of the auditory system constraining speech perception. We identify patterns  based on:

o Voice onset time

▪ Measure of the delay in voicing onset following a stop release burst  

▪ We distinguish between aspirated and unaspirated stops with a 30 ms  voice onset time boundary. If it gets below 30 ms, we can’t distinguish.  

▪ Ex. /pa/ = 86 ms VOT, /ba/ = 10 ms VOT

o Compensation for co-articulation

▪ Our perception of place depends on the preceding phonetic context.  ▪ When listener hears sounds on a stimulus continuum from [da] to [ga] in  5 steps. Subjects were presented sounds on a continuum between [da]  

and [ga]. Listened to sound progressively moving from [da] to [ga] and  

found that we make our judgment (which one we hear) based on what  

precedes it.  

Phonetic knowledge shapes speech perception: 

• Categorical perception: we tend to perceive speech categorically rather than  continuously. We’re going to decide which sound we hear, we’re not going to say it’s  between two.  

o When people listen to sounds on a stimulus continuum, their response is  categorical—people usually call first 3 [da] and last 2 [ga].  

o Ability is predictable from labels we use to identify members of continuum o There are both identification tasks and discrimination tasks

▪ Identification: listen and identify what the sound is

▪ Discrimination: which one was the /r/ sound?

▪ Watched a short video on categorical perception.  

• Phonetic coherence: we can experience phonetic coherence with acoustic components  that should be incoherent (duplex perception and McGurk effect)

o Duplex perception: F3 is integral in determining if we hear [da] or [ga] o Base signal presented to right ear ([da] or [ga] sound missing F3)

o “Chirp” noises presented to left ear (80 ms glide with typical F3)

o identification of the syllable as [da] or [ga] is determined by the “chirp”; the  “chirp” influences phonetic perception of the base.  

o McGurk Effect

▪ Perceptual illusion that only goes away when you close your eyes; shows  that we combine info from our ears and eyes to judge what is being said. ▪ Watched a 3-minute video on this effect.

• If we hear someone saying /ba/ but their lips look like they’re  

producing an /f/, we think they are saying /fa/.  

Linguistic Knowledge Shapes Speech Perception: 

• Slips of the ear

o Errors of misperception in listening, like misheard lyrics

o Occur when the listener mistakes a word or phrase for a similar-sounding word  or phrase in conversation (ex. Coke and a Danish vs. coconut Danish)

o When listeners misplace word boundaries, they tend to insert them between a  weak syllable and a strong similar (ex. Acute back pain vs. a cute back pain).  • Ganong effect

o Phoneme category boundaries can shift depending on lexical expectations o If you play listeners a series of stimuli with a word at one end of the continuum  and a non-word at the other. The people want to hear the word. Even if you play  5 non-words and 1 word, listeners will say there were more real words.  

o You get more real word responses; words are like perceptual magnets o The lexical effect is even stronger when the sound to be identified is at the end  of the word.  

▪ Ex. dad vs. zad. Zad is not a real word, so people will say they heard the  word “dad” more often, but the difference is at the beginning of the  

word, so people will correctly identify them more than if the words were  baf and bad. Baf isn’t a word, but the difference is at the end  

▪ Ex.  

• Phonemic restoration

o Listeners don’t notice that the [s] is missing in the noise-replaced version of  “legislation”

o The brain uses lexical knowledge to fill in the missing phoneme

▪ People hear the same word even if there is a phoneme missing.  

Watched a video on recent research in perception and intelligibility.  


• How can this type of study help us clinically as SLPs?

o Speech intelligibility has a lot to do with the listener. Consider what the  listener is perceiving. There’s gray area between children with dysarthria and  typically developing kids. Monitor group of kids in gray area to see if there is  something going on with their speech that would warrant speech therapy.  • What did she find out about within-listener variability?

o There is more variability within listeners than between listeners. The listeners  weren’t consistent from one trial to another.  

• How did she measure speech intelligibility?

o 5 listeners listened to each kids with dysarthria and typically developing kids  and wrote down what they thought they said.  

• Kids with intelligibility between 75-85% were in which group?

o Gray area: some dysarthric kids and some typically developing kids.

Review of Speech Perception: 

• Speech perception is shaped by:

o General properties of the auditory system (VOT and compensation for co articulation)

o Our phonetic knowledge (categorical perception and phonetic coherence) o Our lexical knowledge (slips of the ear, Ganong, phonemic restoration) o Why do we care?

Clinical Applications: 

• How might acoustics be applied clinically?

• Watched a video of a young adult with a high pitch voiced that was strained. After  therapy, his voice was much deeper and sounded less strained.  

• Acoustic measures: could generate spectrograms on Praat to show his production and  get a baseline/show progress.

Page Expired
It looks like your free minutes have expired! Lucky for you we have all the content you need, just sign up here