Speech Science II Perception
Speech Science II Perception SPA 3123
University of Central Florida
Popular in Course
Popular in OTHER
This 43 page Class Notes was uploaded by Vladimir Yost on Thursday October 22, 2015. The Class Notes belongs to SPA 3123 at University of Central Florida taught by John Ryalls in Fall. Since its upload, it has received 31 views. For similar materials see /class/227507/spa-3123-university-of-central-florida in OTHER at University of Central Florida.
Reviews for Speech Science II Perception
Report this Material
What is Karma?
Karma is the currency of StudySoup.
Date Created: 10/22/15
CH10 Parallel Distributed Processing Models BottomUp Versus TopDown Bottomup versus topdown 0 Review schematic of Linguistic levels 0 This is a somewhat more detailed map of the secondtomeaning process Simply remember What is it that arrives to the ears is at the bottom ofthe perceptionrecognitioncomprehension hierarchy then you won t mix up the order or the terms bottomup and topdown 0 Our ears are lower on our heads than our brain 0 Going from sound to meaning is basically what is being illustrated here 1 How does bottomup differ from topdown a Was traditional study of speech perception bottomup or topdown i bottomup in other words going from the acoustic signal to meaning or word recognition 0 Speech Fiance instead of Speech Science 0 Selection ofa f phoneme for a s phoneme o This is possible because both are voiceless fricative 0 They differ in the feature of place of articulation f is labiodental and s is alveolar 0 These two would be minimal pairs if ence were a word but it is not a word There is another word of English which closely resembles this word What word in English is close to ence Finance The speaker may have misselected the f sound instead ofthe s sound but correctly produced it Or is it possible that the speaker correctly selected the right phoneme but incorrectly produced the phoneme so that the place of articulation changed from alveolar to labiodental 0 Speakers errors typically are close to their intended targets usually they are only a single phonetic feature away 0 What was one of the piece ofthe pieces of evidence for topdown processes that was discussed in this chapter 0 The fact that listener s sometimes llin missing phonemes phonemic restoration That whether a sequence of phonemes is a real word or not In other words using phonotactics or phonology to decide whether or not the utterance is actually a word of the language or not So in the example of the speech error of ence for science does this error VIOLATE the phonotactic constraints of the English language 0 No it does not 0 What is pointed out about the computer as a model forthe brain metaphor 0 That with parallel networks the opposite influence has occurred That is the brain has served as a metaphor forthe computer What is the main difference about parallel processing 0 One doesn t have to wait for the results of one step to proceed onto the next one 0 Speech also seemsto carry information in parallel Remember that the acoustic information specifying one phoneme is not complete before the next one begins Almost every portion ofthe acoustic signal is carrying information about more than one phoneme Priming is the name given for the fact that seeing a word related to the target words speeds up the recognition response time o If you are shown the word cat on a screen then you can recognize the word dog response 0 This is an example ofSEMANTlC PRIMING o gat primes cat and that is a phonological priming Something interesting is that gat also primes dog This appears to be a combination of phonological and semantic priming MarslenWilson and colleagues have shown that listeners are able to shadow speakers with very short latencies about 250 ms which is about 1 syllable or even less in length If MarslenWilson and colleagues are correct what does this mean for speech perception In other words how would data from experiments which showed this to be true provide strong evidence that topdown effect are real o This will mean that in some cases listeners can recognize and begin producing a word before the last syllable has even been heard 0 In fact Marslen has claimed in the cohortmodel that words are recognized just as soon as all other possible candidates have been eliminated It will take a subject longerto recognize a word when there are a lot of similar sounding words at least in the beginning portion and less time when there are few similarsounding words What is the clinical impact of topdown influences in speech perception o It helps explain why children with hearing loss don t immediately catch up with their hearing peers once their audition has been augmented by a hearing aid Understanding topdown influences may help us develop better strategies for speechlanguage therapy Where are top down effects stronger at the beginnings of words or near the ends of words Why at the end of words 0 Because at the beginning of words listeners are dependent on the acoustic signal as more and more candidates are eliminated and sufficient sound information has been gathered the listener can turn attention to what word makes sense in a particular context Chapter 1 August 25 2014 Speech is faster than any other code system Speech is produced with the goal of perception on the part of the listener We speak not just to hear ourselves but to relay a speci c message According to Liberman and colleagues speech is about 10 times faster than any other code system such as Morse Code Morse coders learn to transmit very quickly but the best coders are still only about 110 as rapid as speech transmission Speech can even be speeded up considerably and even though the pitch of the voice goes up the speech remains understandable at very fast rates Speech is both rapid and reliable The notion to support speech reliability is that it is about 50 redundant which means about 50 of the information can be obtained in some other form such as context 0 An example of redundancy in written text is the shorthand notes that leave out A A J I many vowels letters u tuuxtsmluln wimmsug We can t always delete all the vowels for a word y0u We understand spoken words through both their sounds and their meaning Speech has an inherently ephemeral shortlived nature This means that speech sounds only remain suspended in the air for a few brief moments There is more emphasis on the production of speech than the perception of speech because we can physically see the production of speech through the moving of our lips and tongue and perception tends to be invisible and effortless Only evidence of speech perception we have is the confused look on people s face when they have not understood what was said to them There are easy sounds we can make but it is not easily perceived by human beings so they re not used in speech Speech is not made up of random sequences of sounds Perceptionnot only must the sound be easy to make but they must also be easily perceived or understood 0 This principle relates to another principleQuantal theory Some sounds are more acoustically stable than others 0 The way this principle is explained206 in their text The thesis of the quantaltheory is that certain relatively large changes in articulator position will cause little change in the acoustic signal while other relatively small changes in articulator placement will cause large changes in the acoustic signal The extent of the acoustic change appears to be related to the particular region of the vocal tract where the articulation is located In certain critical regions a slight adjustment of articulatory placement will cause a quantal change in sound There are sounds which can be produced with relatively imprecise articulation and will not affect the acoustic signal muchthese are Quanta sounds 0 Examples I Vowels i a u I Consonants ptk and bdg It is because they are Quantal they are sounds over and over again in various languages in the world Can be produced quickly without a great deal of articulatory precision What advantages do these sounds have 0 Speech can be produced at a faster rate What is the physical change 0 The angle between the oral and pharyngeal vocal tract Many people can speak and perceive speech who have not leamed to read their own language Speech production and speech perception are tightly linked togetherMotor theory of speech perception is based off of this Vowels Two main groupsclasses of speech sounds vowels and consonants Rhymes are based on the same or similar vowel sounds Tongue twisters involve sound sequences where the vowel is shared Vowels are more basic or more prime speech sounds than consonants In general vowel sounds are produced with an open vocal tract and consonants are produced with a closed or closing then opening vocal tract Fricatives are produced you ll remember with the vocal tract obstructed or occluded Stops are produced with a closing then an opening of the vocal tract Vowels are almost always produced with vocal folds vibrating voiced unless whispered In Whispered vowels the vocal folds still move but are not set into regular or full vibration Vibrating vocal folds provide the source for vowels which is then modified or filtered by the particular shape of the mouth Shape of mouth is largely determined by the movement of the tongue and lips O O 0 Speech in terms of sound source which is then modified by the vocal tract is called the source filter theory of speech production Sourcesound emitted from vibrating vocal folds Filtersound shaped by the supralaryngeal vocal tract I Supralaryngeal vocal tract refers to all of the throat above the vocal fold including the oral cavity 0 Synesthesia is the occurrence of associating vowel sounds with colors Seeing yellow for the ee sound Description of the vowel sounds of English 0 Two main relevant dimensions of tongue movement Up and Down and Front to Back 0 Vowels can be considered relatively slower to change or more steadystate in nature 0 Even though they are relatively brief events usually only several hundred milliseconds in duration they are much slower to change than consonants o Consonants on the other hand are much more rapid and quickly changing articulation and consequently the acoustic information that specifies them also changes very rapidly o Vowels that change from one vowel to the next are diphthongs bite boy or cow Consonants o Consonants are rapid changing 0 Classic description of speech sounds in terms of voice place manner 0 Consonants are impossible to produce without a vowel Description of Consonant Sounds Front Central Back i Resonants versus Occlusives 0 First major distinction in consonant type is between resonants and occlusive Occlusive consonants are produced by restricting 01 occluding the airstream as it makes its way up from the lungs through the vocal folds and out of the mouth 0 Resonant consonants are closer in nature to vowels because like vowels the O vocal tract is not obstructed or occluded Resonant Consonants 0 There are 2 types of resonant consonants Semivowels and nasals 0 There are 2 types of selnivowels liquids and glides o The glides are the sometime vowelsyandw o The liquids are r and l 0 During nasal sounds the soft palate or velum is opened and the airstream escapes out the nose since the mouth is closed 0 The place where the mouth is shut the tightestithe place of maximal closurei determines the particular sound of each of the three nasal consonants I Bilabialm I Alveolar nasalr I Velar1 Occlusive Consonants 0 There are 3 different types of occlusives Fricatives Stops Afflicates o The type of occlusive it is depends on the degree to which the airstream is obstructed o Occlusives can be produced with or without the vocal folds vibrating voiced or not voiced o Fricative consonants o The mouth is not shut and the airstream is not completely stopped o The airstream is directed through a narrowed space which causes it to have a frication noice I Alveolars z I Palatal f3 Sibilants refer to these subgroups of fricatives listed above because of the distinctive hissing sound it produces 0 There are two groups of nonsibilant fricatives I Labiodental f and v I Interdental 9 and 6 o Stops Plosives 0 Stop consonants are produced with the mouth completely closed at least for a brief moment 0 Referred to as stop because the airstream is completely stopped O Bilabial p and b I Alveolar t and d I Velum 1d and g I Glottal Affricates 0 Begin like a stop they start out with complete closure of the vocal tract like a stop consonant but then they are changed or released into a fricative o Considered as a combination of a stop and fricative Features 0 More than just descriptions of how consonant sounds are articulated they also have some psychological reality 0 Basically features are useful for explaining articulation and are also relevant for understanding speech perception Remember Vowels are more steadystate while consonants are quickly changing therefore both require somewhat different perceptual mechanisms Chapter 9 Dichotic Listening 0 Dichotic listening refers to any situation in which different sounds are presented to the two ears 0 lftwo different and competing acoustic signals are delivered each of the ears at the same time generally the right ear does a better of reporting verbal stimuli This happens because although both ears are connected to each side ofthe brain there are usually somewhat stronger contralateral connections ie right ear to the right hemisphere of the brain or left ear to the left hemisphere o The left hemisphere is more specialized for treatment of at least some forms of verbal stimuli thus subjects are usually more accurate at reporting what words they heard with their right ear than what they heard with their left The dichotic listening procedure was first developed by Broadbent who used spoken digits 0 It was then further developed by Kimura who also offered a neurological explanation The explanation for the rightear superiority on the digits test then was that the right ear had better connections with the left hemisphere than did the left ear and since the left hemisphere was the one in which speech sounds were presumably analyzed the rightear sounds had the advantage of having better access to these speech centres Each ear IS connected to each hemispheres It is NOT true that each ear is ONLY connected to the opposite hemispheres It is just that more fibers larger connections between the ear and the opposite hemisphere exist 0 There is also signi cant amount ofvariation from individual to individual The dichotic effect occurs only under somewhat special conditions The acoustic signals delivered to each ear must be in competition that is they must be of similar intensity and length and they must be delivered to each ear at the same time o The effect is not the same when one sound is louder than the other or if one is longer than the other These are the reasons why true dichotic listening conditions are rare or almost nonexistent outside the psychoacoustic laboratory The greater accuracy for stimuli presented to the right ear is known as right ear advantage The right ear has superiority of the left hemisphere for processing speech We know that the dichotic effect is due to the left hemisphere s specialization for speech because people usually become aphasic much more often as a result of neurological insult to the left hemisphere than as a result of damage to the right hemisphere There are also electrophysiological studies and techniques such as the PET scan that show greater activity associated with processing verbal stimuli in the left side ofthe brain than on the right side There are in fact independent sources of evidence to prove the superiority ofthe left hemisphere for processing verbal stimuli Consonant Vowel Differences In the earliest experiments using dichotic listening there was a clear and statistically signi cant advantage only for stop consonants in CV syllables This means similar to categorical perception there seems to be a difference between consonants and vowels for dichotic listening Ear effects for vowels are not clear but there are conditions under which right ear advantages are elicited for vowel stimuli o For example manipulations that increase the dif culty ofthe perceptual task seem to increase the right ear advantage for vowels If vowels are presented with noise or if the differences between the vowels are reduced the right ear advantage becomes stronger However overall results are not very clear for vowels The ear advantage for vowels is much less reliable than the one found for consonants Some studies have found a left ear advantage for vowel stimuli The right hemisphere is better at processing vowels Speech versus Music The right hemisphere seems to be somewhat betterthan the left for certain types of musical signals 0 Musical stimuli is better processed by the left ear because the right hemispheres seems to be better at processing these stimuli 0 Musical competence ofthe individual must also be taken into 0 consideration because if the person is a musical expert then it seems that his left hemispheres is better at processing music because music a kind of language for music experts Expert musicians can associate verbal labels with musical stimuli We must considerthe nature ofthe stimuli musical versus verbal consonant versus vowel and the type of subject musically sophisticated or not and what exact task listeners are instructed to perform Right Brain Left Brain Whole Brain 0 The right hemisphere seems to be somewhat better at processing intonation the more melodic or musiclike aspects of speech 0 Rememberthat the two halves of the brain already communicate through the corpus callosum O The two halves are also connected by their bases and share many common subcortical structures There are also connections provided through longer Ushaped fibers It is popular to emphasize the differences between the two hemispheres today because in the past we used to thinkthat the two hemispheres were exactly the same for such a long time Differences of the brain is more apparent under artificial laboratory conditions o Might be better to focus on how the brain works as a whole than two halves work independently Summary Dichotic listening reveals that vowels and consonants recruit somewhat different speech processing areas in the brain The dichotic listening paradigm is one of the important ways in which differences between the two hemispheres ofthe brain have been studied Typically the left hemisphere is better at processing verbal material Media has played a role in promoting the differences between the two halves of the brain in order to show two independently functioning systems however we should rememberthat both hemispheres contribute to language processing in an integrated manner September 8 2014 Chapter 3 Perception of Vowels Formant frequencies are extracted at the level of the ears this information is matched up with a particular vowel in the brain But we came across a big problem with matching up formant frequencies to a particular vowel What is the problem we have with matching formant frequencies up with a particular vowel o Vowel normalization We need to understand why we need vowel normalization in order to understand what this process is It is sometimes called speaker or vocal tract normalization The reason we need this is because there are differences in vocal tract size A female speaker has a smaller head size in general than do male speakers and children have even smaller vocal tracts Since the smaller the resonating cavity the higher the resonance the formant frequencies of females is higher than those for the same vowel produced by a male speaker and the formant frequencies for children are even higher still Some researchers think that the i vowel is useful for normalizing the vocal tract of the speaker Spectrograms are a way of making speech visible and enabling us to measure various acoustic properties of speech such as formant frequencies LPC linear predictive coding is a mathematical procedure that is used in computer assisted speech analysis o Advantage of computerassisted speech analysis over spectrographic analysis is that the computer can usually also be programmed to do a number of other things besides analyze formants o The spectrograph is NOT adaptable It is only good for making spectrograms and still needs someone to interpret them and determine the actual formant frequencies Basics of vowel perception o Formant frequencies ARE present in vowel sounds when they are produced by a speaker o How do humans extract formants out of the speech that they hear 0 O 0 There is evidence that the human auditory nerve already reacts directly to formant frequencies The ear is capable of effecting a kind of basic spectrographic analysis before speech is sent to the brain for processing The cochlea resembles the spiralshaped nautilus shell The uidfilled cochlea s nerve cells react to speech in an arranged order of their frequency sensitivity This frequency organization is referred to astonotopic organization The cochlea functions like a bank of frequency filters With this information remember there is not a lot of direct evidence of this formant frequency analysis function of the ear The ear can perform an elementary speech analysis We still need our brain to interpret the information The brain has a lot to do with what is perceived in speech o The brain is what makes speech out of the sounds that we hear and not the ears 0 Vowel sounds are specified by their formant frequencies These formants relate most directly to the position of the tongue and the length and size of the vocal tract There is evidence to say that speakers CAN arrive at the same acoustic goal using 0 quite different articulations This is why an acoustic goal is more important than a particular articulatory configuration Vowel N 0 rm aliz ation Speakers vary in their sizes and mass of their vocal folds Their vocal tracts can also vary considerably These physical differences in speakers result in considerable differences in the formant frequencies for a particular vowel There is a considerable degree of overlap of vowel spaces between adults and children Children are capable of normalizing vowels for different speakers Infant children can recognize the same vowel across different speakersdespite the acoustic differences 0 Because of this capability researches have suggested that vowel normalization is innate and present at birth Vowel normalization is a necessary component of speech perception and that it very unlikely that listeners can perform this task on the basis of visual information alone Summary We can basically perceive vowels on the basis of their formant frequencies When we change the vowel that we are producing and then measuring it acoustically we nd that the formants have changed Also if we synthesize vowels and change the rst two formant frequencies then listeners hear a change in the vowel The rst two formants determine the various vowel sounds September 15 2014 Chapter 5 Categorical Perception Categorical perception is a difference between the perception of vowels and consonants The difference between voiced and voiceless stop consonants is actually one of the relative timing of the onset of vocal fold vibration o This timing difference is referred to as voice onset time VOT o Lisker and Abramson were the rst researchers to de ne VOT and to investigate how it is produced in a variety of languages Voiced stop consonants have a relatively short VOT whereas voiceless consonants have a longer VOT long voice lag Negative VOT Prevoicing AKA Voice lead Listeners can usually discriminate many more different sounds than they can absolutely identify 0 Liberman has suggested that we can discriminate about 1200 different pitches but we can ONLY absolutely identify about 7 The disadvantages in measuring VOT on the sound spectrograph that were mentioned were 0 Once the spectrogram is removed from the machine there is no longer any way to hear the acoustic signal which has been analyzed 0 The VOT portion of the spectrogram is very small and therefore it s easy to make a mistake in estimating VOT What were the two things that were mentioned about such a VOT plot Figure 53 0 That there are two distinct areas of categories with no overlap and also that there is a considerable range of variation even in the normal speaker 0 That the normal speaker tends to avoid the area BETWEEN the two categories as if he or she is aware of such a production could be Two stimuli one at 73 ms and one at 0 ms are both heard as da That s an overall VOT difference of 73 ms Yet this same one at 0 ms and another one at 22 ms are heard as different one as da the other as ta The overall difference here is 22 ms These are the two characteristics of categorical perception 1 That listeners are not sensitive to differences between speech stimuli within a category but only to differences between two different categories 2 That for sounds for which there is categorical perception ie stop consonants discrimination is limited by perception a This is just as important but a little more difficult to grasp What we measure by this is that the listener cannot reliably discriminate any more sounds than he can reliably identify For most auditory stimuli a listener can discriminate many more sounds than he or she can identify Most of us can discriminate thousands of different tones and pitches However most of us can reliably identify only a few of them or even none For most auditory stimuli there is a huge discrepancy between the number of stimuli that they can discriminate and the number they can identify But for speech stimuli like the voicing distinction in stop consonants listeners can only discriminate as many sounds as they can identify These are the 2 characteristics of categorical perception What does the chapter say about vowel stimuli that are changed in similar gradual steps 0 Gradual change in vowels ar heard as gradually The advantage that categorical perception has given to speech is to allow listeners to hear in terms of the phonemes of their particular language and to ignore nonessential variation within a category Categorical perception is one of the essential differences between the perception of speech and other sounds 0 While listeners can usually discriminate many more sound differences than they can reliably identify in categorical perception listeners can only accurately discriminate about as many differences in speech sounds as they can identify I This allows listeners to perceive speech at a much more rapid rate than other sound sequences Chapter 2 Basic Speech Acoustics Fundamental frequency When the vocal folds are set into periodic vibration they create a sound which is also periodic o By periodic it means it repeats at regular intervals There is one cycle or period for each complete opening and closing of the vocal folds o This is the only case in speech where there is onetoone direct relationship between a physiological event of speech and the acoustic signal Why is the rate of vocal fold vibration different from men than it is in women 0 Men have longer vocal folds with greater mass The larger and heavier the vocal folds the slower the typical rate of vibration 0 Children have shorter vocal folds with less mass so they have faster typical rates of vocal fold vibration What are some fundamental frequency values given in the book for men 0 About 100 Hz to 120 Hz 0 In women about 160 Hz to 200 Hz 0 In young children may have fundamental frequencies of 300 Hz or higher Remember that we use the term fundamental frequency for the actual rate of vibration of the vocal folds So pitch refers to the perception of the fundamental frequency The speaker produces the fundamental frequency and pitch is perceived by the listener Why is a different term used for the actual physical vibration of the vocal folds and the perception 0 There is not an exact correlation between the two It takes a greater and greater difference in fundamental frequency to effect as large a difference in the perceptioniespecially at higher frequencies If we looked at the acoustic signal of the vibrating vocal folds in fact it would be a simple periodic wave A sinusoid in fact 0 It looks like a simple S shaped wave From the baseline up to a maximum down to the baseline then down to a minimum and back to baseline Figure 21 is not a sinusoid Figure 21 is a complex wave not a simple wave The shape of this waveform is more complex than a simple up and down sinusoid O 0 What has happened to the sound of the vibrating vocal folds in this figure 0 It has been filtered or shaped by the supralryngeal vocal tract Figure 21 is a vowel sound We can t tell which particular vowel sound just by looking at the waveform but we do know it is a vowel because it is a complex wave In other words because people have heads we mam Harmonics Harmonics they are simple multiples of the fundamental frequency So you get them by simple multiplication The first harmonic in fact is the fundamental l x 11 So we usually don t start talking about harmonics until the second one since the first harmonic IS the fundamental frequency Once we know the fundamental frequency we know all the harmonics Even though harmonics go out to infinity we usually only speak about a limited set of harmonics Why don t we talk about harmonics above a certain frequency Now what harmonic would be 5000 Hertz for a fundamental that is 100 Hz 0 The 50th harmonic Once you know fundamental frequency you know harmonics and vice versa There is a direct simple relationship between the harmonics and FF It is multiples or multiplication This is not the case for formant frequencies F ormants are resonance frequencies or the way that the vocal tract filters the sound of the vibrating vocal solds There are basically two resonanting chambers in speech Do you remember what they are What are the two basic resonating chambers in speech One is formed in front of the tongue and out to the lips the oral CAVITY or chamber and another one formed in the back of the tongue and extending down to the larynxithe pharyngeal cavity or chamber Formants do not bear any relationship to the fundamental frequency Knowing ff will not tell you any hing about formants There is one exception to the rule that we have to know about tongue position for formants o For a midcentral vowel like schwa we can predict formant frequencies based on the speed of sound and the length of the vocal tract 0 The approximate formant frequencies for an average adult male producing a schwa vowel is I Fspeed of sound4 x length of vocal tract 350 meters per second 4 x 175 centimeters 350 4 x 175 or 3507500 Hertz o This gives the approximate for F1 the lowest one Then only ODD multiples so F23 x 500 and F3 5 x 500 so we get Fl500 F21500 and F32500 Hertz Formant frequencies for the vowel i o Page 23 i is the vowel with the greatest difference between F l and F2 You ll remember that this relationship will be very important in learning relationship between the tongue and the various vowels Since the tongue is in the highest most front position the oral cavity is smallest and the pharyngeal cavity the largest Formant Frequencies Formant frequencies are the frequencies at which certain harmonics of the fundamental frequency are emphasized Knowing that the smaller cavity the higher is the resonant frequency A formant frequency may or may not correspond to an exact harmonic of the fundamental frequency Formants account for the differences in particular vowels Remember A vowel with a high front position of the tongue will result in a vowel with a high F2 and a low Fl since the oral cavity in front of the tongue is small and resonates at a high frequency and the pharyngeal cavity in back of the tongue is relatively large and resonates at a low frequency Chapter 7 Distinctive Features and Phonetic Feature Detectors Distinctive Features 0 What was particular about the feature system that J akobson and colleagues were trying to develop 0 It was universal which means it could work for any language They were strictly acoustic in nature and they were binary and minimal So these four characteristics I Universal I Acoustic I Binary I Minimal 0 Minimal here is in the Formal sense of information theory the minimal amount of information that is required to uniquely specify some piece of information o What is the formula for describing the number of binary features needed to uniquely specify a particular piece of information o 2 to the nth power 2An where n is the number of binary features 0 So binary features are certainly all that would be needed to specify the three places of articulation or stop consonants MISSED THIS SLIDEH o 2A24 contrasts and we only have three places of artiucaltion in English 0 How many contrasts could three minimal binary features specify o 2A3 or 2 cubed would be 222 which would be eight contrasts o This is how we know that the terms bilabial alveolar and velar are in no sense minimal features in the sense of J akobson Fant and Halle 0 These are descriptive terms for the place of articulation and not features for place of articulation certainly not minimal acoustic features along the lines of Jakobson Fant and Halle 0 Why is that If these were minimal binary features in the sense of information theory how many contrasts could three features account for o 2 to the 3rd power of 2228 0 Another way to think of these minimal distinctive features is to think of specifying a number The number guessing thing he did in class 0 4 features Three would give us eight distinctions but four would allow us to uniquely distinguish 2 to the 43911 power or 2222 0 Four features or binary questions would allow us to specify 16 different contrasts o SPE features or the Sound Pattern of English written by Chomsky and Halle published in 1968 0 These features are strictly for the English language and not intended to be universal even though they have been borrowed for other languages And they are a combination of articulator and acousticlike features phonological and articulatory features 0 Then what are the two features used by Jakobson Fant and Halle for the three places of articulation in English 0 Compact so its opposite Diffuse and acute or its opposite Grave Since these are polar opposites we can just pick one of these terms so I suggest we use compact and acute Bilabials would then be 7compact acute Alveolar are wompact acute Velars are compact we don t have to specify whether they are or 7 acute since they are the only place of articulation in English that is compact e rising e falling Acute is rising Since Idon t know yet what it is in the acoustic signal that is compact or acute or what terms would look like acoustically But in the 1950s Jakobson and colleagues really could not specify exactly how these acoustic features appeared in the acoustic signal either Remember all they had available for analyzing speech at this time was the sound spectrograph and there weren t a whole lot of them around either in the 1950s Now why is it that we would want to have an analysis of speech into features in the lSt place 0 We could speed up time it would take for templatematching if we had a PRE ANALYSIS in terms of features If we classified the acoustic signal into features then we could speech up the time needed to o What is some evidence that human listeners hear or perceive in terms of features 0 Miller and Nicely s experiments with speech presented with noise 0 What did these researchers nd 0 That listener s errors are far from random but rather are organized in terms of features For example errors were usually very close to the correct sound 0 What was another piece of evidence 0 Speaker s errors are also not random but highly organized Most errors are closely related to their correct targets and often times are the result of change in a single feature DISTINCITVE FEATURES 0 Articulatoryphonetic features describe speech in terms of the articulatory gestures that are required to produce them 0 For example p and b have a bilabial place of articulation because they are produced with the two lips o t and d have an alveolar ridge when they are produced 0 1d and g are velar since the tongue back makes contact with the velum o A more precise description would be to also include the activity of the tongueiso alveolars would be apicoalveolar and velar sounds would be classifies as dorso velar Distinctive features are the absolute minimal contrasts between phonemes in a language Roman Jakobson attempted to define a set of Universal distinctive features and the underlying acoustic property that accounted for each particular distinction 0 Result of this work in collaboration with colleagues Gunnar F ant and Morris Halle was a monograph first published in the early 1950s entitled Preliminaries to Speech Analysis The Distinctive Features o The features proposed in this work is that they were intended to be universal which means they were intended to be applicable with minor adjustments for every language 0 They were strictly acoustic in nature and NOT based on articulation 0 They were also binary or distinctive and minimalrepresenting the relevant contrasts between phonemes with the fewest possible number of features 0 Human speech is highly organized this is proven in Miller and Nicely s study Study listeners were presented with CV syllables in the presence of noise The pattern of results from this study were organized by features Voicing errors were common pig for big I Another source of evidence that human speech is organized in terms of features comes from speech errors or socalled slips of the tongue 0 These errors can often be explained in terms ofa change in a single feature and they are not the result of some random substitution of one speech sound for another Phonetic Feature Detectors o Phonetic feature detectors are a theoretical construct employed to help explain speech perception o No one has yet isolated phonetic feature detectors in the human brainidespite the fact that some even call them neural feature detectors o The main source of evidence for phonetic feature detectors comes from the selective adaptation procedure 0 In selective adaptation a certain acoustic property detector is fatigued through repeated presentation of auditory stimuli with that acoustic property present 0 Phonetic features detectors may help explain the extreme rapidity with which speech is perceived Summary 0 Although a variety of feature systems are available for describing speech minimal distinctive acoustic features have a very restricted definition 0 The goal of such features is to break speech down into its minimal essential acoustic distinctions 0 Using such features the 40some phonemes of American English could be broken down into approximately 7 binary contrasts 0 Acoustic features also imply that the information required to distinguish the sounds of speech is directly available in the acoustic signal of speech 0 The concept of phonetic feature detectors is an outgrowth of feature theory 0 Phonetic feature detectors imply that there is a specific neural substrate devoted to detecting of the presence or absence of each particular phonetic contrast 0 Two notions that help us get a clearer picture of how speech perception might occur Speech can be broken down into minimal essential sound contrasts and that there are neural detectors that react speci cally to the absence or presence of these same acoustic properties 0 In Eimas and Corbit s investigation of voicing contrast they found that after repeated presentation of a voiced sound listeners heard more stimuli as voiceless Chapter 4 Perception of Consonants The problem of Acoustic invariance Considering the perception of consonants 0 Do not confuse the problem of acoustic invariance with the THEORY of Acoustic Invariance The formant transitions the portion of the speech signal that we know is responsible for the stop consonant perception are not constant or invariant across different vowel contexts In other words there is nothing obvious which constant in the speech signal which could explain the constant perception of the place of articulation of the stop consonantsin this case d Figure 43 is a schematic illustration of the formant frequencies It is a simplification It is only the first two formant frequencies Natural speech has more formants But recognizable speech can be synthesized on the basis of two formants alone so we are using simplification The second thing is that the burst are not included in this schematic Schematic based on a wide or narrow band spectrogram Wide band How many different syllables 7 different syllables d the alveolar voiced stop consonant in front of 7 different vowels The problem of acoustic invariancethe fact that the second formant frequency transitions are so different in front of different vowels The second solution is known as motor theory of speech perception and the other one is known as the theory of acoustic invariance 0 So the second formant frequency transition started out pointing down in front of iand ends up pointing sharply upwards by the time we got to u 0 Although the formant frequencies The locus theory is the first proposed solution to the problem of acoustic invariance o What is the problem with the locus theory 0 Pg4l Even though there was nothing constant in the formant transition patterns when the same consonant was produced in front of different vowel sounds the second formant frequency transitions all seem to be pointing toward the same frequency If we were to extend the formant transitions back in time there would be a point to the left where they would eventually cross each other 0 The problem is actually worse Locus theory isn t really the best solution 0 We need to cut back the F2 transition 0 Consonant perception is more complex than vowel perception o Consonants depend on vowels for their perception 0 Stop consonant sounds cannot be isolated from vowels without destroying their perception o The transition is the portion of the acoustic signal 0 The transition delivers the perception of a consonant o If the vowel sound is cut out from the audio the consonant sound will disappear I The sound that remains no longer sounds like speech it sounds like a chirplike noise Formant transitions change quickly and then smooth out and become stable 0 These formant tails are known as formant transitions 0 It was often thought that consonant perception is delivered by these rapidly changing formant transitions 0 The formant transition patterns look very different when the d consonant occurs in front of various vowels The lack of something constant in the spectrographic representation that could signal the consonant is known as the lack of acoustic invariance 0 There is nothing acoustically invariant or constant between this consonant produced in different vowel contexts A solution that researchers seem to favor is the motor theory of speech perception With this they argue that since motor commands are sent to the articulators to produce a d must be the same even in different vowel contexts these motor commands might be used for speech perception Speech can be understood at rates of about 30 phonemes per second Coarticulation in speech production refers to the fact that the articulation for one phoneme impinges on the articulation of another 0 Basically the articulators are usually already beginning to move toward the position necessary for the next phoneme even before they have completed their movement for the present phoneme o The fact that speech units are squeezed together or encoded has led some researchers to consider that perception may depend on units other than phonemes such as syllables September 17 2014 Chapter 6 Motor Theory of Speech Perception 0 Remember formant frequencies work well to explain vowel perception 0 Then we came across a problem which was acoustic invariance 0 Acoustic invariance 0 There is nothing constant or invariantnot varying o in the formant frequency transitions especially F2 Could be used to explain the consonant perception in front of the different vowel O contexts 0 Then we considered the locus theory 0 Abstract theory at best 0 ISI solution to the problem of acoustic invariance o 2quotd solution will be considered later today 0 Motor Theory 0 It is called motor theory because it s based in the motor command from the brain to the articulators 0 Because in this theory the listener somehow makes use of the motor commands that would have been sent to the articulators to produce a particular phoneme I It is pointed out that even it the motot theory were only partially correct it would have large implications for SLP 0 Why do you think that it so I It would suggest that a motor speech problem would have consequences in perception It might suggest that production and perception are two facets of the same cointhat improvement in production would generalize to perception and perhaps even vice versa 0 What is the lst problem with motor theory discussed in the next section 0 It has never been made explicit how it is that listeners are able to extract motor commands from the acoustic signal 0 What was one of the ways that motor theory was tested 0 EMG studies They wanted to see if the motor commands were invariant with the particular phoneme o What did these studies reveal 0 There wasn t a correlation They seemed to be very variable 0 Does evidence from aphasia seem to support or go against motor theory 0 The fact that Broca s aphasics who have a motor speech problem still have intact comprehension seems to be an argument against motor theory When Broca s aphasics try to correct themselves they get closer to their target 0 Wemicke s aphasics have the poorer comprehension yet they don t have a problem in motor outputting of speech When Wernicke s aphasic try to correct themselves they seem to get further away from the target 0 What about evidence supporting motor theory What was some of the evidence the book considered in support of motor theory 0 Foreign speakers who move their lips when they are trying to understand mocking o The phonetician s technique of attempting to produce the phoneme so that they can transcribe it more reliably o The fact that many people with hearingimpaired can read speech from observing facial gestures o The McGurk effect 0 What is the McGurk effect 0 If you combine a visual image with a different auditory signal perceivers make an agglomeration between the two 0 So if you have a visual image of saying ba and the sound track is playing ga listeners perceive da which is intermediary between the two 0 The sound track has to be perfectly coordinated with the visual track I Example in lab 0 How was the motor theory revised in the second version o The concept of specialized modules in the brain was added 0 The concept or experimental paradigm of duplex perception was also added 0 What is duplex perception o If one ear is presented with Fl information and the other ear is presented with F2 information the two are combined But the listener hears the signal as speech and not speech at the same time o This shows that the listener can simultaneously hear speech both in a speech and nonspeech mode at the same time This suggest there may be a specialized listening mode which is triggered by speech and perhaps sets into play specialized process of detecting MOTOR COMMANDS To summarize up to this point no nal consensus has been reached in regard to Motor Theory It is still around and fairly popular today 0 Part of the reason for this is perhaps that there weren t much better explanations around at the time that many people in the eld received their training Most SLP s out there were probably taught that it is formant transitions which are the acoustic correlates that account for the perception of placeofarticulation A clinical concept in SpeechLanguage therapy that contradicts motor theory and is unproven 0 That is the perception precedes production I In therapy you have to make sure that the client can perceive a particular sound distinction before you can work on improving the clients production of that phoneme What is the other theory that was discussed here related to motor theory 0 This is the theory of analysisbysynthesis What it shares with motor theory is that the listener would still be attempting to get access to motor commands The way he would go about it here is to synthesize the speech that he or she is listening to How is analysisbysynthesis different from motor theory 0 The difference is the concept of template matching that comes from analysisby synthesis theory 0 Template matching would be the means by which the listener compares the internally synthesized speech to the incoming sound signal This concept has proven very useful in explain speech perception and it relates directly to the theory of acoustic invariance o The basic idea comes from the study of perception in psychology and the concept of pattern recognition The use of stencils is an analogy to explain this concept 0 Even though their purpose is to prod various letter patterns we can also use the stencils backwards so to speak to recognize which letters have been produced 0 The same analogy can be used for spectral patters as well We can use a spectral pattern in reverse to discover what is an instance of a particular pattern 0 The template can be very speci c or very general The wider the more general the template the easier it will be to find matches to that particular template 0 What was the drawback to a templatematching procedure that was mentioned in this chapter 0 We could never be sure that we had the best match until we went through the entire set of templates 0 What is the solution proposed to alleviate this problem 0 A preanalysis I In the case of speech it could be broken down into some of its features For example with letters we could divide the set of letter into those made with round circle strokes and those made without round portions 0 There is a motor theory in uenced view of speech called action theory which basically conceives of speech production as motor gestalts that can be used for perception 0 Mirror neurons in Broca s area 0 There are mirror neurons which fire during the observation of a motor task as well as during the actual performance of a motor task This could be the reason autistic children cannot empathize with other people because their mirror neurons are off Chapter 8 The theory of acoustic invariance What were the two proposals 0 Locus theory and motor theory 0 Be sure to remember that the problem of acoustic invariance is essentially a problem of explaining placeofarticulation perception for stop consonants So why is it that the theory of acoustic invariance can be considered as a step llback in time in one respect 0 Since the theory of acoustic invariance seeks to solve the problem by looking for the solution in the acoustic signal itself it is a throwback to the past The acoustic signal was what Liberman had firs considered with locus theory Remember also that the theory of acoustic invariance also builds upon the universal binary minimal acoustic features proposed by Jakobson and colleagues at MIT t specifies what these features would look like in the acoustic signal See Diagram from class As soon as its compact its velar This theory is somewhat similar to Gerome Bruner s theories of visual perception Stevens and Blumstein s theories makes specific what the acoustic features proposed by Jakobson and colleagues would look like in the acoustic signal They make use of the concept of quottemplate matching from the htheory of analysisbysynthesis Analysisbysynthesis was first proposed by Stevens and Halle Stevens is one of the names you should associate with acoustic invariance He is a professor at MIT Halle is the third name in the group who developed the minimal acoustic features Jakobson Fant and Halle Gunner Fant is an acoustician with a physics and engineering background from Sweden Roman Jakobson was from Russia so you see we have a very international group Acoustic invariance is very much a MITbased theory that is from the Massachusetts Institute of Technology whereas both locus theory and motor theory are associated with Haskins Laboratories now in New Haven Connecticut Haskins Laboratories was in New York City when it was first established but moved to New Haven more than thirty years ago There are several universities associated with Haskins Laboratories but the main ones would be U of Connecticut and CUNY graduate center and also Yale University to a lesser degree But Haskins is more directly connected with CUNY Graduate School City University of New York because they used to be based in New York City But this is a bit of a digression One of the important advantages to remember about the theory of acoustic invariance is that it does not propose llnew machinery to solve the problem In other words it is a PARSIMONIOUS theory because it makes use of quotmachineryquot or mechanisms which are already required to explain vowel perception quotParsimoniousquot it is a good GRE word does anyone know what it means o In this case making use of what is already available In this case it would be making use of the LPC spectrum or spectrographic type ofanalysis already required to explain vowel perception Remember that we said that since the cochlea can perform a form of Fourier analysis this type of processing would not even have to take place in the brain OK so acoustic invariance doesn t need any type of processing of the acoustic signal which is certainly the case in motor theory For lmotor theory we need to have access to motor commands to solve acoustic invariance for stop consonants we don t need these motor commands to explain vowel perception In this respect motor theory is NOT parsimonious at all Motor theory needs a while different machinery to solve place of articulation perception than it needs to explain vowel perception s acoustic invariance as proposed by Stevens and Blumstein a static or a dynamic theory What do I mean by this Does it make use of information from one particular place in the acoustic signal or does it integrate information over some portion of time o It is more STATIC because it makes use of specific information at a particular discrete location in the acoustic signal at the burst and several tens of milliseconds afterwards This is over a short period of time and it does not look at changing information over the length of a formant transition for example Notice that with the release of the consonant or the burst and llup to some 20 ms Afterwardsquot bottom of page 78 is this going to include formant transitions 0 No not at all for voiceless stops where the voicing and formant transitions have not yet begun What about for voiced stops 0 Well in voiced stops this can include some of the vowel formant transitions What does the book say about formant transitions in the theory of acoustic invariance o Page 79 2nd sentence Why is it important to consider the acoustic information provided in the burst In other words what is the advantage of the spectral information from this particular point in time 0 At the very beginning at the burst the vocal tract is just opening from the stable position that is most characteristic of the place of articulation ie bilabial alveolar or velar In other words the articulators are closest to the place of articulation they have not yet moved very far from the place of articulation position o This is really similar to one idea from motor theory that at this point in time that vocal tract is in a position closest to placeof articulation But the theory of acoustic invariance adds the idea that the acoustic signal is also more invariant at this point in time The vocal tract has not had time move very far so we are still not into the rapidly changing formant transitions at this point in the signal So acoustic invariance uses the burst because the burst is the most invariant part of the signal since the vocal tract is not yet fully into the formant transitions So is Blumstein and Steven s theory of acoustic invariance a STATIC or a DYNAMIC theory In other words do they propose to make use of acoustic invariance