New User Special Price Expires in

Let's log you in.

Sign in with Facebook


Don't have a StudySoup account? Create one here!


Create a StudySoup account

Be part of our community, it's free to join!

Sign up with Facebook


Create your account
By creating an account you agree to StudySoup's terms and conditions and privacy policy

Already have a StudySoup account? Login here

South Asian Sexual Cultures

by: Patrick Langworth

South Asian Sexual Cultures 113 127

Patrick Langworth
GPA 3.77


Almost Ready


These notes were just uploaded, and will be ready to view shortly.

Purchase these notes here, or revisit this page.

Either way, we'll remind you when they're ready :)

Preview These Notes for FREE

Get a free preview of these Notes, just enter your email below.

Unlock Preview
Unlock Preview

Preview these materials now for free

Why put in your email? Get access to more of this material and other relevant free materials for your school

View Preview

About this Document

Class Notes
25 ?




Popular in Course

Popular in anthropology, evolution, sphr

This 9 page Class Notes was uploaded by Patrick Langworth on Friday October 23, 2015. The Class Notes belongs to 113 127 at University of Iowa taught by Staff in Fall. Since its upload, it has received 21 views. For similar materials see /class/228064/113-127-university-of-iowa in anthropology, evolution, sphr at University of Iowa.

Similar to 113 127 at UI

Popular in anthropology, evolution, sphr


Reviews for South Asian Sexual Cultures


Report this Material


What is Karma?


Karma is the currency of StudySoup.

You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!

Date Created: 10/23/15
Filters and Answers The University of Iowa TREC9 Results Elena Catonal David Eichmannz 3 and Padmini Srinivasan1 2 1Department of Management Sciences 2School of Library and Information Science 3Computer Science Department The University of Iowa elenacatona davideichmann padminisrinivasanuiowaedu The University of Iowa participated in the adaptive ltering and question answering tracks of TREC9 The ltering system used was an extension of the one used in TREC7 l and TREC8 2 Question answering was done using a rulebased system that employed a combination of pub lic domain technologies and the SMART retrieval system 1 Adaptive Filtering Our approach to ltering involves atwolevel dynamic clustering technique Each ltering top ic is used to create a primary cluster that forms a general pro le for the topic Documents that are attracted into a primary cluster participate in a topicspeci c second level clustering process yield ing what we refer to as secondary clusters These secondary clusters depending upon their status are responsible for declaring ie retrieving documents for the topic As documents are temporally processed they are attracted to a primary cluster if their similarity with the cluster vector is above a primary threshold These documents enter the secondary cluster ing stage where again based on similarity to cluster vectors and a secondary threshold they either join an existing secondary cluster or start a new one If at some point the similarity between a sec ondary cluster and the primary cluster exceeds a third declaration threshold then the document most recently added to the secondary cluster is retrieved for the user When deriving representations we use TFIDF weights after stemming the terms using Porter s stemmer We also limit document vectors and cluster vectors to the best 100 and 200 stems respec tively In TREC8 adaptation was explored at several different levels 2 First a secondary cluster39s future behavior would depend upon past performance If a secondary cluster declares a document that turns out to be relevant then it is colored green This means that it declares all documents that join it in the future If instead the declared document is non relevant then the cluster is colored red and all future documents are not declared A non relevant document that joins a green cluster spawns an independent red cluster allowing the original cluster to remain green Another adaptive dimension was to have the primary cluster vector adapt as relevant judgements were obtained A version of Rochio39s feedback method is built into the system for this purpose A differential adap tation scheme is also built in for this purpose The key distinction is that in the differential scheme positive and negative term vectors are comprised only of terms not found in the other vector or in the original query vector Filters and Answers The University of Iowa TREC9 Results Recent experiments conducted with TREC8 data explored additional dimensions of adapta tion For example we experimented with adapting the primary threshold as the performance mea sure varied For this the performance measure such as the utility score was computed at regular intervals when a snapshot of the system is taken We also explored adaptation of secondary and declaration thresholds In all these the most pro table approach appears to be adaptation of the break threshold using a step function that responds to changes in performance across snapshots Our OHSU runs use the system as described above with adaptation of the break threshold Other key extensions to the system for TREC9 include the ability to specify the type of index vectors to utilize A phrase recognizer loads dictionaries of phrases derived from sources such as the WordNet thesaurus and matched phrases are included into the document vectors More recently a rulebased entity recognizer has been developed that allows the indexing of documents by person names organizations locations and events Our MESH run includes this technology as well special support for medical terminology The MeSH hierarchy an associated lexicon of synonyms and a supplementary list of concepts such as drug names were used The MESH run involved index vec tors that were populated using only the entities extracted from the source text OHSU Runs For TREC9 we submitted two OHSU runs These runs employed word based indexing Roch io feedback for the profile adaptation and adaptation of the declaration threshold Both OHSU runs used the controlled vocabulary field MeSH terms The two runs differ only in their starting threshold values The primary secondary and declaration thresholds were 03 032 and 03 re spectively for OHSUl and 025 027 and 025 respectively for OHSU2 The declaration threshold was adapted in each case using a stepwise strategy Figure 1 shows the performance in terms of utility for our OHSUl run The dashed bars represent median performance across systems for each topic There are 24 topics for which OHSUl was better than the median and another 24 for which it was below the median We conducted several experiments after the official submission deadline to better understand the different aspects of our filtering system and its weak performance on the OHSU task The first question asked was whether the primary filter was effective In other words how good was it at fil tering out non relevant documents while allowing through the relevant documents Figure 2 shows the percentages filtered through over time with snapshots taken every 1000 documents The figure shows that if we divide the snapshots into three groups then the primary filter allows about 50 60 and then 59 of the relevant documents that arrive over the first second and third sequence of snapshots respectively At the same time the percentage of non relevant documents allowed through stays less than 1 of the number seen We then examined the effectiveness of the second ary filter Note that this analysis of the secondary filter was limited to those documents allowed through by the primary filter Figure 3 shows that the secondary filter was successful in reducing the percentage of non relevant documents allowed through dashed bars However at the same time it also restricts the passage of relevant documents although not as severely Next we took a different track in our analysis and examined the effectiveness in adapting the declaration threshold Figure 4 displays these results We can observe that if we eliminate break threshold adaptation per formance degrades significantly over time dashed bars In contrast the adaptive mode is able to Filters and Answers The University of Iowa TREC9 Results Perlnrmznce mu Tupi r4 Figure 1 Performance of OHSUl Dashed bar OHSUl Solid bar median performance Percentage Flltered Throughpgneragea across quen 12 a A 5 e 7 a a WEN12131415161718192D2122232425262728293U Snapshot IDa m Thruugh DVD m Filtered Thmugh Figure 2 Assessment of Primary Filter stay somewhat steady although on the negative side of the performance axis At this point we sus pected that our break threshold may not be restrictive enough Figure 5 shows the effect of testing this by contrasting a run where the break threshold was increased from the original 025 dashed bars to 03 solid bars MeSh Run Snapshots Figure 4 Assessment of Declaration Threshold Solid bar adaptive dashed bar non adaptive Figure 6 shows the performance in terms of utility score for our MESH run As mentioned be fore for this task we employed a rulebased entity recognizer which uses the MeSH hierarchy an associated lexicon of synonyms and a supplementary list of concepts such as drug names This run involved index vectors that were populated using only the entities extracted from the source text Entitybased performance on the MeSH subset proved to be quite intriguing In 92 of the topics our system yielded the highest score in some cases substantially higher than median performance cummulauve uumy Tau averaged across queru N N e in a Primary Dncumenls ltered averaged acrnss h by Seaman n i m 2 Evan acumen s Fiiieved Thvuugh Snzphnl 12 3 A 5 E 7 E 9 WM 12131415151718192n2122232425252728293n Figure 3 Assessment of Secondary Filter wa u Nun Reievani Ducuments Fiiieved Thvuugh Filters and Answers The University of Iowa TREC9 Results Filters and Answers The University of Iowa TREC9 Results Toplcs 1132 Figure 5 Assessment of Higher Break Threshold Dashed bar 025 Solid bar 03 mun EDD Lllilily Sun Tnpic r4 Figure 6 Performance of MESH Run Solid line median performance Dashed line MESH run At the same time in 147 of the topics our system yielded the lowest score again in some cases substantially lower than median performance We conjecture that the pure entity scoring yields high quality results but for some topics our secondary cluster scheme is generating too many high relevance clusters that prove to be offtopic This may be due in part to the score being generated by ancestordescendant MeSH term tree matches Figure 7 presents performance utility score as Filters and Answers The University of Iowa TREC9 Results Stun MUD WEED Number nl Relevant Dncumenls Figure 7 Performance versus Number of Relevant Documents for Topic Maximum D em Ann rznn n 2mm ADD EDD Bun mun Scare Figure 8 Maximum Depth of Topic s MeSH Phrases versus Performance a function of the number of relevant documents present for a given topic The gure shows 500 data points one for each topic One may observe a general trend that as the availability of relevant documents improves performance increases Moreover most of the scores are on the positive side of the Y axis Figure 8 explores a different aspect Our process extracts MeSH descriptors for each topic description from the MeSH hierarchy In the gure we plot the maximum depth expressed by the group of extracted MeSH phrases for a topic and plot this against utility score There are 50 data points corresponding to the first 50 MeSH topics One may observe that except for a few out liers there is a slight trend for scores to improve with the ability to identify deeper ie more speci c Filters and Answers The University of Iowa TREC9 Results Minimum MeSH De h Ann rznn n 2mm ADD EDD Bun mun Scare Figure 9 Minimum MeSH Depth for Topic versus Score Number nl Emilia in Tn rann 4mm rznn n 2mm ADD EDD Bun mun 12mm Figure 10 Number of Entities Recognized from Topic versus Performance MeSH phrases Interestingly the same sort of analysis using minimal MeSH depth for the topic as shown in Figure 9 does not yield a recognizable trend We also explored the effect of entity rec ognition on performance Figure 10 represents the number of entities recognized on the Y axis and performance on the X axis The graph shows that barring a few exceptions there appears to be a slight trend for performance to improve as the number of entities recognized increases In summary the switch in domain from the newswire domain to MEDLINE proved to be chal lenging The thresholds used in our submitted run were essentially our best guesses For the future Filters and Answers The University of Iowa TREC9 Results we also plan to explore different term weighting strategies as well as query expansion strategies prior to starting the ltering run 2 Question Answering We submitted two runs for this track UIQAOOl and UIQA002 Both utilized only the top 50 documents that were retrieved and distributed by Singhal UIQAOOl gave the better performance score with mean reciprocal rank of 0227strict and 0245 lenient The other run gave almost identical scores Our overall QA approach is shown in below Document processing Extract only the textual parts of each document 2 Apply a sentence detection program to identify distinct sentences We use the publicly avail able nxterminator program for this 3 Apply part of speech tagging on each sentence 4 Apply our rulebased entity tagger on each sentence 5 Create a database of sentences formatted for the SMART retrieval system Here each record corresponds to a single sentence with 3 different elds The rst holds the original untagged sentence the second eld holds the tagged sentence while the last eld of the record holds only the particular entities extracted from each sentence 6 Retrieve the top N sentences for each query Maintaining the three distinct elds for each sen tence record allows us to explore the relative merits of using diiTerent types of information for retrieving the sentence most likely to contain the answer Using SMART allows us to explore di erent weighting schemes during retrieval 7 Post process each of the N sentences to extract the top ve 250byte segments Query processing 1 Apply part of speech tagging on each query 2 Apply our rulebased entity tagger on each query Notice in the case of the query where possi ble its focus a speci c entity type is identi ed in addition to all the entities contained in the query This focus is utilized during the post processing step in 7 above The two runs differ very slightly in the post processing stage Generally this step includes cleanup of the sentences to remove any non informative strings reduction of each sentence to a 250 byte string around the query focus if known removal of duplicate answer strings and selec tion of the top 5 phrases The difference between the two is in the extent to which cleanup of the sentences was done As our results show this did not in uence performance in any way since the two runs yield almost identical results Error analysis indicates much room for improvement Due to insufficient time we were able to implement only very simplistic 250byte segment selection strategies that proved to be a signif icant problem for our system Secondly our performance was limited by the availability of the an swer within the top 50 document sets distributed Again with less time pressures we should be able to explore the 1K datasets and also conduct our own retrieval runs for the top 1K or so documents The results indicate that our approach managed to extract the answers for about 38 to 40 of the questions Filters and Answers The University of Iowa TREC9 Results References 1 Eichmann D M E Ruiz and P Srinivasan ClusterBased Filtering for Adaptive and Batch Tasks Seventh Conference on Text Retrieval NIST Washington DC November 11 13 1998 2 Eichmann D and P Srinivasan Filters Webs and Answers The University of Iowa TREC8 Results Ez39ghth Conference on TextRetrieval NIST Washington DC Novem ber 1999


Buy Material

Are you sure you want to buy this material for

25 Karma

Buy Material

BOOM! Enjoy Your Free Notes!

We've added these Notes to your profile, click here to view them now.


You're already Subscribed!

Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'

Why people love StudySoup

Steve Martinelli UC Los Angeles

"There's no way I would have passed my Organic Chemistry class this semester without the notes and study guides I got from StudySoup."

Kyle Maynard Purdue

"When you're taking detailed notes and trying to help everyone else out in the class, it really helps you learn and understand the I made $280 on my first study guide!"

Jim McGreen Ohio University

"Knowing I can count on the Elite Notetaker in my class allows me to focus on what the professor is saying instead of just scribbling notes the whole time and falling behind."

Parker Thompson 500 Startups

"It's a great way for students to improve their educational experience and it seemed like a product that everybody wants, so all the people participating are winning."

Become an Elite Notetaker and start selling your notes online!

Refund Policy


All subscriptions to StudySoup are paid in full at the time of subscribing. To change your credit card information or to cancel your subscription, go to "Edit Settings". All credit card information will be available there. If you should decide to cancel your subscription, it will continue to be valid until the next payment period, as all payments for the current period were made in advance. For special circumstances, please email


StudySoup has more than 1 million course-specific study resources to help students study smarter. If you’re having trouble finding what you’re looking for, our customer support team can help you find what you need! Feel free to contact them here:

Recurring Subscriptions: If you have canceled your recurring subscription on the day of renewal and have not downloaded any documents, you may request a refund by submitting an email to

Satisfaction Guarantee: If you’re not satisfied with your subscription, you can contact us for further help. Contact must be made within 3 business days of your subscription purchase and your refund request will be subject for review.

Please Note: Refunds can never be provided more than 30 days after the initial purchase date regardless of your activity on the site.