TOP INTRO TO MULTIMEDIA NTWRK
TOP INTRO TO MULTIMEDIA NTWRK CS 510
Popular in Course
Popular in ComputerScienence
This 12 page Class Notes was uploaded by Orrin Rutherford on Tuesday September 1, 2015. The Class Notes belongs to CS 510 at Portland State University taught by David Maier in Fall. Since its upload, it has received 8 views. For similar materials see /class/168260/cs-510-portland-state-university in ComputerScienence at Portland State University.
Reviews for TOP INTRO TO MULTIMEDIA NTWRK
Report this Material
What is Karma?
Karma is the currency of StudySoup.
Date Created: 09/01/15
Evaluation and Relevance 2 Lecture 9 CS 410510 Information Retrieval on the Internet Limitations of test collections Scale Early ones too small Large ones too expensive Pooled relevance judgments incomplete Doesn t reflect the way users work Interactivity Query formulation Dependence on relevance judgments CS 510 Winter 2007 c 2007 Susan Price and David Maier Assumptions of Cranfield Paradigm All relevant documents are known Violated in large test collections Single set ofjudgments for a topic is representative of user population Relevance can be approximated by topical similarity implies All relevant docs equally desirable Relevance of one doc independent of other docs User information need is static Based on Voorhees EM The Philosophy of Information Retrieval Evaluation CA Peters et al Eds 39 CLEF 2001 LNCS 2406 pp 355370 2002 CS 510 Winter 2007 3 Incomplete relevance judgments Use sufficiently deep pools 100 documents in TREC Use metric that is more robust to incomplete relevance judgments bpref Use selection instead of pooling Select document most likely to discriminate between systems being compared based on its effect on AP Stop judging when desired con dence level is reached CS 510 Winter 2007 4 c 2007 Susan Price and David Maier Inconsistency of relevance judgments Sormunen 2002 Reassessed 38 TREC topics to assign graded relevance criteria Of documents judged irrelevant by TREC assessors 94 judged irrelevant again Of documents judged relevant by TREC assessors 25 rated irrelevant and 36 rated marginally relevant CS 510 Winter 2007 Inconsistency of relevance judgments TREC assessor agreement Overlap intersection of relevant document sets from each assessor lt 50 Testing permutations based on the relevance judgments of 3 assessors Values of metrics changed System rankings remained highly correlated Assessor disagreement unlikely to alter results of system performance comparisons CS 510 Winter 2007 c 2007 Susan Price and David Maier What is relevance a measure of the effectiveness ofa contact between a source and a destination in a communication process Saracevic JASIS 1975 Aboutness document d is about topic T Intellectual assessment of whether d is about T Pertinence as perceived by a user User interpretation ofinformation need of document Situational relevance Usefulness in a particular situation or context Related to a particulartask Related to user s existing knowledge CS 510 Winter 2007 What is relevance Multidimensional Relevance assessment differs among users Perspective aspects of interest Level of knowledge Dynamic Relevance describes relationship between information and need at a particular time Relevance changes overtime for same user As user accumulates additional information As context changes CS 510 Winter 2007 c 2007 Susan Price and David Maier ls relevance binary Some documents provide more information than others Cover more aspects of an information need Cover the topic in more depth Cover the topic from a more desirable perspective for a particular user Graded relevance judgments CS 510 Winter 2007 Graded relevance judgments Example Irrelevant document Contains no information about topic Marginally relevant document Only points to the topic Does not contain more or other information than the topic statement Fairly relevant document Contains more information than topic statement but is not exhaustive lftopic multifaceted only covers some subthemes Highly relevant document Discusses topic exhaustively lftopic multifaceted covers most subthemes Egraphrased from Jarvelin and Kekalainen Cumulated gainbased evaluation of IR techniques M TOIS Vol 20 pp 422446 2002 CS 510Wimer 2007 c 2007 Susan Price and David Maier Relevance assessment exercise CS 510 Winter 2007 Using graded relevance judgments Only consider highly relevant documents Apply a threshold to create binary judgments Fewer relevant documents less stable results Evaluate a ranked list by cumulated gain Cumulative gain CG Highly relevant documents contribute more value than marginally relevant documents Discounted cumulative gain DCG And relevant documents contribute more value appearing at higher ranks than at lower ranks CS 510 Winter 2007 c 2007 Susan Price and David Maier Cumulative Gain CG Let G be a vector of values representing the graded relevance judgments for each document in a ranked list Let Gz39 be the graded relevance of the document in the z39th position ofa results list I GU ifi 1 CGz CGl1Gi otherwise CS 510 Winter 2007 Discounted Cumulative Gain DCG Let G be a vector of values representing the graded relevance judgments for each document in a ranked list Let Gz39 be the graded relevance of the document in the ith position ofa results list DOG com ifiltb 1 39 DCGz 1 Gm log 1 in 2 b Choice of b allows modeling user impatience vs persistence Smaller values of b cause greater discounting of documents retrieved at lower ranks b 2 models a more impatient user while b 10 models a more persistent user willing to examine more documents CS 510 Winter 2007 c 2007 Susan Price and David Maier Discounted Cumulative Gain DCG DOG com ifiltb 1 39 DCGz 1 Gm iogb 1 in 2 b Rank Doch Relev level Example 1 0234 0 2 0132 2 CG vector lt0 2 5 5 6 9 9i 111 121 14gt 3 0115 3 4 0193 0 DOG vector lt0 2 39 39 43 55 55 62 65 71gt b2 divide by na na 158 2 232 258 5 0123 1 5 0345 3 Average the vectors over a set of queries to get 7 0337 0 average performance 8 0256 2 9 0078 1 10 0311 2 cs 510Winter2007 15 Cumulative Gain metrics Relevance assessments may be made from an ordinal scale Assign weights to levels of relevance Turns an ordinal scale into a ratio scale Previous example shows a 4point relevance scale with weights 0 1 2 3 Could assign 0 1 10 100 Or any other weighting that fits use scenario CS 510 Winter 2007 16 c 2007 Susan Price and David Maier normalized DCG nDCG Normalize cumulative gain or discounted cumulative gain by comparing results to theoretical best results for each query Create ideal vector Fill first 139 positions with value for highest relevance level Fill nextj positions with value for next relevance level Where 139 num docs at highest relevance And num docs at next highest level Does not assume a theoretical best result ofall docs being relevant Divide CG or DCG vector by the ideal vector to get normalized vector CS 510 Winter 2007 17 Normalized DCG nDCG DCG com ifiltb 1 39 DCGz 1 Gm logb 139 in 2 b Example b 2 CG vector lt0 2 5 5 6 9 9 11 12 14gt DCG vector lt0 2 390 390 433 549 549 616 647 708gt Ideal vector lt3 3 2 2 2 11000gt Ideal CG vector lt3 6 8 10 12 1314141414gt Ideal DCG lt3 6 726 826 912 951 987 987 987 987gt nDCG lt0 033 054 047 047 058 056 063 066 072gt CS 510 Winter 2007 18 c 2007 Susan Price and David Maier Rpref Proposed generalization of bpref for use with graded relevance judgments Weighted counts of documents ranked higher than documentsjudged more relevant A misordered pair of document incurs a penalty proportional to the difference in their relevance values New metric not wellstudied yet CS 510 Winter 2007 Do batch results predict user results Hersh et al 2001 TREC Interactive track Instance recall task eg Find all the discoveries made by the Hubble telescope Two systems with identical user interfaces Similarity algorithm in experimental system shown to be better 176 better instance precision than in baseline system for this document collection using description of task as a query in batch mode All 24 subjects searched 6 topics 3 with each system No difference in user performance between systems Queries Reading speed Reading comprehension CS 510 Winter 2007 20 c 2007 Susan Price and David Maier Do batch results predict user results Turpin and Hersh 2001 Design similar to instance recall study same Ul different underlying system performance Improved system 67 higher MAP on users queries New task Find answers to 2 types ofquestions Find all small number of answers for a topic eg Name four lms in which Orson Welles actualy appeared Select correct answer of two eg ls Denmark larger or smaller in population than Nonvay Measured user s rate of answering questions correctly Found user performance 6 worse with better IR system statistically insigni cant CS 510 Winter 2007 Do batch results predict user results Turpin and Scholer 2006 Two tasks Find one relevant document precisionoriented Find as many relevant docs as possible in 5 minutes recalloriented 30 students searched on 50 TREC topics 10 systems users searched 5 topics each system Each system returned doc list with a known AP 055 095 Regardless ofthe user s query CS 510 Winter 2007 c 2007 Susan Price and David Maier Do batch results predict user results Turpin and Scholer cont Precisionbased task no relationship between user performance and system performance Recallbased task weak relationship between user performance and system performance Significant difference comparing MAP 55 to 75 and 65 to 75 magnitude of effect very small 472 of searches found no relevant documents in 5 minutes despite high AP CS 510 Winter 2007 23 Next Query expansion and relevance feedback CS 510 Winter 2007 24 c 2007 Susan Price and David Maier