# Social Networks INFS 797

Mason

GPA 3.95

This 14 page Class Notes was uploaded by Hazle Turcotte on Monday September 28, 2015. The Class Notes belongs to INFS 797 at George Mason University taught by Staff in Fall. Since its upload, it has received 20 views. For similar materials see /class/215088/infs-797-george-mason-university in Science at George Mason University.

Date Created: 09/28/15

Intelligent Methods in Database Integration 2 Inconsistency Resolution Types of Inconsistency o Intensiona inconsistencies semantic differences Examples 1 Structural differences telephone numbers stored in one field or in several fields 2 Unit differences Dollars vs Euros 3 Difference in semantics of attributes yearly salary vs monthly salary o Extensiona inconsistencies data conflicts 1 Surface only after all intensional inconsistencies have been resolved 2 Two different values for the date of birth of same person 3 Subject received much less attention 12 Intelligent Methods in Database Integration Classification Of Solutions t0 Extensional Inconsistencies Extensional inconsistencies can be handled in different ways 1 Multi answer The complete set of inconsistent answers disjunctive answer Raw information that should be resolved outside the database Ranked answer The complete set of answers but ranked according to likelihood of being correct Usually ranking derived from rate of recurrence Random Answer Single value selected at random Useful when differences among alternatives considered inconsequential Preferred answer The top value in a ranked answer Fused answer A new value synthesized from the set of answers Normally the fusion formula is provided by experts who know the sources 13 Intelligent Methods in Database Integration Weaknesses Of Current Approaches 1 Multi answerRandom are naive solutions that require no further investigation We focus on the RankedPreferred and Fusion solutions 2 RankedPreferred Because these solutions are based only on voting they are essentially useless when the set of alternatives is small or the degree of recurrence is low 3 Fusion There is no measure to indicate if the fusion is any improves on the original values 4 Fusion There is no proof that the expert prescribed the best fusion 14 Intelligent Methods in Database Integration Our Approach to Inconsistency Resolution Assumptions 1 Assume a set of performance measures that quantify the performance of the sources accuracy cost etc Assume each alternative answer is associated with a value for each performance measure Assume a utility function that expresses overall value to individual users by means of a linear combination of the performance measures Expected Adva ntages 1 2 RankingPreferred Define ranking based on utility Fusion Calculate the utility of the fusion and check if it exceeds the utility of each of the original values Fusion Find the optimal fusion with highest utility 15 Intelligent Methods in Database Integration Performance Measures 1 Recentness t The time in which the information was published Basically the timestamp Cost c The expense download seconds access fee of materializing the answer Availability 12 The probability that the source will be available when needed Accuracy 2 Assume database values are estimates of the true value and have a normal distribution around the stored value the standard deviation is the measure of accuracy Priority p A preference based on past performance or a level of authority granted by a certifying agency Quality q Essentially any specification which the data is warranted to meet or exceed Other possible We argue here more for the approach than for individual parameters 16 Intelligent Methods in Database Integration Utility Utility is a linear combination of the performance measures 1 Assume performance measures p1p2pm 2 Assume weights 11111112 wm O w 12 1wi1 3 Utility uzzgglwrpi 17 Intelligent Methods in Database Integration Ranking Straightforward 1 2 Assume the inconsistent values 51315132 xn The utilities ux1ux2uxn are calculated Ranked Answer The values are sorted according to their utility Preferred answer The value with the highest utility 18 Intelligent Methods in Database Integration Fu on Fusion is a linear combination of the given values 1 Assume numerical values x1x2xn 2 Assume coefficients a1a2 an O gai g 1 Zyzlaiz 1 3 Fusion 1 2 21 ai 33 19 Intelligent Methods in Database Integration Performance of the Fusion To compute the utility of the fusion 1 21 ai xi we must derive each of its performance measures 1 2 Recentness t13 2 now Cost c 251 4 Availability 1233 11le Nazi Accuracy 533 2 i 21 a12s213 Priority 19513 271L1 ai xi Qualityquot q13 minle a1ak are assumed to be the positive coefficients 20 Intelligent Methods in Database Integration Normalization of the Performance Measures To facilitate finding the appropriate weights in the utility each performance measure is normalized to be in the range 01 with 1 corresponding to best performance and O to worst To make each measure a function of all n coefficients whether zero or positive we use 1 Cost ail E if ai 0 then 0 else 1 2 Availability maxv x 1 am E if a1 0 then 1 else 12133 3 Quality maxq x 1 am E if a1 0 then 1 else 1133 21 Intelligent Methods in Database Integration Normalization of the Performance Measures Cont The normalized performance measures of the fusion 1 recentness t13 1 2 COSt C33 1 Zyzlla 1 3 availability 1233 2 gizlmaxvxiL1 ail 4 accuracy s13 1 Zznzl a 1 52332 5 priority WU Elizl ai pxi 6 quality qx min 1magtltqxi l1 awn 22 Intelligent Methods in Database Integration Utility Of the Fusion ua 7111 ta w2 C22 w3 8x w4 pa ws 2232 1116 qa 1 We regard fusion as an attempt to improve upon the initial values 2 Hence fusion is justified if ua gt maxglzlmxi 3 But even if the fusion isjustified it may not be the best fusion possible The fusion formula prescribed by the expert may not be optimal with respect to utility 23 Intelligent Methods in Database Integration Optimizing the Fusion The utility of the fusion 1 21 ai xi is expressed as a function of the coefficients a1 ua1a2an w11 w2391 Zlaz l U CD i1 wan i fetid 502 i1 1114 H maxvz a K1 az l 21 1115 2 arm i1 n 7116 gmmaxm 1 GNU There are methods and packages to optimize such functions 24 Intelligent Methods in Database Integration Optimizing the Fusion Example Performance data on a multi answer of 5 values Property raw x1 x2 x3 x4 x5 Recentness timestamp 10 20 30 30 60 Cost cents 80 50 30 10 140 Accuracy standard deviation 25 05 2 1 15 Availability probability 06 04 07 09 03 Priority on a scale of 0 5 4 2 5 1 3 Quality on a scale of 0 10 7 6 3 4 5 Property normalized x1 x2 x3 x4 x5 Recentness 0 0053 0105 0105 0263 Cost 0258 0161 0097 0032 0452 Accuracy 0 0800 0200 0600 0400 Availability 06 04 07 09 03 Priority 08 04 10 02 06 Quality 07 06 03 04 05 25

