MIS 373 Midterm 1 Bundle Notes
MIS 373 Midterm 1 Bundle Notes MIS 373
Popular in Social Media Analytics
Popular in Business, management
This 39 page Bundle was uploaded by Christopher Notetaker on Wednesday February 24, 2016. The Bundle belongs to MIS 373 at University of Texas at Austin taught by Chakrabarti in Spring 2016. Since its upload, it has received 31 views. For similar materials see Social Media Analytics in Business, management at University of Texas at Austin.
Reviews for MIS 373 Midterm 1 Bundle Notes
Report this Material
What is Karma?
Karma is the currency of StudySoup.
You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!
Date Created: 02/24/16
Community Structure II 01/25/2016 1. Why care about communities a. Opinion formation i. Politicized crowds on twitter b. Customer Engagement and Support i. Brand clusters and community forums on twitter c. If people are affected by their friends’ opinions (some support for this in some circumstances, but not always), then being in a tightly knit community may expose you to the same point of view over and over again. d. What determines tightly-knit community? i. Are the ties mutual? 1. 2. Max-Flow min-cut a. Pick a “source” b. And a “sink” c. Think of the links as water pipes, with fixed carrying capacity d. All nodes are junctions of pipes e. Send as much water flow from sources to sink i. “maximum flow” f. The bridge links become the bottleneck i. “minimum cut” 1. If source and sink in separate clusters, a. then flow is limited by bridges b. removing this minimum cut can create good communities g. Nice intuitive method of finding “cuts” in the graph i. Problems? 1. Min-cuts need not be “balanced” 2. The “source” and “sink” must be in different communities 3. What if there are more than two communities ii. Advantage 1. Works on directed graphs 3. Hierarchical clustering a. Compute similarity weights between all pairs of people i. Number of paths between the two people 1. More the number of paths, more similar/connected they are 2. But should long paths count for as much as short paths ii. All paths, but weigh shorter paths more (Katz Score) Good measure of similarity 1. Similarity between two nodes depends mainly on the number of short paths between them 2. a. No direct edge (Path length 1) b. Edges of Length two – 3 c. Length 3 paths – 4 3. All paths, but weigh shorter paths more a. We use a discount factor of .8 i. Smaller factor means we care more about short paths b. Any discount factor < 1.0 works c. Small factor shorter paths are more important b. Compute similarity weights between all pairs of people i. Start with all nodes disconnected ii. Connect people with the highest similarity weight 1. Iterate a. iii. A “slice” of this hierarchy gives a clustering of people 1. How can we choose the right slice a. Slice at the green line and see where the communities fall i. No easy answer ii. Pick the slice that gives the desired number of communities iv. Problems 1. Nodes on the periphery typically have small similarity weight 2. Such nodes go into isolated communities of their own 3. Though they should belong to the closest community a. Isolated nodes are very common in actual networks. Still this isn’t a huge problem because at high levels of the hierarchy, they should merge with their true cluster. b. 4. Why are the president and instructor so close in hierarchical clustering a. Both have high degree b. Both are connected to all the bridges i. Many paths between them c. More paths connect them to each other than to their students 4. Betweenness Centrality (Clustering) a. If there is a particular edge which a lot of people use to get their shortest paths i. Find bind links, and remove them ii. Keep iterating iii. Until the network breaks into two communities b. How bridge-like is a link? i. But bridges are the links that connect communities so to get the bridges we need the communities ii. And for the communities, we need the bridges? c. How bridge like is a link? i. What makes the middle link special? ii. Removing it would split the graph into two pieces iii. Could this be a definition of a bridge 1. Not robust… such bridges are rare in social networks iv. Think of the network as the road network? 1. Links represent interstates 2. Removing I- 35 would make some people’s commute longer v. Whose commute becomes longer? 1. Folks for whom I-35 lies on the shortest path 2. Betweeness of I-35 = number of pairs of nodes for whom I-35 lies on the shortest path 3. Find shortest path for each pair of nodes a. Compute betweenness for all links b. What if it is on one of 2 shortest paths? Consider this as 1/2 cost vi. Betweenness Clustering 1. Find link with highest betweenness (they exist on many different shortest path), and remove it 2. Re-compute betweenness scores (they may change!) 3. Keep iterating until graph splits into communities vii. Problems 1. Must compute betweenness between all pairs 2. Must re-compute after every iterations 3. Considers only shortest paths 5. Modularity a. If I gave you a cluster, can you measure how good it is? i. Want more links within cluster ii. Only a few links crossing the boundary iii. “More within, few across” can we make this precise iv. E within 1. Will be higher if we have lots of edges in the community v. E expected 1. Nice network, e within will be higher than e expected 2. vi. Compute the difference between E within and E expected 1. Multiple links within the network creates a modular network 2. More links across clusters increases the “E expected” and decrease modularity overall 3. Positive Number a. Modular community vii. Good Clustering 1. A community that has high modularity a. Try to merge communities viii. Major Advantage 1. A community that has more links within itself than links across is a modular community ▯ Summary We saw several community detection methods Max-flow min-cut o Which nodes are the bottleneck Hierarchical clustering o Start with individual nodes Similarity between nodes Closeness of nodes Betweenness clustering o Start off with entire network and then try and split it into two Remove the bridge Do this again and again until you split the network into two Modularity o Try to come up with a measure for a good community o Random Community the measure will be ZERO o Modularity that is high is a good cluster General Notes o Are shortest paths important? Then betweenness o Do you already know some members of each cluster? May max-flow min-cut o Modularity is a good general-purpose solution o There are many other methods as well. 1. Why care about communities? a. Opinion formation i. Possible for each community to have a separate opinion ii. People in each community tend to hear the same viewpoint from their friends 1. If people are affected by their friends’ opinions (some support for this in some circumstances, but not always), then being in a tightly knit community may expose you to the same point of view over and over again. b. Opinion formation i. Communities may converge on one product (iPhone or Android) ii. or one investment philosophy (real-estate only, index funds only) c. Customer Engagement i. A forum where users of a product can exchange ideas, workarounds, fixes ii. Businesses can learn about impending product issues 1. How worried are Blizzard users about game cheats and hacks d. Customer Support i. A forum also offers a controlled environment where customers can interact with business representatives 1. Intuit’s Turbotax product has a community for asking questions, where both other users and Intuit representatives can respond e. Customer Analytics i. Also enables measurement of splits in user community ii. and isolated sub-groups 2. What determines a tightly-knit community? a. Are the ties mutual? i. Reciprocity implies closer connections 1. On Twitter, reciprocal ties è real-life friendships ii. Unreciprocated ties could be customer → celebrity/authority 1. Perhaps the authority holds the community together? b. How close are the members? i. Everyone knows everyone is best, but unlikely ii. Everyone knows everyone through a friend iii. Generally, short paths between members c. How close and densely-connected are the members? i. Everyone should have connections to at least k others ii. Avoids a single point of failure d. Is the community separated from the rest of the world? i. Group members should have relatively more connections within the community than outside it e. What determines a tightly-knit community? i. Are the ties mutual? ii. Are the members close to each other iii. Are they densely-connected? iv. Is the community separated from the rest of the world? 3. What are the common community patterns? a. Clique i. Must have 1. Everybody know everybody 2. Most densely connected 3. Can have overlapping cliques ii. But not robust 1. Even a single missing edge disqualifies the community 2. So typically only have small cliques not interesting b. K-Core i. Everybody know at least K others in the group 1. At least one node knows exactly k others ii. Every clique is K-Core iii. Loser version of a clique 1. iv. As we go from large K to small K 1. The community structures becomes weaker 2. But we can find bigger communities a. K-cores offer this tradeoff between within- community linkage and community size. i. c. n-clique i. Another generalization of cliques ii. Everyone knows everybody, within n hops 1. At least one pair of nodes is exactly n hops away a. d. What are the common community patterns i. Clique ii. K-clique iii. N-clique 1. General Information a. Members close to each other n-clique b. They have densely-connected K-Core c. Ideally you want both d. The community is separated from the rest of the role e. The first 2-clique has “strong ties” only, while the second one has only “weak ties” (though in this case, topologically they are identical) 4. Twitter Topic Communities a. Consider a community of twitter users using a hashtag i. X replies to Y ii. X mentions Y iii. We can build a replies-and-mentions network for the community b. What do these communities look like? i. 5. Team Assembly a. P = newcomer b. Q = incumbent c. You want to assemble a team of individuals i. How many? ii. Do you pick newcomers or incumbents? iii. If you pick incumbents, do you pick those who have worked together in the past? d. You want the team to bring in i. different ideas ii. different skills iii. different resources e. But working with new folks might not always be a boon i. potential for conflict ii. too much time spent “figuring each other out” f. g. Decide team size (say, 3 team members) h. Pick first member i. Incumbent or newcomer? 1. Say incumbent ii. Pick an incumbent at random 1. Say, person number 4 a. i. Pick second member i. Incumbent or newcomer? 1. Say, incumbent ii. Past collaborator? 1. Say, yes iii. Pick a past collaborator of 4 1. Pick person 3 2. Pick third member a. j. Pick Third Members i. Incumbent or newcomer? 1. Say newcomer ii. Pick one newcomer at random 1. k. Team is complete i. Add it to the collaboration network l. You want to assemble a team of individuals i. How many? ii. Do you pick incumbents? è probability p iii. If you pick incumbents, do you pick those who have worked together in the past? è probability q iv. Each team adds to the underlying network of collaborations v. What probabilities p and q work best? m. i. The third image is possibly just right? 1. Knowledge exists in one reasonably large component, but still very distinct viewpoints in different parts of that component. 2. We need new comers to generate new knowledge n. i. Blue is too many newcomers, red is too many incumbents. The green line is where you want to be 6. How should I pick my team? a. Picking incumbents = 50-60% i. Roughly half of team members should be new blood b. Picking past collaborators of incumbents = 70-80% i. Mostly proven relationships è less disruption ii. but also some new connections between old hands c. Ideally close to the tipping point i. Enough connected individuals ii. But still having distinct ideas 7. Summary a. Communities are critical for businesses hoping to understand their customers b. Basic structure i. Reciprocated links ii. Members are close to each other, and densely connected iii. More links within community than outside iv. Several such examples from Twitter c. Assembling teams i. Pick enough newcomers (50-60%) ii. But pick past collaborators of incumbents (70-80%) ▯ Outline Reputation in e-commerce o Reputation as asymmetric information The Market for Lemons Case Study: Trip Advisor Reputation in e-commerce Consider e-Bar’s problems o eBay connects buyers to sellers But buyers and sellers never meet face to face So why should buyers trust sellers? Or trust eBay? ▯ The Market for Lemons With asymmetric information there is no market at all When can we have a deal o If there is a set of products for which o The expected value for the buyer > the maximum value for the seller o Then a market exists for those products The dealer will always make a profit The buyer will expect to make an average profit of something greater than zero When does a market exist? o If both buyers and sellers can judge the car type For each type of car Value for buyer > value for seller o If neither buyer not seller can judge the car type Expected value of buyer > expected value of seller o Asymmetric information Need expected value for buyer > maximum value for seller ▯ TripAdvisor Largest travel site in the world 4.4M business 315M unique monthly visitors 200M reviews Engaged community o 26M user photo uploads o Active community forum o 85% of questions are answered within 24 hours Business Landscape o Information aggregators TripAdvisor Google and other search engines Kayak o Online Travel Agencies (OTA) Expeida, Orbitz o Hotels Reviews o TripAdvisor Reviews/Rankings No proof of stay needed to post review Ads + fees for business listings Businesses who list do no get special display on website Hotels can’t offer discounts for reviews o Online Travel Agencies Reviews Verified reviews (links sent to guests who stayed) Charged businesses for completed sales o Hotels Testimonials/reviews o Which reviews are most believable? Online travel Agencies TripAdvisor Problems o How do people find reviews 40% consulted multiple websites 33% went to hotel’s site 15% asked questions on social media o Why go directly to the hotel’s site? S o How do people perceive reviews? Those who read reviews were 81% more likely to buy from site 72% users trust online reviews as much as recommendations from family and friends o So much trust? 2%-6% reviews are fake or deceptive Paid reviews o People apply strategies to deal with fake “Disregard the top 5% and the bottom 5-10% of reviews 50% of customers paid little attention to the most extreme reviews Number of reviews? Why write reviews o Because people care Help others make good decision (90%) To share experiences (85%) Reward a business that had provided good service (80%) To help business improve (75%) o Agrees with other sites Sladshot: Users get “moderator points” Yahoo! Answers: get expert points Amazon: Top 500 reviewer Do reviews matter to hotels? o “Marketing-included… word of mouth generates twice the sales of paid advertising” 10% improvement in online ratings for hotels can support 8% increase in average daily prices 2% increased room occupancy How have hotels responded? o Self-promotion: “Loved by Foursquare Mayors” o Encouraging reviewing: “Please write about your stay” o Fast responses to reviews and social media comments Posting specials and offers is less effective than assisting customers with reservations and hotel stay ▯ Force Atlas Layout (Click Run) Repulsion Strength o Spread out nodes that don’t have strong connections, clump the nodes that have strong connections Graph Too Squished Increase repulsion strength Graph Too Large Decrease repulsion strength Edge Weight Scale o Increase/decrease the edge darkness ▯ Node Attributes Degree o Makes the nodes change color based Highest Degree Nodes are darker colors o Degree means that people must connect with this person, most edges connecting them ▯ Change size of nodes Graphic o Attributes of Nodes o BETWENESS CENTRALITY Min – 10 Max - 50 Betweness centrality means that people have to go through this person in order to reach other parts of the network Click this to change the size of nodes ▯ Calculating Average Path Length in order to get Centrality Measures Go to STATISTICS o GO to AVERAGE PATH LENGTH CLICK OK ▯ RUN Modularity Statistics Resolution o Lower number = more communities o Higher number = less communities ▯ FILTERS Topology o Filter by degree This gets rid of the stragglers in the network with little degrees of connection ▯ To export your final project Go to project ▯ Degrees and Distributions Justin Bieber has high in-degree or high out-degree? In degree! More people follow him than he follows. Ratio of most populous to least populous city is ~ 150,000 Most cities have a very small population. Mega cities skew data, raises the average greatly. Not best to use average as an estimate in this situation. Should use mode. Social network is similar to cities. Most people have low popularity and then there are celebrities with huge popularity. Skewed right. ▯ Power Laws We can’t see much at all in the left plot. Points are too close together, hence the log-log plot. Squish plot so powers of 2 are equidistant. Popularity changes from x axis to y axis. When talking about long tail, generally talking in the “business sense” ▯ Business Implications Songs available at wal mart and rhapsody (most popular songs), but rhapsody also provides “niche music” in long tail. Same as amazon vs. a book store Same as netflix vs. blockbuster ▯ Measuring Power Laws Pink line is the true slope If you do regression, get the gray line…. line is off because of the concentration in the tail. This will not work. In order to fix inaccuracy of regression line: -split data into equal-sized buckets (log scale) -could total frequency in each bin, and then plot those numbers (numbers in red) Slight problem…. this slope is shifted from original. Easy formula to fix this… next slide Power law exponent (green) = slope of binned points (red) + 1 Power law only holds to the right of the red line 1. 4 distinct groups: Split by age and race L 2. Mostly split by race 3. Generally, people of one race seek out the other. (Opposites are preferred; no homphily) 4. Opposites also preferred for food web (same species don’t eat each other). Homophily does not exist here. 5. Assortativity coefficient positive means you prefers others like you. 6. Assortativity coefficient = degree of homophily 7. Baseline for no homophily: half and half? 8. But what if there are more republicans in the population than democrats? a. What if democrat/republican ratio of friends = democrat/republican ratio of overall population. Is that the baseline? b. Most people follow celebrities… if all celebrities are democrats, then democrats will be over-represented. i. Need to account for the celebrities that you follow Measuring Homophily Twitter-style network. Some cross-connections. They could be just from randomness, too. To fill in the middle of Random Baseline….. Top vs Bottom Left vs Right Need to find expected values Each connection falls in one of the four cells randomly. The chance that it falls in the first row is 40/74. The probability that it falls in the first column is 42/72. A basic fact of probability is that if two events are indepdendent, prob( event 1 AND event 2) = prob(event 1) * prob(event 2). Here, event 1 is “falls in first row” and event 2 is “falls in first column”. So prob(falls in first cell) = prob( row 1 AND column 1) = prob(row 1) * prob (column 1) = 40/74 * 42/74. This is for one connection; there are 74 of them. So total connections in first cell = 74 * 40/74 * 42/74 So basically fill in the expected values for the inside boxes! Expcted counts calculations Chi square test to see if it fits random baseline. It is far away from the baseline, it is most likely homophily. This result (very small chi square) means homophily likely exists ▯ Homophily Mechanisms Bigger nodes represent obese individuals. Red border=women, blue border = men. Yellow interior = obese, green interior=nonobese. Colors of edges denote different types of ties. Is obesity innate or are your friends obese so you become obese (like a virus)? Attempt to predict if a person becomes obese on certain factors, age, gender, and obesity level of friends. Alter = another person Findings: to some extent, true. Controversial study. Mutual friends have strong ties in increase in risk of obesity. Instead of a social network of people, you have people and affiliations. This leads to an inferred social network, connecting directors on the same company board. ▯ Affiliation Networks Triadic closure: If I have 2 strong friends, they are likely to become friends (Anna and Claire gain a tie) Selection: Anna befriends Daniel because they both have a taste in karate, select friends based on taste Influence: Anna introduces Bob to Karate, Anna influenced Bob More common friends, more chance of friendship. This holds. The more you and I have in common, the more you and I are likely to become friends. This holds up to a certain point (3 affiliations) Holds up to a point. There might be an effect of “I’m different”. ▯ Segregation from homophily Can homophily at a local level explain this, or is there a more global force at work? A-A neighborhoods grow in concentration. Is this because of government policies or that people prefer people similar to themselves? à Can homophily cause such a large-scale pattern? People want 3 individuals of the same type living near them (this number varies), then eventually neighborhoods became segregated. When number becomes 4, communities are eventually almost 50/50 segregated. Small changes in homophily locally cause large amounts of segregation on a large scale So clearly strong segregation can occur from minor homophily at the individual level ▯ Homophily Mechanisms Social distance = length of shortest path in the social network Notes MIS 373 1/27/16 1. It’s a small world a. A couple of interesting facts i. It isn’t just Kevin Bacon ii. It is surprisingly hard to find long paths iii. We’ll ad to this later 2. Outline a. The small world experiments i. Milgram’s experiment and follow-ups b. Basic Search in Networks i. Breadth-first ii. Shortest Path c. Models i. Watts-Strogatz ii. Geographical 3. The Small World Experiments a. Pathways are not quite shorter now a day i. Only 384 chains made it ii. The ones that didn't might actually have required longer paths b. We must account for attrition i. Participation rate = 37% everywhere along the path 1. Each new ‘chain link’ or group, 37% participated and 63% did not participate ii. Longer chains are more likely to get "lost” or deleted along the way iii. If we observe N chains of length 1, there were probably N/0.37 actual chains of length 1 1. For length 2 N / (0.37*0.37) 2. For length 3 N / (0.37*0.37*0.37) c. Median chain length after accounting for attrition i. Chains within your country: 5 ii. Chains across countries: 7 iii. All chains: 7 d. How do you know the next recipient i. Most of the time people pick their next contact of some friend of theirs (mostly from works/school/university) 1. 25% of the time people know these people are weak ties e. Why did you pick this person as the next recipient? i. 1267 total “target” individuals 1. How many of your friends do you use? a. Almost 50% of the time, 35 friends are used 2. Why pick these friends? a. 47% work-related reasons b. 45% geographical reasons c. 7% other (e.g., "they know lots of people") f. But are people making mistakes? i. Asked N=105 people how they would route a letter to any of the others 1. I know this person, or 2. I don’t know him. Instead, I’ll forward the letter to this other person I know ii. Compared these chains against the optimal shortest path iii. They find 1. The mean small world path length (3.23) is 40% longer than the mean of the actual shortest paths (2.30) 2. Model suggest that people makes less than optimal small world choices more than half the time 4. It’s a small world a. Interesting facts i. People can find these paths using only local information 1. Primarily using geographical and work-related reasons 2. But the paths they find are 40% worse than the shortest paths ii. And yet, we expect high clustering coefficient 1. My friends’ friend tend to be my friends too 5. Basic Search in Networks a. How would you find the shortest path in a network? i. 1. What is the shortest path from s to y? From s to everyone else? b. Breadth-first search i. The world is split between 1. “discovered” nodes we know all their friends a. Green 2. “undiscovered” nodes we know nothing about them a. White 3. “frontier” nodes we know some of their friends a. Light Blue ii. Initialization 1. Only person S is on the “frontier” 2. Everyone else is “undiscovered” iii. In each step 1. Expand the frontier a. Each time you expand a node out from the frontier it becomes discovered iv. Step 1 1. v. Step 2 1. vi. Step 3 1. vii. Step 4 1. viii. Step 5 1. ix. Step 6 1. x. Step 7 1. c. Frontier Grows Very Quickly i. If every discovered person brings in 5 new undiscovered friends 1. Step 0: 1 person 2. Step 1: 5 people 3. Step 2: 5*5 = 25 people 4. Step 3: 5*5*5 = 125 people 5. Step 4: 5*5*5*5 = 625 people a. Add all the people up (1+5+25+125+625 = 781) ii. In real-life people have many more friends (~150 people) 1. Many more pairs of people connected by short paths as a result of larger frontiers 6. The Watts-Strogatz “Small World” model a. Parameters i. N = number of nodes ii. k = number of close friends iii. p = rewiring probability b. Idea i. Everyone has lots of close friends and a few “weak ties” to far- off acquaintances ii. Weak ties are random throughout the network in the Watts-Strogatz model c. The Process i. Start with ring of N nodes ii. Connect each node to k nearest iii. For each edge (u, v), with prop p, rewire it to (u, w), where w is chosen uniformly at random d. Starting Graph i. High clustering coefficient ii. But long paths to people on the opposite side 1. e. Ending graph (if we rewired every edge) i. Edges are random ii. There is no clustering iii. The Frontier grows quickly since each discovered node brings in lots of random undiscovered friends shorter paths 1. f. Somewhere in the middle i. Just right 1. g. Summary i. Solid lines: “strong” ties to close neighbors ii. Dashed lines: “weak” ties to far-off acquaintances iii. Strong ties give it high clustering iv. Weak ties give it the short paths 7. How do we incorporate geography model? a. Main idea i. Use the Watts-Strogatz small-world model ii. Except “weak” ties are not random in the Geographic Model 1. They are determined by geographic distance b. c. Decay Rates i. Suppose r = 5 (decay rate) 1. a. Distance = 2 then 1/32 chance of meeting them b. Distance = 3 then 1/243 chance of meeting them 2. If Decay rate is lower then it is more likely that I will be able to meet them through weak acquaintances a. 3. Smaller decay rate a. i. r = 0 1. Weak ties are haphazard ii. r = 4 1. Mostly local connections iii. Weak ties are everywhere when decay rate r=0 ii. There is a trade-off 1. Small r no effect of geography random graph 2. High r no far-off connections can’t have short paths to everyone iii. How do we navigate the grid? 1. Weak ties can “halve” the distance in each step (roughly) a. Use weak ties initially for long-range jumps b. Use strong ties later to home in on target Notes MIS 373 1/25/16 1. Intro a. Finding a job i. Go through a random acquanntience to find your employer 1. Even though you have a bunch of close friends, its your global structure/friends that help you find you jobs a. The new job is not that far away from in the social network ii. People find jobs through weak connections 1. Despite finding jobs through weak connections, the structure of the network is still well connected a. Just 1 or 2 hops from new employer iii. At a higher level 1. Who has what information 2. Who has new information b. Main idea i. “Strong” ties with close friends 1. Connected in deliberate ways ii. “Weak” ties with far-off acquaintances 1. Connected in far off different ways iii. These ties are structurally different 2. Lecture Outline a. Triadic Closure i. Main Idea ii. Measuring via clustering coefficient b. Bridges i. Connection to Triadic Closure ii. Measuring via neighborhood overlap 3. Triadic Closure a. Basic principle i. If two people B and C have a common friend A, then ii. B and C are likely to become friends themselves 1. The B-C edge “closes the triangle” b. Why i. Opportunity ii. B and C are likely to be introduced via A c. Trust i. B trusts C because they have a common friend A d. Incentive i. A feels latent stress if B and C are not friends e. Homophily i. A, B, and C all like the same things f. End effect i. We expect friendships to “close the open edge” ii. Leading to lots of triangles iii. Many more than in a “random” network 1. Social networks close the connections iv. How do we measure triadic closure 1. Clustering coefficient 4. Measuring triadic closure a. What are the properties of a score for triadic closure i. Consider two nodes A and B ii. Suppose they have no common friends 1. Does it matter to triadic closure a. For Triadic Closure there needs to be a common friend iii. Situation 1. No common friends a. No effect on Triadic Closure 2. Three nodes with Two connections a. High chance 3. Three nodes all connected a. Higher chance 4. Mutual friends all connected a. Highest chance iv. Effects 1. No common friends -> no effect 2. With common friend(s) -> closing the triangle is positive 3. More common friends -> stronger effect 5. Clustering coefficient (CC) a. Look only at “V” shapes around each node b. Score depends on weather the ends of the V are friends or not c. CC of A = Number of “Closed” V shapes around A // Number of V shapes around A i. 0 = horrible triadic closure 1. None of my friends know each other ii. 1 = perfect triadic closure 1. Me and my friends are all together d. CC of A = 2 * Number of edges between friends of A // (Number of friends of A) * (Number of friends of A -1) 6. What does triadic closure tell us a. High triadic closure i. My friends all know each other ii. We are all close to each other iii. So we probably have access to the same kinds of information b. My close friends might not have new information i. The job opening they know of, I already knew about them c. Then who can give me new information i. Bridges 7. Bridges a. Bridge i. An edge that is part of every path between the green and red nodes ii. Removing the bridge would disconnect the groups b. However bridges are rare i. Networks are more robust 1. Internet 2. Power grid c. Local Bridge i. Removing a local bridge ii. Doesn’t disconnect the network iii. But increase path lengths 1. Makes the commute harder between two nodes d. Net effect remains the same i. New information arrives via the bridge 8. Bridges and tie strengths a. Bridges bring new information i. Why should a bridge be an acquaintance 1. Strong ties between A and C and A and B makes the likelihood on connection between B and C more likely 2. Social Networks have… a. STRONG TRIADIC CLOSURE i. If A has strong friendships with B and C, then B and C must be connected 3. Suppose A --- B is a bridge a. Can it be a strong tie? i. Yes, but close must exist between all mutual friends of A and B ii. **** If a bridge is the only connection between two communities, then it MUST be a weak tie. It must be an acquaintance **** 1. Otherwise this create interconnected communities and there no longer is a bridge b. “I got the job from an acquaintance, not a friend” i. You heard about the job from bridges, because your close friends are all in a clique, and new information only comes from bridges c. Who are bridges? i. Two close friends of mine are likely to know each other 9. Measuring bridge-ness a. Our earlier intuition i. Removing a bridge pushes communities farther apart 1. In practice, this is too rigid 2. A more robust measure is called neighborhood overlap b. Neighborhood Overlap (Bridges) i. How much overlap is there between friends of A and friends of B? 1. Less overlap means a. Lesser exchange of information b. The communities of A and B are farther apart c. The A----B edge is more bridge-like ii. Jaccard Coefficient (JC) = Number of friends both A and B // Number of friends of either A or B 1. Smaller values -> more bridge-like, less closely connected 10. Tie Strengths on Facebook a. Maintained Relationship i. Simply viewed the other profile b. One-way communications i. Commented or posted c. Mutual communications i. Comments reciprocated d. Analysis i. As we move up the hierarchy of communication less and less friends appear and clusters break down ii. Notes MIS 373 1/20/16 1. Gephi and NodeXL 2. Group Project a. Present using Gephi and NodeXL 3. Group Assignment a. Two Group Assignments i. One group assignment ii. One group assignment with presentation 1. Each group gets $50 a. Find a client that has a Facebook page and build an advertising campaign b. Write up a report and present in front of class, what worked, what did not work 4. Grading a. 1 group assignment (10%) b. 1 group assignment with presentation (20%) c. 1 group project with presentation (25%) d. 1 midterm e. 1 final 5. Material a. Some quantitative material 6. Attendance a. Attend both presentations days for each presentation (four days) 7. Book a. Networks, Crowds, and Markets: Reasoning About a Highly Connected World, by Easley and Kleinberg 8. Reading packet online
Are you sure you want to buy this material for
You're already Subscribed!
Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'