Inform Visualization CS 7450
Popular in Course
Popular in ComputerScienence
This 0 page Class Notes was uploaded by Alayna Veum on Monday November 2, 2015. The Class Notes belongs to CS 7450 at Georgia Institute of Technology - Main Campus taught by Staff in Fall. Since its upload, it has received 14 views. For similar materials see /class/234149/cs-7450-georgia-institute-of-technology-main-campus in ComputerScienence at Georgia Institute of Technology - Main Campus.
Reviews for Inform Visualization
Report this Material
What is Karma?
Karma is the currency of StudySoup.
Date Created: 11/02/15
Information Visualilation Ag m A chmrds Warm mun Wards Data Explo on Innedb e amnunt was aza abb Wealhev Yvaf t Evans was Nudes mm Fma ma mfmm awn Er Max s m v Gready fammated by Internet am Wat E H ow Ill uch data A w Esnmatas beetwaen 1 and 2 2mm Bf muuuuuuuuuuuuuuuuu mm mm Purva dacuments an v mm mm a ui xhvg x 3mm Big challenge HDW an we malt2 smse Bf m2 malt ng pmeeeeea Samara AgmB venues Data MW 9 sanwae that ana Vzes database and exuaezs ntevestmq Teauves nfcrmauun Vsuahzaunn V sua ma s a new usevs nemey examne the datathems v95 z 2 mammary HumanV on H hastdede sense Pr Extends Mammy and ugmtwe capamy WWW m PEEp E mm wsuaHy w gt39gthz Exam ple cumquot Ie the Data z h Even Tougher What if you could only see 1 state s data at a time eg CenstBurau39swebsite What ifI read the data to you ini m CSPsv usn m Illustrates the Idea Provide tools that present data in a way to help people understand and gain insight from it Cliches Seeing is believingquot A picture is worth a thousand wordsquot ini m CSPsv usn n Visualization Often thought of as process of making a graphic or an ima e Really is a cognitive process Form a mental image of something Internalize an understanding a The purpose of visualization is insight not picturesquot Insight discovery decision making explanation ini m CSPsv usn u Main Idea Vsuab he p us mm smvaqe avea Extens Dgnmnn r Ra s H mm mm m mm and veasan An Exampb maung Mare usz Fvam W Napolean39s March ml g m mm mm 33 33 75 London Subway WM 5 nfcrmaturvquot Items enhhes mm mm da rm have a ma thsxm avvesvandente mvmtmt tau 7 Exavv W25 annettmns betwem nmma i av atmhutes Information Visualization What is visualization The use of computersupported interactive visual representations of data to amplify cognition From Card Mackinlay Shneiderman 98 ini m CSPsv usn n Information Visualization Essence Taking items without a physical correspondence and mapping them to a 2D or 3D physical space Giving information a visual representation that is useful for analysis and decision making ini m CSPsv usn zn Two Key Attributes Scale Challenge often arises when data sets become very large Interactivity Want to show multiple different perspectives on t e data all m CSPsv usn Zl Domains for Info Vis Text Statistics Financialbusiness data Internet information a Software all m CSPsv usn zz Components of study a Data analysis Data items with attributes or variables Generate data tables Visual structures Spatial substrate marks graphical properties of marks UI and interaction Analytic tasks to be performed Browse correlate identify associate lel m CSPsv usn z More Examples Seeing is believing lel m CSPsv usn 24 m mm m Eurek pm case Study Unuemawmng meath Hierarchies Dz nmnn bases 51 he mum D g WW5an an mde m Wm 25 ave Davents m ancesmvs afathev g m Naswe may msmes antesmes Fw edwectaw 5mm an mmvuhevs quamzatmn hats mm mm mm qenus Ohyectranented sa wave dasses aw m mm m m Oquen m mummy Wren Treemap sm apacmhng represemanm devehped by Ehnewdaman am Juhnsm V5 91 Ch drm are drawn mm thew parent 2am 541255va bve mam Treemap Sag Emma Wm mm a mum Map of the Market Sunburst InfoVis Techniques Aggre at 9 mu 5H mm as ndeua e awmts mm a 3921 D21 vade hath q aha avemew ma dela umtta he Dvesmted as smewhab erwew s maman avahxhhes v Fums Cantax t 7 Shaw dela s D we m Mme Veqmns m a wave qmna ante sq sheve M antra ovemw rst mum am ma hm Gem s an demand a Quademun To Learn More NEW WWW mmmm Infovis Tools Evaluation 7 Graham Coleman Datasets and Tasks I chose my datasets based on interest and I chose questions based on interest and and idea of what tasks might be theoretically dif cult clustering similarity prediction In retrospect some of these questions are pretty hard to ask without more complete statistics packages backing them up but the questions are at least plausible for mutual fund and lms databases Films Fl what was the worst lm in the data set F2 what was the worst lm despite popular actor F3 who are the best actresses of medium popularity F4 Display information on Through a Glass Darkly F5 Find similar lms to Through a Glass Darkly that are directed by a Ingmar Bergman b some other director Mutual Funds Ml2 Find a pair of funds that are similar M3 Find a pair of funds that are very different M4 Make an interesting prediction about the future M5 Find some characteristics correlated with success for a fund Hypothesis Spotfire 7 the scattercorrelation tool Infozoom 7the spreadsheetparallel coordinates tool SeeIt 7 the presentation graphicshistogram tool Eureka 7 the spreadsheetonsteroidsHcoords tool Since Infozoom and Eureka have similar approaches to infovis they should perform similarly on my tasks Spotfire should perform well on the correlation tasks and on the clustering tasks Ml3 M5 SeeIt summarizes things by binning them and should be able to produce high quality chart outputs Questions Which tool do I personally like the best Which feature seemed to afford the most insight What were surprising facts about the datasets I leamed from each tool Write a section on why I chose the tasks General comments Spotfire Pro View tip is an interesting and surprising tool Detailsondemand looks like it could be used for an arbitrary media so you could link it to a more indepth domain knowledge tool Filtering via dynamic query I d like to try I love the jitter feature it is kind of fun SeeIt The undo didn t work as expected I made an unwanted change in the Binning but couldn t undo the change InfoZoom From Overview mode I see the prior distributions in the data very nicely I like how operations in Infozoom were constructive filtering and derived attributes are reversible and only add to your view on the data rather than Eureka Eureka Is it simplistic There don t seem to be too many options It seems to be hard to copypaste some of the items Task processing Spotfire F l Plotting year vs popularity in the 2d scatter plot I filtered the popularity down to the lowest entries using the dynamic query objects then zoomed on the popularity aXis to get a better view of what was left From there I could see the line of Popularity 0 objects Selecting them and the Detailsondemand html led to a table representation of the films at the bottom left comer I copypasted these to Excel Year Title Actor Actress Director Popularity 1987 Hot Child in the City Prysirr Geof Hendrix Leah Ayres Florea John 0 1968 Shalako Connery Sean Bardot Brigitte Dmytryk Edward 0 1980 It39s My Turn Douglas Michael Clayburgh Jill Weill Claudia 0 1976 Shout at the Devil Marvin Lee Parkins Barbara Hunt Peter R 0 1993 Employee39s Entrance William Warren Young Loretta 0 1983 Afterthe Rehearsal Josephson Erland Olin Lena Bergman Ingmar 0 1963 Silence The Thulin Ingrid Bergman Ingmar 0 1930 Anna Christie Bickford Charles Garbo Greta Brown Clarence 0 1932 Number Seventeen Missing Hitchcock Alfred 0 1990 A Chorus of Disapproval lrons Jeremy Pigg Alexandra Winner Michael 0 1988 Nightmare at Noon Hauser Wings Beck Kimberly Mastorakis Nico 0 1992 Flame amp the Arrow The Lancaster Burt Mayo Virginia 0 1953 Tales of Tomorrow Karloff Boris Missing 0 1970 Airport Lancaster Burt Bisset Jacqueline Seaton George 0 F2 We rst want to lter on the popular actors and actresses We ll de ne this as actoractress With at least one 75 percentile popularity movie Rather than with a particular frequency of popular movie g i Mg 11 E39D We drag the dynamic query object to display only popular movies At this point again we have a hard time ltering on the actors represented It doesn t help the the dynamic query for Actors can t be turned into a checkbox I tried doing a histogram and using average values over popularity but this didn t work as you couldn t sort on the averaged value I suppose I could produce the additional eld in a spreadsheet but I think that defeats the purpose Why can t I select a subset of actors in the DQ object Annoying F3 This task is similar to F2 on higher order of dif culty F4 It took me a While to scroll through items using the single item slider This could be improved F5 This was simple enough I located the movie sorted on popularity and length ltered on Subject Drama I zoomed in on a cluster surrounding TAGD and selected a group which appeared in the detailondemand subwindow amp Confess The Ml2 I decided to use Net Assets and Yield as similarity measures as well as Momingstar rating It looked like Net assets and Yield needed a log scale This seemed to better distribute the data across the ranges I encoded Momingstar rating by the color and the size encoded the expense ratio Finding a group or pair of items just boiled down to nding a group of similarly shaped and colored items close together 7 42m a le at law M5 WW deb sml llelelaaelnulw Hll l l lgiz Mm null Name All L i mm Al E l m m 22 iaaao D i w m 2727 7273 E F I w m 22 18534 i am i359 mu 239 5w 2 as 32m D m i F4 in 222 25m M u m 1 21 MWWW p gm 5 1quot My Nelassets cm a n n mum g 1 mm mm WW I l l 2 t l Vleld v A gt i as 3 izszniemua l l y A laswmn Name BlackRock Intl Instl MPIEX Large Cap Inst Large Cap Value ll Stock GMO Value Small Cap Value lnstl PNSEX Group Small Cap Value M3 We ll select a pair of diametrically opposed funds a different size and color This should be easy because of Visual comparison skills Mornings Expense Name Symbol YTD Yleld r ratlng ratlo Mgr teant ass category SSgA Actlve Internatlona International SSAIX 1354 38 1 3 977677Etocks American Century Equ Growth Instl AMEIX 2553 101 O 159E Barge Value M4 I realize more and more this is a ridiculous question M5 This is where I pulled out the Spot re coupdegrace the View Tip window It displays pairs of variables much like a correlation matrix sorted in order of correlation of each pair of variables That means it suggests which pairs of variables to look at which is a nice hint The 3M0 vs Yield graph looks interesting It seems to suggest that companies with no yield have a variety of 3M0 possibilities but more yield constrains 3M0 the more it increases v W m m a l eal ulk ll l lgiz Hjm mull mm Ll a Uh u rm illrlnll lillflllldtl YV mm 2 mmm mmm mm W 3mm v am am l v a impam Eureka F 1 This one couldn t have been easier Sort by popularity focus click on a value and drag until you pass the last popularity zero entry Your focused entries are displayed in maximum clarity as opposed to the Spotfire bottom right detailsondemand F2 Filter by right clicking on the column header select only the top half of popularity movies Then I hit a roadblock I was trying to gure out how to lter on the global data using the subset of Actors that were represented in the high popularity movies But for variables represented as Text Eureka allows only substring matches After chugging on this for a little while I realized I could have it interpret this data in other ways Column PropertiesgtTypegtCategory This allowed me to set up a filter based on the list of currently represented actors But I hit another roadblock There doesn t seem to be away to undo the first popularity gt 44 filter leaving only the popular actors filter nor is there a way to store the filter as a file and refilter the global data In this regard it seems a senseless decision to have filters be destructive I ll work from a modified question Given the top ten actors in terms of average film popularity find the worst film There doesn t seem to be a derived variables capability in Eureka so I m not sure there is a way to calculate averages Ok for my third pass I ll try to use one of Eureka s capabilities to answer my question By sorting on Popularity I can select the actors in popularity 88 movies Then I filter the Actor column by the names currently in focus after a sort on actor names scrolling down the list and adding the top names by scrolling and inspection holding the Ctrl key This produces a smaller list which we sort by popularity Focusing on the lowest popularity ones we read off a few 1953 7 Tales of Tomorrow with Boris Karloff pop 0 1981 7 Taming ofthe Shrew with John Cleese pop 2 1986 7 Hoosiers with Gene Hackman pop 2 followed by a string of John Wayne movies F3 To save time and pain I ll do my best to define this one idiomatically I ll focus a small swath of actors within the medium range of popularity and filter on those actresses From there I can see which ones made popular movies and which ones got awards Actresses that produced awardwinning popular movies as well as midpopularity movies Meryl Streep2 Sophia Loren2 Kathleen Turner F4 Originally I was having trouble finding the entry by waving the mouse over the list Then I ltered using a substring of the title glass which worked The entry is displayed in tabular format F5 a Focus on Through a Glass Darkly subjectively selecting some variables which imply similarity We can sort on these rows I chose Director Actress and Subject We see a small pocket of dramas with the Halriet Anderson Ingmar Bergman team Besides TAGD they are Cries and Whispers Dreams and The Naked Night b Focus on TAGD and sort on Harriet Anderson expand focus Apparently in this database she only has acted under Bergman as director We will chose another similarity criteria We try similarlengthandpopularitydramasthatwonawards By inspection we choose SeX Lies and Videotape F6 I haven t really noticed any surprises yet with this tool so I figure I should go hunting around for one Sorting by subject I was interested in what things could be correlated with movie genre Looking to the left column it was pretty clear that Drama movies had a large percentage of awards 16 of all dramas whereas Comedy and Action movies seem to have fewer However Eureka seemed to offer no method of verifying this beyond scrubbing to count row indices and division in one s head lnxigm Eureka 7 a14ilmcsv Eme 39Dquot a all View Database Web Yaols Optmns Windaw Help aaaa gaisrmgg ll might K4 lsunlgu yea imam ma W s D m Pppulzyw lszms lKey 1933 Em suhyect l Raw 7mm 4 A ll lll Ml Since the sorting seemed to be awkward for complex criteria I looked for funds that had similar YTD and Morningstar attributes By sorting on these attributes and focusing on a group I could chose a result Once we ve focused on a group we can find additional similarities by sorting on some of the other attributes and eyeball the others Republic Equity A Johnson Growth IDS New Dimension IDS and Kent Growth and Income Instl seem to be similar based on Morningstar YTD 3M0 3YR M3 ame AIM Weingarten Instl seems to be dissimilar to Bantam Value Instl Name Symbol YTD 3M0 lYR 3YR 5YR lOYR Yield Morningstar rating Expense ratio Mgr tenure Net assets Category AIM Weingarten Instl 33030000 27610000 33030000 25770000 21950000 0390000 0640000 7000000 89623964000000 Large Growth Lazard Bantam Value Instl LABVX 13820000 11940000 13820000 0000000 1050000 2000000 59952338000000 Small Value M4 I think M4 as I asked it could be an impossible question I ll try to restrict my question to find a fund that performs well Given my limited knowledge of mutual funds I ll try to chose a subset of the variables that might be good and select a fund on that basis By sorting on YTD assets and expenses I hoped to find a good company I chose Profunds UltraOTC Inv M5 I have no idea how to define success for a fund but for this example we try to find correlations in the data with high YTD 1YR and 3YR values Sorting on these fields I tried to identify correlations within the parallel coordinate views High YTD 1YR and 3YR values seem to be strongly positively correlated with 5YR values correlated with high Morningstar rating negatively correlated with Yield and weakly correlated with Net Assets Infozoom F 1 This process is similar to Eureka In Compressed Mode we sort on Popularity by clicking the arrow to the left of the row Then we go to Wide mode to see the results which we see legibly by row F2 This was as easy as it should have been in the other tools which didn t have some way of handling this operation similar to a JOIN in relational databases I selected the upper percentage of movies by Popularity 7288 in Compressed mode then I selected the list of Actors represented by rightclicking the row and selecting all Then I removed the constraint on Popularity and went to wide mode to see the low popularity movies It returned a smaller set before which I checked by undoing the constraint on actor afterwards Results Karloff Boris Douglas Michael Mastroianni Marcello and Lancaster Burt All made good movies Pop gt 71 but also made absolute stinkers Pop 0 m s New Hirdnhm man 1982 12D 2 3 91 BB 117 125 It s My Turn Hurrur D ma Comedy Ammn Ammn Drama Karim Buns Missing DE SIDE Taylor Dun Scan Ridley u 1 1 Nu Yes Nu Nu 41 mm m Adalvhsan Edvmmlhevl 5mm gt my 1 my 1 Mme F3 I wanted to use the derived attribute feature that I just discovered so I created an attribute for Average quality per Actress In this case I want the highest avg quality given a minimum of attributes First I needed to create another attribute representing the number of films per actress excluding the films in which the actress was Mssing or blank Next we can filter out actresses with fewer than 5 films Finally we sort on Avg Quality and learn Sandahl Bergman was in five films with an avg popularity of 684 Raw Nerve Programmed to Kill Hell Comes to Frogtown Conan the Barbarian and Kandyland F4 To find TAGD I went to Wide mode and scrolled until I found the entry After widening the column everything is displayed clearly F5 First I zoom into Drama then into Harriet Anderson s leading partner Gunnar Bj ornstrand who acted with her in two other dramas Dreams and The Naked Night also directed by Bergman Other similarity might have been a little more dif cult but this seemed pretty easy Currently my lters included only Bergman movies so I had to zoom out a bit To add other filters I had to use value lists to select ranges which was much easier than it seemed it would be I selected dramas of equivalent length 8595 made around the same decade 19601970 Of the 20 something records returned The Man Who Haunted Himself with Roger Moore and Hildegard Neil seemed to be the most popular Ml My general strategy here is to zoom in on regions of a similar sorted variable then sort on another variable By using categories of Expense ratio Net assets Mgr tenure and Category I was able to pick the pair of NicholasApplegate Intl Core Gr I and William Blair International Growth But this strategy worked only soso Many of the yearly and monthly prices were different so the similarity of these items is dubious M3 I pushed the iterative zoom method further and was able to filter on many attributes maintaining high and low sets for each attribute I came up with two funds M4 Bypassed for lack of good ideas M5 We start by nding a group of successful funds After choosing a group of funds that had high 3YR and ltering on that variable then sorting on Yield it was clear that lower yield funds tended to have higher 3YR values SeeIt Fl Just to play around with SeeIt I did some stuff with bins and histograms and then for the obvious answer I sorted the table and found the same list Handy F23 There didn t seem to be a good way to do these types of filtering Once I selected a set of popular movies for example there didn t seem to be a fast way to filter on those actresses much like Eureka and Spotfire The only tool that did this well was lnfozoom F4 To find TAGD 1 sorted the film list by title and found the record I was seeking F5 Firstl filtered on Subject then graphed by Length and Popularity By Brushing over the entry with TAGD I found the similar films by length pop and subject Ml We graph on Morningstar and 3YR and then encoded additional properties into the color width and height of the records The problem of similarity is reduced to finding closely spaced records that look similar M3 We find dissimilar items by zooming out and selecting things that are far apart and dissimilarly shaped We can then sort by selection status 31 rmmualmnds em mama mamemananaig mx Wm Aveageaw d We Sheet mevr FMEX my may m Mgv ABN AMREI mm max 3 wmmemai AElEX i m i Eusx m 31 v 50 lhancel emahana 7 2 5 mm quotis l 3 New dvamaiAAlEX 25 dvanlaiAAEMX 3g mm W m 2an m g mm mm a r New em m r BAX P5 men5quot eanUlh TWlEX 9 i H mm 13w an Wemananai Eu Am in mm m lnlevahanalEx lgtlt man im lnlema sEEux Vanda mu lnlem M5 We graph lYR and 3YR on the X and Y axes as measures of success We encode the Yield Mgr Tenure and Expense Ratio in other Visual parameters of the data 7 Width color and height The correlation with width and KY pops out immediately 7 it tnms out Yield is negatively correlated with success v y jv v l l me w