### Create a StudySoup account

#### Be part of our community, it's free to join!

Already have a StudySoup account? Login here

# Class Note for CUIN 7317 at UH

### View Full Document

## 19

## 0

## Popular in Course

## Popular in Department

This 53 page Class Notes was uploaded by an elite notetaker on Friday February 6, 2015. The Class Notes belongs to a course at University of Houston taught by a professor in Fall. Since its upload, it has received 19 views.

## Reviews for Class Note for CUIN 7317 at UH

### What is Karma?

#### Karma is the currency of StudySoup.

#### You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!

Date Created: 02/06/15

Computer Graphics Unit Manchester Computing Centre University of Manchester Department of Computer Science University of Manchester Visualisation 1 Graphical Communication Student notes i iiliii i THE UNIVERSITY ITTI Computer Graphics and Visualisation 9 MANCHESTER Table of Contents 1 Introduction 1 11 Scope 1 12 De nitions 1 13 Parts ofa graph 2 Communication 5 21 Why draw a graph 5 22 Measuring information content 6 23 Measuring truthfulness 8 3 Chart forms 11 31 One dimensional charts 1 1 32 2D scatter charts 14 33 Extending the scatter plot 18 34 Line and area charts 19 35 Glyph charts 24 36 Histograms 29 37 Composite charts 33 38 Time charts and series 34 39 Geographic charts 36 310 Charts with noniorthogonal axes 40 4 Drawing together 43 41 Selecting a chart type 43 42 Layout and design 44 43 Getting more from the data 45 A Glossary 47 U niversity of Ma nchester Visualisation 1 Graphical Communication B References 44 Computer Graphics and Visualisation 1 Innoduc on 11 Scope This module is about communication using graphics for the scientist engineer or medic It is the rst of two which cover visualisation This module concentrates on the graphical depiction of relatively small data sets to convey information about the behaviour of that data to others This process may be iterative but in general produces a static noniinteractive result which may for example be printed and used in a journal article The next module applies the techniques learned here to the visualisation of larger probably multidimensional data sets interactive exploration of the data and production of graphical representations which may be dynamic possibly in teractive and might perhaps be output to video tape to be shown at a conference The present module draws on many elds from computer graphics to graphical design from statistics to cartography Where appropriate pointers will be given to particular areas which are only touched on here but can be followed up in more detail by those with a particular interest 12 Definitions It is well to have the de nition of some terms clear at the start Throughout these notes terms in bold like this will be found in the Glossary Appendix A starting on page 47 The terms graph chart and to a lesser extent diagram and plot are used in terchangeably in the literature examples of each will be found in these notes Graphs depict the value or values of at least one variable 7 the changeable quantity being measured or simulated One dimensional graphs depict the value of a single variable for one or more cases for example the mass of three animals Two dimensional graphs depict the relationsth between two primary variables such as mass and metabolic rate The visual display of relationship is a prime reason for drawing graphs hence two dimensional graphs are more frequently encountered than one dimensional graphs The value of additional secondary variables may be shown on a two dimensional graph but their relationship with the primary variables is not shown Three dimensional graphs show the relation ship between three or more variables higher dimensional graphs are possible but infrequently encountered This dimensionality of the graph should be distinguished from the dimensionality of the data For example the concentration of seven chemicals might be measured in an industrial process at various times and at known condii University of Manchester 1 Visualisation 1 Graphical Communication tions of pressure and temperature This yields a four dimensional data set 7 pres sure temperature concentration and time Each chemical is a discrete entity the seven chemicals do not constitute seven dimensions only one A graph might be drawn showing the relationship between temperature and the concentration of one chemical giving a two dimensional graph In this case temperature is the independent variable because it s value changes regularly Concentration is a dependent variable because it s value depends on the values of the independent variables In general the values of in dependent variables are set or adjusted the resulting values of dependent variables are then measured The set of possible values of an independent vari7 able is termed its domain Conventionally the value of the independent variable is plotted on the horizontal or x axis and is termed the abscissa the value of the dependent variable is plot ted on the vertical or y axis and termed the ordinate These conventions may be relaxed for some charts as will be seen In polar coordinates the independent variable is the angular measure and the dependent variable the radial measure A variable may take continuous or discrete values Temperature is an exam ple of a continuous variable Given any two temperatures there is another in between them Any sort of count is a discrete variable a vehicle can have three wheels or four but not three and a half for example Discrete variables can have continuous variables derived from them of course The mean count of something is a continuous variable Data may appear to be discrete due to limitations of sampling but the underly7 ing entity is still continuous For example an instrument to measure depth may give readings which are only accurate to three metres In this case the experi7 mental design should take this into account to avoid bias and the variable is treated as continuous for the purposes of calculation Some authorities describe a third class of variable one whose values are drawn from an enumerated set The values of such variables cannot be ordered and therefore cannot be plotted they are arranged in arbitrary order along an axis This is certainly a valid view However in these notes such a situation is re ferred to as a multiple and the set of these instances is not considered a variable as such Computer Graphics and Visualisation Introduction 13 Parts of a graph The names of the constituent parts ofa graph are shown in Figure 1 Population distribution in three areas i Title yaxis 160 Mmm 52 HKey 140 Tic markgua o o Data points I 0 Population thausnads m 0 Axis labelgt 40 Tie labelazo f X axis 0 10 20 30 40 50 60 70 an 90 100 Age lt AXIS label 0 Data sourceg squot m 3 a 0m Wm m m Figure I 39 The parts Ufa graph University of Manchester 3 2 Communication 21 Why draw a graph Consider Figure 2 which shows the population of Greater Manchester It is a large misleading and essentially useless graph which takes nearly twenty square inches to display two numbers 7 the bar labelled both is just the sum of male and female Because the population scale does not start at zero it appears that the sum is much greater than its two constituents The ll patterns used in the bars distract from the data The same information could be displayed more succinctly and with greater accue racy by simply quoting the numbers 1269557 females and 1185536 males The graph communicates nothing 25 Population of Greater Manchester 1991 20 Source The 1991 Census Crown Copyright ESRC purchase 15 Population millions 10 Both Female Male Figure 2 An uninformatjve graph Now look at Figure 3 which is just made up data The crescent shaped distribue tion is immediately obvious as is the compact and distinct group of outliers In this case the graphical presentation has clearly communicated something which could only with dif culty be detected in the raw numerical data University of Manchester 5 Visualisation 1 Graphical Communication 30 Do acacia 0 0225 2 Binnie Muf nquot D as EDD a D D q a a o D 05200 D m D Daemougjoaau new 89 a gD w Duafmum a e 0 o 00 00 10 20 30 40 50 Figure 3 An informative graph From these two examples some desirable properties of graphs emerge An effec7 tive graph should 0 present a reasonable amount of data 0 say something about the behaviour of that data I avoid giving a false impression of the data In other words the graph must communicate something This has been well put There cannot be too much emphasis on our need to see behaviour Graphs force us to note the unexpected nothing could be more important Tukey 1977 pp 128157 22 Measuring information content 221 Absolute measures A simple numerical measure of the amount of data presented by a graph was proposed by Tufte 1983 d t d t number of entries in data matrix a a ens1 y area of data graphic The data matrix refers to the amount of information in the graphic Figure 2 has four entries two numbers and two labels giving a data density of 02 per square inch Figure 3 with 196 entries has a data density of 284 per square inch a considerable improvement While a good data density is desirable thought must still be given to the design to avoid confusing clutter Effective graphs do not just happen 6 Computer Graphics and Visualisation Communication 222 Relative measures One way to combine high data density with lack of clutter is to maximise the pro portion of the graphic which deals with the data and minimise the nonidata parts such as grids shading hatching and so on This leads to the concept of data ink which represents the nonierasable core of a graphic the non redundant ink arranged in response to variation in the numbers represented Tufte 1983 p93 Data ink ratio is simply the proportion of the total ink which is data ink A good design will increase the data ink ratio while still being easy to understand For example the data shown in Figure 4 is obscured by the heavy grid the frame and the large lettering Each data point is represented three times by the height of the left edge of the bar the height of the right edge and the position of the top of the bar 3000 2000 1000 000 Mean monthly temperature C 1000 198400 198500 198600 198700 Date Figure 4 39 Low datain ratio graph Figure 5 which shows the same information is more readily understood as the data ink forms a greater part of the whole University of Manchester 7 Visualisation 1 Graphical Communication 30 0 10 1984 1985 1986 1987 Date N 0 Mean monthly temperature C 8 Figure 5 A version of Figure 4 with a higher datain ratio Designing for ease of assimilation can con ict with the goal of maximising data ink It is possible to be too minimalistic some redundancy speeds understanding While numerical measures can be helpful it is the perceived crowding and busy7 ness of the graph which must be minimised and this does not always tally with the data ink ratio 23 Measuring truthfulness Consider the graph shown in Figure 6 which shows the daily number of dives made by a single Elephant seal Number of Dives for a Female Elephant Seal in Early February 1991 Nu39rbav of Dives m 5 a n a o Data Figure 6 IlIisIeading histogram Note that in this histogram the bars do not start at the origin This emphasises the variation in the data but can be misleading The variation in height of the 8 Computer Graphics and Visualisation Communication smallest and largest bars is 74 a 60 63 a 60 or 467 however the variation in the data is 7463 or 117 Tufte 1983 uses the term lie factor to describe this effect In this case the lie factor is 467 117 or 4 Tufte states that an acceptable value of lie factor is 095 to 105 although he gives examples of particularly poor graphs that achieve lie factors of 14 or more Here is another version of the same data With the bars starting at the origin Number of Dives for a Female Elephant Seal in Early February 1991 so 74 7o an E 50 a E 40 B E so 20 10 o 4 5 e 1 a 9 10 Date Figure 7 Revised histogram It is clearly seen from this graph that the number of dives per day is quite simi7 lar University of Manchester 9 3 Chart forms It is helpful to attempt a classi cation of the different categories of graph so that they may be compared and the strengths weaknesses and suitabilities of each compared There are many possible means of classification varying in scope and complexity and also in the reason for attempting a classification The method adopted here is convenient for the purposes of these notes but is not presented as superior to alternative schemes Examples of other approaches include Brodlie 1992 Earnshaw l 991 and Haber Lucas and Collins 1991 31 One dimensional charts The essential feature of a one dimensional chart as the term is used in these notes is that a single aXJ39s of measurement is used Other axes may be used to prevent data points overlying one another but these are merely for spacing and the position along these extra axes is arbitrary 311 Single axis graph As the name implies this one dimensional graph has points marked off along a single axis The example in Figure 8 has a text label to identify each point 66 Ab casi as ifs 0 010 x x if 93 e s 9690 of 4 95 0 lquotquotlquotquotquotquotlquot39quotlquotquotl quot39 o 50 100 150 200 250 300 Miles Figure 8 A distance line 312 Bar graph Although superficially a two dimensional graph the bars are not ordered on the x axis it is simply used to space them out Effectively the bar graph is a small multiple although it is not usually considered as such The x axis is an enue meration rather than an ordered set of values Bar graphs are valuable for comparing a group of values from an uneordered set because the shared baseline allows the heights of the bars to be readily be compared The example in Figure 9 shows the population of each district in Greater Manchester Although it looks to be a two dimensional graph at first glance it clearly could be redrawn in the style of Figure 8 University of Manchester 11 Visualisation 1 Graphical Communication 450 400 HHHHHH m y v f w Figure 9 Population levels 13911 Greater Manchester Popuation thousands n w w m o or O O O n o o 31 0 Be aware that some books use the terms bar chart and histogram loosely or even interchangeably The general usage followed in these notes is o uniordered values bar chart 0 single ordered xy pairs impulse chart Section 322 p16 0 grouped ordered xy pairs histogram Section 36 p29 313 Block diagram This compares area rather than height It can be misleading especially if a 3D effect leads one to compare volume rather than area It is often seen in advertise ing masquerading as a bar chart while seeming to use height it invites the viewer to compare areas and obtain a false picture of the data An example of the genre is shown in Figure 10 which uses pictorial blocks to compare the size of various armies The issue of misleading dimensionality is discussed in Huff 1973 12 Computer Graphics and Visualisation Chart forms Size of army millions Figure 10 Misleading block diagram The block diagram is ne as long as it is clearly labelled as an area comparison and is a good way of avoiding too large a spread of bar heights in a bar chart when the data is a mixture of very small and very large values An alternative is to draw a bar chart of the square root of the variable 314 Pie chart This venerable and much used chart form has little to recommend it It is used to display the proportions of a whole Only a single data set can be shown on each graph which tends to make it data poor It is harder to compare angular meas ure than linear measure such as heights even within a single chart between charts it is even harder To compensate for this it is common practice to include the numerical percentages on the chart or in the legend as seen in Figure 11 The data might then be as well presented as a small table Percentage Aged in Greater Manchester Chiidren Young Adults Working age and Pensioners CI 715 CI 647 1875954 ma a Figure I I 39 Simple pie Chart University of Manchester 13 Visualisation 1 Graphical Communication Pie charts can be used in multiples in which case the radius can be varied to en code another variable The use of such charts as glyphs is considered further in Section 35 315 Divided rectangle Like the pie chart this shows proportions however the use of a linear rather than an angular measure makes comparison of the segments easier The divided rectangle chart is useful in small multiples as an alternative to pie charts An example is shown in Figure 12 Percentage Aged in Greater Manchester l Children n45 Vuung adults 1547 quotn 1 4 kamg age 185954 1 Pensiuneis away Figure 12 The same data as Figure I I expressed as a divided rectangle Chart 32 2D scatter charts This is the basic 2D chart type from which other types are conceptually derived A scatter plot shows the relationship between two variables It may use cartesian or polar coordinates as appropriate Cartesian rectangular coordinates are met with more commonly but polar radial coordinates are a good choice when the range of the independent variable is periodic 7 an hour day year or rotation an gle Conventionally the independent variable is plotted on the x axis the measured value of the dependent variable is plotted on the y axis The position of each dai tum is shown by a small marker It is also possible to plot the relationship between two dependent variables for example when looking for correlation This has been done in Figure 13 which shows the measured rates of ascent and def scent on each of 407 dives made by a female southern elephant seal over a ve day period 14 Computer Graphics and Visualisation Chart forms Comparison of ascent and descent speeds 20 I I I I I I I I I II 0 II 18 0femaleeedat 8 o g 0 39 9 00 9 o o o 1 6 08 so 0 00 o c 39 gw WM 3 E 14 39 Z 9 9 39 o 0 o ow 52w 0 12 e w o w a squot 39 o 1 We 0 8 g 043000 00 Q 39 o 0 0 o ltgt90 0 39 E 0 Q0 ogtoioequot 8 08 0 go o o o 8 8 06 39 0 04 0 oo 02 0 I I I I I 00 02 04 06 08 10 12 14 16 18 20 Descent rate ms Figure 13 Scatter diagram showing little correlation The example in Figure 13 shows little relationship between ascent and descent rates for each dive There is a suggestion ofa diagonal line at low rates but too few data points to make this rm The layout is particular plot could also be critii cised 7 the y axis label is badly placed as is the title and the legend is obscured by the data A legend is super uous in any case as only one data set is being plotted 321 Multiple scatter plots It is possible to plot multiple data sets on a single scatter chart if each set is clearly distinguished by size shape colour or preferably a combination of these attributes and clearly labelled Colours should be readily distinguishable Undei sirable crowding of the data points can be reduced by a suitable choice of axes For example in Figure 14 a logarithmic X axis gives a more even distribution of data points u an inn ism 2mm 25m 3mm 35m X A 7 in 2D An in iEIEI 2mm ADD X Figure l 4 Reducing crowding by choice ofaXes University of Manchester 15 Visualisation 1 Graphical Communication A tabular matrix form of scatter plot described in Becker Cleveland and Wilks 1987 is sometimes used to plot all combinations of a few typically between three and seven variables An example is shown in Figure 15 Note that each scatter plot is depicted twice mirrored on the major diagonal and that a signifi7 cant amount of space is taken up by axis labelling 150 Divirg depth ml ls Descent me ms 03 Ascent rate ms 03 Figure 15 A scatterplot matriX 322 Impulse chart This simple variation of the scatter plot shown in Figure 16 has a thin line from the x axis to each data point As with other forms of scatter plot it emphasises point values rather than trends The lines do little to aid comprehension and re sult in a worsening of the dataiink ratio they may also obscure groups of values which have the same x value but differing y values 16 Computer Graphics and Visualisation Chart forms Ascent rate ms 0 0 0 0 5 1 0 1 5 2 0 Descent rate mS Figure 16 An impulse Chart of the same data as Figure 13 University of Manchester 17 Visualisation 1 Graphical Communication 33 Extending the scatter plot The 2D scatter plot develops in four ways as shown in Figure 17 a adding connectivity to each point to give line and area charts b adding encoded information to each point With glyphs c adding grouping such as With histograms d forming composite charts two or more types of chart superimposed u u on 02 04 as 03 1a mu 150 200 250 300 02 u 0 o no 02 04 06 03 1a o 0 Q 006 50 40 03 30 W 20 1o 02 I u on 0 n on 02 04 as 0310 u on 02 04 as 03 1o Figure I 7 Development of the scatter plot 18 Computer Graphics and Visualisation Chart forms 34 Line and area charts These connect the data points in order to show the connections between points and the order in which they come if there is an order Figure 18 shows how two apparently similar scatter charts are shown to be very different once connectivity information is displayed Line charts also emphasise trends compared to scatter plots Eli so on Eli 0D 0 a an an n n n5 n5 an an nu mi in nu mi in Figure 18 Line Charts display connectivity 341 Multiple line charts More than one data set can be plotted As with scatter charts the different sets are distinguished by colour or style of line Figure 19 shows multiple line chart University of Manchester 19 Visualisation 1 Graphical Communication 300 200 female elephant seal male elephant seal I 100 r I r Dive duration minutes l l l 00 60 120 180 240 Time of day hours Figure 19 Line Chart With multiple lines Markers for points used in addition to lines make it clear where the data is and what is interpolation This is particularly important when the data closely fol lows a straight line or a smooth curve and it is not easy to see how many data points have been plotted This is shown in Figure 20 given a line alone it is diffi7 cult to tell how well it represents the behaviour of the population from which the sample was drawn Adding small markers gives an immediate impression of sample size the graph to the left has very few data points whereas the graph to the right supports the idea that the knee in the data is genuine and not an artei fact ofsampling 10 10 05 05 00 00 00 05 10 00 05 10 Figure 20 Showing data positions in a line graph 20 Computer Graphics and Visualisation Chart forms 342 Stacked line chart This is a method of displaying a multiple line graph where it is sensible to add together the data sets to form a grand total Each data set is stacked on the pre vious one forming a cumulative total This has the advantage that lines do not cross as they may in a multiple unistacked line graph This design also emphai sises consistent variation between data sets Figure 21 shows the amount of time spent at the sea bottom by a female elephant seal over a three day period By stacking the lines for each day consistent diurnal variation is emphasised over random dayitoiday variation 0 oo o N b o o o o o b o Cumulatwe bottom t1me mmutes 5111 Feb 1991 5111 Feb 199 7111 Feb 199 N o o 3 6 9 12 15 18 21 24 T1me hours Figure 2 Cumulative bottom time over three days The disadvantage is the difficulty in comparing different parts of the graph as there is no stable baseline This effect can be somewhat reduced by putting the most stable elements at the bottom As an alternative to stacked line charts consider using small multiples of line charts or use 2 charts a line chart to show the overall trend or with multiple lines if required to show absolute amounts and a group of divided rectangle charts to show the proportions for each sample 343 Area chart This chart form uses a filled area either the closed area bounded by a line or for open lines the area between a line and one of the axes Of little use for a single data set it is good for more than one If each data set forms a closed line this is often termed a bounded region chart Figure 22 shows the closing price of a fictional share over the course of six days An area chart shows the highest and lowest prices each day by lling to the x axis University of Manchester 21 Visualisation 1 Graphical Communication 200 I I I 180 Price Limits Cluse Price 160 140 Share price pence 120 100 l l l I I I l i2 i3 i4 i5 16 i7 18 i9 20 Date Figure 22 Sample area Chart 344 Radar chart Radial form of the line or area chart the independent variable is plotted as an angle and the value of the dependent variable as a radius Figure 23 shows the depth of each dive made by a female elephant seal over the course of 24 hours because the time of day is a circular measure it is suitable to be plotted in a polar form Depth of Dive for a Female Seal on the 4th February 1991 n 23 l Depth of Diver 1ElE e ZUEI 13 12 Time of Dive Hours Figure 23 Radar Chart of dive depths over one day 22 Computer Graphics and Visualisation Chart forms 345 Stacked radar chart This is the radial form of stacked line graph and has similar applicability M 5m m wvr DmFebWVl TmFeval 13 12 Trme hams Figure 24 Cumulative bottom time over three days 346 Divided circle chart Similar to the stacked radar chart this shows proportion on the radial axis The distinguishing feature is that the outer edge is a circle corresponding to 100 whereas a stacked radar chart generally has a ragged edge 5m m 1 DesemMuemwue 5mm 13 n Trme an the Eth Feb 1991 huurs Figure 25 An eXampIe divided Circle Chart University of Manchester 23 Visualisation 1 Graphical Communication 35 Glyph charts The distinguishing feature of glyph charts is that extra information is encoded into each datum by forming some sort of symbol or glyph whose interpretation is clear and unambiguous This implies that the glyph should be explained some where particularly if it is complex or encodes a large number of variables Simple glyphs can be described on the graph by a showing some examples as a key 351 Vector plots These have an arrow as the marker for each datum the angle of the arrow cone veying additional directional information Examples of usage include ows stresses Figure 26 plots the computed ion density in a simulation of solar active ity over two days The longitude and latitude of the ow are used to form an arrow glyph by using the longitude as an x offset and the latitude as a y offset for the arrowhead The offsets are scaled appropriately so that the visual effect of a given change in longitude is the same as a similar change in latitude 25 ion density per cc 2 0 15 152 154 156 158 16 162 164 166 168 17 day number Figure 26 Plasma ows 13911 the solar atmosphere 352 Numeric glyphs These are occasionally seen the numbers indicating the value of some additional variable When the number of data points becomes even moderately large num7 bers overlap and are dif cult to read They also do not present a graphical impression of the data it would be better to form a geometric glyph In some cases numbers are used to denote the sequence of the data values when this is not obvious The resulting chart can look cluttered and resembles a child s dot to dot puzzle book A much clearer communication of the sequence can 24 Computer Graphics and Visualisation Chart forms be made byjoining up the dots to form a line graph Examples of both forms with the same data are shown in Figure 27 20 1 1 1 20 1 1 1 20 1 1 1 o u 24 21 15 o o o 7 157 20 15 a D 31 25 23 22 oo 9 D 3 6 10 o O 0 r 10 r 30 2922714 10 O o o O O O 2 428 171819 15 n O E 05 5 10 16 05 a 00 o 6 91215 D O 0 1 7 5 30 1 1 1 00 1 R 1 1 00 1 1 1 00 05 10 15 20 00 05 10 15 20 00 05 10 15 21 Figure 27 Two presentations of connectivity Where the resulting line crosses itself repeatedly the data can be split into a number of sections and each connected by a different coloured line 353 Text labels If a plot shows the relationship between two numerical properties of an enumeri ated set text labels can be used as glyphs to identify the data points This technique works best when the data set is small and shows little clustering As a text string only indicates an area rather than a point it is conventional to indi cate the position of each datum by a small marker and place the label to one side of this The clarity of these graphs can often be improved by manually rearrange ing the labels to give less overlap 1000 DEC Alpha AXF 70007010 0 DECAlpna AXP 40007010 9 800 V HF39 F39ArRlSC 735 0755 DEC AlpnaAXF39 30007500 a m E DEC Alpha AXP 30007400 o E 60390 0 Sun SPARCstatiun 10752 9 0 Sun SPARCstatiun 10741 5 g 0 Sun SPARCstatiun 10730 E 400 1 HF39 F39ArRlSC 7150725 Sun SPARCstatiun Lgtlt 20D 0 Sun SPARCstatiun 2 0 Sun SPARCstatiun lF39C 00 00 500 1000 1500 2000 25 Floating point SPEC fp92 Figure 28 Relationship between two benchmarksan eXampIe of teXt label glyphs University of Manchester 25 Visualisation 1 Graphical Communication 354 Error bars These show the expected limits of observational error and are commonly used when a single datum represent the mean of a series of measurements Errors may be shown in the x direction the y direction or both The usual measure to plot is the standard error of the mean se which is the standard deviation 0 divided by the number of observations 11 350 300 a a 250 i i 200 i Median filtration rate mls 150 a 1 l l l O 20 4O 60 80 1 00 Time minutes Figure 29 Error bars Error bars are frequently symmetrical but need not be particularly when a transformation such as the inverse logarithm or square of the data is plotted Packages vary in their ability to plot and correctly handle error bars Some re quire that the ends of the error bar data be specified as absolute positions some as relative offsets Good packages allow full speci cation of positive and negative error bar positions in both x and y Unfortunately most packages do not cor rectly compute error bars when data is transformed or when two data sets are added or subtracted 355 Box and whisker plots A single box and whisker plot is a means of graphically depicting the statistical properties of a data set It generally shows a ve figure summary 7 the median hinges and extremes These summaries are described in Tukey 1977 26 Computer Graphics and Visualisation Chart forms Upper extreme Upper hinge Median Lower hinge Lower extreme Figure 30 BOX and Whisker plot A single box and whisker plot is not especially useful and a simple tabulation of the five numerical values may be preferable In multiples as glyphs however as in Figure 31 they can be a succinct means of summarising the behaviour of a large amount of data 300 200 100 Monthly sunspot count a q K A c s Q Q a A o x0 43 Y Q 7 9 x Y9 of 0 e0 0 Montn Figure 3 BOX and Whisker glyphs 356 HiLo charts In these charts the glyphs show the range of each datum and also possibly the mean or median value A variant commonly used by economists to represent share values depicts the opening and closing values by horizontal bars and the highest and lowest values by a vertical bar University of Manchester 27 Visualisation 1 Graphical Communication Hig h Close Open Low Figure 32 JiLo glyphs Figure 33 depicts the prices of a ctional share over a six day period using a hilo chart The same data is also shown on an area chart which shows the shares performance more readily However the hilo chart is the conventional form for this type of information presenting information in a form with which the target audience is familiar speeds uptake of information and hence enhances graphical communication mu m 200 mun lBU mun J MUD anle S ClnsePriue 50 MO l l mm i i i i i l2 l4 l6 l5 2 l2 l3 l4 l5 l5 l7 l6 l9 20 Data Data Figure 33 Share price Charts using hilo glyphs left and area Charts right Snare price pence Share prlce pence lZEIEI 357 Icons These glyphs use a variety of symbols to depict the values of a discrete variable The icons should be readily distinguishable The shape of each icon should pref erably convey some meaning to avoid too frequent reference to a key which should never the less be provided For example the geographic location of the finds at an archaeological dig might be presented as a scatter plot Different colours or marker styles could represent different categories of artefact Using icons such as a stylised vase for pottery a sword for weapons a coin for money and a bone for animal remains would come municate the meaning of the data more readily than representing these categories by less meaningful markers such as circles squares or stars Historie cal period could then be represented by for example colour Examples of this chart form are most commonly seen in historical atlases they may also be found in leisure and tourism maps The concept is not however limited to cartography 28 Computer Graphics and Visualisation Chanforms 358 Chemoquot laces When the number m aelelitinnal values te be shewn at eaeh glyph beenmes large eenrusien reaelily results One way te represent a large number m mra variables in a slrgla glyph was first eleseribeel by the psyehelegist Chemu llis metheel uses the ability at the human brainte reeegnise a face anel eneeeles eaeh variable as part at a styliseel face Overall size anel shape size anel shaelirg er the eyes lntxareye spaeirg whether the meuth uirves up at elewn anel the elegee m it39s uirvature can all be useel te eneeele infmmatmn Useel as glyphs chernemaees alluw rapiel ielentirieatinn elmisrits m 39strargers39 00 Figure 34 Chemofffaces with same praetiee this metheel can give geeel results althmlgh must general purpese paelrages eannet draw them a typieally a speeialpregam must be useel One limitatinn is that the metheel elees net reaelily shuw up variatien in combine Zions er variables Fm this separate plets m eaeh pairwise eembinatien er variables er a statistieal analysis sueh as principal component analysis wmld be mere suitable 36 Histograms This elass er iharts eleveleps the seatter plet by geupirg elata inte intervals anel plettirg the eeunt er elata paints falllrg within eaeh geup llistegams use the x axis te represent a gaupeel eentinuuis variable anel sheulel be elistirguisheel mm bar eharts where the x axis isuseel merely fur spaeirg er enumerateel valr ues 361 Simple histogram Figure 35 shews a simple histegram er the number m elives maele eaeh elay by a female Elephant seal Nete that the bars teueh inelieatirg that eaeh bar eevers the entire large at eaeh interval Fm ennvenienbe the aetual numerie eeunts have been plaeeel at the tap at eaeh bar This is ene particular traele ulT between elata ink anel speeel m eemprehensien strietly either the numbers at the axis lar bels shmld be eleleteel UNVme 01 Manchester 29 Visualisation 1 Graphical Communication Number of Dives for a Female Elephant Seal in Early February 1991 m o a E Number of Dives m m a m m o o o o o 5 0 Date Figure 35 A simple histogram of dives per day 362 Stacked histogram Like stacked line charts the values in each data set are added onto the cumulae tive sum of the preceding data sets Population distribution in three areas 160 Manchester Stuckpun 140 El Tamasqu 120 100 80 60 Population thousnads 40 20 0 10 20 30 40 50 60 70 80 90 100 Age Source The 1991 Census Crown Copyright ESRC purchase Figure 36 A stacked histogram showing age distribution in three districts of Greater Manchester 30 Computer Graphics and Visualisation Chart forms 363 Floating histogram Histograms are constrained to the positive dependent variable axis as counts can be zero or positive but not negative The oating histogram takes advantage of this to display two related data sets drawing one histogram to one side of this axis and another on the other side The magnitudes of each component can be readily compared as the eye is quite good at detecting symmetry and departures from it Swapping the axes from their conventional positions 7 to put the depend ent variable on the x axis 7 enhances this effect by making the axis of symmetry leftright rather than updown Natural objects such as faces and people are late erally symmetrical Female and male population in Greater Manchester ism lEIEI 5m an inn tin Female population thousands Male popula ion thousands Figure 37 Age distribution offemale and male Manchester residents compared 364 Grouped histogram The x axis represents a continuously increasing value of some variable Grouped histograms such as Figure 38 are thus misleading because they imply that the different bars in a group correspond to different x values Also the need to t in all the bars means that each individual bar takes up less than the full width of the interval it represents University of Manchester 31 Visualisation 1 Graphical Communication Population distribution in three areas 80 Manchester Stockport 707 Tameside Population thousnads Age Figure 38 A grouped histogram One way around this is the stacked histogram Another is to use a pseudo third dimension for once a good use of pseudo 3D 32 Computer Graphics and Visualisation Chart forms Population distribution in three areas Age 0 iO 20 30 4o 50 60 7o 80 90 100 W 70 Tmlaz 80 50 g l 3 70 mg g 60 303 50 20 a g 1 g 0 v40 E O imp Ei Mnsm 2 2 20 39Q Stnukpert 10 gTa39naaids o I I I I I I 0 10 20 50 4O 50 SO 70 8390 90 100 Age Figure 39 Using a third dimension to displaymultiple histograms Source The 1991 Census Crown Copyright ESRCpurChase 37 Composite charts Combining more than one chart type in a single graph can be an effective way of communication For example Figure 40 is a composite chart consisting of three parts a scatter plot and two line charts The scatter plot shows the duration of each dive made in a single day by a female elephant seal The line charts are a curve tted to the data to show the trend over the day and the residuals or deviations from that curve to show how well the tted curve models the observa7 tions This composite chart adds useful information to the basic scatter plot without greatly increasing the overall complexity University of Manchester 33 Visualisation 1 Graphical Communication Dive duration minutes 600 400 i 200 i V l lI yX l l y i x Ail A cl W I 00 i Mi i l i F l silk 00 Mil ll 6 0 120 180 240 Time of day hours Figure 4 0 Composite scatter Chart with cubic regression curve and residuals 38 Time charts and series In general a time series chart is just another form of two dimensional chart where the x axis happens to represent time Several of the preceding examples have been time charts However there are some particular techniques which are more appropriate to time series 381 Periodicity One common requirement is to look for repeating patterns in the data Autocorrelation is one method of doing this The correlation coef cient between the data and a shifted copy is determined if there is a repeating pattern it shows up as increased correlation when the shift distance is equal to the perioi dicity 34 Computer Graphics and Visualisation Chart forms Fourier analysis which analyses a waveform as a weighted sum of sines and co sines may also be used for this Figure 41 shows a line chart of the mean sunspot counts for each month from 1760 to 1992 A marked periodicity is evi7 dent in this graph Above it is the discrete Fourier transform which shows not only the dominant 11 year cycle but also smaller subsidiary cycles whose pres ence was masked in the line chart This is a good example of using graphical techniques to reveal hidden aspects of the data 300 200 Magnitude 100 00 00 100 200 300 400 500 Period years 3000 r r r r r r r 2000 7 1000 7 Mean sunspot count M i i i i i 17600 17900 18200 18500 18800 19100 19400 1970 0 20000 Year Figure 41 S unspot observations analysed by discrete Fourier transform 382 Prediction Another frequent requirement with time series data is to predict values which cannot be measured because they lie in the future Observing the residuals in the data once trends and cyclical patterns have been subtracted allows a model to be built up to predict future values If these are plotted on a graph the pre dicted and observed values should be clearly distinguished University of Manchester 35 Visualisation 1 Graphical Communication Displaying on a single graph the raw data the derived components and the re sults as each component in the model is subtracted can be a powerful and succinct means of showing the behaviour of a set of data Further details of these techniques which are outside the scope of this module may be found in books on time series analysis such as Cryer 1986 39 Geographic charts Graphs which are maps or have a cartographic component are a special case of 2D graph which requires some special techniques As many people require this form of graph but are not Geographers is also helpful to have some understand ing of the properties of maps in general before considering their use as graphs 391 Overview of maps Maps are a 2D representation of a 2D manifold in 3D space 7 the surface of the earth or indeed another planet Because of this there is some distortion of are eas introduced by mapping to 2D The size of this distortion depends on the area covered 7 it is largest for a map of the whole globe and near negligible for a one kilometre square This distortion must be taken into account when creating a chart that includes a map as one of its elements The globe has an axis of rotation and the equator is a plane perpendicular to this axis The equator is an example of a great circle because it exactly bisects the globe The position of a point on the surface may be speci ed by it s latitude and longitude Latitude is an angular measure of how far the point lies from the plane of the equator and thus varies between 90 N and 90 S Lines of equal latitude are not great circles they are parallel to the equator Longitude is the angular measure east or west from an arbitrary starting point the arc of a great circle passing through Greenwich Observatory England Longitude varies be tween 180 W and 180 E and lines of equal longitude are meridians arcs of great circles passing through the poles Three points on the globe form a spherical triangle the sum of whose angles is greater than 180 The larger the area of the triangle the greater the sum When projected onto a plane the angles in the resulting triangle sum to exactly 180 If two points on the triangle are the correct scale distance apart either the angles or the distances to the third point may be retained but not both Assuming for the sake of simplicity that the earth is a sphere its surface may be projected onto another surface such as a at plane a cylinder or a cone There are many possible projections but they are commonly classi ed in three groups 0 equal area projections o conformal projections o the rest 36 Computer Graphics and Visualisation Chart forms Equal area projections have a constant area scale in different parts of the map but the linear scale varies with direction and angles are distorted This is clearly the type to use when a map is making comparisons of areas Examples of equal area projections are the Alber Bonne and Lambert azimuthal projections Conformal projections have a constant linear scale in all directions around a given point and angles around that point are correct but the linear scale varies from place to place This type of projection is suitable for charts that compare an gles Examples of conformal projections include the Mercator and the Lambert con1c The rest of the projections are neither equal area or conformal 392 Filled area maps Maps such as that shown in Figure 42 are a particular case of the bounded re gion plot It is effectively a multiple area plot however besides showing the shape of each area the colour or greyness of the shading denotes the value of some variable in this case the population density of each county or region in the UK The map in Figure 42 is drawn on the National Grid the standard method of drawing maps of the UK It uses a transverse Mercator projection which is conformal One drawback of this method is that large areas tend to dominate the first imi pression of the graph and that there is no indication of the spread of values within an area For example some areas may contain mostly empty moorland which swamps the effect of a few cities A better approach if the data is avail able is to split the map into a small grid and plot each grid point separately University of Manchester 37 Visualisation 1 Graphical Communication l l Population iin Density 1 maquot 1000000 v logw linear 139 i a 6310 16 r 39 31 l 4 l 25 12 i z is 85 l 0 l 10 00 08 03 800000 06 3 98 7 39gt 04 z 51 02 l 59 00 l 00 702 003 704 040 700 025 600000 7 708 016 7 71 0 010 400000 7 200000 7 7 00 f r i i 00 200000 400000 600000 Figure 42 Bounded region plot of UK population density Source The 1991 Census Crown Copyright ESRCpurChase 393 Maps as background There are several types of graph in which the map is a background in a compose ite chart supplying meaning to the measurements on the axes In this sense it is not data ink but it aids comprehension The nonimap part may be a scatter plot line graph or glyph chart Figure 43 is of this type and shows the network con nections between various UK universities colleges and research laboratories and their network operations centre Connections to sites in Northern Ireland are not shown in this example In this case the background map provides suf cient orientation so the axes have no tics or labels To avoid overwhelming the data the background map is greyed out 38 Computer Graphics and Visualisation Chart forms Pquot lt25 1 I 2 39V quotx J 39 I q vquot 1 V 39 gages i g a I i r6 r lt5 f a P Figure 43 Connections of JANE T sites to Network Operations Centres 394 Distorted maps cartograms The area of each region on a map is separately scaled to encode the value of some variable but in such a way that the boundaries between regions remain in con tact with one another These are hard to draw automatically and only of use for well understood maps where the distortion is immediately apparent and it s mag nitude readily estimated Simpler methods are probably equally effective Examples of such cartograms may be found in for example Lockwood 1969 pp 100 and 101 University of Manchester 39 Visualisation 1 Graphical Communication 395 Space time charts A logical fusion of the geographical chart forms and the time series chart space time charts show sequences of position Containing two or three variables to rep resent space one to represent time and at least one other for the quantity being plotted space time charts are richly multivariate and consequently difficult to design without appearing overly complex for a static non interactive graph Tufte 1983 gives several good examples pp 40743 which succeed in presenting multivariate data in an apparently simple and obvious way which is thus readily understood 310 Charts with nonorthogonal axes 3101 Nomograms A nomogram is graphical representation of a system of equations Originally a calculation aid it may be considered a form of visualisation 3102 Triangular proportion diagrams An equilateral triangle is used to form a three axis graph Because the axes are not orthogonal there are only two degrees of freedom These graphs are used to represent mixtures of three items where the sum of each component is constant The value of any of the three variables may be read off directly Figure 44 shows the CIE 1931 chromaticity diagram used to predict the colour of additive mixtures The three parameters xy and z are normalised forms of CIE XY and Z such that xyz1 Hence the value of 1 can be predicted given x and y It is conventional to draw the chromaticity diagram as a plot of x against y Note that the upper diagonal of the graph area cannot contain any data points because xy would be greater than one Points on the graph are labelled in nanometres and the portion from 550nm to 660nm lies along this diagonal Figure 45 reiplots this data on triangular axes Notice that more of the plotting area is used and the portion from 550nm to 660nm is clearly seen to lie along one of the axes 40 Computer Graphics and Visualisation Chart forms 525 6607780 00 02 04 06 08 Figure 44 39 CIE 1931 Chromaticity scale conventional orthogonal aXes 6607780 00 01 02 03 04 05 06 07 08 09 10 Figure 4 5 The same data as Figure 44 trianguIar aXes University of Manchester 41 4 Drawing together 41 Selecting a chart type 411 Classifying the data Firstly determine how many dimensions the data has Secondly consider the vol ume of the data If the data has only one dimension or you wish to focus on one pick an appropriate form to display its magnitude and make comparisons be tween data sets If there are two or more dimensions select two to see if they are related Do a quick scatteIplot Is the data clumped Would a transformation of one or both axes help to distribute or straighten the data Consider the number of data points If there are a lot a scatter diagram might be appropriate If there are too many data points consider dividing the domain into segments and forming a histogram or displaying a statistical summary of the data such as a box and whisker plot Is the order of the data points important and the number of data points not too high Some form of line graph might be useful Having brie y explored the data decide what you wish to show and your in tended audience Select a few chart types and plot them Do they pay their way Do they say something useful about the data If not discard them Keep clear what the graph is for and ensure that it shows what you have discovered with out distortion 412 Establishing the context A graph must conform to the formats expected by its audience else it is seen to be dif cult A new graph type may be information rich but if it appears confuse ing to its audience the message is lost Must decide on the aims of a chart Will the graph stand alone or will it be shown together with tabulated data or a written description Is it intended to de liver a single message at a glance or to repay careful study at the risk of being initially confusing A good graphic will communicate something straight away yet present additional information on further examination Prescribed formats for books journal articles and departmental reports must be followed if the work is to be accepted for publication In some cases these formats can con ict with the principles discussed in these notes University of Manchester 43 Visualisation 1 Graphical Communication 42 Layout and design Having examined a number of different graph forms it is time to select from them the best elements of graphical design 421 Title The title of a graph should in general be at the top outside the frame Other po sitions may be preferred in some cases to make better use of space An alternative is to omit the title entirely and use a caption at the foot of the graph Examples of both formats are found in the preceding section 422 Keys annotation amp legends If there is more than one data set a key should always be included Convention ally these go inside the frame at top right as in Figure l but this depends on the distribution of the data Annotation of features of interest can be valuable However be careful not to pa tronise the viewer by pointing out the obvious The de nition of obvious will vary with the audience 423 Axes and scales Axes should be clearly labelled have an adequate but not excessive number of tic marks and the unit of measurement should be stated 424 Typography Text should be easy to read in small amounts implying the use of sansiserif faces such as AvantGarde Helvetica or Univers The use of too many different point sizes in a single graph should be avoided Labelling on the y axis is generally rotated to be parallel to that axis the alter native being to put the label at the top left of the graph Other text and numbers should all be the same way up 425 Notes The source of the data and any copyright or license information should be in cluded either on the graph or in the caption If the graph is included in a document it is helpful if the text makes an explicit reference to the graph either by name or by gure number This avoids the tendency to read around the graph without ever actually coming to a suitable point to look at it 44 Computer Graphics and Visualisation Drawing together 426 Interpolationextrapolation As was noted in Section 341 make it clear which are the data values and which the interpolated values 427 Small multiples Tufte 1983 discusses presenting several small graphs that are similar to show the behaviour of an additional variable This is effectively adding another dimen7 sion to your graph 43 Getting more from the data 431 Straightening curves When attempting to nd a mathematical model to explain the data a straight line relationship between dependent and independent variables is desirable While data can be tted to curves a straight line is often simpler Tukey 1977 describes a hierarchy of expressions which can be applied to one or both axes to straighten the data Plotting logs roots or squares of the data can help here Ex periment a little 432 Line and curve tting If some mapping of the data is seen to approximate a straight line the equation of that line can be determined by a method such as least squares See any statisi tics book This line may be plotted on the graph but should be clearly labelled as a fitted line rather than an observation The goodness of t can be expressed as a correlation coefficient and it is often helpful to display this in the legend for the tted line In the case of a persistently curved data set a variety of techniques are available to t a curve and the goodness of fit can again be expressed by a correlation coef7 cient Such techniques are outside the scope of these notes but are discussed in Tukey 1977 433 Residuals Residuals are the deviations from the predicted or tted line and were men tioned in Section 37 Subtracting the tted values from the observed values allows many sloping graphs to be straightened out This permits the y axis to be stretched displaying deviations more clearly To do this some portion of the data must approximate a line or curve University of Manchester 45 A Glossary This glossary is not presented as a set of formal or complete de nitions It is def liberately kept informal and colloquial and is intended to refresh the memory when an unfamiliar word is encountered abscissa annotation autoicorrelation axis curve fitting correlation dependent variable domain extrapolation Fourier transform frame glyph independent variable intercept interpolation key The x coordinate of a point the shortest distance from that point to the y axis Small amounts of text on the body of a graph to explain points of detail Correlating a data set against a shifted version of itself to look for periodicity The calibrated edge ofa graph Finding a simple mathematical description of the data which partially or rarely fully models it A measure of the strength of relationship between two variables A thing whose measured value depends on the independ7 ent variables The set of possible values of an independent variable Predicting values which lie outside the range of the data A transformation of a data set which yields a frequency spectrum The boundary ofa graph Often omitted A shape which is used to encode extra information be sides position of a data point The thing being altered in an experiment or simulation Where a line crosses one of the axes Prediction of values within the range of a data set A tabulated annotation that explains the symbols shade ing or colour used in a graph University of Manchester 47 Visualisation 1 Graphical Communication ordinate In general the shortest distance of a point from one axis measured parallel to the other axis In the context of graphs it usually refers to the y coordinate of a point residual The difference between a partial t and the raw data slope The gradient of a line tic A small marker used to form the calibration marks on an axis variable Something whose value changes 48 Computer Graphics and Visualisation B References Becker R A Cleveland W S Wilks A R 1987 Dynamic graphics for data analysis in Cleveland W S McGill M E eds 1988 Dynamic Graphics for Statistics Belmont California Wadsworth amp Brooks ISBN 0753479805Z7X Brodlie K W 1992 Visualisation techniques Chaper 3 in Brodlie K W Car7 penter L Earnshaw R A Gallop J R Hubbold RJ Mumford A M Osland C D Quarendon P eds 1992 Scienti c Visualisation 7 Techniques and Applications Berlin Springer7Verlag ISBN 3754075456574 Chernoff H 1973 Using faces to represent points in K7dimensional space graphically J Amer Statist Assoc 68 3617368 Earnshaw R A 1991 Scienti c visualisation 7 transforming numeric data into visual information in Graphics Interaction and Visualisation 7 The Challenge of the 1990s 1991 Proceedings of International State7of7the7Art Seminar 4 De7 cember 1991 London British Computer Society Haber R B Lucas B Collins N 1991 A data model for scienti c visualisa7 tion with provisions for regular and irregular grids in Visualisation 91 Huff D 1973 How to Lie With Statistics London Penguin Books ISBN 07147 01362970 Lockwood A 1969 Diagrams A Visual Survey of Graphs Maps Charts and Diagrams for the Graphic Designer London Studio Vista British SBN 2897 3703072 Tufte E R 1983 The Visual Display of Quantitative Information Cheshire Connecticut Graphics Press Tukey J W 1977 EXploratory Data Analysis Reading Massachusetts Addison7Wesley ISBN 0720170761670 University of Manchester 49

### BOOM! Enjoy Your Free Notes!

We've added these Notes to your profile, click here to view them now.

### You're already Subscribed!

Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'

## Why people love StudySoup

#### "I was shooting for a perfect 4.0 GPA this semester. Having StudySoup as a study aid was critical to helping me achieve my goal...and I nailed it!"

#### "I bought an awesome study guide, which helped me get an A in my Math 34B class this quarter!"

#### "Knowing I can count on the Elite Notetaker in my class allows me to focus on what the professor is saying instead of just scribbling notes the whole time and falling behind."

#### "It's a great way for students to improve their educational experience and it seemed like a product that everybody wants, so all the people participating are winning."

### Refund Policy

#### STUDYSOUP CANCELLATION POLICY

All subscriptions to StudySoup are paid in full at the time of subscribing. To change your credit card information or to cancel your subscription, go to "Edit Settings". All credit card information will be available there. If you should decide to cancel your subscription, it will continue to be valid until the next payment period, as all payments for the current period were made in advance. For special circumstances, please email support@studysoup.com

#### STUDYSOUP REFUND POLICY

StudySoup has more than 1 million course-specific study resources to help students study smarter. If you’re having trouble finding what you’re looking for, our customer support team can help you find what you need! Feel free to contact them here: support@studysoup.com

Recurring Subscriptions: If you have canceled your recurring subscription on the day of renewal and have not downloaded any documents, you may request a refund by submitting an email to support@studysoup.com

Satisfaction Guarantee: If you’re not satisfied with your subscription, you can contact us for further help. Contact must be made within 3 business days of your subscription purchase and your refund request will be subject for review.

Please Note: Refunds can never be provided more than 30 days after the initial purchase date regardless of your activity on the site.