<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Data Visualization</title>
	<atom:link href="http://interactiondesign.sva.edu/classes/datavisualization/feed/" rel="self" type="application/rss+xml" />
	<link>http://interactiondesign.sva.edu/classes/datavisualization</link>
	<description>Shawn Allen</description>
	<lastBuildDate>Thu, 29 Jul 2010 01:45:23 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	
		<item>
		<title>The Value of Many Eyes</title>
		<link>http://interactiondesign.sva.edu/classes/datavisualization/2010/07/29/many-eyes/</link>
		<comments>http://interactiondesign.sva.edu/classes/datavisualization/2010/07/29/many-eyes/#comments</comments>
		<pubDate>Thu, 29 Jul 2010 01:44:45 +0000</pubDate>
		<dc:creator>shawnallen</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://interactiondesign.sva.edu/classes/datavisualization/?p=139</guid>
		<description><![CDATA[When we last left off, I was leading the class on a charting expedition. My intention was to do this on paper, under the assumption that, if we used a medium with which everyone is familiar, we could avoid getting hung up on the implementation details (namely, programming syntax). The class decided that this wasn&#8217;t [...]]]></description>
			<content:encoded><![CDATA[<p>When we last left off, I was leading the class on a <a href="http://interactiondesign.sva.edu/classes/datavisualization/2010/07/15/charted-territory/">charting expedition</a>. My intention was to do this on paper, under the assumption that, if we used a medium with which everyone is familiar, we could avoid getting hung up on the implementation details (namely, programming syntax). The class decided that this wasn&#8217;t the best use of their time, though—and I admit that charting with pens and using calculators to interpolate values on scales would have been tedious, but instructive nonetheless—so last week we took a look at <a href="http://manyeyes.alphaworks.ibm.com/manyeyes/">Many Eyes</a>, a project by the fine folks at the <a href="http://www.research.ibm.com/visual/">IBM Visual Communication Lab</a>.</p>
<p>Many Eyes&#8217; goal, <a href="http://manyeyes.alphaworks.ibm.com/manyeyes/page/About.html">according to its creators</a>, is &#8220;to &lsquo;democratize&rsquo; visualization and to enable a new social kind of data analysis,&#8221; the idea being that both the use of social visualization tools and the public release of the underlying data can lead to new insights. To test this theory, I played with a few of the data sets already uploaded to the site and sought out a few of my own to contribute. In just a few hours, I had:</p>
<ul>
<li>attempted to correlate <a href="http://manyeyes.alphaworks.ibm.com/manyeyes/visualizations/vehicle-miles-traveled-versus-traf">vehicle miles traveled and rates of traffic fatality</a> in US states;</li>
<li>created a <a href="http://manyeyes.alphaworks.ibm.com/manyeyes/visualizations/2008-hate-crimes-by-type-and-state">small multiples map of hate crimes by type and state</a>;</li>
<li>visualized the <a href="http://manyeyes.alphaworks.ibm.com/manyeyes/visualizations/fuel-use-by-transportation-mode">high proportion of fuel used by light-duty passenger vehicles</a>,</li>
<li>tweaked the color contrast on <a href="http://manyeyes.alphaworks.ibm.com/manyeyes/visualizations/89ade5ae1d2998b0011d457a59eb1e4b/comments/1d4689b495ab11df9309000255111976">a treemap of CO<sub>2</sub> emissions by transportation mode</a> to better show proportional differences;</li>
<li>shown <a href="http://manyeyes.alphaworks.ibm.com/manyeyes/visualizations/highway-transport-energy-consumpti">the rise of light and heavy trucks in highway transportation energy consumption from 1970 to 2003</a>;</li>
<li>modified a matrix chart to more clearly communicate <a href="http://manyeyes.alphaworks.ibm.com/manyeyes/visualizations/f7ed39dc95ce11dfb5d8000255111976/comments/72cee5da9a5f11dfbd61000255111976">gender bias by political party represented on The BBC&#8217;s Question Time</a>;</li>
<li>discovered <a href="http://manyeyes.alphaworks.ibm.com/manyeyes/datasets/b9501f60585b11df8629000255111976">a data set</a> that&#8217;s particularly interesting when viewed in Many Eyes&#8217; <a href="http://manyeyes.alphaworks.ibm.com/manyeyes/visualizations/challenges-by-task-and-use-case-di-4">matrix</a> <a href="http://manyeyes.alphaworks.ibm.com/manyeyes/visualizations/challenges-by-task-and-use-case-di-6">chart</a>;</li>
<li><a href="http://manyeyes.alphaworks.ibm.com/manyeyes/visualizations/89ade5ae16262a800116266f1c66018b/comments/379cc67a95a111df9c9c000255111976">clarified</a> a map illustrating <a href="http://manyeyes.alphaworks.ibm.com/manyeyes/visualizations/the-shifting-of-irelands-populatio">the shifting of Ireland&#8217;s population</a> by using circles rather than colored areas;</li>
<li>and created a line chart comparing <a href="http://manyeyes.alphaworks.ibm.com/manyeyes/visualizations/annual-bike-vs-car-production-mill">annual bicycle and car production worldwide</a>.</li>
</ul>
<p>The day&#8217;s experiment was successful, so last week I had the class use Many Eyes as a tool for visualizing their own data sets. There were some issues, particularly:</p>
<ul>
<li>While it was great that everyone already had their tabular data (the first week&#8217;s homework was to gather data sets in comma-separated value format), Many Eyes requires you to upload data by pasting it into a text field as tab-separated values. This requires opening the CSV file in a spreadsheet application (such as Microsoft Excel, Apple Numbers, or <a href="http://spreadsheets.google.com/">Google Docs</a>) and copying cells by hand.</li>
<li>&#8220;Null&#8221; values (which appeared as &#8220;N/A&#8221; or other non-numeric strings in some people&#8217;s data) sometimes led Many Eyes to interpret columnar fields as qualitative text rather than quantitative numeric values. This resulted in some of the automated visualizations failing to assign fields to particular axes, and often required students to re-upload their data sets.</li>
</ul>
<p>The biggest issue, however, appeared to be that students quickly ran into the limitations of Many Eyes visualizations. They wanted to change the colors, filter the data interactively, or cross-refernce multiple data sets. We learned in this process that, while Many Eyes is a great tool for creating an initial picture of a data set, it doesn&#8217;t provide &#8220;all&#8221; of the tools one would need to really explore their data. There are many other sites and paid products which claim to do just that, but it should be obvious to anyone who&#8217;s used them that no generalized system (yet, anyway) is capable of adjusting itself to suit the needs of every possible data set.</p>
<p>As a remedy for this, I suggested that the students use Many Eyes (or another service) to do the heavy lifting of deriving scales and interpolating values, then use its visual output as input for a more manual, bespoke visualization process. Once you&#8217;ve fed your data set into a <a href="http://manyeyes.alphaworks.ibm.com/manyeyes/page/Bar_Chart.html">bar</a> or <a href="http://manyeyes.alphaworks.ibm.com/manyeyes/page/Bubble_Chart.html">bubble</a> chart, you can use the calculated relative sizes of each <em>mark</em> as the basis for a new representation. E.g.:</p>
<ul>
<li>Take a screenshot of Many Eyes and adjust the colors in Photoshop.</li>
<li>Trace over the shapes in Illustrator and create your own arrangement.</li>
<li>Mix and match your data sets by incorporating elements from two or more visualizations into your own canvas.</li>
<li>Measure the sizes of elements in pixels and recreate them in HTML and CSS.</li>
</ul>
<p>Don&#8217;t assume that you can&#8217;t make something interesting without programming skills. Wield the tools that you already know how to use and do whatever it takes to bang the data—be they the original numbers or the normalized output sizes—into a form that you can work with.</p>
<p>And when you bump up against <em>those</em> limits, then you can consider taking up programming and using visualization libraries like <a href="http://vis.stanford.edu/protovis/">protovis</a> (with JavaScript) or <a href="http://flare.prefuse.org/">Flare</a> (with Flash). And if you get into visualizing large amounts of data you&#8217;ll quickly discover that neither Flash nor SVG are capable of moving more than a couple of thousands of points around on the screen at once, at which point you&#8217;ll have to either resort to working with aggregations of data and creating interactive interfaces to filter those points into manageable subsets; you&#8217;ll decide to employ non-interactive tools to generate static representations of data; or you&#8217;ll discover hardware acceleration in more &#8220;serious&#8221;—and, unfortunately, less &#8220;web-friendly&#8221;—programming languages like Java and C.</p>
<p>My point isn&#8217;t that this stuff is particularly &#8220;hard&#8221;, but rather that it&#8217;s only worth really figuring out if it&#8217;s applicable to your goals and you have the time to learn it. Just as visualization can be seen as a process for educating yourself about data, visualization is also a useful programming exercise. Many of the <a href="http://processing.org/learning/">Processing</a> examples could be easily adapted to incorporate real data rather than generating a series of random shapes. Conversely, you may wish to use visualization libraries to generate <a href="http://www.flickr.com/photos/shazbot/4827711059/">artistic renditions</a> of random data. I spend a lot of time trying to automate or prematurely optimize processes that would probably have taken less time to do manually. Whatever process helps you do what you need to do quickly and effectively, by all means <em>use it</em>.</p>
<p>Coming back to Many Eyes, though, let&#8217;s consider the advantages of its visualizations existing within an open structure on the web. The name itself is meaningful: <em>The more eyes on this stuff the better</em>. That&#8217;s a belief that we espouse at <a href="http://stamen.com">Stamen</a> too, because the whole point of our work is to make data more relevant, accessible, and desirable. And how can you do that without showing it to lots of people? Above all else, your choices of medium and technology should be driven by your ability to make something with them that you can share. A <a href="http://www.flickr.com/photos/walkingsf/sets/72157624209158632/">series of static graphics on the web</a> is more interesting and useful than the half-baked interactive interface that you started but never finished. And incremental improvements made in the public eye both help other people learn about your process and invite valuable input along the way. There&#8217;s nothing much worse than spending a lot of time on something in private, only to have it picked apart once you finally release it to the world.</p>
<p>Use the tools with which you&#8217;re comfortable, and experiment with new stuff if you&#8217;ve got the time. In the words of my esteemed peer Matt Jones, though, and above all else, just:</p>
<p><a href="http://www.flickr.com/photos/blackbeltjones/3365682994/"><img src="http://farm4.static.flickr.com/3604/3365682994_b257c0c52d.jpg" width="349" height="500" alt="GET EXCITED AND MAKE THINGS" /></a></p>
]]></content:encoded>
			<wfw:commentRss>http://interactiondesign.sva.edu/classes/datavisualization/2010/07/29/many-eyes/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Charted Territory</title>
		<link>http://interactiondesign.sva.edu/classes/datavisualization/2010/07/15/charted-territory/</link>
		<comments>http://interactiondesign.sva.edu/classes/datavisualization/2010/07/15/charted-territory/#comments</comments>
		<pubDate>Thu, 15 Jul 2010 21:39:51 +0000</pubDate>
		<dc:creator>shawnallen</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://interactiondesign.sva.edu/classes/datavisualization/?p=68</guid>
		<description><![CDATA[Data visualization's most recognized form is the chart. You've seen charts all over the place, from <a href="http://www.nytimes.com/2010/04/27/world/27powerpoint.html">PowerPoint presentations</a> and <a href="http://www.google.com/finance">stock tickers</a> to <a href="http://people-press.org/">public polling results</a> and <a href="http://www.fivethirtyeight.com/">election predictions</a>. Even the humble and oft-misused pie chart, though <a href="http://seedmagazine.com/content/article/getting_past_the_pie_chart/">derided by visualization critics</a> for its perceptual shortcomings, is still useful for comic effect:

<img src="http://interactiondesign.sva.edu/classes/datavisualization/files/2010/07/pie_eaten.jpg" alt="Pie I have eaten/not" width="500" height="330" />

As a society we've practically dismissed many of the popular forms of charting as useless because most of the charts that we see are just ugly at best or, at worst, fail to communicate any actionable information. But while charts are often deemed failures unless they illustrate dramatic changes or unseen trends, their increasing abundance in popular media has also led to an increase in literacy that makes our job of communicating visual information a lot easier than it has been historically. (We don't have to explain to our audience what a <a href="http://en.wikipedia.org/wiki/Time_Series">time series</a> is anymore!) Note, though, that it's important to keep common assumptions in mind when you're creating graphs. For instance, since most people expect time to be represented left to right on the x-axis, presenting it vertically or from right to left may confuse your audience no matter how clearly your axes are marked.

Before we get into the perceptual (and even cultural) qualities of various charting forms, though, let's step back and wrap our heads around what it is, exactly, that makes a chart a chart.
]]></description>
			<content:encoded><![CDATA[<p>Data visualization&#8217;s most recognized form is the chart. You&#8217;ve seen charts all over the place, from <a href="http://www.nytimes.com/2010/04/27/world/27powerpoint.html">PowerPoint presentations</a> and <a href="http://www.google.com/finance">stock tickers</a> to <a href="http://people-press.org/">public polling results</a> and <a href="http://www.fivethirtyeight.com/">election predictions</a>. Even the humble and oft-misused pie chart, though <a href="http://seedmagazine.com/content/article/getting_past_the_pie_chart/">derided by visualization critics</a> for its perceptual shortcomings, is still useful for comic effect:</p>
<p><img src="http://interactiondesign.sva.edu/classes/datavisualization/files/2010/07/pie_eaten.jpg" alt="Pie I have eaten/not" width="500" height="330" /></p>
<p>As a society we&#8217;ve practically dismissed many of the popular forms of charting as useless because most of the charts that we see are just ugly at best or, at worst, fail to communicate any actionable information. But while charts are often deemed failures unless they illustrate dramatic changes or unseen trends, their increasing abundance in popular media has also led to an increase in literacy that makes our job of communicating visual information a lot easier than it has been historically. (We don&#8217;t have to explain to our audience what a <a href="http://en.wikipedia.org/wiki/Time_Series">time series</a> is anymore!) Note, though, that it&#8217;s important to keep common assumptions in mind when you&#8217;re creating graphs. For instance, since most people expect time to be represented left to right on the x-axis, presenting it vertically or from right to left may confuse your audience no matter how clearly your axes are marked.</p>
<p>Before we get into the perceptual (and even cultural) qualities of various charting forms, though, let&#8217;s step back and wrap our heads around what it is, exactly, that makes a chart a chart.</p>
<h4 id="chart-anatomy">Anatomy of a Chart</h4>
<p>With respect to <a href="http://www.infovis-wiki.net/index.php?title=Visual_Variables">Bertin&#8217;s variables</a>, charts deal primarily with the <em>position</em> and <em>size</em> of visual elements. For our purposes, charts have at least one <em>axis</em> (timelines are an example of a chart with only one real axis) along which elements are placed to distinguish varying values from one another. I&#8217;m also intentionally excluding the genre of <a href="http://flowingdata.com/2010/05/06/the-boom-of-big-infographics/">&#8220;big infographics&#8221;</a> that <a href="http://www.good.is/post/the-carbon-cost-of-spam-an-infographic/">lack</a> any perceptual component whatsoever, because that&#8217;s what essentially distinguishes a &#8220;chart&#8221; from a &#8220;diagram&#8221;.</p>
<p>In most charts, <a href="http://en.wikipedia.org/wiki/Cartesian_coordinate_system">cartesian coordinates</a> describe the position of an element relative to one or more linear axes, commonly called <var>x</var> on the horizontal and <var>y</var> on the vertical, and written as <b>(<var>x</var>,<var>y</var>)</b>. In computer screen coordinate systems (specifically, on web pages and in most visual programming environments) the upper lefthand corner serves as the origin, or <b>(0, 0)</b>. As <var>x</var> values increase an element moves toward the right edge of the screen, and positive <var>y</var> values move the element toward the bottom. On paper we may choose to think of the origin as the lower lefthand corner, and position positive <var>y</var> values above it.</p>
<p>It&#8217;s important to note that axes can be made for both quantitative (numeric, or <em>continuous</em>) and qualitative (categorical, or <em>discrete</em>) variables. The humble <a href="http://en.wikipedia.org/wiki/Bar_chart">bar chart&#8217;s</a> quantitative axis (in this case, <var>y</var>) determines the height of each bar, and the other (<var>x</var>) evenly spaces out each bar so that its height can be easily compared to the others:</p>
<p><a href="http://en.wikipedia.org/wiki/File:Incarceration_Rates_Worldwide_ZP.svg"><img src="http://upload.wikimedia.org/wikipedia/commons/thumb/3/35/Incarceration_Rates_Worldwide_ZP.svg/300px-Incarceration_Rates_Worldwide_ZP.svg.png" width="300" height="333" /></a></p>
<p>Often, as is the case in the above graph, the elements are sorted on the discrete axis according to their value on the other so that you can easily see the <a href="http://en.wikipedia.org/wiki/Statistical_distribution">distribution</a> of values in the set. The <a href="http://en.wikipedia.org/wiki/Histogram">histogram</a>, a cousin to the bar chart in some respects, replaces the qualitative axis with a quantitative one. The <a href="http://en.wikipedia.org/wiki/Time_Series">time series</a> plots continuous values of a quantitative variable over time, usually on the horizontal axis. For some other examples, check out Nathan Yau&#8217;s <a href="http://flowingdata.com/2010/01/07/11-ways-to-visualize-changes-over-time-a-guide/">guide to visualizing changes over time</a>.</p>
<p>The more generalized <a href="http://en.wikipedia.org/wiki/Scatter_plot">scatter plot</a> is particularly useful for illustrating the relationship between two quantitative variables. This one, also from Wikipedia, plots eruptions of the <a href="http://en.wikipedia.org/wiki/Old_Faithful">Old Faithful</a> geyser in Yellowstone National Park using two variables: the duration of each eruption on the horizontal, and the time since the previous eruption on the vertical:</p>
<p><a href="http://en.wikipedia.org/wiki/File:Oldfaithful3.png"><img src="http://upload.wikimedia.org/wikipedia/commons/0/0f/Oldfaithful3.png" width="401" height="400" alt="Old Faithful Eruptions" /></a></p>
<p><a href="http://en.wikipedia.org/wiki/Polar_coordinate_system">Polar coordinates</a> are used to plot points in circular arrangements, such as pie and <a href="http://en.wikipedia.org/wiki/Radar_chart">radar charts</a>. In this system, coordinates are expressed not as <var>x</var> and <var>y</var>, but as <var>angle</var> and <var>radius</var>. Polar charts are best suited for plotting cyclical values, such as <a href="http://en.wikipedia.org/wiki/Wind_rose">wind direction</a>, time of day (i.e., a clock), or categorical values that, when displayed as <a href="http://en.wikipedia.org/wiki/Small_multiple">small multiples</a>, can reveal similarities in shape:</p>
<p><a href="http://en.wikipedia.org/wiki/File:Star_plot.gif"><img src="http://upload.wikimedia.org/wikipedia/commons/3/38/Star_plot.gif" width="380" height="280" alt="star plot" /></a></p>
<p>For more examples, check out <a href="http://queue.acm.org/detail.cfm?id=1805128">A Tour through the Visualization Zoo</a> by <a href="http://vis.stanford.edu/">Stanford Vis Group&#8217;s</a> Jeff Heer, Michael Bostock and Vadim Ogievetsky, which profiles a variety of common visualization forms made with their <a href="http://vis.stanford.edu/protovis/">protovis</a> library. And if you&#8217;re going to plot more than two variables against one another using only position, you might consider the <a href="http://en.wikipedia.org/wiki/Ternary_plot">ternary plot</a>, 3D, animation, or even an interactive interface that allows the user to adjust one of the variables in realtime.</p>
<h4 id="scales">Scales</h4>
<p>Rarely will you find a data set expressed in terms of the same coordinates used to display it. In order to convert data values into display coordinates we apply one or more <em>scales</em>. A scale is the means by which we plot a variable on a given axis. Each scale has a <em>minimum</em> and a <em>maximum</em> (usually built from the calculated minima and maxima, but sometimes chosen specifically to over- or under-emphasize distributions), and defines a method for <a href="http://en.wikipedia.org/wiki/Interpolation">interpolating</a> values between them. The <a href="http://en.wikipedia.org/wiki/Linear_scale">linear scale</a> on this <a href="http://en.wikipedia.org/wiki/File:Scale_from_NOAA_Chart_13272.png">NOAA chart</a> shows the reader how to convert measurements on a map into distances in real life:</p>
<p><a href="http://en.wikipedia.org/wiki/File:Scale_from_NOAA_Chart_13272.png"><img src="http://upload.wikimedia.org/wikipedia/commons/e/e5/Scale_from_NOAA_Chart_13272.png" width="520" alt="NOAA linear scale" /></a></p>
<p>Let&#8217;s take another look at the example table from my <a href="http://interactiondesign.sva.edu/classes/datavisualization/2010/07/08/introduction/">introductory blog post</a>:</p>
<p><img src="http://interactiondesign.sva.edu/classes/datavisualization/files/2010/07/table01.png" alt="An example table" width="517" height="109" /></p>
<p>If we wanted to create a bar chart of the subjects&#8217; incomes, we would need to devise a scale for the <var>y</var> axis. The natural minimum for this scale would be <tt>100</tt>, and the maximum <tt>30,000</tt>. This example is easy because there are only 3 elements to plot: Jane goes at the bottom of the scale, and Alex at the top. In order to figure out where Joe goes, though, we have to do a little bit of math. Here I&#8217;m using <var>y</var> here as a relative measurement of how far along the scale the value <var>n</var> should be positioned, where 0 would be the bottom and 1 the top. This is generally referred to as a process of <a href="http://en.wikipedia.org/wiki/Normalization">normalization</a>:</p>
<pre>y =  (n - min) / (max - min)
y =  (20,000 - 100) / (30,000 - 100)
y =  19,900 / 29,900
y = ~0.665</pre>
<p>So, if our chart were 100 pixels tall, Joe&#8217;s bar would have a height of 66 pixels (or 67, if we round up):</p>
<p><img src="http://interactiondesign.sva.edu/classes/datavisualization/files/2010/07/bar_chart_example01.png" alt="Bar chart example #1" width="387" height="135" /></p>
<p>One problem with this, though, is that Jane&#8217;s bar essentially has zero height because her low income corresponds to the bottom of the scale. (<tt>(100 - 100) / (30,000 - 100) = 0</tt>) We can&#8217;t really &#8220;fix&#8221; that, but we can make it clearer—and avoid having to use a calculator!—by thinking of the <var>y</var> axis as 100-dollar increments (the <a href="http://en.wikipedia.org/wiki/Greatest_common_divisor">greatest common divisor</a> of this particular collection) and setting the minimum of the scale to zero. This way, you simply divide each number by 100 to get the height in pixels; so Jane&#8217;s bar is exactly 1 pixel tall, Joe&#8217;s is 200, and Jane&#8217;s is 300. It also simplifies the labeling of the vertical axis, because you can split it into nice, round numbers:</p>
<p><img src="http://interactiondesign.sva.edu/classes/datavisualization/files/2010/07/bar_chart_example02.png" alt="Bar chart example #2" width="387" height="345" /></p>
<p>Obviously this is an over-simplified example, but I hope that it illustrates why your choice of scale is important. We can emphasize or de-emphasize variances by making our bar charts short or tall, or we can intentionally set the scale minimum or maximum to a value outside the range of the data, as in &#8220;Miracles in nature and Science&#8221;, from the <a href="http://www.toriljohannessen.no/Words_and_Years_page_1.html">Words and Years</a> exhibit by <a href="http://www.toriljohannessen.no/">Toril Johannessen</a>, which plots the number of occurrences of the word &#8220;miracle&#8221; over time in the the two eponymous periodicals:</p>
<p><a href="http://www.toriljohannessen.no/Words_and_Years_page_3.html"><img src="http://www.toriljohannessen.no/bilder/Words_and_years_Miracles.jpg" width="470" height="600" alt="Miracles in nature and Science" /></a></p>
<p>Of course, it&#8217;s worth mentioning that unscaled values in their original unit of measurement might better suit some contexts for visualization than scaled values. This <a href="http://www.youtube.com/watch?v=6Eg_SEAnE-M">energy saving campaign</a> depicts greenhouse gases produced by energy use as black balloons, each containing the volumetric equivalent of 50 grams. Imagine seeing Chris Jordan&#8217;s <a href="http://chrisjordan.com/gallery/rtn/#plastic-bottles">field of plastic bottles</a> in real life. Most data sets probably aren&#8217;t worth expressing natively like this, but you should certainly consider displays that emphasize the physical dimensions of a particular data set as a useful way of drawing attention or raising awareness.</p>
<h4 id="other-aspects">Other Visual Aspects</h4>
<p>Once we&#8217;ve exhausted the physical dimensions of our chart as a means to communicate information, we may need to resort to modifying some other visual aspects of our elements:</p>
<h5 id="color">Color</h5>
<p>Color, with respect to <a href="http://www.infovis-wiki.net/index.php?title=Visual_Variables">Bertin&#8217;s variables</a>, is expressed in two ways:</p>
<ol>
<li><b>Hue</b>: the color itself—red, blue, green, orange, purple, etc.</li>
<li><b>Value</b>: the brightness, or intensity of a color. You can think of this as some combination of the <em>value</em> and <em>saturation</em> components in the <a href="http://en.wikipedia.org/wiki/HSL_and_HSV">HSV color space</a>.</li>
</ol>
<p>We&#8217;ll go a bit more in depth on color in the next couple of weeks. For now, though, let&#8217;s see how far we can get without having to use it. Feel free to experiment with varying color for categorical variables, but be warned that creating <a href="http://www.visibone.com/">color scales</a> for continuous variables is <a href="http://en.wikipedia.org/wiki/Color_blindness">fraught with peril</a>.</p>
<h5 id="shape">Shape</h5>
<p>Varying the shapes of visual elements is a great way of encoding categorical variables. We&#8217;ll touch on a couple examples of this with your data sets tonight if it&#8217;s applicable.</p>
<h5 id="size">Size</h5>
<p>Size is well suited for positional arrangements on multiple axes, such as scatter plots. <a href="http://www.gapminder.org/">Gapminder</a>, for instance, tends to encode a country&#8217;s population in its dot size. Note that, in many cases, research has revealed that circles of varying sizes are difficult for people to compare because we tend to interpret the area of a circle more easily than its radius. You can calculate a proportional radius by taking the square root of the desired area divided by pi:</p>
<pre>r = sqrt(area / π)</pre>
<p>And vice-versa, the area from a radius:</p>
<pre>area = π • r<sup>2</sup></pre>
<h5 id="texture">Texture</h5>
<p>Texture is often useful in visualization forms like bar and area charts, in which you may wish to encode a categorical variable of each element. It&#8217;s also particularly useful in maps to denote different types of area or foliage.</p>
<h4 id="perception">Visual Perception</h4>
<p>Rigorous scientific research of visual perception is not a particularly recent development. As noted <a href="http://interactiondesign.sva.edu/classes/datavisualization/2010/07/08/introduction/#dark-ages">previously</a>, figures like Willard Cope Brinton and <a href="http://en.wikipedia.org/wiki/Jacques_Bertin">Jacques Bertin</a> illuminated many of the problems common to the statistical graphics of the 20th century and attempted to codify rules for designing representations that people could better understand. Statistical analyst <a href="http://en.wikipedia.org/wiki/John_Tukey">John Tukey</a> contributed a significant body of work not only to the practice of statistical analysis itself, but also to the modern-day understanding how people &#8220;read&#8221; visual representations of data. More recently, William S. Cleveland and Robert McGill unveiled the findings of their research on the perception of visual cues in their paper <a href="https://secure.cs.uvic.ca/twiki/pub/Research/Chisel/ComputationalAestheticsProject/cleveland.pdf">Graphical Perception: Theory, Experimentation, and Application to the Development of Graphical Methods</a> [PDF], published in the <a href="http://www.amstat.org/publications/jasa.cfm">Journal of the American Statistical Association</a>. They found the following aspects of visual elements to be most successful (this ranked list comes from Nathan Yau&#8217;s <a href="http://flowingdata.com/2010/03/20/graphical-perception-learn-the-fundamentals-first/">blog post</a> on graphical perception):</p>
<ol>
<li>Position along a common scale e.g. <a href="http://www.b-eye-network.com/view/2468">scatter plot</a></li>
<li>Position on identical but nonaligned scales e.g. <a href="http://psycnet.apa.org/journals/pas/19/1/images/pas_19_1_88_fig7a.gif">multiple scatter plots</a></li>
<li>Length e.g. <a href="http://flowingdata.com/2009/07/02/whos-going-to-win-nathans-hot-dog-eating-contest/">bar chart</a></li>
<li>Angle &amp; Slope (tie) e.g. <a href="http://flowingdata.com/2008/06/09/what-do-you-use-to-analyze-andor-visualize-data-poll-results/">pie chart</a></li>
<li>Area e.g. <a href="http://flowingdata.com/2007/10/02/americans-prefer-watered-down-beer/">bubbles</a></li>
<li>Volume, density, and color saturation (tie) e.g. <a href="http://flowingdata.com/2010/01/21/how-to-make-a-heatmap-a-quick-and-easy-solution/">heatmap</a></li>
<li>Color hue e.g. <a href="http://newsmap.jp/">newsmap</a></li>
</ol>
<p>Some researchers are even studying the <a href="http://www.creativesynthesis.net/blog/?p=158">aesthetic qualities of visualization</a> in an attempt to learn which forms people find most beautiful. A less formal, but no less actionable, form of visual perception research and analysis is taking place in books by <a href="http://edwardtufte.com">Edward Tufte</a>, and sites like <a href="http://chartjunk.karmanaut.com/">chartjunk</a> (and <a href="http://junkcharts.typepad.com/">junkcharts</a>) exist to critique charts in the media (and sometimes <a href="http://chartjunk.karmanaut.com/?p=9">correct them</a> quite handily). Some business journals regularly feature articles that suggest <a href="http://www.b-eye-network.com/view/2468">graphing strategies</a> for particular types of data. The <a href="http://extremepresentation.typepad.com/">Extreme Presentation</a> blog published this guide that suggests specific types of charts for certain types of data, or aspects of it to be visualized (or jump straight to the <a href="http://extremepresentation.typepad.com/files/choosing-a-good-chart-09.pdf">PDF</a>):</p>
<p><a href="http://extremepresentation.typepad.com/blog/2006/09/choosing_a_good.html"><img src="http://extremepresentation.typepad.com/photos/uncategorized/choosing_a_good_chart.jpg" width="520" alt="Choosing a Good Chart" /></a></p>
<h4 id="process">Visualization as a Process</h4>
<p>As we create our visualizations, it&#8217;s important to consider that process as a way to learn something new about the data—to derive new information from it. Try out as many of the forms as you can (within reason of course, and keeping in mind which ones are appropriate for different types and aspects of data), and see if you can draw any interesting conclusions from the distribution of particular values (remember to sort your values first!), or find potential correlations between two variables (by matching up two different sources of data with a common variable, or by using a scatter plot). Perhaps most importantly of all, <em>save your work often</em> (whether that keeping a paper sketch or saving multiple versions of a file on your hard drive) and <em>create artifacts along the way</em>. Even experiments gone &#8220;wrong&#8221; can produce clues for how to visualize particular aspects of your data differently.</p>
<h4 id="day2-homework">Homework <em>UPDATED!</em></h4>
<p>I&#8217;ll be posting a new entry with some specifics about your updated homework. Stay tuned!</p>
]]></content:encoded>
			<wfw:commentRss>http://interactiondesign.sva.edu/classes/datavisualization/2010/07/15/charted-territory/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Introduction</title>
		<link>http://interactiondesign.sva.edu/classes/datavisualization/2010/07/08/introduction/</link>
		<comments>http://interactiondesign.sva.edu/classes/datavisualization/2010/07/08/introduction/#comments</comments>
		<pubDate>Thu, 08 Jul 2010 20:43:11 +0000</pubDate>
		<dc:creator>shawnallen</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://interactiondesign.sva.edu/classes/datavisualization/?p=14</guid>
		<description><![CDATA[<i>Data visualization</i> is a pretty literal term that means, quite simply, the visual representation of quantitative data. In this course we'll learn common techniques for visualizing data, as well as some strategies for managing information digitally. But first, a brief history.]]></description>
			<content:encoded><![CDATA[<p><i>Data visualization</i> is a pretty literal term that means, quite simply, the visual representation of quantitative data. In this course we&#8217;ll learn common techniques for visualizing data, as well as some strategies for managing information digitally. But first, a brief history.</p>
<h3 id="history">A Brief History of Visualization</h3>
<h4 id="early-days">The Early Days</h4>
<p>Although visualization hasn&#8217;t been widely recognized as a discipline in and of itself until fairly recently, today&#8217;s most popular forms date back nearly two centuries. Geographical exploration, mathematics, and popularized history spurred the creation of early maps, graphs, and timelines as far back as the 1600s; but <a href="http://en.wikipedia.org/wiki/William_Playfair">William Playfair</a> is widely credited as the inventor of the modern chart, having created the first widely distributed line and bar charts in his <i>Commercial and Political Atlas</i> of 1786, and what is generally considered to be the first pie chart in his <i>Statistical Breviary</i>, published in 1801.</p>
<p><a href="http://en.wikipedia.org/wiki/File:Playfair_TimeSeries.png"><img src="http://upload.wikimedia.org/wikipedia/commons/d/d8/Playfair_TimeSeries.png" alt="William Playfair's time series graph of trade balances between England and Denmark &amp; Norway" width="510" height="356" /></a></p>
<p>In that same year geologist <a href="http://en.wikipedia.org/wiki/William_Smith_(geologist)">William Smith</a> drew his first sketch of the <a href="http://en.wikipedia.org/wiki/File:Geological_map_Britain_William_Smith_1815.jpg">1815 geological map of Great Britain</a>, which many cartographers even today refer to as &#8220;The Map that Changed the World&#8221;:</p>
<p><a href="http://en.wikipedia.org/wiki/File:Geological_map_Britain_William_Smith_1815.jpg"><img src="http://upload.wikimedia.org/wikipedia/commons/9/98/Geological_map_Britain_William_Smith_1815.jpg" width="407" height="600" alt="William Smith's 1815 geological map of Great Britain" /></a></p>
<p>The 1800s saw the invention of many new mapping and visualization forms, from <a href="http://en.wikipedia.org/wiki/Francis_Galton">Francis Galton&#8217;s</a> weather maps, to the innovative time lapse photography of scientist <a href="http://en.wikipedia.org/wiki/Étienne-Jules_Marey">Étienne-Jules Marey</a>, which he used to study the motion of people, birds, horses, cats, smoke, and fluids.</p>
<p><a href="http://en.wikipedia.org/wiki/Étienne-Jules_Marey"><img src="http://content.stamen.com/talks/where_20_2008/files/marey_pole.jpg" width="520" alt="Marey" /></a></p>
<p>In 1858, nurse and statistician <a href="http://en.wikipedia.org/wiki/Florence_Nightingale">Florence Nightingale</a> pioneered the use of the circular area charts to show that more British soldiers had died during the Crimean War as a result of poor hygienic conditions in battlefield hospitals than in combat. Her famous charts eventually became known as the &#8220;coxcombs&#8221; of a voluminous Royal Commission report—not because they looked like the crest of a rooster, but because they served as the most colorful and ostentatious part of it that immediately communicated useful information, and galvanized public support for reforms.</p>
<p><a href="http://understandinguncertainty.org/node/213"><img src="http://interactiondesign.sva.edu/classes/datavisualization/files/2010/07/florence_nightingale-coxcomb.jpg" alt="Florence Nightingale&#039;s Coxcomb" width="500" height="332" /></a></p>
<p>Perhaps the most notable innovator of information graphics during this period was <a href="http://en.wikipedia.org/wiki/Charles_Minard">Charles Minard</a>, who in 1869 published a geographical chart illustrating the decimation of Napoleon&#8217;s army during the 1812 Russian campaign. Popular visualization critic <a href="http://en.wikipedia.org/wiki/Edward_Tufte">Edward Tufte</a> says that this &#8220;may well be the best statistical graphic ever drawn&#8221;, and rightly so:</p>
<p><a href="http://en.wikipedia.org/wiki/File:Minard.png"><img src="http://upload.wikimedia.org/wikipedia/commons/thumb/2/29/Minard.png/520px-Minard.png" width="520" alt="Charles Minard's chart of Napoleon's 1812 Russian campaign" /></a></p>
<p>Many other &#8220;greatest hits&#8221; of visualization were invented in the 1800s, and most of them are chronicled in both Tufte&#8217;s <a href="http://www.edwardtufte.com/tufte/books_vdqi">books</a> and the <a href="http://datavis.ca/milestones/">Milestones in Visualization project</a> (from which I&#8217;ve culled many of these examples).</p>
<h4 id="dark-ages">The So-Called &#8220;Dark Ages&#8221;</h4>
<p>The 1900s saw the rise of a more formal, empirical attitude toward visualization, which tended to focus on aspects such as color, value scales, and labeling. Willard Cope Brinton&#8217;s <i>Graphic Presentation</i> details hundreds of charts, graphs, and maps; and suggests methods for improving the legibility of each form. You can read the entire book for free online at <a href="http://www.archive.org/details/graphicpresentat00brinrich">archive.org</a>, or check out Michael Stoll&#8217;s <a href="http://www.flickr.com/photos/mstoll/sets/72157619121678127/">selected photos of the hard copy</a>:</p>
<p><a href="http://www.flickr.com/photos/mstoll/3592480739/in/set-72157619121678127/"><img src="http://farm4.static.flickr.com/3359/3592480739_4a0482c6a2.jpg" width="500" height="387" alt="Willard Cope Brinton – Graphic Presentation /_03 by Michael Stoll on Flickr" /></a></p>
<p>In the mid-1900s cartographer and theorist <a href="http://en.wikipedia.org/wiki/Jacques_Bertin">Jacques Bertin</a> published his <i>Semiologie Graphique</i>, which <a href="http://www.infovis.net/printMag.php?lang=2&amp;num=116">some say</a> serves as the theoretical foundation of modern information visualization. While most of his patterns are either outdated by more recent research or completely inapplicable to digital media, many are still very relevant to what we&#8217;re doing in this course. Particularly, his definition of six <em>visual variables</em> is directly applicable in any graphical visualization. John Krygier and Denis Wood more recently incorporated these variables into their book, <a href="http://www.amazon.com/gp/product/1593852002?ie=UTF8&amp;tag=theelearningc-20&amp;linkCode=as2&amp;camp=1789&amp;creative=390957&amp;creativeASIN=1593852002">Making Maps: A Visual Guide to Map Design for GIS</a>, which expands upon Bertin&#8217;s theories and illustrates how each variable, when applied to various map symbologies, better communicates quantitative or qualitative differences in geographical data. This image is from <a href="http://understandinggraphics.com/visualizations/information-display-tips/">Understanding Graphics</a>:</p>
<p><a href="http://understandinggraphics.com/visualizations/information-display-tips/"><img src="http://understandinggraphics.com/wp-content/uploads/2010/01/retinal-variables.png" width="520" alt="" /></a></p>
<h4 id="aughts">Recent History</h4>
<p>Fast forward about 50 years, and here we are in the 2000s. In the last ten years the internet has emerged as a new medium for visualization, and brought with it a bag full of new tricks. Not only has the worldwide, digital distribution of both data and visualization made them more accessible to a broader audience (raising <a href="http://www.visual-literacy.org/periodic_table/periodic_table.html">visual literacy</a> along the way), but it has also spurred the design of new forms that incorporate interaction, animation, graphics rendering technology unique to screen media, and real-time data feeds to create immersive environments for communicating and consuming data. On the internet, visualization has graduated from the status of chart sidebar on a newspaper page to <a href="http://www.nytimes.com/interactive/2009/07/31/business/20080801-metrics-graphic.html">the interface that tells the story</a>.</p>
<p>People are, seemingly all of a sudden, interested in <em>data</em>; and that interest has in turn sparked a need for visual tools that help them understand it. Visualization, in response to this need, has become increasingly dynamic. It&#8217;s no longer practical to create most charts or graphs by hand. Instead, we&#8217;ve designed new patterns for dynamic value scales; new interfaces for interactively manipulating chart dimensions, such as time; and we&#8217;ve developed new tools for managing data. For example:</p>
<ul>
<li><a href="http://www.google.com/finance?q=NASDAQ:AAPL">Google Finance</a> has popularized the interactive timeline chart; and their <a href="http://spreadsheets.google.com">Spreadsheets</a> offering has effectively removed the software barriers (in particular, expensive desktop applications such as Microsoft Excel) to collecting and storing data.</li>
<li><a href="http://manyeyes.alphaworks.ibm.com/manyeyes/">IBM Many Eyes</a>, by Martin Wattenberg and Fernanda Viegas, has made it possible to plug arbitrary data into a variety of well-designed, interactive visualizations that can be embedded and shared elsewhere on the internet.</li>
<li>Nicholas Felton&#8217;s <a href="http://daytum.com/">Daytum</a> is a delightfully simple tool for collecting and displaying everyday data that describes our own personal habits, thoughts, and aspirations.</li>
</ul>
<p>Cheap hardware sensors and DIY frameworks for <a href="http://www.arduino.cc/">building your own</a> are driving down the costs of collecting analog data. Countless other applications, software tools, and low-level code libraries are springing up even as I write this to help people collect, organize, manipulate, visualize, and understand data from practically any source. The internet has also served as a fantastic distribution channel for visualizations; and a diverse (though not very &#8220;tight-knit&#8221;) community of designers, programmers, cartographers, tinkerers, and data wonks has assembled to disseminate all sorts of new ideas and tools for working with data in both visual and non-visual forms. Here is just a tiny sampling of my favorite visualization projects on the web:</p>
<ul>
<li><a href="http://www.nytimes.com/interactive/2009/11/06/business/economy/unemployment-lines.html">The Jobless Rate for People Like You</a>, by The Times&#8217; <a href="http://shancarter.com/">Shan Carter</a>, illustrates unemployment rates for 192 demographical permutations of race, gender, age, and education.</li>
<li><a href="http://www.marumushi.com/apps/newsmap/">Marcos Weskamp&#8217;s Newsmap</a> is, even by today&#8217;s standards, the most beautiful interactive treemap on the internet.</li>
<li><a href="http://www.babynamewizard.com/voyager">Martin Wattenberg&#8217;s Baby Name Voyager</a> is an interactive stacked area chart that visualizes 120 years of baby naming trends.</li>
<li><a href="http://benfry.com/salaryper/">Ben Fry&#8217;s Salary vs. Performance</a> is an interactive visualization comparing American baseball teams&#8217; sum player salaries with game winnings.</li>
<li><a href="http://www.ge.com/visualization/health_visualizer/">GE&#8217;s Health Visualizer</a> visualizes American health statistics and allows you to interactively compare gender, risk factors, and conditions.</li>
<li><a href="http://www.everyblock.com/">EveryBlock</a>, though not particularly innovative in terms of visualization, presents a common visual vocabulary for quantitative and geographical data and applies it to everything from building inspections, to restaurant reviews, to crime reports.</li>
<li><a href="http://www.good.is/departments/transparency/">GOOD Magazine&#8217;s Transparency series</a> regularly features bespoke visualizations by a variety of designers on topics, from serious topics like <a href="http://awesome.good.is/transparency/web/1005/oil-consumption/flat.html">energy</a> and <a href="http://awesome.good.is/transparency/web/1006/rise-of-walking-and-biking/flat.html">transportation</a> to decidedly less political ones such as <a href="http://awesome.good.is/transparency/012/trans012animals.html">zoo populations</a>.</li>
<li>I&#8217;d be remiss if I neglected to mention the <a href="http://labs.digg.com/stack/">Stack</a>, <a href="http://labs.digg.com/swarm/">Swarm</a>, and <a href="http://labs.digg.com/arc/">Arc</a> visualizations from <a href="http://labs.digg.com/">Digg Labs</a>, which all present the same real-time social news activity in their own ways. My colleagues and I at <a href="http://stamen.com">Stamen</a> both developed these visualizations and designed the API (or <a href="http://en.wikipedia.org/wiki/Application_programming_interface">Application Programming Interface</a>) that drives them.</li>
</ul>
<p>You&#8217;ll find many, many more fantastic examples (published on the web and elsewhere) on sites such as <a href="http://infosthetics.com/">information aesthetics</a>, <a href="http://flowingdata.com/">Flowing Data</a>, and <a href="http://www.visualcomplexity.com/vc/">visual complexity</a>.</p>
<p><a href="http://maps.google.com">Google Maps</a> has also single-handedly democratized both the interface conventions (click to pan, double-click to zoom) and the technology (256-pixel square map tiles with predictable file names) for displaying interactive geography online, to the extent that most people <em>just know what to do</em> when they&#8217;re presented with a map online. <a href="http://www.adobe.com/products/flashplayer/">Flash</a> has served well as a cross-browser platform on which to design and develop rich, beautiful internet applications incorporating interactive data visualization and maps; and now, new browser-native technologies such as <a href="https://developer.mozilla.org/en/HTML/Canvas">canvas</a> and <a href="https://developer.mozilla.org/en/SVG">SVG</a> (sometimes collectively included under the umbrella of <a href="https://developer.mozilla.org/en/HTML/HTML5">HTML5</a>) are emerging to challenge Flash&#8217;s supremacy and extend the reach of dynamic visualization interfaces to mobile devices.</p>
<p>Advocates for various causes have also embraced visualization as a medium for communicating the breadth and depth of the problems they seek to communicate and, ultimately, solve. Hans Rosling, an expert on world development, used a specially designed tool called <a href="http://www.gapminder.org/">Gapminder</a> as a storytelling device in this rousing, visualization-driven <a href="http://www.ted.com/talks/hans_rosling_shows_the_best_stats_you_ve_ever_seen.html">TED talk</a>. Artists have also latched onto visualization as a medium for expressing information. <a href="http://chrisjordan.com/">Chris Jordan</a>, who creates sprawling images of American consumption, explains in his own <a href="http://www.ted.com/index.php/talks/chris_jordan_pictures_some_shocking_stats.html">TED talk</a> how visualization is an effective and necessary means for evoking emotional responses to data.</p>
<h3 id="data">What <i>Is</i> Data, Anyway?</h3>
<p>So what, exactly, are we referring to when we say &#8220;data&#8221;? York University professor and data visualization historian <a href="http://www.datavis.ca/personal/">Michael Friendly</a> defines it as &#8220;information which has been abstracted in some schematic form, including attributes or variables for the units of information&#8221;. I find it useful, though, not to think of data as an abstraction; but rather as an <em>expression of occurrence</em>. At <a href="http://stamen.com">Stamen</a> we&#8217;re fond of saying that our favorite data to work with is anything is created by humans, such as:</p>
<ul>
<li>Activity on social networking sites</li>
<li>Geographical locations and categorizations of crime</li>
<li>Health, education, and economic indicators for nations of the world, over time</li>
<li>Financial transactions, often categorized by the type of goods or service they purchased; or grouped by day, month, or financial quarter when they relate to businesses</li>
<li>Tons of CO<sub>2</sub> emitted by specific activities, people&#8217;s aggregate activity, averaged by nation, etc.</li>
<li>Web site visits, typically grouped by time and date</li>
</ul>
<p>All of these examples can be expressed in tabular structure, which you&#8217;ll commonly encounter stored as an Excel file or in a <a href="http://docs.google.com/">Google Docs</a> spreadsheet. In fact, most of the data that you&#8217;ll ever encounter will be tabular, and even some data models that you wouldn&#8217;t think of as tabular can be expressed as rows and columns, or what we sometimes refer to as a matrix. The spreadsheet, despite having been tarred by Microsoft&#8217;s notoriously unwieldy tools and proprietary file formats, is today the most well-understood and universal form in which to encode and transmit data.</p>
<h4 id="data-formats">Data Formats</h4>
<p>Typically, tabular data will be provided as a raw text CSV (Comma-Separated Value) file or a binary Excel file. Excel files can be converted into CSV by uploading them to <a href="http://docs.google.com">Google Docs</a> and exporting them from there. Here are some sites that provide CSV or Excel versions of some, if not all, of their data:</p>
<ul>
<li><a href="http://www.data.gov/">Data.gov</a>, the federally maintained clearinghouse for US government data</li>
<li><a href="http://www.nyc.gov/html/datamine/html/data/raw.shtml">NYC data mine</a></li>
<li><a href="http://data.un.org/">UNdata</a>, the United Nations data catalog</li>
<li><a href="http://www.gapminder.org/data/">Gapminder&#8217;s global indicators</a> are the numbers behind Hans Rosling&#8217;s <a href="http://www.ted.com/speakers/hans_rosling.html">amazing TED talks</a></li>
<li><a href="http://data.worldbank.org/data-catalog">World Bank Data Catalog</a></li>
</ul>
<h4>Data Models</h4>
<p>A tabular structure can be used to express a variety of different <i>data models</i>. A data model, as far as it relates to a table, is simply a description of its rows and columns. In most models columns represent <i>attribute</i> or <i>variable</i> names, and each row represents a sample with a value for each column. Consider the following:</p>
<p><img src="http://interactiondesign.sva.edu/classes/datavisualization/files/2010/07/table01.png" alt="An example table" width="517" height="109" /></p>
<p>Here we&#8217;ve got several columns, each of which might serve as a potential variable for visualization. Age and income stand out as the ones most suitable for graphing, simply because they&#8217;re both numbers. Gender, if we had more than three rows, might be an attribute suitable for coloring dots or lines. Profession, like gender, is a <em>qualitative</em> (referring to quality rather than quantity, sometimes referred to as <em>categorical</em>) variable that might not lend itself to any particular visual distinction; but it may serve as an interesting filter. Name is the one that I would refer to as the &#8220;identifier&#8221;, the (hopefully) unique column that we could use to label individual points.</p>
<p>Some data sets may not have unique identifiers for each row, but may instead describe changes in one variable relative to another. For instance, a table of average temperatures for major American cities over time wouldn&#8217;t need unique identifiers for each row; we could make a line chart that connects the plotted points for each city with a unique color. (It&#8217;s worth noting that, in data sets which describe changes in one or more variables over time, time is itself one of the variables.)</p>
<h4 id="aggregation">Aggregation and Granularity</h4>
<p>One important thing to note is that most data is not, in fact, &#8220;raw&#8221; in any sense. The word &#8220;raw&#8221;, to me, implies collection closest to the source of the activity that it describes. Financial information is rarely expressed as a list of transactions, but as an <i>aggregation</i> of transaction totals grouped by uniform time periods. Aggregation is not necessarily a bad thing, though, because most data sets aren&#8217;t particularly revealing in their &#8220;raw&#8221; form. We typically refer to most aggregations and mathematical operations on more granular data as <a href="http://en.wikipedia.org/wiki/Statistics">statistics</a>.</p>
<p>Aggregations of highly granular data allow us to understand changes in variables over long periods of time; at the scales of cities or countries; or relative to particular demographic groups, such as gender, age, and political orientation. There is such a thing as &#8220;premature aggregation&#8221;, though. Collecting data and &#8220;bucketing&#8221; it without recording all of the variables may result in a loss of useful information, and later prevent aggregation along interesting axes.</p>
<h4 id="special-variables">&#8220;Special&#8221; Variables</h4>
<p>As you can see by browsing through some of the data catalogs I&#8217;ve listed above, most data is not limited by having to be expressed in a table. There are some exceptions that you should be aware of, though, which I&#8217;m going to refer to as &#8220;special&#8221; because they deserve extra attention when it comes to formatting input values, labeling, and positioning.</p>
<h5 id="variable-time">Time</h5>
<p>As mentioned previously, time is one of the most common variables. Time is special because it can be represented on many different scales, from the second (and, in some cases, the millisecond) to one or more years. Often times, the rawest forms of data are aggregated into tables that list totals of other variables grouped by hour, day, month, or year. You&#8217;ll need to be sensitive to the time scale of certain data sets when plotting them on charts and graphs.</p>
<h5 id="variable-location">Location</h5>
<p>The word &#8220;location&#8221; can mean many things:</p>
<ul>
<li>A street address</li>
<li>The name of a specific place, such as a park or lake</li>
<li>A more general area, such as city, state, or country</li>
<li>Precise geographical coordinates, typically expressed as latitude and longitude</li>
</ul>
<p>To some degree of precision, the the first three can be turned into the last. <a href="http://maps.google.com/">Google Maps</a>, for example, exists primarily as a service for translating an address or place name into geographical coordinates so that it can be pinpointed on a map. This process is generally called <a href="http://en.wikipedia.org/wiki/Geocoding">geocoding</a>, and can be <a href="http://en.wikipedia.org/wiki/Reverse_geocoding">reversed</a> to derive place names from geographical coordinates. Latitude and longitude aren&#8217;t typically necessary (or even useful) unless you&#8217;re working with tools that understand them, though, so we&#8217;ll likely be focusing on the less specific location types. Street addresses, for instance, can be plotted on a map that lists block numbers (or you can just <a href="http://maps.google.com/maps?q=132+W+21st+St,+New+York,+NY">Google the address</a> and use the map as guidance).</p>
<p>Location, like time, is also ripe for aggregation. Certain data is less much less interesting at the street level than it is at the neighborhood or city level. The <a href="http://www.census.gov/">US Census Bureau</a> collects multivariate data aggregated by &#8220;tracts&#8221;, which they designate to contain a relatively uniform number of people so that they can be compared independent of population density. Tract statistics are then aggregated into cities, counties, and states. For privacy reasons, the Census Bureau never releases the &#8220;raw&#8221; results of their surveys.</p>
<h4 id="model-patterns">Data Model Patterns</h4>
<p>A number of common patterns have arisen to deal with the visualization of common data models, which often deal with one or both of our &#8220;special&#8221; variables: time and location. The line chart is an obvious example that tracks the change of one or more variables over time. The <a href="http://en.wikipedia.org/wiki/Choropleth_map">choropleth</a>, or &#8220;heat map&#8221;, is another. This is the New York Times&#8217; excellent 2008 electoral map which demonstrates aggregation of presidential voting tallies by both county and state:</p>
<p><a href="http://elections.nytimes.com/2008/results/president/map.html"><img src="http://interactiondesign.sva.edu/classes/datavisualization/files/2010/07/ntyimes_electoral_map01.png" alt="New York Times 2008 Presidential Election Map" width="520" height="349" /></a></p>
<p>We&#8217;ll investigate a few of these patterns in the next three weeks.</p>
<h3>Next Week</h3>
<p>Next week we&#8217;re going to visualize some data. Your homework in the meantime is to collect three distinct data sets that you find interesting, and would like to learn something about through the process of visualization. For the purposes of sorting and filtering data, your tables should be saved either in Excel (or Numbers) locally, or on <a href="http://docs.google.com">Google Docs</a>, which allows you to upload CSV and binary spreadsheets from Excel or Numbers. Feel free to share your spreadsheets with me (shawn at stamen dot com) and I will gladly look them over to ensure that they&#8217;re usable for next week&#8217;s exercise.</p>
<p>For inspiration and some more historical perspective, <a href="http://edwardtufte.com/">Edward Tufte&#8217;s</a> books come highly recommended. The <a href="http://oreilly.com/catalog/0636920000617">Beautiful Visualization</a> O&#8217;Reilly book also provides a nice overview of both modern visualization and some of the canonical classics.</p>
<p>If you&#8217;re interested in creating your own data set, I would recommend reading <a href="http://www.nytimes.com/2010/05/02/magazine/02self-measurement-t.html">The Data-Driven Life</a>, a Times article summarizing a decade&#8217;s worth of innovation in the field of self-quantification via automated and/or obsessive data collection. I can highly recommend both <a href="http://daytum.com/">Daytum</a> and <a href="http://your.flowingdata.com/">your.flowingdata</a> as collection tools, but you may find it easier just to carry around a pad and paper.</p>
<p>Good luck, and see you next week!</p>
]]></content:encoded>
			<wfw:commentRss>http://interactiondesign.sva.edu/classes/datavisualization/2010/07/08/introduction/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

