Data visualization’s most recognized form is the chart. You’ve seen charts all over the place, from PowerPoint presentations and stock tickers to public polling results and election predictions. Even the humble and oft-misused pie chart, though derided by visualization critics for its perceptual shortcomings, is still useful for comic effect:

As a society we’ve practically dismissed many of the popular forms of charting as useless because most of the charts that we see are just ugly at best or, at worst, fail to communicate any actionable information. But while charts are often deemed failures unless they illustrate dramatic changes or unseen trends, their increasing abundance in popular media has also led to an increase in literacy that makes our job of communicating visual information a lot easier than it has been historically. (We don’t have to explain to our audience what a time series is anymore!) Note, though, that it’s important to keep common assumptions in mind when you’re creating graphs. For instance, since most people expect time to be represented left to right on the x-axis, presenting it vertically or from right to left may confuse your audience no matter how clearly your axes are marked.
Before we get into the perceptual (and even cultural) qualities of various charting forms, though, let’s step back and wrap our heads around what it is, exactly, that makes a chart a chart.
Anatomy of a Chart
With respect to Bertin’s variables, charts deal primarily with the position and size of visual elements. For our purposes, charts have at least one axis (timelines are an example of a chart with only one real axis) along which elements are placed to distinguish varying values from one another. I’m also intentionally excluding the genre of “big infographics” that lack any perceptual component whatsoever, because that’s what essentially distinguishes a “chart” from a “diagram”.
In most charts, cartesian coordinates describe the position of an element relative to one or more linear axes, commonly called x on the horizontal and y on the vertical, and written as (x,y). In computer screen coordinate systems (specifically, on web pages and in most visual programming environments) the upper lefthand corner serves as the origin, or (0, 0). As x values increase an element moves toward the right edge of the screen, and positive y values move the element toward the bottom. On paper we may choose to think of the origin as the lower lefthand corner, and position positive y values above it.
It’s important to note that axes can be made for both quantitative (numeric, or continuous) and qualitative (categorical, or discrete) variables. The humble bar chart’s quantitative axis (in this case, y) determines the height of each bar, and the other (x) evenly spaces out each bar so that its height can be easily compared to the others:
Often, as is the case in the above graph, the elements are sorted on the discrete axis according to their value on the other so that you can easily see the distribution of values in the set. The histogram, a cousin to the bar chart in some respects, replaces the qualitative axis with a quantitative one. The time series plots continuous values of a quantitative variable over time, usually on the horizontal axis. For some other examples, check out Nathan Yau’s guide to visualizing changes over time.
The more generalized scatter plot is particularly useful for illustrating the relationship between two quantitative variables. This one, also from Wikipedia, plots eruptions of the Old Faithful geyser in Yellowstone National Park using two variables: the duration of each eruption on the horizontal, and the time since the previous eruption on the vertical:
Polar coordinates are used to plot points in circular arrangements, such as pie and radar charts. In this system, coordinates are expressed not as x and y, but as angle and radius. Polar charts are best suited for plotting cyclical values, such as wind direction, time of day (i.e., a clock), or categorical values that, when displayed as small multiples, can reveal similarities in shape:
For more examples, check out A Tour through the Visualization Zoo by Stanford Vis Group’s Jeff Heer, Michael Bostock and Vadim Ogievetsky, which profiles a variety of common visualization forms made with their protovis library. And if you’re going to plot more than two variables against one another using only position, you might consider the ternary plot, 3D, animation, or even an interactive interface that allows the user to adjust one of the variables in realtime.
Scales
Rarely will you find a data set expressed in terms of the same coordinates used to display it. In order to convert data values into display coordinates we apply one or more scales. A scale is the means by which we plot a variable on a given axis. Each scale has a minimum and a maximum (usually built from the calculated minima and maxima, but sometimes chosen specifically to over- or under-emphasize distributions), and defines a method for interpolating values between them. The linear scale on this NOAA chart shows the reader how to convert measurements on a map into distances in real life:
Let’s take another look at the example table from my introductory blog post:

If we wanted to create a bar chart of the subjects’ incomes, we would need to devise a scale for the y axis. The natural minimum for this scale would be 100, and the maximum 30,000. This example is easy because there are only 3 elements to plot: Jane goes at the bottom of the scale, and Alex at the top. In order to figure out where Joe goes, though, we have to do a little bit of math. Here I’m using y here as a relative measurement of how far along the scale the value n should be positioned, where 0 would be the bottom and 1 the top. This is generally referred to as a process of normalization:
y = (n - min) / (max - min) y = (20,000 - 100) / (30,000 - 100) y = 19,900 / 29,900 y = ~0.665
So, if our chart were 100 pixels tall, Joe’s bar would have a height of 66 pixels (or 67, if we round up):

One problem with this, though, is that Jane’s bar essentially has zero height because her low income corresponds to the bottom of the scale. ((100 - 100) / (30,000 - 100) = 0) We can’t really “fix” that, but we can make it clearer—and avoid having to use a calculator!—by thinking of the y axis as 100-dollar increments (the greatest common divisor of this particular collection) and setting the minimum of the scale to zero. This way, you simply divide each number by 100 to get the height in pixels; so Jane’s bar is exactly 1 pixel tall, Joe’s is 200, and Jane’s is 300. It also simplifies the labeling of the vertical axis, because you can split it into nice, round numbers:

Obviously this is an over-simplified example, but I hope that it illustrates why your choice of scale is important. We can emphasize or de-emphasize variances by making our bar charts short or tall, or we can intentionally set the scale minimum or maximum to a value outside the range of the data, as in “Miracles in nature and Science”, from the Words and Years exhibit by Toril Johannessen, which plots the number of occurrences of the word “miracle” over time in the the two eponymous periodicals:
Of course, it’s worth mentioning that unscaled values in their original unit of measurement might better suit some contexts for visualization than scaled values. This energy saving campaign depicts greenhouse gases produced by energy use as black balloons, each containing the volumetric equivalent of 50 grams. Imagine seeing Chris Jordan’s field of plastic bottles in real life. Most data sets probably aren’t worth expressing natively like this, but you should certainly consider displays that emphasize the physical dimensions of a particular data set as a useful way of drawing attention or raising awareness.
Other Visual Aspects
Once we’ve exhausted the physical dimensions of our chart as a means to communicate information, we may need to resort to modifying some other visual aspects of our elements:
Color
Color, with respect to Bertin’s variables, is expressed in two ways:
- Hue: the color itself—red, blue, green, orange, purple, etc.
- Value: the brightness, or intensity of a color. You can think of this as some combination of the value and saturation components in the HSV color space.
We’ll go a bit more in depth on color in the next couple of weeks. For now, though, let’s see how far we can get without having to use it. Feel free to experiment with varying color for categorical variables, but be warned that creating color scales for continuous variables is fraught with peril.
Shape
Varying the shapes of visual elements is a great way of encoding categorical variables. We’ll touch on a couple examples of this with your data sets tonight if it’s applicable.
Size
Size is well suited for positional arrangements on multiple axes, such as scatter plots. Gapminder, for instance, tends to encode a country’s population in its dot size. Note that, in many cases, research has revealed that circles of varying sizes are difficult for people to compare because we tend to interpret the area of a circle more easily than its radius. You can calculate a proportional radius by taking the square root of the desired area divided by pi:
r = sqrt(area / π)
And vice-versa, the area from a radius:
area = π • r2
Texture
Texture is often useful in visualization forms like bar and area charts, in which you may wish to encode a categorical variable of each element. It’s also particularly useful in maps to denote different types of area or foliage.
Visual Perception
Rigorous scientific research of visual perception is not a particularly recent development. As noted previously, figures like Willard Cope Brinton and Jacques Bertin illuminated many of the problems common to the statistical graphics of the 20th century and attempted to codify rules for designing representations that people could better understand. Statistical analyst John Tukey contributed a significant body of work not only to the practice of statistical analysis itself, but also to the modern-day understanding how people “read” visual representations of data. More recently, William S. Cleveland and Robert McGill unveiled the findings of their research on the perception of visual cues in their paper Graphical Perception: Theory, Experimentation, and Application to the Development of Graphical Methods [PDF], published in the Journal of the American Statistical Association. They found the following aspects of visual elements to be most successful (this ranked list comes from Nathan Yau’s blog post on graphical perception):
- Position along a common scale e.g. scatter plot
- Position on identical but nonaligned scales e.g. multiple scatter plots
- Length e.g. bar chart
- Angle & Slope (tie) e.g. pie chart
- Area e.g. bubbles
- Volume, density, and color saturation (tie) e.g. heatmap
- Color hue e.g. newsmap
Some researchers are even studying the aesthetic qualities of visualization in an attempt to learn which forms people find most beautiful. A less formal, but no less actionable, form of visual perception research and analysis is taking place in books by Edward Tufte, and sites like chartjunk (and junkcharts) exist to critique charts in the media (and sometimes correct them quite handily). Some business journals regularly feature articles that suggest graphing strategies for particular types of data. The Extreme Presentation blog published this guide that suggests specific types of charts for certain types of data, or aspects of it to be visualized (or jump straight to the PDF):
Visualization as a Process
As we create our visualizations, it’s important to consider that process as a way to learn something new about the data—to derive new information from it. Try out as many of the forms as you can (within reason of course, and keeping in mind which ones are appropriate for different types and aspects of data), and see if you can draw any interesting conclusions from the distribution of particular values (remember to sort your values first!), or find potential correlations between two variables (by matching up two different sources of data with a common variable, or by using a scatter plot). Perhaps most importantly of all, save your work often (whether that keeping a paper sketch or saving multiple versions of a file on your hard drive) and create artifacts along the way. Even experiments gone “wrong” can produce clues for how to visualize particular aspects of your data differently.
Homework UPDATED!
I’ll be posting a new entry with some specifics about your updated homework. Stay tuned!














The Value of Many Eyes
When we last left off, I was leading the class on a charting expedition. My intention was to do this on paper, under the assumption that, if we used a medium with which everyone is familiar, we could avoid getting hung up on the implementation details (namely, programming syntax). The class decided that this wasn’t the best use of their time, though—and I admit that charting with pens and using calculators to interpolate values on scales would have been tedious, but instructive nonetheless—so last week we took a look at Many Eyes, a project by the fine folks at the IBM Visual Communication Lab.
Many Eyes’ goal, according to its creators, is “to ‘democratize’ visualization and to enable a new social kind of data analysis,” the idea being that both the use of social visualization tools and the public release of the underlying data can lead to new insights. To test this theory, I played with a few of the data sets already uploaded to the site and sought out a few of my own to contribute. In just a few hours, I had:
The day’s experiment was successful, so last week I had the class use Many Eyes as a tool for visualizing their own data sets. There were some issues, particularly:
The biggest issue, however, appeared to be that students quickly ran into the limitations of Many Eyes visualizations. They wanted to change the colors, filter the data interactively, or cross-refernce multiple data sets. We learned in this process that, while Many Eyes is a great tool for creating an initial picture of a data set, it doesn’t provide “all” of the tools one would need to really explore their data. There are many other sites and paid products which claim to do just that, but it should be obvious to anyone who’s used them that no generalized system (yet, anyway) is capable of adjusting itself to suit the needs of every possible data set.
As a remedy for this, I suggested that the students use Many Eyes (or another service) to do the heavy lifting of deriving scales and interpolating values, then use its visual output as input for a more manual, bespoke visualization process. Once you’ve fed your data set into a bar or bubble chart, you can use the calculated relative sizes of each mark as the basis for a new representation. E.g.:
Don’t assume that you can’t make something interesting without programming skills. Wield the tools that you already know how to use and do whatever it takes to bang the data—be they the original numbers or the normalized output sizes—into a form that you can work with.
And when you bump up against those limits, then you can consider taking up programming and using visualization libraries like protovis (with JavaScript) or Flare (with Flash). And if you get into visualizing large amounts of data you’ll quickly discover that neither Flash nor SVG are capable of moving more than a couple of thousands of points around on the screen at once, at which point you’ll have to either resort to working with aggregations of data and creating interactive interfaces to filter those points into manageable subsets; you’ll decide to employ non-interactive tools to generate static representations of data; or you’ll discover hardware acceleration in more “serious”—and, unfortunately, less “web-friendly”—programming languages like Java and C.
My point isn’t that this stuff is particularly “hard”, but rather that it’s only worth really figuring out if it’s applicable to your goals and you have the time to learn it. Just as visualization can be seen as a process for educating yourself about data, visualization is also a useful programming exercise. Many of the Processing examples could be easily adapted to incorporate real data rather than generating a series of random shapes. Conversely, you may wish to use visualization libraries to generate artistic renditions of random data. I spend a lot of time trying to automate or prematurely optimize processes that would probably have taken less time to do manually. Whatever process helps you do what you need to do quickly and effectively, by all means use it.
Coming back to Many Eyes, though, let’s consider the advantages of its visualizations existing within an open structure on the web. The name itself is meaningful: The more eyes on this stuff the better. That’s a belief that we espouse at Stamen too, because the whole point of our work is to make data more relevant, accessible, and desirable. And how can you do that without showing it to lots of people? Above all else, your choices of medium and technology should be driven by your ability to make something with them that you can share. A series of static graphics on the web is more interesting and useful than the half-baked interactive interface that you started but never finished. And incremental improvements made in the public eye both help other people learn about your process and invite valuable input along the way. There’s nothing much worse than spending a lot of time on something in private, only to have it picked apart once you finally release it to the world.
Use the tools with which you’re comfortable, and experiment with new stuff if you’ve got the time. In the words of my esteemed peer Matt Jones, though, and above all else, just: