<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Nick Diakopoulos</title>
	<atom:link href="http://www.nickdiakopoulos.com/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.nickdiakopoulos.com</link>
	<description>Musings on Media</description>
	<lastBuildDate>Fri, 12 Apr 2013 14:38:52 +0000</lastBuildDate>
	<language>en-US</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.5</generator>
		<item>
		<title>Storytelling with Data: What Are the Impacts on the Audience?</title>
		<link>http://www.nickdiakopoulos.com/2013/04/12/storytelling-with-data-what-are-the-impacts-on-the-audience/</link>
		<comments>http://www.nickdiakopoulos.com/2013/04/12/storytelling-with-data-what-are-the-impacts-on-the-audience/#comments</comments>
		<pubDate>Fri, 12 Apr 2013 14:38:52 +0000</pubDate>
		<dc:creator>Nick</dc:creator>
				<category><![CDATA[data]]></category>
		<category><![CDATA[Storytelling]]></category>
		<category><![CDATA[visualization]]></category>

		<guid isPermaLink="false">http://www.nickdiakopoulos.com/?p=957</guid>
		<description><![CDATA[Storytelling with data visualization is still very much in its &#8220;Wild West&#8221; phase, with journalism outlets blazing new paths in exploring the burgeoning craft of integrating the testimony of data together with compelling narrative. Leaders such as The News York Times create impressive data-driven presentations like 512 Paths to the White House (seen above) that weave complex <a href="http://www.nickdiakopoulos.com/2013/04/12/storytelling-with-data-what-are-the-impacts-on-the-audience/"> read more <span class="meta-nav">&#187;</span></a>]]></description>
				<content:encoded><![CDATA[<p><img class="aligncenter" alt="" src="http://www.niemanlab.org/images/512-paths_crop-700x463.png" width="600" height="auto" /></p>
<p>Storytelling with data visualization is still very much in its &#8220;Wild West&#8221; phase, with journalism outlets blazing new paths in exploring the burgeoning craft of integrating the testimony of data together with compelling narrative. Leaders such as The News York Times create impressive data-driven presentations like <a href="http://www.nytimes.com/interactive/2012/11/02/us/politics/paths-to-the-white-house.html">512 Paths to the White House</a> (seen above) that weave complex information into a palatable presentation. But as I look out at the kinds of meetings where data visualizers converge, like <a href="http://eyeofestival.com/">Eyeo</a>, <a href="http://www.tapestryconference.com/">Tapestry</a>, <a href="http://openvisconf.com/">OpenVis</a>, and the infographics summit <a href="http://www.malofiej21.com/">Malofiej</a>, I realize there’s a whole lot of inspiration out there, and some damn fine examples of great work, but I still find it hard to get a sense of direction — which way is West, which way to the promised land?</p>
<p>And it occurred to me: We need a science of data-visualization storytelling. We need some direction. We need to know what makes a data story &#8220;work&#8221;. And what does a data story that &#8220;works&#8221; even mean?</p>
<p>Examples <a href="http://www.marijerooze.nl/thesis/graphics/">abound</a>, and while we have theories for color use, visual salience and perception, and graph design that suggest how to depict data efficiently, we still don&#8217;t know, with any particular scientific rigor, which are better stories. At the Tapestry conference, where I attended, journalists such as <a href="http://https//twitter.com/13pt">Jonathan Corum</a>, <a href="http://https//twitter.com/hfairfield">Hannah Fairfield</a>, and <a href="http://https//twitter.com/cephillips">Cheryl Phillips</a> whipped out a staggering variety of examples in their presentations. Jonathan, in his keynote, talked about &#8220;<a href="http://projects.nytimes.com/guantanamo">A History of the Detainee Population</a>&#8221; an interactive NYT graphic (partially excerpted below) depicting how Guantanamo prisoners have, over time, slowly been moved back to their country of origin. I would say that the presentation is effective. I &#8220;got&#8221; the message. But I also realize that, because the visualization is animated, it&#8217;s difficult to see the overall trend over time — to compare one year to the next. There are different ways to tell this story, some of which may be more effective than others for a range of storytelling goals.</p>
<p><img class="aligncenter" alt="guantanamo" src="http://www.niemanlab.org/images/guantanamo-700x307.jpg" width="600" height="auto" /></p>
<p>Critical blogs such as <a href="http://thewhyaxis.info/">The Why Axis</a> and <a href="http://thesocietypages.org/graphicsociology/">Graphic Sociology</a> have arisen to try to fill the gap of understanding what works and what doesn&#8217;t. And research on <a href="http://www.nickdiakopoulos.com/2011/08/13/unpacking-visualization-rhetoric/">visualization rhetoric</a> has tried to situate narrative data visualization in terms of the rhetorical techniques authors may use to convey their story. Useful as these efforts are in their thick description and critical analysis, and for increasing visual literacy, they don&#8217;t go far enough toward building predictive theories of how data-visualization stories are &#8220;read&#8221; by the audience at large.</p>
<p>Corum, a graphics editor at NYT, has a <a href="http://style.org/tapestry/">descriptive framework</a> to explain his design process and decisions. It describes the tensions between interactivity and story, between oversimplification and overwhelming detail, and between exploration and decoration. Other axes of design include elements such as focus versus depth and the author versus the audience. Author and educator <a href="http://https//twitter.com/albertocairo">Alberto Cairo</a> exhibits similar sets of design dimensions in his book, &#8220;<a href="http://www.nickdiakopoulos.com/2012/09/30/review-the-functional-art/">The Functional Art</a>&#8220;, which start to trace the features along which data-visualization stories can vary (recreated below).</p>
<p><img class="aligncenter" alt="vis wheel" src="http://www.niemanlab.org/images/vis-wheel.png" width="600" height="auto" /></p>
<p>Such descriptions are a great starting point, but to make further progress on interactive data storytelling we need to know which of the many experiments happening out in the wild are having their desired <i>effect</i> on readers. Design decisions like how and where annotations are placed on a visualization, how the story is <i>structured</i> across the canvas and over time, the graphical style including things like visual embellishments and novelties, as well as data mapping and aggregation can all have consequences on how the audience perceives the story. How does the effect on the audience change when modulating these various design dimensions? A science of data-visualization storytelling should seek to answer that question.</p>
<p>But still the question looms: What does a data story that &#8220;works&#8221; even mean? While efficiency and parsimony of visual representation may still be important in some contexts, I believe online storytelling demands something else. What effects on the audience should we measure? As data visualization researcher Robert Kosara writes in <a href="http://kosara.net/papers/2013/Kosara_Computer_2013.pdf">his forthcoming IEEE Computer article</a> on the subject, “there are no clearly defined metrics or evaluation methods … Developing these will require the definition of, and agreement on, goals: what do we expect stories to achieve, and how do we measure it?”</p>
<p>There are some hints in <a href="http://enrico.bertini.me/material/tvcg2011-seven-scenarios.pdf">recent research in information visualization</a> for how we might evaluate visualizations that communicate or present information. We might for instance ask questions about how effectively a message is acquired by the audience: Did they <b>learn</b> it faster or better? Was is <b>memorable</b>, or did they forget it 5 minutes, 5 hours, or 5 weeks later? We might ask whether the data story spurred any personal <b>insights</b> or questions, and to what degree users were &#8220;<b>engaged</b>&#8221; with the presentation. Engaged here could mean clicks and hovers of the mouse on the visualization, how often widgets and filters for the presentation where touched, or even whether users <b>shared</b> or <b>conversed</b> around the visualization. We might ask if users felt they understood the <b>context</b> of the data and if they felt confident in their interpretation of the story: Did they feel they could make an informed decision on some issue based on the presentation? <b>Credibility</b> being an important attribute for news outlets, we might wonder whether some data story presentations are more trustworthy than others. In some contexts a presentation that is <b>persuasive</b> is the most important factor. Finally, since some of the best stories are those that evoke <b>emotional</b> responses, we might ask <a href="http://www.peachpit.com/articles/article.aspx?p=2036558">how to do the same</a> with data stories.</p>
<p>Measuring some of these factors is as straightforward as instrumenting the presentations themselves to know where users moved their mouse, clicked, or shared. There are a variety of <a href="http://alistapart.com/article/quick-and-dirty-remote-user-testing">remote usability testing services</a> that can already help with that. Measuring other factors might require writing and attaching survey questions to ask users about their perceptions of the experience. While the best graphics departments do a fair bit of internal iteration and testing it would be interesting to see what they could learn by setting up experiments that varied their designs minutely to see how that affected the audience along any of the dimensions delineated above. More collaboration between industry and academia could accelerate this process of building knowledge of the impact of data stories on the audience.</p>
<p>I&#8217;m not arguing that the creativity and boundary-pushing in data-visualization storytelling should cease. It&#8217;s inspiring looking at the <a href="http://13pt.com/graphics/">range of visual stories</a> that artists and illustrators produce. And sometimes all you really want is an <i>amuse yeux — </i>a little bit of visual amusement. Let’s not get rid of that. But I do think we&#8217;re at an inflection point where we know enough of the design dimensions to start building models of how to reliably know what story designs achieve certain goals for different kinds of story, audience, data, and context. We stand only to be able to further amplify the <a href="http://www.niemanlab.org/2012/08/metrics-metrics-everywhere-how-do-we-measure-the-impact-of-journalism/">impact of such stories</a> by studying them more systematically.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.nickdiakopoulos.com/2013/04/12/storytelling-with-data-what-are-the-impacts-on-the-audience/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>How does newspaper circulation relate to Twitter following?</title>
		<link>http://www.nickdiakopoulos.com/2013/04/03/how-does-newspaper-circulation-relate-to-twitter-following/</link>
		<comments>http://www.nickdiakopoulos.com/2013/04/03/how-does-newspaper-circulation-relate-to-twitter-following/#comments</comments>
		<pubDate>Wed, 03 Apr 2013 18:59:42 +0000</pubDate>
		<dc:creator>Nick</dc:creator>
				<category><![CDATA[data]]></category>
		<category><![CDATA[newspapers]]></category>
		<category><![CDATA[social media]]></category>

		<guid isPermaLink="false">http://www.nickdiakopoulos.com/?p=950</guid>
		<description><![CDATA[I was recently looking at circulation numbers from the Audit Bureau of Circulation for the top twenty-five newspapers in the U.S. and wondered: How does circulation relate to Twitter following? So for each newspaper I found the Twitter account and recorded the number of followers (link to data). The graph below shows the ratio of <a href="http://www.nickdiakopoulos.com/2013/04/03/how-does-newspaper-circulation-relate-to-twitter-following/"> read more <span class="meta-nav">&#187;</span></a>]]></description>
				<content:encoded><![CDATA[<p>I was recently looking at <a href="http://accessabc.wordpress.com/2012/10/30/the-top-u-s-newspapers-for-september-2012/">circulation numbers from the Audit Bureau of Circulation</a> for the top twenty-five newspapers in the U.S. and wondered: How does circulation relate to Twitter following? So for each newspaper I found the Twitter account and recorded the number of followers (<a href="https://docs.google.com/spreadsheet/pub?key=0AsKbLffq9fg5dGNpMldVS2Npblo2UDNFQzVJS1hXSXc&amp;output=html">link to data</a>). The graph below shows the ratio of Twitter followers to total circulation; you could say it’s some kind of measure of how well the newspaper has converted its circulation into a social media following.</p>
<p>You can clearly see national papers like the NYT and Washington Post rise above the rest, but for others like USA Today it’s surprising that with a circulation of about 1.7M, they have comparatively few — only 514k — Twitter followers. This may say something about the audience of that paper and whether that audience is online and using social media. For instance, Pew has <a href="http://pewinternet.org/Reports/2013/Social-media-users/Social-Networking-Site-Users/Demo-portrait.aspx">reported stats</a> that suggest that people over the age of 50 use Twitter at a much lower than average rate. Another possible explanation is that a lot of the USA Today circulation is vapor; I can’t remember how many times I’ve stayed at a hotel where USA Today was left for me by default, only to be left behind unread. Finally, maybe USA Today is just not leading an effective social strategy and they need to get better about reaching, and appealing to, the social media audience.</p>
<p>There are some metro papers like NY Post and LA Times that also have decent ratios, indicating they’re addressing a fairly broad national or regional audience with respect to their circulation. But the real winners in the social world are NYT and WashPost, and maybe WSJ to some extent. And in this game of web scale audiences, the big will only get bigger as they figure out how to transcend their own limited geographies and expand into the social landscape.</p>
<p><a href="http://www.nickdiakopoulos.com/wp-content/uploads/2013/04/newspaper-graph.png"><img class="size-large wp-image-951 aligncenter" alt="newspaper graph" src="http://www.nickdiakopoulos.com/wp-content/uploads/2013/04/newspaper-graph-944x1024.png" width="620" height="672" /></a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.nickdiakopoulos.com/2013/04/03/how-does-newspaper-circulation-relate-to-twitter-following/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Neolithic Journalists? Influence Engines? Narrative Analytics? Some Thoughts on C+J</title>
		<link>http://www.nickdiakopoulos.com/2013/02/15/neolithic-journalists-influence-engines-narrative-analytics/</link>
		<comments>http://www.nickdiakopoulos.com/2013/02/15/neolithic-journalists-influence-engines-narrative-analytics/#comments</comments>
		<pubDate>Fri, 15 Feb 2013 14:21:33 +0000</pubDate>
		<dc:creator>Nick</dc:creator>
				<category><![CDATA[computational journalism]]></category>

		<guid isPermaLink="false">http://www.nickdiakopoulos.com/?p=882</guid>
		<description><![CDATA[A few weeks ago now was the 2nd Computation + Journalism Symposium at Georgia Tech, which I helped organize and program. I wrote up a few reflections on things that jumped out at me from the meeting. Check them out on Nieman Lab.]]></description>
				<content:encoded><![CDATA[<p>A few weeks ago now was the 2nd <a href="http://computation-and-journalism.com/symposium2013/">Computation + Journalism Symposium</a> at Georgia Tech, which I helped organize and program. I wrote up a few reflections on things that jumped out at me from the meeting. Check them out on <a href="http://www.niemanlab.org/2013/02/finding-tools-vs-making-tools-discovering-common-ground-between-computer-science-and-journalism/">Nieman Lab</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.nickdiakopoulos.com/2013/02/15/neolithic-journalists-influence-engines-narrative-analytics/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Aha! Brainstorming App</title>
		<link>http://www.nickdiakopoulos.com/2013/01/19/aha-brainstorming-app/</link>
		<comments>http://www.nickdiakopoulos.com/2013/01/19/aha-brainstorming-app/#comments</comments>
		<pubDate>Sat, 19 Jan 2013 17:03:37 +0000</pubDate>
		<dc:creator>Nick</dc:creator>
				<category><![CDATA[computational journalism]]></category>

		<guid isPermaLink="false">http://www.nickdiakopoulos.com/?p=874</guid>
		<description><![CDATA[In April 2012 I published a whitepaper on Cultivating Innovation in Computational Journalism with the CUNY Tow-Knight Center for Entrepreneurial Journalism. Jeff Jarvis wrote about it on the Tow-Knight blog, and the Nieman Lab even covered it. Part of the paper developed a structured brainstorming activity called &#8220;Aha!&#8221; to help students and news industry professionals in thinking more about ways to combine <a href="http://www.nickdiakopoulos.com/2013/01/19/aha-brainstorming-app/"> read more <span class="meta-nav">&#187;</span></a>]]></description>
				<content:encoded><![CDATA[<p>In April 2012 I published a whitepaper on <a href="http://www.nickdiakopoulos.com/wp-content/uploads/2012/04/diakopoulos_whitepaper_systematicinnovation.pdf">Cultivating Innovation in Computational Journalism</a> with the CUNY <a href="http://towknight.org/">Tow-Knight Center</a> for Entrepreneurial Journalism. Jeff Jarvis wrote about it on the <a href="http://towknight.org/research/newopps/">Tow-Knight blog</a>, and the <a href="http://www.niemanlab.org/2012/04/a-new-framework-for-innovation-in-journalism-how-a-computer-scientist-would-do-it/">Nieman Lab even covered it</a>.</p>
<p>Part of the paper developed a structured brainstorming activity called &#8220;Aha!&#8221; to help students and news industry professionals in thinking more about ways to combine ideas from technology, information science, user needs, and journalistic goals into useful new news products and services. We produced a printed deck of cards with different concepts that people could re-combine, and you can still <a href="https://docs.google.com/spreadsheet/viewform?formkey=dHFHV2lkem1MYV9RM0x4T2Y0Q1FqZHc6MQ#gid=0">get these cards</a> from CUNY.</p>
<p>But really the Aha! Brainstorming activity was begging to be made into an app, which is <a href="https://itunes.apple.com/us/app/aha-brainstorming/id592869307#">now available on the Apple App Store</a>. The app has the advantages that you can augment the re-combinable concepts, you can audio record your brainstorming sessions, take and store photos of any notes you scribble down about your ideas, and share the whole thing via email with your colleagues. If you have an iDevice be sure to <a href="https://itunes.apple.com/us/app/aha-brainstorming/id592869307#">check it out</a>!</p>
]]></content:encoded>
			<wfw:commentRss>http://www.nickdiakopoulos.com/2013/01/19/aha-brainstorming-app/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Understanding bias in computational news media</title>
		<link>http://www.nickdiakopoulos.com/2012/12/10/understanding-bias-in-computational-news-media/</link>
		<comments>http://www.nickdiakopoulos.com/2012/12/10/understanding-bias-in-computational-news-media/#comments</comments>
		<pubDate>Tue, 11 Dec 2012 02:17:26 +0000</pubDate>
		<dc:creator>Nick</dc:creator>
				<category><![CDATA[computational journalism]]></category>

		<guid isPermaLink="false">http://www.nickdiakopoulos.com/?p=859</guid>
		<description><![CDATA[Just a quick pointer to an article I wrote for Nieman Lab exploring some of the ways in which algorithms serve to introduce bias into news media. Different kind of writing than my typical academic-ese, but fun.]]></description>
				<content:encoded><![CDATA[<p>Just a quick <a href="http://www.niemanlab.org/2012/12/nick-diakopoulos-understanding-bias-in-computational-news-media/">pointer to an article</a> I wrote for Nieman Lab exploring some of the ways in which algorithms serve to introduce bias into news media. Different kind of writing than my typical academic-ese, but fun.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.nickdiakopoulos.com/2012/12/10/understanding-bias-in-computational-news-media/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Mobile Gaming Summit 2012</title>
		<link>http://www.nickdiakopoulos.com/2012/10/19/mobile-gaming-summit-2012/</link>
		<comments>http://www.nickdiakopoulos.com/2012/10/19/mobile-gaming-summit-2012/#comments</comments>
		<pubDate>Fri, 19 Oct 2012 23:31:01 +0000</pubDate>
		<dc:creator>Nick</dc:creator>
				<category><![CDATA[games]]></category>
		<category><![CDATA[mobile]]></category>
		<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://www.nickdiakopoulos.com/?p=844</guid>
		<description><![CDATA[I have recently been getting more into mobile design and development and so was excited to attend the Mobile Gaming Summit in New York today. It was a well attended event, with what seemed like dozens of presenters from top mobile studios sharing tips on everything from user acquisition to design, mobile analytics, cross-platform development, <a href="http://www.nickdiakopoulos.com/2012/10/19/mobile-gaming-summit-2012/"> read more <span class="meta-nav">&#187;</span></a>]]></description>
				<content:encoded><![CDATA[<p>I have recently been getting more into mobile design and development and so was excited to attend the <a href="http://alumni.mogasummit.com/conference-seminar-agenda/">Mobile Gaming Summit</a> in New York today. It was a well attended event, with what seemed like dozens of presenters from top mobile studios sharing tips on everything from user acquisition to design, mobile analytics, cross-platform development, finance, and social. What I wanted to share here quickly were some of the resources that were mentioned at the summit because I think they would be useful to any mobile studio / developer who&#8217;s just starting out (noobs like me!). So, by topic, here are some services to check out:</p>
<ul>
<li><strong>Ad Platforms for user acquisition</strong>
<ul>
<li><a href="http://www.chartboost.com/web">ChartBoost</a> (geared towards app cross-promotions)</li>
<li><a href="http://www.millennialmedia.com/">Millenial Media</a> (banner ads)</li>
<li><a href="http://www.google.com/ads/admob/">AdMob</a> (google&#8217;s mobile ad platform)</li>
<li>Facebook&#8217;s new <a href="https://developers.facebook.com/docs/tutorials/mobile-app-ads/">ads for apps</a></li>
<li><a href="http://www.tapjoy.com/">Tapjoy</a> (another mobile ad platform)</li>
</ul>
</li>
<li><strong>Analytics</strong>
<ul>
<li><a href="http://www.flurry.com/index.html">Flurry</a> (free analytics platform to help you understand how users are using your app)</li>
<li><a href="http://beesandpollen.com/Default.aspx">Bees and Pollen</a> (analytics to help optimize the user experience based on the user)</li>
<li><a href="http://apsalar.com/">Apsalar</a></li>
</ul>
</li>
<li><strong>Cross-Platform Technologies</strong>
<ul>
<li><a href="http://www.coronalabs.com/">Corona</a> (uses a language called Lua that I&#8217;ve never heard of)</li>
<li><a href="http://www.madewithmarmalade.com/">Marmelade</a> (program in c++, deploy to iOS, Android, xbox, etc.)</li>
<li><a href="http://phonegap.com/">Phone Gap</a> (program in javascript, HTML, CSS)</li>
<li><a href="http://unity3d.com/unity/">Unity</a> (geared toward 3D games)</li>
</ul>
</li>
</ul>
<p>In general I was impressed with the amount of data driven design going on in the mobile apps / games space and how the big studios are really optimizing for attention, retention, and monetization by constantly tweaking things.</p>
<p>Other tips that were shared included things like: use Canada as a test market to work out kinks in your apps before you launch in the larger U.S. market; concentrate marketing efforts / budget in a short period of time to attain the highest rank in the app store as this drives more organic growth; the industry is heavily moving towards a free-to-play model with monetization done with in-app purchases or advertising.</p>
<p>In the next few weeks I&#8217;ll be excited to try out some of these services with my new app, <strong><a href="http://bit.ly/SmfzAP">Many Faces</a></strong>, which launched a couple weeks ago. I think it&#8217;s all about the user-acquisition / marketing at this point &#8230;</p>
]]></content:encoded>
			<wfw:commentRss>http://www.nickdiakopoulos.com/2012/10/19/mobile-gaming-summit-2012/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Comment Readers Want Relevance!</title>
		<link>http://www.nickdiakopoulos.com/2012/10/09/comment-readers-want-relevance/</link>
		<comments>http://www.nickdiakopoulos.com/2012/10/09/comment-readers-want-relevance/#comments</comments>
		<pubDate>Tue, 09 Oct 2012 15:12:34 +0000</pubDate>
		<dc:creator>Nick</dc:creator>
				<category><![CDATA[commenting]]></category>
		<category><![CDATA[computational journalism]]></category>
		<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://www.nickdiakopoulos.com/?p=824</guid>
		<description><![CDATA[A couple years ago now I wrote a paper about the quality of comments on online news stories. For the paper I surveyed a number of commenters on sacbee.com about their commenting experience on that site. One of the aspects of the experience that users complained about was that comments were often off-topic: that comments weren&#8217;t germane, <a href="http://www.nickdiakopoulos.com/2012/10/09/comment-readers-want-relevance/"> read more <span class="meta-nav">&#187;</span></a>]]></description>
				<content:encoded><![CDATA[<p style="text-align: left;">A couple years ago now I wrote a <a href="http://www.nickdiakopoulos.com/wp-content/uploads/2007/05/pr220-diakopoulos.pdf">paper</a> about the quality of comments on online news stories. For the paper I surveyed a number of commenters on <a href="http://www.sacbee.com/">sacbee.com</a> about their commenting experience on that site. One of the aspects of the experience that users complained about was that comments were often <strong>off-topic</strong>: that comments weren&#8217;t germane, or relevant, to the conversation or to the article to which they were attached. This isn&#8217;t surprising, right? If you&#8217;ve ever read into an online comment thread you know there&#8217;s a lot of irrelevant things that people are posting.</p>
<p style="text-align: left;">It stands to reason then that if we can make news comments more <strong>relevant</strong> then people might come away more satisfied from the online commenting experience; that they might be more apt to read and find and learn new things if the signal to noise ratio was a bit higher. The point of my post here is to show you that there&#8217;s a straightforward and easy-to-implement way to provide this relevance that coincides with both users&#8217; and editors notions of &#8220;quality comments&#8221;.</p>
<p style="text-align: left;">I collected data in July via the New York Times API, including 370 articles and 76,086 comments oriented around the topic of climate change. More specifically I searched for articles containing the phrase &#8220;climate change&#8221; and then collected all articles which had comments (since not all NYT articles have comments). For each comment I also had a number of pieces of metadata, including: (1) the number of times the comment was &#8220;recommended&#8221; by someone upvoting it, and (2) whether the comment was an &#8220;editor&#8217;s selection&#8221;. Both of these ratings indicate &#8220;quality&#8221;; one from the users&#8217; point of view and the other from the editors&#8217;. And <strong>both of these ratings in fact correlate with a simple measure of relevance</strong> as I&#8217;ll describe next.</p>
<p style="text-align: left;">In the dataset I collected I also had the full text of both the comments and the articles. Using some basic IR ninjitsu I then normalized the text, stop-worded it (using <a href="http://nltk.org/">NLTK</a>), and stemmed the words using the <a href="http://en.wikipedia.org/wiki/Stemming">Porter stemming</a> algorithm. This leaves us with cleaner, less noisy text to work with. I then computed relevance between each comment and its parent article by taking the dot product (cosine distance) of unigram feature vectors of tf-idf scores. For the sake of the tf-idf scores, each comment was considered a document, and only unigrams that occurred at least 10 times in the dataset were considered in the feature vectors (again to reduce noise). The outcome of this process is that for each comment-article pair I now had a score (between 0 and 1) representing similarity in the words used in the comment and those used in the article. So a score of 1 would indicate that the comment and article were using identical vocabulary whereas a score of 0 would indicate that the comment and article used no words in common.</p>
<p style="text-align: left;">So, what&#8217;s interesting is that this simple-to-compute metric for relevance is highly correlated to the recommendation score and editor&#8217;s selection ratings mentioned above. The following graph shows the average comment to article similarity score over each recommendation score up to 50 (red dots), and a moving average trend line (blue).</p>
<p><a href="http://www.nickdiakopoulos.com/wp-content/uploads/2012/10/avg-sim-to-article-graph.png"><img class="aligncenter size-full wp-image-830" title="avg sim to article graph" src="http://www.nickdiakopoulos.com/wp-content/uploads/2012/10/avg-sim-to-article-graph.png" alt="" width="595" height="400" /></a></p>
<p style="text-align: left;">As you get into the higher recommendation scores there&#8217;s more variance because it&#8217;s averaging less values. But you can see a clear trend that as the number of recommendation ratings increases so too does the average comment to article similarity. In statistical terms, Pearson&#8217;s correlation is r=0.58 (p &lt; .001). There&#8217;s actually a fair amount of variance around each of those means though, and the next graph shows the distribution of similarity values for each recommendation score. If you turn your head side-ways each column is a histogram of the similarity values.</p>
<p style="text-align: center;"><a href="http://www.nickdiakopoulos.com/wp-content/uploads/2012/10/similarity-distribution.png"><img class="aligncenter  wp-image-832" title="similarity distribution" src="http://www.nickdiakopoulos.com/wp-content/uploads/2012/10/similarity-distribution.png" alt="" width="613" height="311" /></a></p>
<p style="text-align: left;">We can also look at the relationship between comment to article similarity in terms of editors&#8217; selections, certain comments that have been elevated  in the user interface by editors. The average similarity for comments that are not editors&#8217; selections is 0.091 (N=73,723) whereas for comments that are editors&#8217; selections the average is 0.118 (N=2363). A t-test between these distributions indicates that the difference in means is statistically significant (p &lt; .0001). So what we learn from this is that <strong>editors&#8217; criteria for selecting comments also correlates to the similarity in language used between the comment and article</strong>.</p>
<p style="text-align: left;">The implications of these findings are relatively straightforward. A simple metric of similarity (or relevance) correlates well to notions of &#8220;recommendation&#8221; and editorial selection. This metric could be surfaced in a commenting system user interface to allow users to rank comments based on how similar they are to an article, without having to wait for recommendation scores or editorial selections. In the future I&#8217;d like to look into ways to assess how predicative such metrics are in terms of recommendation scores, as well as try out different metrics of similarity, like <a href="http://staff.science.uva.nl/~tsagias/?p=185">KL divergence</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.nickdiakopoulos.com/2012/10/09/comment-readers-want-relevance/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Many Faces Photo Collages</title>
		<link>http://www.nickdiakopoulos.com/2012/10/02/many-faces-photo-collages/</link>
		<comments>http://www.nickdiakopoulos.com/2012/10/02/many-faces-photo-collages/#comments</comments>
		<pubDate>Tue, 02 Oct 2012 22:09:00 +0000</pubDate>
		<dc:creator>Nick</dc:creator>
				<category><![CDATA[app]]></category>
		<category><![CDATA[collage]]></category>

		<guid isPermaLink="false">http://www.nickdiakopoulos.com/?p=812</guid>
		<description><![CDATA[I’ve been interested in photo collages for years. Those who know me well have likely seen my Many Faces from a few years ago (pictured above), which was inspired by some improv classes I was taking at the time. It was fun to put together, but also very time-consuming. A couple months ago I realized it <a href="http://www.nickdiakopoulos.com/2012/10/02/many-faces-photo-collages/"> read more <span class="meta-nav">&#187;</span></a>]]></description>
				<content:encoded><![CDATA[<p style="text-align: center;"><a href="http://www.nickdiakopoulos.com/wp-content/uploads/2012/10/many-faces-of-nick.jpeg"><img class="aligncenter  wp-image-813" title="many faces of nick" src="http://www.nickdiakopoulos.com/wp-content/uploads/2012/10/many-faces-of-nick.jpeg" alt="" width="461" height="614" /></a></p>
<p>I’ve been interested in photo collages for years. Those who know me well have likely seen my Many Faces from a few years ago (pictured above), which was inspired by some improv classes I was taking at the time. It was fun to put together, but also very time-consuming. A couple months ago I realized it would be fun to turn the concept into an app that could help quickly and easily make ManyFace-esque collages. I’m happy to say that the app has <a href="http://bit.ly/SmfzAP">launched in the app store today</a>. For a bit more info on the app you can also visit <a href="http://www.manyfacesapp.com/">the website</a>. Please check it out, and if you like it, share your ManyFaces on <a href="http://twitter.com/manyfacesapp">twitter</a> or <a href="http://www.facebook.com/manyfacesapp">facebook</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.nickdiakopoulos.com/2012/10/02/many-faces-photo-collages/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Review: The Functional Art</title>
		<link>http://www.nickdiakopoulos.com/2012/09/30/review-the-functional-art/</link>
		<comments>http://www.nickdiakopoulos.com/2012/09/30/review-the-functional-art/#comments</comments>
		<pubDate>Sun, 30 Sep 2012 19:05:27 +0000</pubDate>
		<dc:creator>Nick</dc:creator>
				<category><![CDATA[journalism]]></category>
		<category><![CDATA[review]]></category>
		<category><![CDATA[visualization]]></category>

		<guid isPermaLink="false">http://www.nickdiakopoulos.com/?p=801</guid>
		<description><![CDATA[I don&#8217;t often write reviews of books. But I can&#8217;t resist offering some thoughts on The Functional Art, a new book by Alberto Cairo aimed at teaching the basics of information graphics and visualization, mostly because I think it&#8217;s fantastic, but also because I think there are a few areas where I&#8217;d like to see <a href="http://www.nickdiakopoulos.com/2012/09/30/review-the-functional-art/"> read more <span class="meta-nav">&#187;</span></a>]]></description>
				<content:encoded><![CDATA[<p>I don&#8217;t often write reviews of books. But I can&#8217;t resist offering some thoughts on <a href="http://www.amazon.com/Functional-Art-introduction-information-visualization/dp/0321834739">The Functional Art</a>, a new book by <a href="http://www.visualopolis.com/">Alberto Cairo</a> aimed at teaching the basics of information graphics and visualization, mostly because I think it&#8217;s fantastic, but also because I think there are a few areas where I&#8217;d like to see a future edition expound.</p>
<p>Basically I see this as the <strong>new</strong> <strong>default book for teaching journalists how to do infographics and visualization. </strong>If you&#8217;re a student of journalism, or just interested in developing better visual communication skills I think this book has a ton to offer and is very accessible. But what&#8217;s really amazing is that the book also offers a lot to people <em>already</em> in the field (e.g. designers or computer scientists) who want to learn more about the journalistic perspective on visual storytelling. There are nuggets of wisdom sprinkled throughout the book, informed by Cairo&#8217;s years of journalism experience. And the diagrams and models of thinking about things like the designer-user relationships or dimensions along which graphics vary adds some much needed structure that forms a framework for thinking about and characterizing information graphics.</p>
<p>Probably the most interesting aspect of the book for someone already doing or studying visualization is the last set of chapters which detail, through a series of interviews with practitioners, how &#8220;the sausage is made.&#8221; Exposing <em>process</em> in this way is extremely valuable for learning how these things get put together. This exposition continues on the included DVD in which additional production artifacts, sketchs, and mockups form a show-and-tell. And it&#8217;s not just about artifacts; the interviews also explore things like how teams are composed in order to facilitate collaborative production.</p>
<p>One of the things I appreciated most about the book is that, in light of its predominant focus on practice, Cairo fearlessly  reads into and then translates research results into practical advice, offering an evidence-based rationale for design decisions. We need more of that kind of thinking, for all sorts of practices.</p>
<p>I have only a few critiques of the book. The first is straightforward: I wish that the book was printed in a larger format because some of the examples shown in the book are screaming for more breathing space. I would have also liked to see the computer science perspective represented a bit more thoroughly in the book &#8211; this can for instance serve to enhance and add depth to the discussion about interactivity with visualizations. My only other critique of the book is about critique itself. What I mean is that the idea of critique is sprinkled throughout the book, but I&#8217;d almost like to see it elevated to the status of having its own chapter. Learning the skills of critique and the thought process involved is an essential aspect of learning to be a graphics communication intellectual and thoughtful practitioner. And it can and should be taught in a way that students learn a systematic way for thinking and analyzing benefits and tradeoffs. Cairo has the raw material to do this in the book, but I wish it were formalized in some way that lent it the attention it deserves. Such a method could even be illustrated using some of the interviewees&#8217; many examples.</p>
<p>&nbsp;</p>
]]></content:encoded>
			<wfw:commentRss>http://www.nickdiakopoulos.com/2012/09/30/review-the-functional-art/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Does Local Journalism Need to Be Locally Sustainable?</title>
		<link>http://www.nickdiakopoulos.com/2012/09/12/does-local-journalism-need-to-be-locally-sustainable/</link>
		<comments>http://www.nickdiakopoulos.com/2012/09/12/does-local-journalism-need-to-be-locally-sustainable/#comments</comments>
		<pubDate>Thu, 13 Sep 2012 03:50:31 +0000</pubDate>
		<dc:creator>Nick</dc:creator>
				<category><![CDATA[business models]]></category>
		<category><![CDATA[journalism]]></category>
		<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://www.nickdiakopoulos.com/?p=755</guid>
		<description><![CDATA[The last couple of weeks have seen the rallying cries of journalists echo online as they call for support of the Homicide Watch Kickstarter campaign. The tweets &#8220;hit the fan&#8221; so to speak, Clay Shirky implored us to not let the project die, and David Carr may have finally tipped the campaign with his editorial questioning foundations&#8217; <a href="http://www.nickdiakopoulos.com/2012/09/12/does-local-journalism-need-to-be-locally-sustainable/"> read more <span class="meta-nav">&#187;</span></a>]]></description>
				<content:encoded><![CDATA[<p>The last couple of weeks have seen the rallying cries of journalists echo online as they call for support of the <a href="http://homicidewatch.org/">Homicide Watch</a> <a href="http://www.kickstarter.com/projects/1368665357/a-one-year-student-reporting-lab-within-homicide-w">Kickstarter campaign</a>. The tweets &#8220;hit the fan&#8221; so to speak, Clay Shirky <a href="http://www.shirky.com/weblog/2012/09/save-homicide-watch/">implored</a> us to not let the project die, and David Carr may have finally tipped the campaign with his <a href="http://www.nytimes.com/2012/09/10/business/media/homicide-watch-web-site-venture-struggles-to-survive.html?_r=2&amp;pagewanted=all">editorial</a> questioning foundations&#8217; support for Big News at the expense of funding more nimble start-ups like Homicide Watch.</p>
<p>It seems like a good idea too &#8211; providing more coverage of a civically important issue &#8211; and one that&#8217;s underserved to boot. <strong>But</strong> <strong>is it sustainable?</strong> As Jeff Sonderman at Poynter <a href="http://www.poynter.org/latest-news/mediawire/187881/homicide-watch-will-live-on-as-fundraising-campaign-reaches-goal/">wrote</a> about the successful Kickstarter campaign, &#8220;The $40,000 is not a sustainable endowment, just a stopgap to fund intern staffing for one year.&#8221;</p>
<p>For Homicide Watch to be successful at franchising to other cities (i.e. by selling a platform) each of those franchises itself needs to be sustained. This implies that, on a local level, either enough advertising buy-in, local media support, or crowdfunding (a la Kickstarter) would need to be generated to pay those pesky labor costs, the most expensive cost in most content businesses.</p>
<p>Here&#8217;s the thing. Even though Homicide Watch was funded, it struggled to get there, mostly surviving on the good-natured altruism of the media elite. I doubt that local franchises will be able to repeat that trick. Here&#8217;s why: <strong>most of the donors who gave to Homicide Watch were from elsewhere in the U.S.</strong> <strong>(68%)</strong> or from other countries (10%). Only  22% of donors where from DC, Virginia, or Maryland (see below for details on where the numbers come from). But this means that people local to Washington, DC, those who ostensibly would have the most to gain from a project like this, barely made up more than a fifth of the donors. Other local franchises probably couldn&#8217;t count on the kind of national attention that the media elite brought to the Homicide Watch funding campaign, nor could they count on the national interest afforded to the nation&#8217;s capital. </p>
<p>You might argue that for something like this to flourish it needs local support, from the people who would get the real utility of the innovation. At least Homicide Watch got a chance to prove itself out, but we&#8217;ll have to wait to see if it can make a sustainable business and provide real information utility at a local level. The numbers at this stage would seem to suggest it&#8217;s got an uphill battle ahead of it.</p>
<p><strong>Stats</strong><br />
Here&#8217;s how I got the stats I quoted above. I made a <a href="https://scraperwiki.com/scrapers/homicide_watch/">Scraper wiki script</a> to collect all of the donors on the Homicide Watch Kickstarter page (there were 1,102 as of about noon on 9/12). Of those 1102, 270 donors had geographic information (city, state, country). The stats quoted above are based on those 270 geotagged donors. Of course, that&#8217;s only about 25% of the total donors, so an assumption that I make above is that the 75%, the non-geotagged donors, follow a similar geographic distribution (and donation magnitude distribution) as the geotagged ones. I can&#8217;t think of a reason that assumption might not be true. For kicks I put the data up on Google Fusion Tables (it&#8217;s so awful, please, someone fix that!) so <a target="blank" href="https://www.google.com/fusiontables/embedviz?viz=GVIZ&amp;t=MAP&amp;gco_region=US&amp;gco_dataMode=regions&amp;containerId=gviz_canvas&amp;q=select+gvizregion(col0)%2C+col1%2C+col0+from+1vW-UkVhLSOPnODauWNPSI7nJvlWe7dSnLCwLqRM&amp;qrs=+where+gvizregion(col0)+%3E%3D+&amp;qre=+and+gvizregion(col0)+%3C%3D+&amp;qe=+limit+32&amp;width=500&amp;height=300">here&#8217;s a map of what states donors come from</a>.</p>
<p><a target="blank" href="https://www.google.com/fusiontables/embedviz?viz=GVIZ&amp;t=MAP&amp;gco_region=US&amp;gco_dataMode=regions&amp;containerId=gviz_canvas&amp;q=select+gvizregion(col0)%2C+col1%2C+col0+from+1vW-UkVhLSOPnODauWNPSI7nJvlWe7dSnLCwLqRM&amp;qrs=+where+gvizregion(col0)+%3E%3D+&amp;qre=+and+gvizregion(col0)+%3C%3D+&amp;qe=+limit+32&amp;width=500&amp;height=300"><img class="aligncenter size-full wp-image-792" title="homicidewatchdonors" src="http://www.nickdiakopoulos.com/wp-content/uploads/2012/09/homicidewatchdonors.png" alt="" width="552" height="345" /></a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.nickdiakopoulos.com/2012/09/12/does-local-journalism-need-to-be-locally-sustainable/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>User Expectations, Prototypes, and Professionals</title>
		<link>http://www.nickdiakopoulos.com/2012/06/28/user-expectations-prototypes-and-professionals/</link>
		<comments>http://www.nickdiakopoulos.com/2012/06/28/user-expectations-prototypes-and-professionals/#comments</comments>
		<pubDate>Thu, 28 Jun 2012 16:12:41 +0000</pubDate>
		<dc:creator>Nick</dc:creator>
				<category><![CDATA[research]]></category>
		<category><![CDATA[organizational structure]]></category>

		<guid isPermaLink="false">http://www.nickdiakopoulos.com/?p=735</guid>
		<description><![CDATA[Back in the 1980s a Japanese professor by the name of Noriaki Kano developed a model of product development and what we today might call user experience. This model is now known as the Kano model after the eponymous professor. I’m not going to go into details on the model but one of the interesting <a href="http://www.nickdiakopoulos.com/2012/06/28/user-expectations-prototypes-and-professionals/"> read more <span class="meta-nav">&#187;</span></a>]]></description>
				<content:encoded><![CDATA[<p>Back in the 1980s a Japanese professor by the name of Noriaki Kano developed a model of product development and what we today might call user experience. This model is now known as the <a href="http://en.wikipedia.org/wiki/Kano_model">Kano model</a> after the eponymous professor. I’m not going to go into details on the model but one of the interesting aspects that it describes is the notion of “<strong>threshold attributes</strong>” which are basically the “must-have” features that a product needs to incorporate in order to meet customer needs. If a product hits the market and doesn’t have these features then users will not adopt the product because it provides a lousy or dissatisfying experience. Moreover, the model predicts that over time, more exciting and newer features become expected in subsequent generations of products.</p>
<p>I suspect that many of the high-profile usable products on the market (e.g. from the likes of Apple and Google) are raising the bar for what users expect in a new product. Can you imagine going from a retina display back to a low-res screen? And isn’t it annoying, just a little bit, when search suggestions don’t populate as you type? I was using <a href="http://www-958.ibm.com/software/data/cognos/manyeyes/">Many Eyes</a> earlier this week to do some visualization and was frustrated that the conventions and technologies with which that UI were built are now about 5 or 6 years old.</p>
<p>Essentially the bar is always going up in terms of <a href="http://www.usereffect.com/topic/expectations-and-usability-introduction">what users expect</a>. These features in turn put pressure on any prototype builder (including those at start-ups, or in academia) to meet those expectations. Basic usability (e.g. undo/redo, error handling, clear labeling), an aesthetic design, and solid information architecture and navigation seem to be givens. And Google has trained us all to expect low latency in a web-app, which can demand a lot of engineering and systems-building time. According to Kano, if a product doesn’t have these kinds of things we’re bound to notice and have a less awesome user experience.</p>
<p>If you assume that threshold attributes are indeed important (and a moving target), what does this mean in terms of getting prototypes and products built? I’ll address the academic space since that’s the realm I’m most familiar with. I think it’s particularly hard to provision threshold attributes in an academic setting because of (1) limited human resources (i.e. possibly just 1 graduate student for a year), (2) limited student expertise (i.e. in visual or interaction design or user research &#8211; they’re there to learn!), and (3) different incentives (i.e. novelty tends to get the emphasis which makes building new features more important than implementing threshold attributes to support a baseline user experience).</p>
<p>For a certain type of research I think it makes sense to organize the work such that a graduate student and faculty work more closely with professionals such as programmers or interaction designers and UX researchers. The extra polish and experience that professionals bring to the table would enable a prototype to reach those threshold attributes, while allowing researchers to focus on identifying / implementing new features to evaluate, or deciding on what data to gather and analyze during a deployment so that new knowledge is produced. You could imagine interaction designers or UX researchers being part-time on a number of projects, essentially becoming internal consultants on university projects. The types of prototypes produced might also have the added advantage of being further along the path to products and less &#8220;<a href="http://matt-welsh.blogspot.com/2012/06/startup-university.html">throw-away</a>&#8221; thus helping the university-incubator. I think if universities really want to incubate prototypes and see those prototypes turning into products they need to reorganize work to include more professionals in the process. These professionals should not be paid for out of raw research budgets, but from university overhead.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.nickdiakopoulos.com/2012/06/28/user-expectations-prototypes-and-professionals/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Systems Papers at CHI &#8211; Some Data</title>
		<link>http://www.nickdiakopoulos.com/2012/06/27/systems-papers-at-chi-some-data/</link>
		<comments>http://www.nickdiakopoulos.com/2012/06/27/systems-papers-at-chi-some-data/#comments</comments>
		<pubDate>Wed, 27 Jun 2012 16:11:11 +0000</pubDate>
		<dc:creator>Nick</dc:creator>
				<category><![CDATA[CHI]]></category>
		<category><![CDATA[conference]]></category>

		<guid isPermaLink="false">http://www.nickdiakopoulos.com/?p=723</guid>
		<description><![CDATA[Back in 2009 James Landay wrote a thoughtful piece on some of the challenges associated with publishing systems research at a venue like CHI (or UIST). He concluded that the incentive structure just isn&#8217;t there to support the greater degree of time and effort needed to build and evaluate systems, especially when compared to other <a href="http://www.nickdiakopoulos.com/2012/06/27/systems-papers-at-chi-some-data/"> read more <span class="meta-nav">&#187;</span></a>]]></description>
				<content:encoded><![CDATA[<p>Back in 2009 James Landay wrote a <a href="http://dubfuture.blogspot.com/2009/11/i-give-up-on-chiuist.html">thoughtful piece</a> on some of the challenges associated with publishing systems research at a venue like CHI (or UIST). He concluded that the incentive structure just isn&#8217;t there to support the greater degree of time and effort needed to build and evaluate systems, especially when compared to other types of research which require less time but still get you the line-item on the CV.</p>
<p>I wanted to try to back up some of this thinking with data, so I wrote a <a href="https://scraperwiki.com/">ScraperWiki</a> script to go out and harvest a corpus of previous CHI proceedings (you can edit the script or access the data I collected <a href="https://scraperwiki.com/scrapers/chi_abstracts/">here</a>). I scraped all paper titles, authors, and abstracts going back to 1999 (the ACM DL changes their page format before then which is why I didn&#8217;t go back further). The dataset ended up being 2,498 papers over 14 years (1999-2012)</p>
<p>For the sake of the rest of the analysis I define &#8220;systems papers&#8221; as the subset of papers with an abstract that uses the word &#8220;system&#8221;. I know it&#8217;s not perfect (most likely some false positives in there), but it&#8217;s a reasonable proxy and I didn&#8217;t have time to go through all 2.5k papers by hand.</p>
<p>One question we might ask is: <strong>Do systems papers really require more effort than other papers at CHI?</strong> If they take too much effort, a rational researcher might choose to spend time on other types of contributions. In the following graph we can see that, in the last 5 years, systems papers have indeed averaged more authors per paper than other papers at CHI (and an assumption is that more authors implies more overall work, though this of course doesn&#8217;t always hold). There have also been years in the past when non-systems papers have had more authors on average (e.g. 2001 or 2002). Overall the number of authors for systems papers over the period (M=3.61, SD 0.37) is slightly higher than that for non-systems papers (M=3.43, SD=0.21), and the standard deviation is also a bit higher indicating there is more variance in the number of authors of systems papers. The difference in means isn&#8217;t statistically significant (p=.15). So it seems there is some (weak) evidence that systems papers do have more authors on average.</p>
<p><img class="aligncenter  wp-image-725" title="System Authors" src="http://www.nickdiakopoulos.com/wp-content/uploads/2012/06/System-Authors.png" alt="" width="586" height="342" /></p>
<p style="text-align: left;">Another question we might ask is: <strong>Is the relative amount of systems work published at CHI declining?</strong> To see this we can look at the graph below which shows the fraction of systems papers out of the total for each year. The average fraction of systems papers over the time period (1999-2012) is 0.36 (SD = 0.07). There&#8217;s a fair bit of variance with a low in 2007 and a high in 2003. In the last couple years the fraction of systems papers has been a tad below the mean, but still within one standard deviation. There&#8217;s no correlation between fraction and year. From this I think we can conclude that there&#8217;s no clear trend in fraction of systems papers being published at CHI. Moreover, the absolute number of systems papers has gone from 15 in 1999 to 60 in 2012, indicating fair growth in this segment of CHI papers. (It would be really interesting to analyze abstracts from all papers both accepted and rejected to see if there is a bias).</p>
<p style="text-align: left;"><a href="http://www.nickdiakopoulos.com/wp-content/uploads/2012/06/fraction-of-systems1.png"><img class="aligncenter  wp-image-729" title="fraction of systems" src="http://www.nickdiakopoulos.com/wp-content/uploads/2012/06/fraction-of-systems1-1024x591.png" alt="" width="558" height="321" /></a></p>
<p style="text-align: left;">While the cost of doing systems work in HCI may be higher (i.e. more co-authors needed), the fraction of systems work at CHI doesn&#8217;t seem to have been substantially affected over the course of the last 14 years. But it&#8217;s still easy to feel like all the action is happening in industry: new products are constantly hitting the market and start-ups and entrepreneurship and heavily covered by the tech press. The reality is that systems publishing is trucking along and also growing, but, I think, over time will represent a smaller and smaller fraction of the pie as prototyping becomes &#8220;mainstream&#8221; and knowledge of HCI continues to diffuse. That may be ok, as long as the research prototypes produced by the academy are sufficiently differentiated to what&#8217;s available and possible in the market.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.nickdiakopoulos.com/2012/06/27/systems-papers-at-chi-some-data/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Fact-Checking at Scale</title>
		<link>http://www.nickdiakopoulos.com/2012/06/12/fact-checking-at-scale/</link>
		<comments>http://www.nickdiakopoulos.com/2012/06/12/fact-checking-at-scale/#comments</comments>
		<pubDate>Tue, 12 Jun 2012 06:55:25 +0000</pubDate>
		<dc:creator>Nick</dc:creator>
				<category><![CDATA[computational journalism]]></category>
		<category><![CDATA[fact-checking]]></category>
		<category><![CDATA[information]]></category>
		<category><![CDATA[information quality]]></category>
		<category><![CDATA[journalism]]></category>

		<guid isPermaLink="false">http://www.nickdiakopoulos.com/?p=717</guid>
		<description><![CDATA[Note: this is cross-posted on the CUNY Tow-Knight Center for Entrepreneurial Journalism site.  Over the last decade there’s been a substantial growth in the use of Fact-Checking to correct misinformation in the public sphere. Outlets like Factcheck.org and Politifact tirelessly research and assess the accuracy of all kinds of information and statements from politicians or think-tanks. But <a href="http://www.nickdiakopoulos.com/2012/06/12/fact-checking-at-scale/"> read more <span class="meta-nav">&#187;</span></a>]]></description>
				<content:encoded><![CDATA[<p><em>Note: this is <a href="http://towknight.org/research/thinking/scaling-fact-checking/">cross-posted</a> on the CUNY Tow-Knight Center for Entrepreneurial Journalism site. </em></p>
<p>Over the last decade there’s been a<a href="http://newamerica.net/sites/newamerica.net/files/policydocs/The_Fact-checking_Universe_in_2012.pdf"> substantial growth in the use</a> of Fact-Checking to correct misinformation in the public sphere. Outlets like <a href="http://www.factcheck.org/">Factcheck.org</a> and <a href="http://www.politifact.com/">Politifact</a> tirelessly research and assess the accuracy of all kinds of information and statements from politicians or think-tanks. But a casual perusal of these sites shows that there are usually only 1 or 2 fact-checks per day from any given outlet. Fact-Checking is an intensive research process that demands considerable skilled labor and careful consideration of potentially conflicting evidence. <strong>In a task that’s so labor intensive, how can we scale it so that the truth is spread far and wide?</strong></p>
<p>Of late, Politifact has expanded by <a href="http://www.niemanlab.org/2011/09/a-truth-o-meter-franchised-politifact-places-its-bets-on-expanding-to-states/">franchising its operations to states</a> &#8211; essentially increasing the pool of trained professionals participating in fact-checking. It’s a good strategy, but I can think of at least a few others that would also grow the fact-checking pie: (1) sharpen the scope of what’s fact-checked so that attention is where it’s most impactful, (2) make use of volunteer, non-professional labor via crowdsourcing, and (3) automate certain aspects of the task so that professionals can work more quickly. In the rest of this post, I’ll flesh out each of these approaches in a bit more detail.</p>
<p><strong>Reduce Fact-Checking Scope </strong><br />
“I don’t get to decide which facts are stupid … although it would certainly save me a lot of time with this essay if I were allowed to make that distinction.” argues Jim Fingal in his epic fact-check struggle with artist-writer John D’Agata in <a href="http://www.amazon.com/The-Lifespan-Fact-John-DAgata/dp/0393340732/ref=sr_1_1?s=books&amp;ie=UTF8&amp;qid=1338472579&amp;sr=1-1">The Lifespan of a Fact</a>. Indeed, some of the things Jim checks are really absurd: did the subject take the stairs or the elevator, did he eat “potatoes” or “french fries”; these things don’t matter to the point of that essay, nor, frankly, to me as the reader.</p>
<p>Fact-checkers, particularly the über-thorough kind employed by magazines, are tasked with assessing the accuracy of <em>every</em> claim or factoid written in an article (See the <a href="http://www.amazon.com/Checkers-Bible-Sarah-Harrison-Smith/dp/0385721064/ref=sr_1_1?s=books&amp;ie=UTF8&amp;qid=1338473092&amp;sr=1-1">Fact Checker’s Bible</a> for more). This includes hard facts like names, stats, geography, and physical properties as well as what sources claim via a quotation, or what the author writes from notes. Depending on the nature of the claim some of it may be subjective, opinion-based, or anecdotal. All of this checking is meant to protect the reputation of the publication and of the writers. To maintain trust with the public. But it’s a lot to check and the imbalance between content volume and critical attention will only grow.</p>
<p>To economize their attention fact-checkers might better focus on overall <em>quality</em>; who cares if they’re “potatoes” or “french fries”?<em> </em>In information science <a href="http://onlinelibrary.wiley.com/doi/10.1002/asi.21447/abstract">studies</a>, the notion of <em>quality</em> can be defined as the “value or ‘fitness’ of the information <strong>to a specific purpose or use</strong>.” If quality is really what we’re after then fact-checking would be well-served and more efficacious if it focused the precious attention of fact-checkers on claims that have some utility. These are the claims that if they were false could impact the outcome of some event or an important decision. I’m not saying accuracy doesn’t matter, it does, but fact-checkers might focus more energy on information that impacts <em>decisions</em>. For health information this might involve spending more time researching claims that impact health-care options and choices; for finance it would involve checking information informing decisions about portfolios and investments. And for politics this involves checking information that is important for people’s voting decisions &#8211; something that the likes of Politifact already focus on.</p>
<p><strong>Increased Use of Volunteer Labor</strong><br />
Another approach to scaling fact-checking is to incorporate more non-professionals, the crowd, in the truth-seeking endeavor. This is something often championed by social media journalists like Andy Carvin, who see truth-seeking as an open process that can involve asking for (and then vetting) information from social media participants. Mathew Ingram <a href="http://gigaom.com/2012/05/16/twitter-and-reddit-as-crowdsourced-fact-checking-engines/">has written about</a> how platforms like Twitter and Reddit can act as crowdsourced fact-checking platforms. And there have been several efforts toward systematizing this, notably the <a href="http://www.pbs.org/mediashift/2010/11/crowdsourced-fact-checking-what-we-learned-from-truthsquad320.html">TruthSquad</a>, which invited readers to post links to factual evidence that supports or opposes a single statement. A professional journalist would then write an in-depth report based on their own research plus whatever research the crowd contributed. I will say I’m impressed with <a href="http://newstrust.net/quotes/33">the kind of engagement they got</a>, though sadly it’s not being actively run anymore.</p>
<p>But it’s important to step back and think about what the limitations of the crowd in this (or any) context really are. Graves and Glaisyer <a href="http://newamerica.net/sites/newamerica.net/files/policydocs/The_Fact-checking_Universe_in_2012.pdf">remind us </a>that we still don’t really know how much an audience can contribute via crowdsourced fact-checking. Recent information quality <a href="http://onlinelibrary.wiley.com/doi/10.1002/asi.21447/abstract">research by Arazy and Kopak </a>gives us some clues about what dimensions of quality may be more amenable to crowd contributions. In their study they looked at how consistent ratings of various wikipedia articles were along dimensions of accuracy, completeness, clarity, and objectivity. They found that, while none of these dimensions had particularly consistent ratings, completeness and clarity were more reliable than objectivity or accuracy. This is probably because it’s easier to use a heuristic or shortcut to assess completeness, whereas rating accuracy requires specialized knowledge or research skill. So, if we’re thinking about scaling fact-checking with a pro-am model <strong>we might have the crowd focus on aspects of completeness and clarity, but leave the difficult accuracy work to the professionals</strong>.</p>
<p><strong>#Winning with Automation</strong><br />
I’m not going to fool anyone by claiming that automation or aggregation will fully solve the fact-checking scalability problem. But there may be bits of it that can be automated, at least to a degree where it would make the life of a professional fact-checker easier or make their work go faster. An automated system could allow any page online to be quickly checked for misinformation. Violations could be flagged and highlighted, either for lack of corroboration or for controversy, or the algorithm could be run before publication so that a professional fact-checker could take a further crack at it.</p>
<p>Hypothetical statements, opinions and matters of taste, or statements resting on complex assumptions may be too hairy for computers to deal with. But we should be able to automatically both identify and check <em>hard-facts</em> and other things that are easily found in reference materials. The basic mechanic would be one of <em>corroboration</em>, a method often used by journalists and social scientists in truth-seeking. If we can find two (or more) independent sources that reinforce each other, and that are credible, we gain confidence in the truth-value of a claim. Independence is key, since political, monetary, legal, or other connections can taint or at least place contingencies on the value of corroborated information.</p>
<p>There have already been a<a href="http://dl.acm.org/citation.cfm?doid=1281192.1281309"> handful</a><a href="http://dl.acm.org/citation.cfm?doid=1871985.1872002"> of</a><a href="http://dl.acm.org/citation.cfm?doid=1148170.1148316"> efforts</a> in the computing research literature that have looked at how to do algorithmic corroboration. But there is still work to do to define adequate operationalizations so that computers can do this effectively. First of all, we need to define, identify, and extract the units that are to be corroborated. Computers need to be able to differentiate a factually stated claim from a speculative or hypothetical one, since only factual claims can really be meaningfully corroborated. In order to aggregate statements we then need to be able to match two claims together while taking into account different ways of saying similar things. This includes the challenge of context, the tiniest change in which can alter the meaning of a statement and make it difficult for a computer to assess the equivalence of statements. Then, the simplest aggregation strategy might consider the frequency of a statement as a proxy for its truth-value (the more sources that agree with statement X, the more we should believe it) but this doesn’t take into the account the <em>credibility</em> of the source or their other relationships, which also need to be enumerated and factored in. We might want algorithms to consider other dimensions such as the relevance and expertise of the source to the claim, the source’s originality (or lack thereof), the prominence of the claim in the source, and the source’s spatial or temporal proximity to the information. There are many challenges here!</p>
<p>Any automated corroboration method would rely on a corpus of information that acts as the basis for corroboration. Previous work like<a href="http://confront.intel-research.net/Dispute_Finder.html"> DisputeFinder</a> has looked at scraping or accessing known repositories such as Politifact or Snopes to jump-start a claims database, and other work like<a href="http://www.videolyzer.com/"> Videolyzer</a> has tried to leverage engaged people to provide structured annotations of claims. Others have proceeded by<a href="http://www.cs.rutgers.edu/~amelie/papers/2011/webCorrob_is11.pdf"> using the internet</a>as a massive corpus. But there could also be an opportunity here for news organizations, who already produce and have archives of lots of credible and trustworthy text (e.g. rigorously fact-checked magazines), to provide a corroboration service based on all of the claims embedded in those texts. Could news organizations even make money by syndicating their archives like this?</p>
<p>There are of course other challenges to fact-checking that also need to be surmounted, such as the user-interface for presentation or how to effectively syndicate fact-checks across different media. In this essay I’ve argued that scale is one of the key challenges to fact-checking. How can we balance scope with professional, non-professional, and computerized labor to get closer to the truth that really matters?</p>
<p>&nbsp;</p>
]]></content:encoded>
			<wfw:commentRss>http://www.nickdiakopoulos.com/2012/06/12/fact-checking-at-scale/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>The Future of Automated Story Production</title>
		<link>http://www.nickdiakopoulos.com/2012/05/18/the-future-of-automated-story-production/</link>
		<comments>http://www.nickdiakopoulos.com/2012/05/18/the-future-of-automated-story-production/#comments</comments>
		<pubDate>Fri, 18 May 2012 13:48:12 +0000</pubDate>
		<dc:creator>Nick</dc:creator>
				<category><![CDATA[computational journalism]]></category>
		<category><![CDATA[data]]></category>
		<category><![CDATA[Media Synthesis]]></category>
		<category><![CDATA[media synthesis]]></category>
		<category><![CDATA[storytelling]]></category>

		<guid isPermaLink="false">http://www.nickdiakopoulos.com/?p=711</guid>
		<description><![CDATA[Note: this is cross-posted on the CUNY Tow-Knight Center for Entrepreneurial Journalism site.  Recently there’s been a surge of interest in automatically generating news stories. The poster child is a start-up called Narrative Science which has earned coverage by the likes of the New York Times, Wired, and numerous blogs for its ability to automatically produce <a href="http://www.nickdiakopoulos.com/2012/05/18/the-future-of-automated-story-production/"> read more <span class="meta-nav">&#187;</span></a>]]></description>
				<content:encoded><![CDATA[<p><em>Note: this is <a href="http://towknight.org/research/thinking/automated/">cross-posted</a> on the CUNY Tow-Knight Center for Entrepreneurial Journalism site. </em></p>
<p>Recently there’s been a surge of interest in automatically generating news stories. The poster child is a start-up called <a href="http://www.narrativescience.com/">Narrative Science</a> which has earned coverage by the likes of the <a href="http://www.nytimes.com/2011/09/11/business/computer-generated-articles-are-gaining-traction.html?pagewanted=all">New York Times</a>, <a href="http://www.wired.com/gadgetlab/2012/04/can-an-algorithm-write-a-better-news-story-than-a-human-reporter/all/1">Wired</a>, and numerous <a href="http://structureofnews.wordpress.com/2012/04/29/the-cybernetic-newsroom/">blogs</a> for its ability to automatically produce actual, readable stories of things like sports games or companies’ financial reports based on nothing more than numeric data. It’s impressive stuff, but it doesn’t stop me from thinking: What’s next? In the rest of this post I’ll talk about some challenges, such as story <em>schema and modality</em>, <em>data context</em>, and <em>text transparency</em>, that could improve future story generation engines.</p>
<p>Without inside information we can’t say for sure exactly how Narrative Science (NS) works, though there are some <a href="http://www.dia.fi.upm.es/grupos/I&amp;K/11-using-journalistic-metaphor.pdf">academic systems out there</a> that provide a suitable analogue for description. There are two main phases that have to be automated in order to produce a story this way: the analysis phase and the generative phase. In the <strong>analysis</strong> phase, numeric data is statistically analyzed for things like trends, clusters, patterns, and outliers or exceptions. The analysis phase also includes the challenging aspect of condensing or <strong>selecting</strong> the most interesting things to include in the story (see Ramesh Jain’s “<a href="http://ngs.ics.uci.edu/whitepapers/Extreme_Stories.pdf">Extreme Stories</a>” for more on this).</p>
<p>Followed by analysis and selection comes the task of figuring out an interesting structure to order the information in the story, a <strong>schema</strong>. Narrative Science differentiates itself primarily, I think, by paying close attention to the structure of the stories it generates. Many of the precursors to NS were stuck in the mode of presenting generated text in a chronological schema, which, as we know is quite boring for most stories. Storytelling is really all about structure: providing the connections between aspects of the story, its actors and setting, using some rhetorical ordering that makes sense for and engages the reader. There are <a href="http://www.amazon.com/Story-Structure-Architect-Situations-Compelling/dp/1582973253/ref=sr_1_1?s=books&amp;ie=UTF8&amp;qid=1336928953&amp;sr=1-1">whole books</a> written on how to effectively structure stories to explore different dramatic arcs or genres. Many of these different story structures have yet to be encoded in algorithms that generate text from data, so there’s <strong>lots of room for future story generation engines to explore diverse text styles, genres, and dramatic arcs</strong>.</p>
<p>It’s also important to remember that text has limitations on the structures and the schema it supports well. A textual narrative schema might draw readers in, but, depending on the data, a network schema or a temporal schema might expose different aspects of a story that aren’t apparent, easy, or engaging to represent in text. This leads us to another opportunity for advancement in media synthesis: better integration of textual schema with visualization schemas (e.g. temporal, hierarchical, network). For instance, there may be complementary stories (e.g. change over time, comparison of entities) that are more effectively conveyed through <a href="http://www.ii.uib.no/vis/publications/publication/2012/pdfs/Ma12ScientificStorytelling.pdf">dynamic visualizations</a> than through text. Combining these two modalities has been explored in some <a href="http://www.dia.fi.upm.es/grupos/I&amp;K/11-using-journalistic-metaphor.pdf">research</a> but there is much work to do in thinking about <strong>how best to combine textual schema with different visual schema to effectively convey a story</strong>.</p>
<p>There has also been recent work looking into how data can be used to <a href="http://ilps.science.uva.nl/biblio/why-did-prime-minister-resign-generation-event-explanations-large-news-repositories">generate stories in the medium of video</a>. This brings with it a whole slew of challenges different than text generation, such as the role of audio, and how to crop and edit existing video into a coherent presentation. So, in addition to better incorporating visualization into data-driven stories I think there are opportunities to think about automatically composing stories from such varied modalities as video, photos, 3D, <a href="http://www.nickdiakopoulos.com/wp-content/uploads/2007/05/paper1257-diakopoulos.pdf">games</a>, or even data-based simulations. If you have the necessary data for it, why not include an automatically produced simulation to help communicate the story?</p>
<p>It may be surprising to know that text generation from data has actually been around for some time now. The earliest reference that I found goes back 26 years to a <a href="http://acl.ldc.upenn.edu/C/C86/C86-1132.pdf">paper</a> that describes how to automatically create written weather reports based on data. And then ten years ago, in 2002, we saw the launch of <a href="http://newsblaster.cs.columbia.edu/">Newsblaster</a>, a complex news summarization engine developed at Columbia University that took articles as a data source and produced new text-based summaries using articles clustered around news events. It worked all right, though starting from text <em>as the data</em> has its own challenges (e.g. text understanding) that you don’t run into if you’re just using numeric data. The downside of using <em>just</em> numeric data is that it is largely bereft of context. One way to enhance future story generation engines could be to <strong>better integrate text generated by numeric data together with text (collected from clusters of human-written articles) that provides additional context</strong>.</p>
<p>The last opportunity I’d like to touch on here relates to the journalistic ideal of transparency. I think we have a chance to embed this ideal into algorithms that produce news stories, which <a href="http://dl.acm.org/citation.cfm?id=1378832">often articulate a communicative intent</a> combined with rules or templates that help achieve that intent. It is largely feasible to link any bit of generated text back to the data that gave rise to that statement – in fact it’s already done by Narrative Science in order to debug their algorithms. But this <strong>linking of data to statement should be exposed publicly</strong>. In much the same way that <a href="http://www.nickdiakopoulos.com/Documents/visRhetoric_final_preprint.pdf">journalists often label their graphics and visualizations with the source of their data</a>, text generated from data should source each statement. Another dimension of transparency practiced by journalists is to be up-front about the journalist’s relationship to the story (e.g. if they’re reporting on a company that they’re involved with). This raises an interesting and challenging question of <em>self-awareness</em> for algorithms that produce stories. Take for instance this <a href="http://www.forbes.com/sites/narrativescience/2012/04/17/forbes-earnings-preview-new-york-times-company-3/">Forbes article</a> produced by Narrative Science about New York Times Co. earnings. The article contains a section on “competitors”, but the NS algorithm isn’t smart enough or self-aware enough to know that it itself is an obvious competitor. <strong>How can algorithms be taught to be transparent about their own relationships to stories?</strong></p>
<p>There are tons of exciting opportunities in the space of media synthesis. Challenges like exploring different story structures and schemas, providing and integrating context, and embedding journalistic ideals such as transparency will keep us more than busy in the years and, likely, decades to come.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.nickdiakopoulos.com/2012/05/18/the-future-of-automated-story-production/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Visualization Performance in the Browser</title>
		<link>http://www.nickdiakopoulos.com/2012/05/14/visualization-performance-in-the-browser/</link>
		<comments>http://www.nickdiakopoulos.com/2012/05/14/visualization-performance-in-the-browser/#comments</comments>
		<pubDate>Tue, 15 May 2012 02:28:45 +0000</pubDate>
		<dc:creator>Nick</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[visualization]]></category>

		<guid isPermaLink="false">http://www.nickdiakopoulos.com/?p=698</guid>
		<description><![CDATA[I&#8217;ve recently embarked on a new project that involves visualizing and animating some potentially large networks as part of a browser-based information tool. So, I wanted to compare some of the different javascript visualization libraries out there to see how their performance scales. There are tons of options for doing advanced graphics in the browser <a href="http://www.nickdiakopoulos.com/2012/05/14/visualization-performance-in-the-browser/"> read more <span class="meta-nav">&#187;</span></a>]]></description>
				<content:encoded><![CDATA[<p>I&#8217;ve recently embarked on a new project that involves visualizing and animating some potentially large networks as part of a browser-based information tool. So, I wanted to compare some of the different javascript visualization libraries out there to see how their performance scales. There are tons of options for doing advanced graphics in the browser nowadays including SVG-based solutions like <a href="http://d3js.org/">D3</a>, and <a href="http://raphaeljs.com/">Raphael</a>, as well as HTML5 canvas solutions like <a href="http://processingjs.org/">processing.js</a>, the<a href="http://thejit.org/"> javascript infovis toolkit</a>, <a href="http://sigmajs.org/">sigma.js</a> and <a href="http://fabricjs.com/">fabric.js</a>.</p>
<p>There are certain <a href="http://dev.opera.com/articles/view/svg-or-canvas-choosing-between-the-two/">benefits and trade-offs between SVG and Canvas</a>. For instance canvas has performance <a href="http://smus.com/canvas-vs-svg-performance/">that scales with the size of the image area</a>. SVG performance instead scales with the complexity and size of the scenegraph. It also allows for control of elements via the DOM and CSS and has much better support for interactivity (i.e. every visual object can have event listeners). This <a href="http://bl.ocks.org/2647924">sketch</a> from D3 creator Mike Bostock shows that D3 performance can render 500 animated circles in SVG at a resolution of 960&#215;500 at about ~40 FPS in Chrome, whereas rendering the <a href="http://bl.ocks.org/2647922">same via the Canvas</a> element was closer to ~30 FPS. Knowing what we know about how canvas scales, if the image area were less than 960 x 500, then canvas performance would increase, whereas SVG performance would not change. Of course, your mileage may vary depending on your browser and system &#8211; for instance <a href="http://www.trevorbedford.com/archive/may_07_2012.html">this post</a> found that processing.js (using canvas) outperformed D3 (using SVG) by 20-1000%.</p>
<p>To get a better feel for some of the performance trade-offs (and to take some of the different libraries for a test spin) I developed <a href="http://nad.webfactional.com/ntap/graphscale/">a quick comparison tool</a> which lets you see performance for D3 (SVG), Sigma.js, Processing.js, and D3 (rendering to canvas) for different graph sizes (500-5,000 nodes, and 1,000-10,000 edges) on an image area of 600&#215;600 pixels. On my system (MBP 2.4GHz, Chrome v.18) D3 (SVG) choked down to about 7 FPS with 1000 nodes and 2000 edges when 20% of nodes&#8217; colors were gradually animated. For the same rig sigma.js could do 19 FPS and processing.js could do 11 FPS. Using D3 but then rendering to canvas did the best though: 23 FPS.</p>
<p>D3 seems like a great option given the rich set of utilities and functions available, as well as the option to efficiently render directly to canvas if you really need to scale up the number of objects in your scene. Of course this does undo some of the nice interactivity and manipulability features of using SVG &#8230;</p>
<p>&nbsp;</p>
]]></content:encoded>
			<wfw:commentRss>http://www.nickdiakopoulos.com/2012/05/14/visualization-performance-in-the-browser/feed/</wfw:commentRss>
		<slash:comments>7</slash:comments>
		</item>
	</channel>
</rss>
