<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Nick Diakopoulos</title>
	<atom:link href="http://www.nickdiakopoulos.com/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.nickdiakopoulos.com</link>
	<description>Musings on Media</description>
	<lastBuildDate>Tue, 15 May 2012 02:28:45 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.2.1</generator>
		<item>
		<title>Visualization Performance in the Browser</title>
		<link>http://www.nickdiakopoulos.com/2012/05/14/visualization-performance-in-the-browser/</link>
		<comments>http://www.nickdiakopoulos.com/2012/05/14/visualization-performance-in-the-browser/#comments</comments>
		<pubDate>Tue, 15 May 2012 02:28:45 +0000</pubDate>
		<dc:creator>Nick</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[visualization]]></category>

		<guid isPermaLink="false">http://www.nickdiakopoulos.com/?p=698</guid>
		<description><![CDATA[I&#8217;ve recently embarked on a new project that involves visualizing and animating some potentially large networks as part of a browser-based information tool. So, I wanted to compare some of the different javascript visualization libraries out there to see how their performance scales. There are tons of options for doing advanced graphics in the browser <a href="http://www.nickdiakopoulos.com/2012/05/14/visualization-performance-in-the-browser/"> read more <span class="meta-nav">&#187;</span></a>]]></description>
			<content:encoded><![CDATA[<p>I&#8217;ve recently embarked on a new project that involves visualizing and animating some potentially large networks as part of a browser-based information tool. So, I wanted to compare some of the different javascript visualization libraries out there to see how their performance scales. There are tons of options for doing advanced graphics in the browser nowadays including SVG-based solutions like <a href="http://d3js.org/">D3</a>, and <a href="http://raphaeljs.com/">Raphael</a>, as well as HTML5 canvas solutions like <a href="http://processingjs.org/">processing.js</a>, the<a href="http://thejit.org/"> javascript infovis toolkit</a>, <a href="http://sigmajs.org/">sigma.js</a> and <a href="http://fabricjs.com/">fabric.js</a>.</p>
<p>There are certain <a href="http://dev.opera.com/articles/view/svg-or-canvas-choosing-between-the-two/">benefits and trade-offs between SVG and Canvas</a>. For instance canvas has performance <a href="http://smus.com/canvas-vs-svg-performance/">that scales with the size of the image area</a>. SVG performance instead scales with the complexity and size of the scenegraph. It also allows for control of elements via the DOM and CSS and has much better support for interactivity (i.e. every visual object can have event listeners). This <a href="http://bl.ocks.org/2647924">sketch</a> from D3 creator Mike Bostock shows that D3 performance can render 500 animated circles in SVG at a resolution of 960&#215;500 at about ~40 FPS in Chrome, whereas rendering the <a href="http://bl.ocks.org/2647922">same via the Canvas</a> element was closer to ~30 FPS. Knowing what we know about how canvas scales, if the image area were less than 960 x 500, then canvas performance would increase, whereas SVG performance would not change. Of course, your mileage may vary depending on your browser and system &#8211; for instance <a href="http://www.trevorbedford.com/archive/may_07_2012.html">this post</a> found that processing.js (using canvas) outperformed D3 (using SVG) by 20-1000%.</p>
<p>To get a better feel for some of the performance trade-offs (and to take some of the different libraries for a test spin) I developed <a href="http://nad.webfactional.com/ntap/graphscale/">a quick comparison tool</a> which lets you see performance for D3 (SVG), Sigma.js, Processing.js, and D3 (rendering to canvas) for different graph sizes (500-5,000 nodes, and 1,000-10,000 edges) on an image area of 600&#215;600 pixels. On my system (MBP 2.4GHz, Chrome v.18) D3 (SVG) choked down to about 7 FPS with 1000 nodes and 2000 edges when 20% of nodes&#8217; colors were gradually animated. For the same rig sigma.js could do 19 FPS and processing.js could do 11 FPS. Using D3 but then rendering to canvas did the best though: 23 FPS.</p>
<p>D3 seems like a great option given the rich set of utilities and functions available, as well as the option to efficiently render directly to canvas if you really need to scale up the number of objects in your scene. Of course this does undo some of the nice interactivity and manipulability features of using SVG &#8230;</p>
<p>&nbsp;</p>
]]></content:encoded>
			<wfw:commentRss>http://www.nickdiakopoulos.com/2012/05/14/visualization-performance-in-the-browser/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>Cultivating the Landscape of Innovation in Computational Journalism</title>
		<link>http://www.nickdiakopoulos.com/2012/04/05/cultivating-the-landscape-of-innovation-in-computational-journalism/</link>
		<comments>http://www.nickdiakopoulos.com/2012/04/05/cultivating-the-landscape-of-innovation-in-computational-journalism/#comments</comments>
		<pubDate>Thu, 05 Apr 2012 16:42:00 +0000</pubDate>
		<dc:creator>Nick</dc:creator>
				<category><![CDATA[computational journalism]]></category>
		<category><![CDATA[innovation]]></category>
		<category><![CDATA[systematic innovation]]></category>

		<guid isPermaLink="false">http://www.nickdiakopoulos.com/?p=678</guid>
		<description><![CDATA[For the last several months I&#8217;ve been working on a whitepaper for the CUNY Tow-Knight Center for Entrepreneurial Journalism. It&#8217;s about cultivating more technical innovation in journalism and involves systematically mapping out what&#8217;s been done (in terms of research) as well as outlining a method for people to generate new ideas in computational journalism. I&#8217;m <a href="http://www.nickdiakopoulos.com/2012/04/05/cultivating-the-landscape-of-innovation-in-computational-journalism/"> read more <span class="meta-nav">&#187;</span></a>]]></description>
			<content:encoded><![CDATA[<p>For the last several months I&#8217;ve been working on a whitepaper for the CUNY <a href="http://towknight.org/">Tow-Knight Center</a> for Entrepreneurial Journalism. It&#8217;s about cultivating more technical innovation in journalism and involves systematically mapping out what&#8217;s been done (in terms of research) as well as outlining a method for people to <em>generate</em> new ideas in computational journalism. I&#8217;m happy to say that the paper was published by the Tow-Knight Center today. You can get Jeff Jarvis&#8217; take on it on the <a href="http://towknight.org/research/newopps/">Tow-Knight blog</a>, or for more coverage you can see the <a href="http://www.niemanlab.org/2012/04/a-new-framework-for-innovation-in-journalism-how-a-computer-scientist-would-do-it/">Nieman Lab write-up</a>. Or go straight for the <a href="http://www.nickdiakopoulos.com/wp-content/uploads/2012/04/diakopoulos_whitepaper_systematicinnovation.pdf">paper</a> itself.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.nickdiakopoulos.com/2012/04/05/cultivating-the-landscape-of-innovation-in-computational-journalism/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Moving Towards Algorithmic Corroboration</title>
		<link>http://www.nickdiakopoulos.com/2012/03/05/moving-towards-algorithmic-corroboration/</link>
		<comments>http://www.nickdiakopoulos.com/2012/03/05/moving-towards-algorithmic-corroboration/#comments</comments>
		<pubDate>Mon, 05 Mar 2012 14:41:27 +0000</pubDate>
		<dc:creator>Nick</dc:creator>
				<category><![CDATA[computational journalism]]></category>
		<category><![CDATA[fact-checking]]></category>
		<category><![CDATA[information quality]]></category>

		<guid isPermaLink="false">http://www.nickdiakopoulos.com/?p=650</guid>
		<description><![CDATA[Note: this is cross-posted on the Berkman/MIT &#8220;Truthiness in Digital Media&#8221; blog.  One of the methods that truth seekers like journalists or social scientists often employ is corroboration. If we find two (or more) independent sources that reinforce each other, and that are credible, we gain confidence in the truth-value of a claim. Independence is <a href="http://www.nickdiakopoulos.com/2012/03/05/moving-towards-algorithmic-corroboration/"> read more <span class="meta-nav">&#187;</span></a>]]></description>
			<content:encoded><![CDATA[<p><em>Note: this is cross-posted on the Berkman/MIT &#8220;Truthiness in Digital Media&#8221; <a href="http://blogs.law.harvard.edu/truthiness/blog/">blog</a>. </em></p>
<p>One of the methods that truth seekers like journalists or social scientists often employ is corroboration. If we find two (or more) independent sources that reinforce each other, and that are credible, we gain confidence in the truth-value of a claim. Independence is key, since political, monetary, legal, or other connections can taint or at least place contingencies on the value of corroborated information.</p>
<p>How can we scale this idea to the web by teaching computers to effectively corroborate information claims online? An automated system could allow any page online to be quickly checked for misinformation. Violations could be flagged and highlighted, either for lack of corroboration or for a multi-faceted corroboration (i.e. a controversy).</p>
<p>There have already been a <a href="http://dl.acm.org/citation.cfm?doid=1281192.1281309">handful</a> <a href="http://dl.acm.org/citation.cfm?doid=1871985.1872002">of</a> <a href="http://dl.acm.org/citation.cfm?doid=1148170.1148316">efforts</a> in the computing research literature that have looked at how to do algorithmic corroboration. But there is still work to do to define adequate operationalizations so that computers can be effective corroborators. First of all, we need to define and extract the units that are to be corroborated. Computers need to be able to differentiate a factually stated claim from a speculative or hypothetical one, since only factual claims can really be meaningfully corroborated. In order to aggregate statements we then need to be able to match two claims together while taking into account different ways of saying similar things. This includes the challenge of context, the tiniest change in which can alter the meaning of a statement and make it difficult for a computer to assess the equivalence of statements. Then, the simplest aggregation strategy might consider the frequency of a statement as a proxy for its truth-value (the more sources that agree with statement X, the more we should believe it) but this doesn’t take into the account the <em>credibility</em> of the source or their other relationships, which also need to be enumerated and factored in. We might want algorithms to consider other dimensions such as the relevance and expertise of the source to the claim, the source’s originality (or lack thereof), the prominence of the claim in the source, and the source’s spatial or temporal proximity to the information. There are many research challenges here!</p>
<p>Any automated corroboration method would rely on a corpus of information that acts as the basis for corroboration. Previous work like <a href="http://confront.intel-research.net/Dispute_Finder.html">DisputeFinder</a> has looked at scraping known repositories such as Politifact or Snopes to jump-start a claims database, and other work like <a href="http://www.videolyzer.com/">Videolyzer</a> has tried to leverage engaged people to provide structured annotations of claims, though it’s difficult to get enough coverage and scale through manual efforts. Others have proceeded by <a href="http://www.cs.rutgers.edu/~amelie/papers/2011/webCorrob_is11.pdf">using the internet </a>as a massive corpus. But there could also be an opportunity here for news organizations, who already produce and have archives of lots of credible and trustworthy text, to provide a corroboration service based on all of the claims embedded in those texts. A browser plugin could detect and highlight claims that are not corroborated by e.g. the NYT or Washington Post corpora. Could news organizations even make money off their archives like this?</p>
<p>It’s important not to forget that there are limits to corroboration too, both practical and philosophical. Hypothetical statements, opinions and matters of taste, or statements resting on complex assumptions may not benefit at all from a corroborative search for truth. Moreover, systemic bias can still go unnoticed, and a collective social mirage can guide us toward fonts of hollow belief when we drop our critical gaze. We’ll still need smart people around, but, I would argue, finding effective ways to automate corroboration would be a huge advance and a boon in the fight against a misinformed public.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.nickdiakopoulos.com/2012/03/05/moving-towards-algorithmic-corroboration/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Systematic Technical Innovation in Journalism</title>
		<link>http://www.nickdiakopoulos.com/2012/02/08/systematic-technical-innovation-in-journalism/</link>
		<comments>http://www.nickdiakopoulos.com/2012/02/08/systematic-technical-innovation-in-journalism/#comments</comments>
		<pubDate>Wed, 08 Feb 2012 15:56:46 +0000</pubDate>
		<dc:creator>Nick</dc:creator>
				<category><![CDATA[computational journalism]]></category>
		<category><![CDATA[innovation]]></category>

		<guid isPermaLink="false">http://www.nickdiakopoulos.com/?p=637</guid>
		<description><![CDATA[The idea that innovation can be an organized, systematic search for change is not new — Peter Drucker wrote about it over 25 years ago in his book Innovation and Entrepreneurship — and I’m fairly certain he wasn’t the first. Systematic innovation is about methodically surveying a landscape of potential innovation while also analyzing the potential <a href="http://www.nickdiakopoulos.com/2012/02/08/systematic-technical-innovation-in-journalism/"> read more <span class="meta-nav">&#187;</span></a>]]></description>
			<content:encoded><![CDATA[<p>The idea that innovation can be an organized, systematic search for change is not new — Peter Drucker wrote about it over 25 years ago in his book <a href="http://www.amazon.com/Innovation-Entrepreneurship-Peter-F-Drucker/dp/0060851139/ref=sr_1_1?ie=UTF8&amp;qid=1328640129&amp;sr=8-1">Innovation and Entrepreneurship</a> — and I’m fairly certain he wasn’t the first. Systematic innovation is about methodically surveying a landscape of potential innovation while also analyzing the potential economic or social value of innovations. For the last several months I’ve been working with the CUNY Graduate School for Journalism on <strong>developing a process to systematically explore the potential for technical innovation in journalism</strong>. My hope is that this can spur new ideas and growth in <a href="http://en.wikipedia.org/wiki/Computational_journalism">Computational Journalism</a>. In the rest of this post I’ll describe how the process is developing and provide some initial feedback we’ve gotten on how it’s working.</p>
<p>One way to look at innovation is in terms of problem solving: (1) what’s the problem or what’s needed, and (2) how do you reify the solution. Sure, technical innovation is not the <em>only</em> kind of innovation, but here my focus of &#8220;how to make it happen&#8221; will be computing. The problems and needs that I’m focused on are further constrained by the domain, journalism, and include aspects of <a href="http://www.nickdiakopoulos.com/2011/12/02/what-a-news-consumer-wants-modeling-users/">what news consumers need and want</a>, what <a href="http://www.nickdiakopoulos.com/2011/12/12/designing-tools-for-journalism/">news producers (e.g. professional journalists, but also others) need and want</a>, and how <a href="http://www.nickdiakopoulos.com/2012/01/02/journalism-as-information-science/">value is added to information</a> during the production process.</p>
<p>My basic premise is that if we can identify and enumerate concrete concepts related to needs/wants and technical solutions, then we can systematically combine different concepts to arrive at new ideas for innovation. This is the core idea of <a href="http://en.wikipedia.org/wiki/Computational_creativity#Combinatorial_creativity">combinatorial creativity</a>:  mashing up concepts in novel juxtapositions often sparks new ideas. Drawing on lots of research and, when possible, theory, I developed a concept space which includes 27 computing and technology concepts (e.g. natural user interfaces, computer vision, data mining, etc.), 15 needs and goals that journalists or news consumers typically have with information / media (e.g. storytelling, sensemaking, staying informed, etc.), and 14 information processes that are used to increase the value of information (e.g. filtering, ordering, summarization, etc.). That amounts to 56 concepts across four main categories (computing and technology, news consumer needs, journalism goals, and information processes).</p>
<p>To make the creative combination of ideas more engaging I produced and printed concept cards using <a href="http://us.moo.com/">Moo</a>, which were color-coded based on their main category. Each card has a concept and brief description; here’s what they look like:</p>
<p><a href="http://www.nickdiakopoulos.com/wp-content/uploads/2012/02/cards.jpg"><img class="aligncenter size-medium wp-image-640" title="cards" src="http://www.nickdiakopoulos.com/wp-content/uploads/2012/02/cards-e1328715040159-300x225.jpg" alt="" width="300" height="225" /></a></p>
<p>Brainstorming could happen in a lot of different ways, but for a start I decided to have groups of three people with each person randomly picking a card, one card from computing and technology and two cards from the other main categories. Then the goal is to generate as many different ideas as possible for products or services that combine those three concepts in some time-frame (say 5 minutes). A recorder in the group keeps track of the concept cards drawn and all of the ideas generated so that they can be discussed later.</p>
<p>The process seems to be working. Earlier this week in <a href="http://www.buzzmachine.com/">Jeff Jarvis</a>’ entrepreneurial journalism class I spent some time lecturing on the different concepts and then had students break into 5 groups of 3 to play the brainstorming “game”, which looked something like this:</p>
<p style="text-align: center;"><a href="http://www.nickdiakopoulos.com/wp-content/uploads/2012/02/brainstorming.jpg"><img class="aligncenter size-full wp-image-643" title="brainstorming" src="http://www.nickdiakopoulos.com/wp-content/uploads/2012/02/brainstorming.jpg" alt="" width="384" height="288" /></a></p>
<p>The reaction was largely positive, with at least one student exclaiming that she really like the exercise, and another acknowledging that there were some good ideas coming out of having to think about (and apply) combinations of concepts that they hadn’t necessarily thought of before.</p>
<p>In a series of 3, 5-minute rounds of brainstorming, the five groups generated 54 ideas in total, for an average of 3.6 ideas per group per round. Of course there was some variability between groups and most groups needed a round to warm-up, but there were definitely some 5-star ideas generated. Some of the ideas were for general products or services, but some were also about how technologies could enable new kinds of stories to be told — editorial creativity. For instance, an idea for a general platform was to produce 3D virtual recreations of accident spots to help viewers get a better experience of why that spot could be dangerous. Another idea was to develop an app where citizen journalists could sign up and be automatically alerted when an incident occurs near their location. On the editorial creativity side of things, some ideas included using motion capture technology to recreate crime scenes or analyses, or to illustrate workplace injuries from repetitive stress. Not all of these things would make tons of money or generate millions of clicks, but that&#8217;s not the point — for now the point is to get people thinking in new directions.</p>
<p>We&#8217;re still thinking about ways to improve the process, like adding pressure, constraints, or context. And generating lots of ideas is good, but step two is to think about winnowing and how to assess feasibility and quality of ideas. Stay tuned as this continues to evolve&#8230;</p>
]]></content:encoded>
			<wfw:commentRss>http://www.nickdiakopoulos.com/2012/02/08/systematic-technical-innovation-in-journalism/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Finding News Sources in Social Media</title>
		<link>http://www.nickdiakopoulos.com/2012/01/24/finding-news-sources-in-social-media/</link>
		<comments>http://www.nickdiakopoulos.com/2012/01/24/finding-news-sources-in-social-media/#comments</comments>
		<pubDate>Tue, 24 Jan 2012 15:18:34 +0000</pubDate>
		<dc:creator>Nick</dc:creator>
				<category><![CDATA[computational journalism]]></category>
		<category><![CDATA[credibility]]></category>
		<category><![CDATA[HCI]]></category>
		<category><![CDATA[social media]]></category>
		<category><![CDATA[Twitter]]></category>
		<category><![CDATA[twitter]]></category>

		<guid isPermaLink="false">http://www.nickdiakopoulos.com/?p=616</guid>
		<description><![CDATA[Whether it&#8217;s terrorist attacks in Mumbai, a plane crash landing on the Hudson River, or videos and reactions from a recently capsized cruise ship in Italy, social media has proven itself again and again to be a huge boon to journalists covering breaking news events. But at the same time, the prodigious amount of social <a href="http://www.nickdiakopoulos.com/2012/01/24/finding-news-sources-in-social-media/"> read more <span class="meta-nav">&#187;</span></a>]]></description>
			<content:encoded><![CDATA[<p>Whether it&#8217;s terrorist attacks in Mumbai, a plane crash landing on the Hudson River, or videos and reactions from a recently <a href="http://storyful.com/stories/1000018704">capsized cruise ship in Italy</a>, social media has proven itself again and again to be a huge boon to journalists covering breaking news events. But at the same time, the prodigious amount of social media content posted around news events creates a challenge for journalists trying to find <em>interesting</em> and <em>trustworthy</em> sources in the din. A few recent efforts have looked at <a href="http://www.aclweb.org/anthology-new/D/D11/D11-1147.pdf">automatically identifying misinformation</a> on Twitter, or <a href="http://www.ra.ethz.ch/cdstore/www2011/proceedings/p675.pdf">automatically assessing credibility</a>, though pure automation carries the risk of cutting human decision makers completely out of the loop. There aren&#8217;t many general purpose (or accessible) solutions out there for this problem either; services like <a href="http://www.klout.com">Klout</a> help identify topical authorities, and <a href="http://www.storify.com">Storify</a> and <a href="http://www.storyful.com">Storyful</a> help in assembling social media content, but don&#8217;t offer additional cues for assessing credibility or trustworthiness.</p>
<p>Some research I&#8217;ve been doing (with collaborators at <a href="http://research.microsoft.com/en-us/um/people/munmund/">Microsoft</a> and <a href="http://comminfo.rutgers.edu/~mor/">Rutgers</a>) has been looking into this problem of developing cues and filters to enable journalists to better tap into social media. In the rest of this post I&#8217;ll to preview this forthcoming research, but for all the details you&#8217;ll want to see the CHI <a href="http://www.nickdiakopoulos.com/wp-content/uploads/2011/07/SRSR-diakopoulos.pdf">paper</a> appearing in May and the CSCW <a href="http://www.nickdiakopoulos.com/wp-content/uploads/2011/07/dechoudhury-cscw2012.pdf">paper</a> appearing next month.</p>
<p>With my collaborators I built an application called SRSR (standing for &#8220;Seriously Rapid Source Review&#8221;) which incorporates a number of advanced aggregations, computations, and cues that we thought would be helpful for journalists to find and assess sources in Twitter around breaking news events. And we didn&#8217;t just build the system, we also evaluated it on two breaking news scenarios with seven super-star social media editors at leading local, national, and international news outlets.</p>
<p>The features we built into SRSR were informed by talking with many journalists and include facilities to filter and find eyewitnesses and archetypical user-types, as well as to characterize sources according to their implicit location, network, and past content. The SRSR interface allows the user to quickly scan through potential sources and get a feeling for whether they&#8217;re more or less credible and if they might make good sources for a story. Here&#8217;s a snapshot showing some content we collected and processed around the Tottenham riots.</p>
<p><a href="http://www.nickdiakopoulos.com/wp-content/uploads/2012/01/srsr-application-2.png"><img class="aligncenter size-large wp-image-624" title="srsr application 2" src="http://www.nickdiakopoulos.com/wp-content/uploads/2012/01/srsr-application-2-1024x556.png" alt="" width="620" height="336" /></a></p>
<p><strong>Automatically Identifying Eyewitnesses</strong><br />
A core feature we built into SRSR was the ability to filter sources based on whether or not they were likely to be eyewitnesses. To determine if someone was an eyewitness we built an automatic classifier that looks at the text content shared by a user and compares it to a dictionary of over 700 key terms relating to perception, seeing, hearing, and feeling &#8211; the kind of language you would expect from eyewitnesses. If a source uses one of the key terms then we label them as a likely eyewitness. Even using this relatively simple classifier we got fairly accurate results: precision was 0.89 and recall was 0.32. This means that if a source uses one of these words it&#8217;s highly likely they are really an eyewitness to the event, but that there were also a number of eyewitnesses who <em>didn&#8217;t</em> use any of these key words (thus the lower recall score). Being able to rapidly find eyewitnesses with 1st hand information was one of the most liked features in our evaluation. In the future there&#8217;s lot&#8217;s we want to do to make the eyewitness classifier even more accurate.</p>
<p><strong>Automatically Identifying User Archetypes</strong><br />
Since different types of users on Twitter may produce different <em>kinds </em>of information we also sought to segment users according to some sensible archetypes: journalists/bloggers, organizations, and &#8220;ordinary&#8221; people. For instance, around a natural hazard news event, organizations might share information about marshaling public resources or have links to humanitarian efforts, whereas &#8220;ordinary&#8221; people are more likely to have more eyewitness information. We thought it could be helpful to journalists to be able to rapidly classify sources according to these information archetypes and so we built an automatic classifier for these categories. All of the details are in the CSCW <a href="http://www.nickdiakopoulos.com/wp-content/uploads/2011/07/dechoudhury-cscw2012.pdf">paper</a>, but we basically got quite good accuracy with the classifier across these three categories: 90-95%. Feedback in our evaluation indicated that rapidly identifying organizations and journalists was quite helpful.</p>
<p><strong>Visually Cueing Location, Network, Entities</strong><br />
We also developed visual cues that were designed to help journalists assess the potential verity and credibility of a source based on their profile. In addition to showing the location of the source, we normalized and aggregated locations <em>within a sources&#8217;s network</em>. In particular we looked at the &#8220;friends&#8221; of a source (i.e. people that I follow and that follow me back) and show the top three most frequent locations in that network. This gives a sense of where this source knows people and has their social network. So even if I don&#8217;t live in London, if I know 50 people there it suggests I have a stake in that location or may have friends or other connections to that area that make me knowledgable about it. Participants in our evaluation really liked this cue as it gives a sense of <em>implicit or social</em> <em>location. </em></p>
<p>We also show a small sketch of the network of a source indicating who has shared relevant event content and is also following the source. This gives a sense of whether many people talking about the news event are related to the source. Journalists in our evaluation indicated that this was a nice credibility cue. For instance, if the Red Cross is following a source that&#8217;s a nice positive indicator.</p>
<p>Finally, we aggregated the top five most frequent entities (i.e. references to corporations, people, or places) that a source mentioned in their Twitter history (we were able to capture about 1000 historical messages for each person). The idea was that this could be useful to show what a source talks about, but in reality our participants didn&#8217;t find this feature that useful for the breaking news scenarios they were presented with. Perhaps in other scenarios it could still be useful?</p>
<p><strong>What&#8217;s Next</strong><br />
While SRSR is a nice step forward there&#8217;s still plenty to do. For one, our prototype was not built for real-time events and was tested with pre-collected and processed data due to limitations of the Twitter API (hey Twitter, give me a call!!). And there&#8217;s plenty more to think about in terms of enhancing the eyewitness classifier, thinking about different ways to use network information to spider out in search of sources, and to experiment with how such a tool can be used to cover different kinds of events.</p>
<p>Again, for all the gory details on how these features were built and tested you can read our research papers. Here are the full references:</p>
<ul>
<li>N. Diakopoulos, M. De Choudhury, M. Naaman. Finding and Assesing Social Media Information Sources in the Context of Journalism. <em>Conference on Human Factors in Computing Systems (CHI)</em>. May, 2012. [<a href="http://www.nickdiakopoulos.com/wp-content/uploads/2011/07/SRSR-diakopoulos.pdf">PDF</a>]</li>
<li>M. De Choudhury, N. Diakopoulos, M. Naaman. Unfolding the Event Landscape on Twitter: Classification and Exploration of User Categories. <em>Proc. Conference on Computer Supported Cooperative Work (CSCW)</em>. February, 2012. [<a href="http://www.nickdiakopoulos.com/wp-content/uploads/2011/07/dechoudhury-cscw2012.pdf">PDF</a>]</li>
</ul>
<p>&nbsp;</p>
]]></content:encoded>
			<wfw:commentRss>http://www.nickdiakopoulos.com/2012/01/24/finding-news-sources-in-social-media/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>News Headlines and Retweets</title>
		<link>http://www.nickdiakopoulos.com/2012/01/17/news-headlines-and-retweets/</link>
		<comments>http://www.nickdiakopoulos.com/2012/01/17/news-headlines-and-retweets/#comments</comments>
		<pubDate>Tue, 17 Jan 2012 15:29:14 +0000</pubDate>
		<dc:creator>Nick</dc:creator>
				<category><![CDATA[computational journalism]]></category>
		<category><![CDATA[headlines]]></category>
		<category><![CDATA[Twitter]]></category>
		<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[news headlines]]></category>
		<category><![CDATA[twitter]]></category>

		<guid isPermaLink="false">http://www.nickdiakopoulos.com/?p=567</guid>
		<description><![CDATA[How do you maximize the reach and engagement of your tweets? This is a hugely important question for companies who want to maximize the value of their content. There are even start-ups, like Social Flow, that specialize in optimizing the &#8220;engagement&#8221; of tweets by helping to time them appropriately. A growing body of research is also looking <a href="http://www.nickdiakopoulos.com/2012/01/17/news-headlines-and-retweets/"> read more <span class="meta-nav">&#187;</span></a>]]></description>
			<content:encoded><![CDATA[<p>How do you maximize the reach and engagement of your tweets? This is a hugely important question for companies who want to maximize the value of their content. There are even start-ups, like <a href="http://www.socialflow.com/">Social Flow</a>, that specialize in optimizing the &#8220;engagement&#8221; of tweets by helping to time them appropriately. A <a href="http://www.springerlink.com/content/n037277255u2065h/">growing</a> <a href="http://www.websci11.org/fileadmin/websci/Papers/50_paper.pdf">body</a> <a href="http://www.aaai.org/ocs/index.php/ICWSM/ICWSM11/paper/viewFile/2754/3209">of </a><a href="http://www2.parc.com/isl/members/hong/publications/socialcomputing2010.pdf">research</a> is also looking at what factors, both of the social network and of the content of tweets, impact how often tweets get retweeted. For instance, some of this research has indicated that tweets are more retweeted when they contain URLs and hashtags, when they contain negative or exciting and intense sentiments, and when the user has more followers. Clearly time is important too and different times of day or days of week can also impact the amount of attention people are paying to social media (and hence the likelihood that something will get retweeted).</p>
<p>But aside from the obvious thing of growing their follower base, what can content creators like news organizations do to increase the retweetability of their tweets? Most news organizations basically tweet out headlines and links to their stories. And that delicate choice of words in writing a headline has always been a bit of a skill and an art. But with lots of data now we can start being a bit more scientific by looking at what textual and linguistic features of headlines tend to be associated with higher levels of retweets. In the rest of this post I&#8217;ll present some data that starts to scratch at the surface of this.</p>
<p>I collected all tweets from the <a href="https://twitter.com/#!/nytimes">@nytimes</a> twitter account between July 1st, 2011 and Sept. 30th, 2011 using the <a href="http://topsy.com/">Topsy</a> API. I wanted to analyze somewhat older tweets to make sure that retweeting had run its natural course and that I wasn&#8217;t truncating the retweeting behavior. Using data from only one news account has the advantage that it controls for the network and audience and allows me to focus purely on textual features. In all I collected 5101 tweets, including how many times each tweet was retweeted (1) using the built-in retweet button and (2) using the old syntax of &#8220;RT @username&#8221;. Of these tweets, 93.7% contained links to NYT content, 1.0% contained links to other content (e.g. yfrog, instagram, or government information), and 0.7% were retweets themselves. The remaining 4.6% of tweets in my sample had no link.</p>
<p>The first thing I looked at was what the average number of retweets was for the tweets in each group (links to NYT content, links to other content, and no links).</p>
<ul>
<li>Average # of RTs for tweets with links to NYT content: 48.0</li>
<li>Average # of RTs for tweets with links to other content: 48.1</li>
<li>Average # of RTs for tweets with no links: 83.8</li>
</ul>
<p>This is interesting because some of the <a href="http://www2.parc.com/isl/members/hong/publications/socialcomputing2010.pdf">best research out there</a> suggests that tweets WITH links get more RTs. But I found just the opposite: <strong>tweets with NO LINKS got more RTs </strong>(1.74 times as many on average).  I read through the tweets with no links (there&#8217;s only 234) and they were mostly breaking news alerts like &#8220;<em>Qaddafi Son Arrested&#8230;</em>&#8220;, &#8220;<em>Dow drops more than 400 points&#8230;</em>&#8220;, or &#8220;<em>Obama and Boehner Close to Major Budget Deal&#8230;</em>&#8220;. So from the prior research we know that for any old tweet source, URLs are a signal that is correlated with RTs, but <strong>for news organizations, the most &#8220;newsy&#8221; or retweetable information comes in a brief snippet, without a link</strong>. The implication is not that news organization should stop linking their content to get more RTs, but rather that the kind of information shared without links from news organizations (the NYT in particular) is highly retweetable.</p>
<p>To really get into the textual analysis I wanted to look just at tweets with links back to NYT content though. So the rest of the analysis was done on the 4780 tweets with links back to NYT content. If you look at these tweets they basically take the form: &lt;<em>story headline</em>&gt; + &lt;<em>link</em>&gt;. I broke the dataset up into the top and bottom 10% of tweets (deciles)<em> </em>as ranked by their total number of RTs, which includes RTs using the built-in RT button as well as the old style RTs. The overall average # of RTs was 48.3, but in the top 10% of tweets it was 173 and in the bottom 10% it was 7.4. Here&#8217;s part of the distribution:</p>
<p style="text-align: left;"><a href="http://www.nickdiakopoulos.com/wp-content/uploads/2012/01/RThistogram.png"><img class="aligncenter size-full wp-image-606" title="RThistogram" src="http://www.nickdiakopoulos.com/wp-content/uploads/2012/01/RThistogram.png" alt="" width="655" height="371" /></a><a href="http://www.nickdiakopoulos.com/wp-content/uploads/2012/01/Screen-Shot-2012-01-16-at-6.15.38-PM.png"><br />
</a>Is length of a tweet related to how often it gets retweeted? I looked at the average length of the tweets (in characters) in the top and bottom 10%.</p>
<ul>
<li>Top 10%: 75.8 characters</li>
<li>Bottom 10%: 82.8 characters</li>
</ul>
<p>This difference is statistically significant using a t-test (t=5.23, p &lt; .0001). So<strong> tweets that are in the top decile of RTs are shorter, on average, by about 7 characters</strong>. This isn&#8217;t prescriptive, but it does suggest an interesting correlation that headline / tweet writers for news organizations might consider exploring.</p>
<p>I also wanted to get a feel for what <em>words</em> were used more frequently in either the top or bottom deciles. To do this I computed the frequency distribution of words for each dataset (i.e. how many times each unique word was used across all the tweets in that decile). Then for each word I computed a ratio indicating how frequent it was in one decile versus the other. If this ratio is above 1 then it indicates that that word is more likely to occur in one decile than the other. I&#8217;ve embedded the data at the end of this post in case you want to see the top 50 words ranked by their ratio for both the top and bottom deciles.</p>
<p>From scanning the word lists you can see that pronouns (e.g. &#8220;I, you, my, her, his, he&#8221; etc.) are used more frequently in tweets from the bottom decile of RTs. Tweets that were in the top decile of RTs were more likely to use words relating to <strong>crime</strong> (e.g. &#8220;police&#8221;, &#8220;dead&#8221;, &#8220;arrest&#8221;), natural <strong>hazards</strong> (&#8220;irene&#8221;, &#8220;hurricane&#8221;, &#8220;earthquake&#8221;), <strong>sports</strong> (&#8220;soccer&#8221;, &#8220;sox&#8221;), or politically contentious <strong>issues</strong> (e.g. &#8220;marriage&#8221; likely referring to the legalization of gay marriage in NY). I thought it was particularly interesting that &#8220;China&#8221; was much more frequent in highly RTed tweets. To be clear, this is just scratching the surface and I think there&#8217;s a lot more interesting research to do around this, especially relating to theories of attention and newsworthiness.</p>
<p>The last bit of data analysis I did was to look at whether certain parts of speech (e.g. nouns, verbs, adjectives) were used differently in the top and bottom RT deciles. More specifically I wanted to know: Are different parts of speech used more frequently in one group than the other? To do this, I used a natural language processing toolkit (<a href="http://www.nltk.org/">NLTK</a>) and computed the parts of speech (POS) of all of the words in the tweets. Of course this isn&#8217;t a perfect procedure and sometimes the POS tagger makes mistakes, but I consider this analysis preliminary. I calculated the Chi-Square test to see if there was a statistical difference in the frequency of nouns, adverbs, conjunctions (e.g. &#8220;and&#8221;, &#8220;but&#8221;, etc.), determiners (e.g. &#8220;a&#8221;, &#8220;some&#8221;, &#8220;the&#8221;, etc.), pronouns, and verbs used in either the top or bottom 10% of RTs. What I found is that there is a strong statistically significant difference for <strong>adverbs</strong> (p &lt; .02), <strong>determiners</strong> (p &lt; .001), and <strong>verbs</strong> (p &lt; .003), and somewhat of a difference for <strong>conjunctions</strong> (p = .06). There was no difference in usage for adjectives, nouns, or pronouns. Basically what this boils down to is that, in tweets that get lots of RTs, adverbs, determiners (and conjunctions somewhat) are used substantially less, while verbs are used substantially more. Perhaps it&#8217;s the less frequent use of determiners and adverbs that (as described above) makes these tweets shorter on average. Again, this isn&#8217;t prescriptive, but there may be something here in terms of how headlines are written. <strong>More use of verbs, and less use of &#8220;empty&#8221; determiners and conjunctions in tweets is correlated with higher levels of retweeting</strong>. Could it be the case that action words (i.e. verbs) somehow spur people to retweet the headline? Pinning down the causality of this is something I&#8217;ll be working on next!</p>
<p>Here are the lists of words I promised. If you find anything else notable, please leave a comment!<br />
<iframe width='620' height='400' frameborder='0' src='https://docs.google.com/spreadsheet/pub?hl=en_US&#038;hl=en_US&#038;key=0AsKbLffq9fg5dDYtdEFpT0piUzRWbmlHeVp2QWNoVFE&#038;output=html&#038;widget=true'></iframe></p>
]]></content:encoded>
			<wfw:commentRss>http://www.nickdiakopoulos.com/2012/01/17/news-headlines-and-retweets/feed/</wfw:commentRss>
		<slash:comments>7</slash:comments>
		</item>
		<item>
		<title>Journalism as Information Science</title>
		<link>http://www.nickdiakopoulos.com/2012/01/02/journalism-as-information-science/</link>
		<comments>http://www.nickdiakopoulos.com/2012/01/02/journalism-as-information-science/#comments</comments>
		<pubDate>Mon, 02 Jan 2012 15:49:36 +0000</pubDate>
		<dc:creator>Nick</dc:creator>
				<category><![CDATA[computational journalism]]></category>
		<category><![CDATA[information]]></category>
		<category><![CDATA[information science]]></category>

		<guid isPermaLink="false">http://www.nickdiakopoulos.com/?p=559</guid>
		<description><![CDATA[The core activity of journalism basically boils down to this: knowledge production. It’s presented in various guises: stories, maps, graphics, interviews, and more recently even things like newsgames, but it all essentially entails the same basic components of information gathering, organizing, synthesizing, and publishing of new (sometimes just new to you) knowledge. To be sure, <a href="http://www.nickdiakopoulos.com/2012/01/02/journalism-as-information-science/"> read more <span class="meta-nav">&#187;</span></a>]]></description>
			<content:encoded><![CDATA[<p>The core activity of journalism basically boils down to this: <strong>knowledge production</strong>. It’s presented in various guises: stories, maps, graphics, interviews, and more recently even things like newsgames, but it all essentially entails the same basic components of information gathering, organizing, synthesizing, and publishing of new (sometimes just new to you) knowledge. To be sure, the particular <em>flavor</em> of knowledge is colored by the <a href="http://www.nickdiakopoulos.com/2011/12/12/designing-tools-for-journalism/">cultural milieu</a>, ethics, and temporal constraints through which journalism extrudes information into knowledge. Journalists add value to information and news by making sense of it, making it more accessible and memorable, and putting it in context.</p>
<p>Many of the practices followed by journalists in the process of knowledge production can be mapped quite neatly to corresponding ideas in information science. Thankfully, information science studies knowledge production in a much more structured fashion, and in the rest of this post I’d like to surface some of that structure as a way for reflecting on what journalists do, and for thinking about how technology could enhance such processes.</p>
<p>Much of what journalists are engaged with on a day-to-day basis is in <strong>adding value to information</strong>. Raw data and information is harvested from the world, and as the journalist gathers it and makes sense of it, puts it in context, increases its quality, and frames it for decision making, it gets more and more valuable to the end-user. And by “value” I don’t necessarily mean monetary, but rather <strong>usefulness </strong>in meeting a <a href="http://www.nickdiakopoulos.com/2011/12/02/what-a-news-consumer-wants-modeling-users/">user need</a><strong>. </strong>This point is important because it implies that the value of information is <em>perceived</em> and driven by <strong>user-needs in context</strong>. And the process is cyclical or recursive. The output of someone else, be it an article, tweet, or comment can be fed into the process for the next output.</p>
<p>Robert S. Taylor, one of the fathers of information studies at Syracuse University, wrote <a href="http://www.amazon.com/Value-Added-Processes-Information-Systems-Communication/dp/0893912735/ref=sr_1_1?ie=UTF8&amp;qid=1324051852&amp;sr=8-1">an entire book</a> on value-added processes in information systems. Below I examine the processes that he described. There may be some information processes that journalists could learn to  do more effectively, with or without new tools. Taylor organized the processes into four broad categories:</p>
<ul>
<li><strong>Ease of Use:</strong> This includes information usability such as information architecture (i.e. how to order information), design (i.e. how to format and present information), and browseability. When journalists take a table of numbers and present them as a map or graph they are making that data far more accessible and usable; when they write a compelling story which incorporates those numbers it is also increasing value through usability. <em>Physical accessibility</em> is also important to ease of use, and there’s no doubt that the physical accessibility of information on a mobile or tablet is <a href="http://www.niemanlab.org/2011/11/more-evidence-that-different-devices-fuel-news-consumption-at-different-times/">different than on a desktop</a>.</li>
<li><strong>Noise Reduction</strong>: This involves the processes of inclusion and exclusion with an understanding of relevance that may be informed by context or end-user needs. Journalists are constantly engaging as noise reducers as they assemble a story and decide what is relevant to include and what is not, and even by their very judgement of what is considered newsworthy. <em>Summarization</em> is another dimension of this, as is <em>linking</em> which provides access to other relevant information.</li>
<li><strong>Quality</strong>: A lot of value is added to information by enhancing its quality. Quality decisions depend on quality information: garbage in, garbage out. Quality includes aspects of <em>accuracy</em>, <em>comprehensiveness</em> (i.e. completeness of coverage), <em>currency</em>, <em>reliability</em> (i.e. consistent and dependable), and <em>validity</em>. Journalists engage (sometimes) in factchecking to enhance accuracy, as well as corroboration of sources as a method to increase validity. Different end-user contexts and needs have different demands on quality: non-breaking news doesn’t have the same demands on currency for instance. Seeing as quality (i.e. a commitment to truth) is <a href="http://www.nickdiakopoulos.com/2011/12/12/designing-tools-for-journalism/">a central value of journalism</a>, it stands to reason that tools built for journalism might consider new ways of enhancing quality.</li>
<li><strong>Adaptability</strong>: The idea of adaptability is that information is most valuable when it meets specific needs of a person with a particular problem. This involves knowing what users’ information needs are. Another dimension is that of <em>flexibility</em>, providing a variety of ways to work with information. Oftentimes I think adaptability is addressed in journalism through nichification &#8211; that is one outlet specializes in a particular information need, like for example, Consumer Reports.</li>
</ul>
<p>You can’t really argue that any of these processes <em>aren’t</em> important to the knowledge produced by journalists, and many (all?) of them are also important to others who produce knowledge. There are people out there specialized in some of these activities. For instance, my alma mater, Georgia Tech, pumps out <a href="http://mshci.gatech.edu/">masters degrees in Human Computer Interaction</a>, which teaches you a whole lot about that first category above &#8211; ease of use. Journalism could benefit from more cross-functional teams with such specialists.</p>
<p>The question moving forward is: How can technology inform the design of new tools that enable journalists to add the above values to information? Quality seems like a likely target since it is so important in journalism. But aspects of noise reduction (summarization), and adaptability may also be well-suited to developing augmenting technologies. Moreover, newer forms of information (e.g. social media) are in need for new processes that can add value.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.nickdiakopoulos.com/2012/01/02/journalism-as-information-science/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Tweaking Your Credibility on Twitter</title>
		<link>http://www.nickdiakopoulos.com/2011/12/29/tweaking-your-credibility-on-twitter/</link>
		<comments>http://www.nickdiakopoulos.com/2011/12/29/tweaking-your-credibility-on-twitter/#comments</comments>
		<pubDate>Thu, 29 Dec 2011 16:57:33 +0000</pubDate>
		<dc:creator>Nick</dc:creator>
				<category><![CDATA[credibility]]></category>
		<category><![CDATA[journalism]]></category>
		<category><![CDATA[social media]]></category>
		<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://www.nickdiakopoulos.com/?p=552</guid>
		<description><![CDATA[You want to be credible on social media, right? Well, a paper to be published at the Conference on Computer Supported Cooperative Work (CSCW) in early 2012 from researchers at Microsoft and Carnegie Mellon suggests at least a few actionable methods to help you do so. The basic motivation for the research is that when <a href="http://www.nickdiakopoulos.com/2011/12/29/tweaking-your-credibility-on-twitter/"> read more <span class="meta-nav">&#187;</span></a>]]></description>
			<content:encoded><![CDATA[<p>You want to be credible on social media, right? Well, a <a href="http://research.microsoft.com/pubs/155374/tweet_credibility_cscw2012.pdf" target="_blank">paper</a> to be published at the Conference on Computer Supported Cooperative Work (CSCW) in early 2012 from researchers at Microsoft and Carnegie Mellon suggests at least a few actionable methods to help you do so. The basic motivation for the research is that when people see your tweet via a search (rather than following you) they have less cues to assess credibility. With a better understanding of what factors influence tweet credibility, new search interfaces can be designed to highlight the most relevant credibility cues (now you see why Microsoft is interested).</p>
<p>First off, five people were interviewed by the researchers to collect a range of issues that might be relevant to credibility perception. They came up with a list of 26 possible credibility cues and then ran a survey with 256 respondents in which they asked how much each feature impacted credibility perception. You can see <a href="http://research.microsoft.com/pubs/155374/tweet_credibility_cscw2012.pdf" target="_blank">the paper for the full results</a>, but, for instance, things like keeping your tweets on a similar topic, using a personal photo, having a username related to the topic, having a location near a topic, having a bio that suggests relavent topical expertise, and frequent tweeting were all perceived by participants to positively impact credibility to some extent. Things like using non-standard grammar and punctuation, using the default user image were seen to detract from credibility.</p>
<p>Based on their first survey, the researchers then focused on three specific credibility cues for a follow-on study: (1) topic of tweets (politics, science, or entertainment), (2) user name style (first_last, internet &#8211; &#8220;tenacious27&#8243;, and topical &#8211; &#8220;AllPolitics&#8221;), and finally (3) user image (male / female photo, topical icon, generic icon, and default). For the study, each participant (there were 266) saw some combination of the above cues for a tweet, and rated both tweet credibility and author credibility. Unsurprisingly tweets about the science topic were rated as more credible than those on politics or entertainment. The most surprising result to me was that topically relevant user names were more credible than traditional names (or internet style names, though that&#8217;s not surprising). In a final follow-up experiment the researchers found that the user image doesn&#8217;t impact credibility perceptions, except for when the image is the default image in which case it significantly (in the statistical sense) lowers perceptions of tweet credibility.</p>
<p>So here are the main actionable take-aways:</p>
<ul>
<li>Don&#8217;t use non standard grammar and punctuation (no &#8220;lol speak&#8221;)</li>
<li>Don&#8217;t use the default image.</li>
<li>Tweet about topics like science, which seem to carry an aura of credibility.</li>
<li>Find a user name that is topically aligned with those you want to reach.</li>
</ul>
<div>That last point of finding a topically aligned user name might be an excellent strategy for large news organizations to build a more credible presence across a range of topics. For instance, right now the <a href="http://www.nytimes.com/twitter">NY Times has a mix of accounts</a> that have topical user names, as well as reporters using their real names. In addition to each reporter having their own &#8220;real name&#8221; account, individual tweets of theirs that were topically relevant could be routed to the appropriate topically named account. So for instance, let&#8217;s say <a href="http://www.twitter.com/revkin">Andy Revkin</a> tweets something about the environment. That tweet should also show up via the <a href="http://www.twitter.com/nytenvironment">Environment</a> account, since the tweet may be perceived as having higher credibility from a topically-related user name. For people who search and find that tweet, of course if they know who Andy Revkin is, then they&#8217;ll find his tweet quite credible since he&#8217;s known for having that topical expertise. But for someone else who <em>doesn&#8217;t know</em> who Andy Revkin is, the results of the above study suggest that that person would find the same content more credible coming from the topically related Environment account. Maybe the Times or others are already doing this. But if not, it seems like there&#8217;s an opportunity to systematically increase credibility by adopting such an approach.</div>
]]></content:encoded>
			<wfw:commentRss>http://www.nickdiakopoulos.com/2011/12/29/tweaking-your-credibility-on-twitter/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Data, Information, Knowledge Visualization</title>
		<link>http://www.nickdiakopoulos.com/2011/12/16/data-information-knowledge-visualization/</link>
		<comments>http://www.nickdiakopoulos.com/2011/12/16/data-information-knowledge-visualization/#comments</comments>
		<pubDate>Fri, 16 Dec 2011 18:04:52 +0000</pubDate>
		<dc:creator>Nick</dc:creator>
				<category><![CDATA[diagram]]></category>
		<category><![CDATA[DIKW]]></category>

		<guid isPermaLink="false">http://www.nickdiakopoulos.com/?p=544</guid>
		<description><![CDATA[Recently I&#8217;ve been reading up on the Data, Information, Knowledge, (Wisdom) or DIKW typology as a way for thinking about how journalists produce journalism. I&#8217;m not going to touch &#8220;wisdom&#8221; with a 10 foot pole, as there&#8217;s a fair bit of wrangling even over the data, information, and knowledge facets. Oftentimes it&#8217;s discussed as a <a href="http://www.nickdiakopoulos.com/2011/12/16/data-information-knowledge-visualization/"> read more <span class="meta-nav">&#187;</span></a>]]></description>
			<content:encoded><![CDATA[<p>Recently I&#8217;ve been <a href="http://blogs.hbr.org/cs/2010/02/data_is_to_info_as_info_is_not.html">reading</a> <a href="http://www.systems-thinking.org/dikw/dikw.htm">up</a> <a href="http://researchrepository.murdoch.edu.au/1884/1/The_Noetic_Prism.pdf">on</a> the Data, Information, Knowledge, (Wisdom) or DIKW typology as a way for thinking about how journalists produce journalism. I&#8217;m not going to touch &#8220;wisdom&#8221; with a 10 foot pole, as there&#8217;s a fair bit of wrangling even over the data, information, and knowledge facets. Oftentimes it&#8217;s discussed as a &#8220;hierarchy&#8221; and depicted as a pyramid; David McCandless even produced such a <a href="http://www.informationisbeautiful.net/2010/data-information-knowledge-wisdom/">diagram</a>. But a pyramid is the wrong visual (what is the width of the pyramid even mapped to? is there less knowledge than data?).</p>
<p>Data are numerical entities or readily verifiable facts. Information is about adding relationships between elements of data. Knowledge emerges when humans <em>interpret, analyze</em>, and <em>judge</em> information, and can be used to inform or to help drive decision making (<a href="http://www.amazon.com/Value-Added-Processes-Information-Systems-Communication/dp/0893912735/ref=sr_1_1?ie=UTF8&amp;qid=1324051852&amp;sr=8-1">Taylor, 1986</a>). As Kovach and Rosenstiel persuasively <a href="http://www.amazon.com/Blur-Know-Whats-Information-Overload/dp/1608193012/ref=sr_1_1?s=books&amp;ie=UTF8&amp;qid=1324052149&amp;sr=1-1">argue</a>, news gathering organizations are but venues for the accumulation and synthesis of knowledge (usually about a particular community) that makes that knowledge available to the public; journalists are the agents of this knowledge generation.</p>
<p>Here&#8217;s my attempt at visualizing this concept:</p>
<p><a href="http://www.nickdiakopoulos.com/wp-content/uploads/2011/12/dik1.png"><img class="aligncenter size-full wp-image-548" title="dik" src="http://www.nickdiakopoulos.com/wp-content/uploads/2011/12/dik1.png" alt="" width="615" height="205" /></a></p>
<p>&nbsp;</p>
]]></content:encoded>
			<wfw:commentRss>http://www.nickdiakopoulos.com/2011/12/16/data-information-knowledge-visualization/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Designing Tools for Journalism</title>
		<link>http://www.nickdiakopoulos.com/2011/12/12/designing-tools-for-journalism/</link>
		<comments>http://www.nickdiakopoulos.com/2011/12/12/designing-tools-for-journalism/#comments</comments>
		<pubDate>Mon, 12 Dec 2011 16:23:05 +0000</pubDate>
		<dc:creator>Nick</dc:creator>
				<category><![CDATA[design]]></category>
		<category><![CDATA[information quality]]></category>
		<category><![CDATA[journalism]]></category>

		<guid isPermaLink="false">http://www.nickdiakopoulos.com/?p=528</guid>
		<description><![CDATA[Whether you’re designing for professionals or amateurs, for people seeking to reinvigorate institutions or to invent new ones, there are still core cultural values ensconced in journalism that can inspire and guide the design of new tools, technologies, and algorithms for committing acts of journalism. How can we preserve the best of such values in new technologies? <a href="http://www.nickdiakopoulos.com/2011/12/12/designing-tools-for-journalism/"> read more <span class="meta-nav">&#187;</span></a>]]></description>
			<content:encoded><![CDATA[<p>Whether you’re designing for professionals or amateurs, for people seeking to <a href="http://www.cjr.org/essay/confidence_game.php" target="_blank">reinvigorate institutions</a> or to <a href="http://www.shirky.com/weblog/2011/12/institutions-confidence-and-the-news-crisis" target="_blank">invent new ones</a>, there are still core cultural values ensconced in journalism that can inspire and guide the design of new tools, technologies, and <a href="http://giladlotan.com/blog/2011/12/the-algorithmic-newsroom/" target="_blank">algorithms</a> for committing acts of journalism. How can we preserve the best of such values in new technologies? One approach is known as <a href="http://vsdesign.org/publications/pdf/non-scan-vsd-and-information-systems.pdf" target="_blank"><em>value sensitive design</em></a> and attempts to account for human values in a comprehensive manner throughout the design process by identifying stakeholders, benefits, values, and value conflicts to help designers prioritize features and capabilities.</p>
<p>“Value” is defined as “what a person or group of people consider important in life”. Values could include things like privacy, property rights, autonomy, and accountability among other things. What does journalism value? If we can answer that question, then we should be able to design tools for professional journalists that are more easily adopted (“This tool makes it easy to do the things I find important and worthwhile!”), and we should be able to design tools that more easily facilitate acts of journalism by non-professionals (“This tool makes it easy to participate in a meaningful and valuable way with a larger news process!”). Value sensitive design espouses consideration of all stakeholders (both direct and indirect) when designing technology. I’ve covered some of those stakeholders in a previous post on <a href="http://www.nickdiakopoulos.com/2011/12/02/what-a-news-consumer-wants-modeling-users/" target="_blank">what news consumers want</a>, but another set of stakeholders would be those relating to the business model (e.g. advertisers). In any case, mismatches between the values and needs of different stakeholders will lead to conflicts that need to be resolved by identifying benefits and prioritizing features.</p>
<p>When we turn to normative descriptions of journalism, such as Kovach and Rosenstiel’s <a href="http://www.amazon.com/Elements-Journalism-Newspeople-Completely-Updated/dp/0307346706/" target="_blank">The Elements of Journalism</a> and <a href="http://www.amazon.com/Blur-Know-Whats-Information-Overload/dp/1608193012/" target="_blank">Blur</a>, Schudson’s <a href="http://www.amazon.com/Sociology-News-Second-Contemporary-Societies/dp/0393912876/" target="_blank">The Sociology of News</a>, or descriptions of ethics principles from the <a href="http://www.apme.com/?page=EthicsStatement" target="_blank">AP</a> or <a href="http://asne.org/kiosk/archive/principl.htm" target="_blank">ASNE</a>, we find both core values, as well as valued activities. It’s easiest to understand these as <em>ideals</em> which are not always met in practice. Some core values include:</p>
<ul>
<li><strong>Truth</strong>: including a commitment to accuracy, verification, transparency, and putting things in context</li>
<li><strong>Independence</strong>: from influence by those they cover, from politics, from corporations, or from others they seek to monitor</li>
<li><strong>Citizen-first</strong>: on the side of the citizen rather than for corporations or political factions</li>
<li><strong>Impartial</strong>: except when opinion has been clearly marked</li>
<li><strong>Relevance</strong>: to provide engaging and enlightening information</li>
</ul>
<p>Core values also inform valued activities or roles, such as:</p>
<ul>
<li><strong>Informer: </strong>giving people the information they need or want about contemporary affairs of public interest</li>
<li><strong>Watchdog</strong>: making sure powerful institutions or individuals are held to account (also called “accountability journalism”)</li>
<li><strong>Authenticator</strong>: assessing the truth-value of claims (“factchecking”); also relates to watchdogging</li>
<li><strong>Forum Organizer</strong>: orchestrating a public conversation, identifying and consolidating community</li>
<li><strong>Aggregator</strong>: collecting and curating information to make it accessible</li>
<li><strong>Sensemaker</strong>: connecting the dots and making relationships salient</li>
</ul>
<p>Many of these values and valued activities can be seen from an information science perspective as contributing to <strong>information quality</strong>, or the degree of excellence in communicating knowledge. I’ll revisit the parallels to information science in a future post.</p>
<p>Besides core values and valued activities, there are other, perhaps more abstract, processes which are essential to <em>producing</em> journalism, like <a href="http://www.nickdiakopoulos.com/2011/04/22/a-functional-roadmap-for-innovation-in-computational-journalism/" target="_blank">information gathering, organization and sensemaking, communication and presentation, and dissemination</a>. Because they’re more abstract these processes have a fair amount of variability as they are adapted for different milieu (e.g. information gathering on social media) or media (e.g. text, image, video, games). Often valued activities are already the composition of several of these underlying information processes that have been infused with core values. We should be on the lookout for “new” valued activities waiting for products to emerge around them, for instance, by considering more specific value-added information processes in conjunction with core values.</p>
<p>There’s a lot of potential for technology to re-invent and re-imagine valued activities and abstract information processes in light of core values: to make them more effective, efficient, satisfying, productive, and usable. Knowing the core values also helps designers understand what would <em>not</em> be acceptable to design for professionals (e.g. a platform to facilitate the acquisition of paid sources would probably not be adopted in the U.S.). I would argue that it’s the <em>function</em> that is served by the above valued activities, and not the institutionalized practices that are currently used to accomplish them, that is fundamentally important to consider for designers. While we should by all means consider designs that adhere to core values and to an understanding of the outputs of valued activities, we should also be open to allowing technology to enhance the processes and methods which get us there. Depending on whether you’re innovating in an institutional setting or in an unencumbered non-institutional environment you have different constraints, but, irregardless I maintain that value sensitive design is a good way forward to ensure that future tools for journalism will be more trustworthy, have more impact, and resonate more with the public.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.nickdiakopoulos.com/2011/12/12/designing-tools-for-journalism/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>People Scopes, Platforms, and Research</title>
		<link>http://www.nickdiakopoulos.com/2011/12/05/people-scopes-platforms-and-research/</link>
		<comments>http://www.nickdiakopoulos.com/2011/12/05/people-scopes-platforms-and-research/#comments</comments>
		<pubDate>Mon, 05 Dec 2011 19:17:43 +0000</pubDate>
		<dc:creator>Nick</dc:creator>
				<category><![CDATA[platforms]]></category>
		<category><![CDATA[research]]></category>
		<category><![CDATA[people-scopes]]></category>

		<guid isPermaLink="false">http://www.nickdiakopoulos.com/?p=521</guid>
		<description><![CDATA[I recently finished reading Pasteur’s Quadrant. The gist of the book is that the author, Donald Stokes, argues that the traditional (back to the Greeks) distinction between basic research and applied research is misguided.  Louis Pasteur didn’t make that distinction. He in fact was very much driven to solve real-world problems whilst also pursuing basic <a href="http://www.nickdiakopoulos.com/2011/12/05/people-scopes-platforms-and-research/"> read more <span class="meta-nav">&#187;</span></a>]]></description>
			<content:encoded><![CDATA[<p>I recently finished reading <a href="http://www.amazon.com/Pasteurs-Quadrant-Science-Technological-Innovation/dp/0815781776/ref=sr_1_1?ie=UTF8&amp;qid=1323029273&amp;sr=8-1">Pasteur’s Quadrant</a>. The gist of the book is that the author, Donald Stokes, argues that the traditional (back to the Greeks) distinction between <em>basic research</em> and <em>applied research</em> is misguided.  Louis Pasteur didn’t make that distinction. He in fact was very much driven to solve real-world problems whilst also pursuing basic scientific understanding of the phenomena that he observed. The book does a great job of explaining the historical antecedents of the basic-applied distinction in the modern research-industrial complex, and I would highly recommend it to other researchers.</p>
<p>Stokes defines basic research as “experimental or theoretical work undertaken primarily to acquire new knowledge of the underlying foundation of phenomena and observable facts” whereas applied research is concerned with “the elaboration and application of the known &#8230; to convert the possible into the actual, to demonstrate the feasibility of scientific or engineering development, to explore alternative routes and methods for achieving practical ends.” But he argues that this one dimensional dichotomy is too simple and that it should be expanded to a two dimensional typology with <em>consideration of use</em> on one axis, and <em>fundamental understanding</em> on the other. The quadrant of this typology that is concerned with fundamental understanding AND considerations of use is termed Pasteur’s quadrant, or alternately “use-inspired basic research.”</p>
<p>I reproduced a diagram from the book that illustrates the typology:</p>
<p><a href="http://www.nickdiakopoulos.com/wp-content/uploads/2011/12/Screen-Shot-2011-12-05-at-1.17.01-PM.png"><img class="aligncenter size-full wp-image-522" title="Screen Shot 2011-12-05 at 1.17.01 PM" src="http://www.nickdiakopoulos.com/wp-content/uploads/2011/12/Screen-Shot-2011-12-05-at-1.17.01-PM.png" alt="" width="523" height="393" /></a></p>
<p>Use-inspired basic research can advance both fundamental knowledge as well as technology. Which is a good thing because new (or better) technology enables new scientific questions to be asked. And the answers to those scientific questions can often lead to better technology designs. The scanning electron microscope (SEM) is a good example.</p>
<p>I think it’s likely that many interdisciplinary fields thrive at this intersection of applied and basic research and Human Computer Interaction (HCI) is no exception. A lot of HCI research seeks to harness fundamental knowledge for the design of interactive systems but at the same time use new technologies and interfaces to ask fundamental questions about people and interfaces (though these two phases do not always occur simultaneously, nor must they). Basic findings can trickle back to core disciplines (e.g. psychology, sociology), and other findings from other core disciplines can inform the designs and the engineering that goes into building the next generation of interactive systems.</p>
<p>Take a simple new technology that has had a huge impact on computational social science research: Twitter. Twitter is <em>the</em> computational social science “scope” that lets researchers ask all kinds of interesting questions about social psychology. Refining such knowledge could lead to a newer social scope (Twitter 2.0?) that is even better. Another example is Digg.com, which a few years ago was a technology that helped advance our <a href="http://www.hpl.hp.com/research/idl/papers/novelty/novelty.pdf">understanding of information novelty and decay</a>.</p>
<p>Real people-scopes working in naturalistic settings are essential for basic research as well as for driving technology forward. Academia (not just in HCI, but in other interdisciplinary social sciences) needs to get more strategic about building people-scopes, basically <strong><em>platforms</em></strong> that enable new human-centered questions to be asked, at scale. Unfortunately academia is not traditionally good at platforms. Right now I can only think of a few academic projects that have done this successfully: <a href="http://www.movielens.org/login">Movie Lens</a> at University of Minnesota, <a href="http://scratch.mit.edu/">Scratch</a> at MIT, and maybe IBM has also had some semi-successful ones.</p>
<p>There are likely a number of reasons why academia is not that good at platforms: (1) grad students may not be around long enough to grow and maintain the system, (2) the risk of failure is immense and too high for a pre-tenure faculty to bear, (3) there are not enough sustained resources to maintain the systems, and (4) there are little to no marketing resources to support the acquisition of users. So there are incentive as well as resource issues here.</p>
<p>It may be that start-ups are simply a better source of new social platforms, since the market can quickly winnow out the unsuccessful ones, and the risk is externalized. But I think it may also warrant thinking about how funding agencies like the NSF might better support (e.g. through sustained resources and incentives) the construction of the next generation of people-scopes.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.nickdiakopoulos.com/2011/12/05/people-scopes-platforms-and-research/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>What a News Consumer Wants</title>
		<link>http://www.nickdiakopoulos.com/2011/12/02/what-a-news-consumer-wants-modeling-users/</link>
		<comments>http://www.nickdiakopoulos.com/2011/12/02/what-a-news-consumer-wants-modeling-users/#comments</comments>
		<pubDate>Fri, 02 Dec 2011 20:22:36 +0000</pubDate>
		<dc:creator>Nick</dc:creator>
				<category><![CDATA[computational journalism]]></category>
		<category><![CDATA[consumption]]></category>
		<category><![CDATA[user modeling]]></category>

		<guid isPermaLink="false">http://www.nickdiakopoulos.com/?p=499</guid>
		<description><![CDATA[What exactly is it that drives people to consume news information? If we can answer that, I would argue, then we open a new space of possibility for creating new media products, and for optimizing existing ones. As Google’s first commandment states: “Focus on the user and all else will follow.” I adopt this point <a href="http://www.nickdiakopoulos.com/2011/12/02/what-a-news-consumer-wants-modeling-users/"> read more <span class="meta-nav">&#187;</span></a>]]></description>
			<content:encoded><![CDATA[<p><span>What exactly is it that drives people to consume news information? If we can answer that, I would argue, then we open a new space of possibility for creating new media products, and for optimizing existing ones. As Google’s first commandment </span><span><a  href="http://www.google.com/about/corporate/company/tenthings.html" target="_blank">states</a></span><span>: “Focus on the user and all else will follow.” I adopt this point of view here and will consider other perspectives (e.g. business or content producers) in future posts. In this post I really want to get at the </span><span style="font-style: italic;">underlying</span><span> needs, motivations, or habits that drive news consumption.</span></p>
<p><span>First, I think it’s important to draw a distinction between the “How” of news consumption and the “Why” of news consumption. </span><span style="font-style: italic;">How</span><span> news is consumed is largely attributable to the medium and technology of presentation (e.g. paper, radio, TV, internet). The context and form-factor of the technology also matters: the way that people consume news across different devices has been shown to </span><span><a href="http://www.niemanlab.org/2011/11/more-evidence-that-different-devices-fuel-news-consumption-at-different-times" target="_blank">vary over the course of the day</a></span><span>, and </span><span ><a href="http://www.journalism.org/node/27060" target="_blank">consumption of news on tablets</a></span><span> exhibits different patterns than consumption on other devices</span><span>. Certainly online social networks such as Twitter and Facebook have changed how people are exposed to and consume news. These are all technologies that facilitate news consumption, and bias it in their own ways as their unique affordances differentially enable, place constraints on, and influence behavior. The </span><span style="font-style: italic;">why</span><span> of news consumption is more fundamental though, since understanding the underlying needs and motivations for consuming news can drive new mechanisms for the </span><span style="font-style: italic;">how</span><span> of consumption. Going back to a user-centered design philosophy, ideally, the how amplifies the why, and the why informs the how. </span></p>
<p><span>Of course, </span><span style="font-style: italic;">why</span><span> people consume news or media is not invariant across people or contexts. So there’s not bound to be a single user model that describes all people at once. For starters, demographic factors such as age and gender have been linked to different patterns of consumption (e.g. younger people tend to consume news more for the sake of </span><span><a href="http://www.tandfonline.com/doi/abs/10.1207/s15506878jobem5002_2" target="_blank">escapism or passing time</a></span><span>, women </span><span><a  href="http://www.amazon.com/All-News-Thats-Fit-Sell/dp/0691123675" target="_blank">tend to be less interested in news on science and technology</a></span><span>). This necessitates thinking about information niches and that needs and motives may vary over time and context. For instance, social context (e.g. co-viewing) </span><span style=""><a  href="http://www.tandfonline.com/doi/abs/10.1080/08838151.2011.597466?journalCode=hbem20#preview" target="_blank">can influence people</a></span><span> to watch television news for longer. Individual differences also exist between people: personality traits such as extraversion and openness <a href="http://apr.sagepub.com/content/39/1/32.short" target="_blank">have been linked</a> to both interest in politics and public affairs, as well as exposure to such related news. Considering all of the moderating factors that influence why someone might consume news (i.e. demographics, context, personality, &#8230;) how could products be designed to appeal to any of these niches? What does a news product for introverts look like? How should it work differently? </span></p>
<p><span style="background-color: #ffffff;">Since the 1940’s communications and journalism scholars have been developing a theoretical framework that came to be known as Uses and Gratifications (U&amp;G), which attempts to explain why people seek out and consume media. </span><span style="background-color: #ffffff; font-weight: bold;">What are the gratifications that people receive from various kinds of media or types of content which help to satisfy their underlying social and psychological needs?</span><span style="background-color: #ffffff;"> Some of the earliest studies looked at why people consumed radio news, and some of the most recent look at internet technologies (e.g. I have </span><span><a  href="http://www.nickdiakopoulos.com/wp-content/uploads/2007/05/pr220-diakopoulos.pdf" target="_blank">looked at news commenting</a></span><span style="background-color: #ffffff;"> through this lens). U&amp;G theory attempts to explain how/why people select their media, as well as how concentrated the attention is that they allocate (e.g. casually attending to a report for entertainment or to pass time is different than goal-oriented information seeking). Some limitations of the theory are (1) that it assumes an </span><span style="font-style: italic; background-color: #ffffff;">active</span><span style="background-color: #ffffff;"> user that is making selection decisions (though sometimes these calcify into habits), and (2) that the typologies of needs and motivations are built on self-reported information, instead of observational data. This second limitation is perhaps quite important, as </span><span style=""><a href="http://www.mendeley.com/research/americans-really-want-know-tracking-behavior-news-readers-internet-1/" target="_blank">research has shown</a></span><span style="background-color: #ffffff;"> that people over-report their interest in international news by a factor of 3 as compared to their actual news browsing behavior. So, just a quick caveat that, ideally, user needs and motivations should be triangulated and validated based on </span><span style="font-style: italic; background-color: #ffffff;">observations</span><span style="background-color: #ffffff;"> of behavior in addition to self-reports. </span></p>
<p><span>U&amp;G proffers a typology of gratifications which help explain </span><span style="font-style: italic;">why</span><span> people consume news. Those listed below are taken from </span><span style=""><a  href="http://www.amazon.com/Communication-Theories-Perspectives-Processes-Contexts/dp/0072937947/ref=sr_1_1?ie=UTF8&amp;qid=1322853715&amp;sr=8-1" target="_blank">Miller</a></span><span> and </span><span style=""><a  href="http://www4.ncsu.edu/~amgutsch/Ruggiero.pdf" target="_blank">Ruggiero</a></span><span> and include: </span></p>
<ul>
<li><strong>Informational/Surveillance</strong>: finding out about relevant events and conditions in immediate surrounding, society, and the world; seeking advice on practical matters, or opinion and decision choices; satisfying curiosity and general interest; learning, self-education</li>
<li><strong>Personal Identity</strong>: finding reinforcement for personal values; finding models of behavior; identifying with media actors; gaining insight into one’s self</li>
<li><strong>Integration and Social Interaction</strong>: insight into circumstances of others including social empathy; identifying with others and gaining a sense of belonging; finding a basis for conversation and social interaction; enabling connection with family, friends, society</li>
<li><strong>Entertainment/Diversion</strong>: escaping, relaxing, cultural or aesthetic enjoyment, filling time, emotional release, sexual arousal</li>
</ul>
<p><span>You might ask yourself which news products address any of these motives better or worse? For instance, getting news on Facebook makes integration and social interaction motives very salient and easy for the user; watching Jon Stewart ties together entertainment and news effectively. </span></p>
<p><span>But still there is the underlying question of </span><span style="font-weight: bold;">what are the driving psychological needs</span><span> that lead to these categories of gratifications being sought through the media. For this we can turn to a theory of motivation developed over the last 40 years called </span><span><a href="http://www.selfdeterminationtheory.org/theory" target="_blank">Social Determination Theory</a></span><span>; here’s a </span><span><a href="http://www.google.com/url?q=http%3A%2F%2Fwww.amazon.com%2FWhy-We-What-Understanding-Self-Motivation%2Fdp%2F0140255265%2F&amp;sa=D&amp;sntz=1&amp;usg=AFQjCNHtzkRN4yEp_FJmR4ipclQyy9eFrQ" target="_blank">nice book</a></span><span> on the subject. </span><span> The theory postulates that there are three main drivers of intrinsic motivation: (1) autonomy, (2) competence, and (3) social-relatedness. Autonomy is about providing people with choices &#8211; the more choices people have th</span><span>e m</span><span>ore in control they feel. Competence is about helping people to see the relationship between their behavior and some desired outcome; feeling competent is about taking on a challenge and meeting it. How could news products better help people feel autonomous or competent? Those products would be hits. The last driver is social-relatedness which is about people feeling connected to other people; social networks are already doing a pretty good job of satisfying that underlying psychological need. </span></p>
<p><span>Beyond psychological needs though, there may even be a biological driver for news consumption. In 1996 Pamela Shoemaker argued in a </span><span><a href="http://onlinelibrary.wiley.com/doi/10.1111/j.1460-2466.1996.tb01487.x/abstract?systemMessage=Wiley+Online+Library+will+be+disrupted+3+Dec+from+10-12+GMT+for+monthly+maintenance" target="_blank">Journal of Communication paper</a></span><span> that the human desire to surveil is evolutionarily adapted to help detect deviances or threats in the environment; humans that could surveil better were more likely to survive because they could avoid threats and thus reproduce. However, this hypothesis still needs to be tested empirically to see if people attend more to news that is more deviant (though it does seem plausible). What </span><span style="font-style: italic;">has</span><span> been tested empirically, via a </span><span><a href="http://arxiv.org/pdf/0802.0483" target="_blank">big-data analysis </a></span><span>by information scientists Fang Wu and Bernardo Huberman, is how human attention orients to novel information and that that attention naturally decays over time according to a mathematical function. Indeed, for the </span><span><a href="http://digg.com" target="_blank">digg.com</a></span><span> site they found that the half-life for an item was, on average, 69 minutes, which suggests a natural time-scale (though site dependent) at which human attention fades. </span></p>
<p><span>There is a wide palette of options for thinking about new ways of engaging people in news information: context, demographics, personality, uses &amp; gratifications, psychological needs, and biological drivers for novel information. There are likely many new (or existing) news products that can leverage this typology to personalize and make sure people are getting what they came for out of their media experience. And, to make the job even easier, </span><span><a  href="http://onlinelibrary.wiley.com/doi/10.1002/meet.14504701237/full" target="_blank">research has also shown</a></span><span> that people enjoy </span><span style="font-style: italic;">incidental</span><span> exposure to news information. So, even if you initially motivate people to engage the media in one way (e.g. social relatedness), they will likely still enjoy incidental exposure to other news information. </span></p>
]]></content:encoded>
			<wfw:commentRss>http://www.nickdiakopoulos.com/2011/12/02/what-a-news-consumer-wants-modeling-users/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Authoring Data-Driven Documents</title>
		<link>http://www.nickdiakopoulos.com/2011/11/23/authoring-data-driven-documents/</link>
		<comments>http://www.nickdiakopoulos.com/2011/11/23/authoring-data-driven-documents/#comments</comments>
		<pubDate>Wed, 23 Nov 2011 17:25:11 +0000</pubDate>
		<dc:creator>Nick</dc:creator>
				<category><![CDATA[data]]></category>
		<category><![CDATA[visualization]]></category>

		<guid isPermaLink="false">http://www.nickdiakopoulos.com/?p=471</guid>
		<description><![CDATA[Over the last few months I&#8217;ve been learning D3 (Data-Driven Documents), which is a really powerful data visualization library built for javascript. The InfoVis paper gets to the gritty details of how it supports data transformations, immediate evaluation of attributes, and a native SVG representation. These features can be more or less helpful depending on what <a href="http://www.nickdiakopoulos.com/2011/11/23/authoring-data-driven-documents/"> read more <span class="meta-nav">&#187;</span></a>]]></description>
			<content:encoded><![CDATA[<p>Over the last few months I&#8217;ve been learning <a href="http://mbostock.github.com/d3/">D3</a> (Data-Driven Documents), which is a really powerful data visualization library built for javascript. The <a href="http://vis.stanford.edu/files/2011-D3-InfoVis.pdf">InfoVis paper</a> gets to the gritty details of how it supports data transformations, immediate evaluation of attributes, and a native SVG representation. These features can be more or less helpful depending on what kind of visualization you&#8217;re working on. For instance, transformations don&#8217;t really matter if you&#8217;re just building static graphs. But being able to inspect the SVG representation of your visualization (and edit it in the console) is really quite helpful and powerful.</p>
<p>But for all the power that D3 affords, is programming really how we should be (want to be?) authoring visualizations?</p>
<p>Here&#8217;s something that I recently made with D3. It&#8217;s a story about U.S. manufacturing productivity, employment, and automation told across a series of panels programmed using D3.</p>
<p><iframe src="http://nad.webfactional.com/eiu/te/index.html" width="610px" height="605" style="padding: 0px;border: 0px solid #eee "></iframe></p>
<p>Now, of course, the exploratory data analysis, storyboarding, and research needed to tell this story were time-consuming. But after all that, using D3 to render the graphs I wanted was substantially more tedious and time-consuming than I would have liked. I think this was because (1) my knowledge of SVG is not fantastic and I&#8217;m still learning that, but more importantly (2) D3 supports very low-level operations that make high level activities for basic data storytelling time-consuming to implement. And yes, D3 does provide a number of helper modules and layouts, but these aren&#8217;t documented with clear examples using <em>concrete</em> data that would make it obvious how to easily utilize them. Having support for the library on <a href="http://jsfiddle.net/">jsFiddle</a>, together with some very <em>simple</em> examples would go a long way towards helping noobs (like me!) ramp up.</p>
<p>But, really, where&#8217;s the flash-like authoring tool of data visualization? Such a tool could be used to <em>interactively</em> manipulate a D3 visualization and, when you&#8217;re done, output HTML + CSS + D3 code to generate your graphs (including animation, transitions, etc.). The tool would also include basic graph templates that could be populated with your data and customized. Basic storytelling functions for highlighting important aspects or comparisons of the data (e.g. through animation, color, juxtaposition, etc.), or using text to annotate and explain the data could also be supported. D3 suffers from a bit of a usability problem right now, and powerful as it is, authoring stories with visualization doesn&#8217;t need to be, nor should it be, bound up in programming.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.nickdiakopoulos.com/2011/11/23/authoring-data-driven-documents/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Modeling Computing and Journalism (Part I)</title>
		<link>http://www.nickdiakopoulos.com/2011/11/16/modeling-computing-and-journalism-part-i/</link>
		<comments>http://www.nickdiakopoulos.com/2011/11/16/modeling-computing-and-journalism-part-i/#comments</comments>
		<pubDate>Wed, 16 Nov 2011 21:00:48 +0000</pubDate>
		<dc:creator>Nick</dc:creator>
				<category><![CDATA[computational journalism]]></category>
		<category><![CDATA[computing]]></category>
		<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://www.nickdiakopoulos.com/?p=442</guid>
		<description><![CDATA[Recently I’ve been thinking more about modeling the intersection of computing and journalism, and in particular thinking about ways that aspects of computing might impact or allow for innovation in journalism. It struck me that I needed a more precise definition of computing and its purview (I’ll come back to the journalism side of the <a href="http://www.nickdiakopoulos.com/2011/11/16/modeling-computing-and-journalism-part-i/"> read more <span class="meta-nav">&#187;</span></a>]]></description>
			<content:encoded><![CDATA[<p>Recently I’ve been thinking more about modeling the intersection of computing and journalism, and in particular thinking about ways that aspects of computing might impact or allow for innovation in journalism. It struck me that I needed a more precise definition of computing and its purview (I’ll come back to the journalism side of the equation in a later post). What, exactly, is computing? I’ll try to answer that in this post&#8230;</p>
<p>Definitions of computing and computer science <a href="http://www.cs.mtu.edu/~john/whatiscs.html">abound</a> <a href="http://www.cs.bu.edu/AboutCS/WhatIsCS.pdf">online</a>, but the most canonical comes perhaps from <a href="http://en.wikipedia.org/wiki/Peter_J._Denning">Peter Denning</a>, an elder in the field of Computer Science. In a CACM <a href="http://www.cs.gmu.edu/cne/pjd/PUBS/CACMcols/cacmApr05.pdf">article</a> from 2005 he writes, “<strong>Computing is the systematic study of algorithmic processes that describe and transform information</strong>”. Two key words there: “algorithmic” and “information”. Computing is about information, about describing and transforming it, but also about acquiring, representing, structuring, storing, accessing, managing, processing, manipulating, communicating, and presenting it. And computing is about algorithms: their theory, feasibility, analysis, structure, expression, and implementation. The fundamental question of computing concerns what information processes can be effectively automated.</p>
<p>In modern CS there is a huge body of knowledge that stems from this core notion of computing. For instance, the <a href="http://www.acm.org/education/curricula/ComputerScience2008.pdf">Computer Science Curriculum</a> defined in 2008 defines 14 different areas of knowledge (see list below). The Georgia Tech <a href="http://www.cc.gatech.edu/">College of Computing</a> delineates some of these areas as belonging to <em>core computer science</em>, and others belonging to <em>interactive computing</em>. Roughly, core computer science deals with the conceptual (i.e. mathematical), and operational (i.e nuts and bolts of how a modern computer works) aspects of computing. Interactive computing on the other hand mostly deals with information input, modeling, and output. There are aspects of professional practice, engineering, and design that apply in both.</p>
<p><strong>Core Computer Science</strong></p>
<ul style="margin-top: -15px">
<li>Discrete Structures, Programming Fundamentals, Software Engineering, Algorithms and Complexity, Architecture and Organization, Operating Systems, Programming Languages, Net Centric Computing, Information Management, Computational Science</li>
</ul>
<p><strong>Interactive Computing</strong></p>
<ul style="margin-top: -15px">
<li>Human Computer Interaction, Graphics and Visual Computing, Intelligent Systems</li>
</ul>
<p>In terms of modeling the intersection of computing and journalism it’s the interactive side of things that’s most interesting. How information is moved around inside a computer is less important for journalists to understand than the interactive capabilities of information input, modeling, and output afforded by computing.  That is, how does computing interface with the rest of the world? Of course many of the capabilities of computers studied in interactive computing rest on solid foundations of core computer science (e.g. you couldn’t get much done without an operating system to schedule processes and manage data). Core areas with particular relevance to interactive computing are technologies in networking/communications, information management, and to a lesser extent computational science. Below I list more detailed sub-areas for each of the interactive computing and related core areas.</p>
<ul>
<li><strong>Human Computer Interaction</strong> (HCI) includes sub-areas such as interaction design, user-centered design, multimedia systems, collaboration, online communities, human-robot interaction, natural interaction, tangible interaction, mobile and ubiquitous computing, wearable computing, and information visualization</li>
<li><strong>Graphics and Visual Computing</strong> includes sub-areas such as geometric modeling, materials modeling and simulation, rendering, image synthesis, non-photorealistic rendering, volumetric rendering, animation, motion capture, scientific visualization, virtual environments, computer vision, image processing and editing, game engines, and computational photography</li>
<li><strong>Intelligent Systems</strong> includes sub-areas such as general AI including search and planning, cognitive science, knowledge-based reasoning, agents, autonomous robotics, computational perception, machine learning, natural language processing and understanding, machine translation, speech recognition, and activity recognition</li>
<li><strong>Net Centric Computing</strong> includes aspects of networking, web architecture, compression, and mobile computing.</li>
<li><strong>Information Management</strong> includes aspects of database systems, information architecture, query languages, distributed data, data mining, information storage and retrieval, hypermedia, and multimedia databases.</li>
<li><strong>Computational Science</strong> includes aspects of modeling, simulation, optimization, and parallel computing often oriented towards big data sets.</li>
</ul>
<p>So what can we do with this detailed typology of interactive computing technology?</p>
<p>In a 2004 <a href="http://cs.usc.edu/~rosenblo/Pubs/Framework.pdf">CACM article </a>Paul Rosenbloom developed a notation for describing how computing interacts with other fields. In his typology, he articulated ways in which computing could <em>implement</em>, <em>interact with</em>, and <em>embed with</em> other disciplines, namely with physical, life, and social sciences. These different relationships between fields lead to different kinds of ideas for technology (e.g. an embedding relationship of computing in life sciences would be the notion of cyborgs, an interaction between computing and physical sciences would be robotics). In this spirit, later on in this blog series I’ll look more specifically at how some of the computing technologies articulated above can map to aspects of journalism practice, with an eye toward innovation in journalism by applying computing in new or under-explored ways.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.nickdiakopoulos.com/2011/11/16/modeling-computing-and-journalism-part-i/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Google+ and Commenting</title>
		<link>http://www.nickdiakopoulos.com/2011/10/14/google-and-commenting/</link>
		<comments>http://www.nickdiakopoulos.com/2011/10/14/google-and-commenting/#comments</comments>
		<pubDate>Fri, 14 Oct 2011 19:28:05 +0000</pubDate>
		<dc:creator>Nick</dc:creator>
				<category><![CDATA[commenting]]></category>
		<category><![CDATA[social media]]></category>

		<guid isPermaLink="false">http://www.nickdiakopoulos.com/?p=430</guid>
		<description><![CDATA[Twitter isn’t built for conversation, the interface just doesn’t support it &#8211; snippets of 140 characters largely floating in a groundless ether of chatter. But Google+ does (to some extent) and I’ve recently begun pondering what this means for the future of commenting online, especially around news media where I’ve done research. One difference I <a href="http://www.nickdiakopoulos.com/2011/10/14/google-and-commenting/"> read more <span class="meta-nav">&#187;</span></a>]]></description>
			<content:encoded><![CDATA[<p>Twitter isn’t built for conversation, the interface just doesn’t support it &#8211; snippets of 140 characters largely floating in a groundless ether of chatter. But Google+ does (to some extent) and I’ve recently begun pondering what this means for the future of commenting online, especially around news media where <a href="http://www.nickdiakopoulos.com/news-commenting-systems/">I’ve done research.</a></p>
<p>One difference I see moving forward is a transition away from commenting being dictated primarily by the content, to a world where online comment threads are heavily influenced by <strong>both the content and the person sharing the content</strong>. How does the same content posted by different people lead to different conversations evolving around that content? If a conservative blogger and a liberal blogger share the same link to a news article on Google+, how do their circles react differently to that article and how does that affect the conversation? And if we aggregate these conversations back together somehow does this lead to a more interesting, engaging, or insightful experience for users? How can online publishers harness this as an opportunity?</p>
<p>On Google+ people post in all kinds of different ways: status updates, entire blog posts (e.g. <a href="https://plus.google.com/u/0/112374836634096795698/posts">Guy Kawasaki</a>), or just sharing news and rich media links. Here I’ll focus on commenting around links to media since that’s most relevant to online publishers. The diversion of commenting attention and activity to platforms other than the publisher’s (e.g. Google+ or Facebook) could be seen as a threat, but it could also be an opportunity. Platform <a href="https://developers.google.com/+/api/">APIs</a> can harvest this activity and aggregate it back to the publisher’s site. The opportunity is in harnessing the activity on social platforms to provide new, more sticky interfaces for keeping users engaged on the publisher’s content page. For the designers out there: what are novel ways of organizing and presenting online conversations that are enabled by new features on social networks like Google+?</p>
<p>One idea, for opinion oriented news articles, would be for a publisher to aggregate threads of Google+ comments from two or more well-known bloggers who have attracted a lot of commentary. These could be selected by the users, editors, or, eventually by algorithms which identify “interesting” Google+ threads. These algorithms could, for instance, identify threads with people from diverse backgrounds, from particular geographies, with particular relevant occupations, or with a pro/con stance. These threads would help tell the story from different conversational perspectives anchored around particular people sharing the original content. The threads could be embedded directly on the publisher’s site as a way to keep users there longer, perhaps getting them more interested in the debates that are happening out on social media.</p>
<p>Another idea would be to organize commentary by network distance, providing a view of the commentary that is personalized to an individual. Let’s say I share a link on Google+ and 20 people comment (generous to myself, I know), but then 2 of those people re-share it to their circles and 50 more people comment, and from those 50, 5 of them share it and 100 people comment, and so on. At each step of re-sharing that’s a bit further away from me (the originator) in the network. Other people in the network can also share the link (as originators) and it will diffuse. All of this activity can be aggregated and presented to me based on how many hops away in the network a comment falls. I may be interested in comments that are 1 hop away, and maybe 2 (friends of friends) but maybe not further than that. Network distance from the user could end up being a powerful social filter.</p>
<p>There’s lots to try here and while I think it’s great that new platforms for commenting are emerging, it’s time for publishers to think about how to tap into these to improve the user experience either by enabling new ways of seeing discussion or new ways to learn and socialize with others around content.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.nickdiakopoulos.com/2011/10/14/google-and-commenting/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

