YouTube Finally Does Deep Linking

YouTube announced yesterday that you can finally form URL’s which allow for navigation to specific parts of videos. Also, if you’re writing a comment, you can just specify something in the form: MM:SS and this will be detected and turned into a link to that part of the video.

This is an interesting and I would argue non-intuitive way to specify links in videos. At the same time, it keeps the overall interface simple unless you’re an advanced user that knows the format for creating these links. It’s the expert interface way of doing things rather than the GUI approach. When someone hits, “add comment” why not augment the UI to let the user more visually specify where that anchor should be placed, perhaps even specifying an interval?

Videolyzer Alpha Online

Version of Videolyzer is now online! Videolyzer is a tool designed for journalists and bloggers to be able to collaboratively assess the information quality of a video, including its transcript. Information quality involves things like credibility, validity, and comprehensivness among other things.  Videolyzer was designed to support the analysis, collection, and sharing of criticisms of online videos and is initially geared toward politics. To try it out with some of the recent presidential debate content go to

NYT Interactive Presidential Debates

The New York Times recently published an interactive application for exploring the video and transcripts from the presidential and vice-presidential debates. Actual debate content aside, the application is quite a usable foray into the realm of multimedia (video + transcript) interfaces. Seen here is a screen shot of the application from the 2nd presidential debate.

Overall the interface has a good “flow.” At the top is the ability to search for keywords and see where they showed up in the transcript. You can see the comparison between the word’s usage between Obama, McCain, and the moderator. Below this are two timelines, the problem is that while they are all intuitive, they are in the wrong hierarchical order. The top most timeline is the most “zoomed out,” but the next one down is the most “zoomed in.” Really they need to be re-ordered so that the middle timeline is the bottom most. This would be a more intuitive layout from least detailed to most detailed. What IS really nice about all of the timelines and what really helps navigation is all of the textual information that pops up when hovering. Also there’s some segmentation showing parts of the video where each of the debaters is speaking. I found it really helpful to be able to click any of these segments and navigate the video to there. There is some navigational integration with the transcript which is interesting too. For one you can click on a block of the transcript and that will navigate you to that section of the video. But still we’re dealing with blocks of text rather than individual words being linked into the video.

The other fantastic aspect of this tool is that it provides some level of integrated fact-checks. The fact checking is produced professionally by the Times and is presented as aligned with the different question segments.¬† It’s difficult to follow though because it’s in a tab which competes with the transcript itself and so you can’t see the context or anchor to where the fact checking is referring. It seems it would be a lot more helpful for comparison’s sake to be able to see both the transcript and also the fact checking at the same time. The other problem with the presentation of the fact checking is just that’s it’s really dense and hard to read through. Again, better contextualization with the video and the transcript would really help here.

Video Transcription on Google

Yesterday Google announced that they were applying some of their speech transcription research to political videos on YouTube. The philosophy – pushing research into the market to see its value and how it’s used- is great. The implementation however is rather shallow. While searching for keywords within video may be valuable for some users, several other features (such as closed captioning) have been left out of the interface. Also, the feature has not been integrated into YouTube itself and only functions within the google gadget, which makes it less likely to be seen and used by many people.

Speech recognition is a hard problem. In a recent test I did with the sphinx 3 engine from CMU I was lucky to get a 60% correct transcription for a YouTube video – and this was cleanly spoken audio. Studies at the University of Toronto by Cosmin Munteanu suggest that a word error rate (WER) of 25% is needed for the benefits of a transcribed video to be realized. And there’s a LONG way to go until automatically transcribed video achieves that WER on arbitrary internet content. The problems with automatic transcription are manifold but include (1) noisy audio, (2) different speakers with varying accents, (3) poor support for named entities, and (4) high errors in audio to transcript alignment.

It’s hard to evaluate the Google transcription effort, but I will mention that in several searches for keywords that I have done, the markers on the timeline are off by several seconds from where the words are actually spoken in the video. This speaks to difficulty # 4 above. To my knowledge there is no research about the effects on the interactive experience of this type of misalignment error, so it should be interesting to see if Google users find this annoying or not.

I’ve been developing a new technology which addresses the video transcription problem. Check out my post on it here.

Online Video Consumption Habits

I found another really interesting survey from the Pew Internet and American Life Project about Online Video Consumption Habits. This survey looks at some of the trends surrounding the consumption of online video in the US including what types of content and behaviors arise around online video. Some of the take away points from the report:

  • News is the most popular genre of online video (except with youngr viewers aged 18-29); 10% of internet users say they watched news video yesterday (~16 million Americans)
    • News watching correlates with higher income and education
  • 19% of video viewers either rate video or post comments on video
    • If we intersect this with the prior number, then about 1.9% of internet users (~3 million people) rate or post comments on video each day. This is only an estimate since we can’t say for sure how much these two actually overlap; i.e. people who watch news may be more likely to rate or post comments
    • Younger people are twice as likely as older adults to post comments on video or rate video
  • 15% of internet users (~23 million) have sought political video content online; 2% (~3 million) do this on a typical day
    • Political content is popular among viewers who also rate or comment on video [This content is appealing to people who also want to have their opinion heard?]
  • Professional videos are preferred to amateur productions
  • Few people will pay to access online video
  • YouTube dominates online video, with about a third of the online audience for video; news websites account for only 6% of traffic to online video.
  • 3% of internet users (~ 5 million) watch educational video on a typical day

Misinformation on YouTube

Recently I read an article that I first found on Slashdot and later tracked down to the original from JAMA (The Journal of the American Medical Association) about a content analysis study of videos concerning vaccination on YouTube. Researchers at The University of Toronto took a sample of 153 videos on YouTube by searching for videos containing the keywords vaccination or immunization. They then coded the videos based on whether they conveyed a positive or negative message about vaccination. They looked at the specific scientific claims being make and also coded them as either substantiated or unsubstantiated using the Canadian Immunization Guide as a reference. 48% of the videos were found to have a positive message, 32% were negative, and 20% were ambiguous. It was troubling that they did find that the negative videos had a higher mean user rating (i.e. 1-5 stars) as compared to the positive videos. 14.3% of the sample (22 videos) conveyed messages that were not substantiated with reference to the Canadian immunization guide.

The way this was reported in some news outlets (including the press release) was itself misleading. It was reported that 45% “of those videos” contradicted the Canadian reference guide, however, it wasn’t 45% of the total sample but rather 45% of only the negative videos that contradicted the reference guide. So the percentage of the entire sample (22 / 153) is 14.3%, considerably less alarming or big than was reported. So misinformation is still an issue on YouTube, but the magnitude of the reported effect wasn’t stated clearly; yet another reason to go back to the original source.

Timed Comments in Video

There’s a lot of interest from new video startups in making video into a first class web 2.0 citizen by bringing tagging, commenting, and responses to videos at a sub-video level of granularity. While the old skool video sites like YouTube, Revver, Metacafe, Magnify etc. let you add tags and comments to a video, the new breed of video services such as Viddler, YouTube Streams, and The Click facilitate timed tags and comments to video. There has also been some recent academic interest in how people can interact with one another through commenting and chatting around video. This CHI 2007 paper from CMU is a good first step in understanding chatting behavior around videos.

Today I took a closer look at one of the newer attempts at highly granular commentable video, viddler. Here’s a screenshot of their interface, which despite some great features also suffers from some usability issues. All in all the timeline is pretty good though, you can clearly see where people have left behind comments and an easy to understand “+” graphic allows the user to access a menu and select the addition of a comment, tag, or video response. The downside here is there is no time extent associated with the comments or tags, they are simply added as point anchors. At the same time, this does simplify the interface by not requiring the use to select and in and out point; ideally I think the point anchor should be extendable to cover a time period if the user so chooses.

One of the problems that the interface suffers from is that comments sometimes obscure the video. Although they can be expanded and contracted as necessary, this just seems tedious. Furthermore, to see all of the comments that have been added to the video (as a list), almost half of the video is obscured. Perhaps the goal is to have the comment track and video mutually exclusive as they distract from one another anyway? What is nice is the voting mechanism for comments which is used to determine which comment shows up while someone is watching if there are several comments or responses at that point in the video.

