Category Archives: social media

Story-focused reading in online news

I worked for several years with Janette Lehmann as part of her PhD looking at user engagement across sites. This blog post describes our work on inter-site engagement in the context of online news reading. The work was done in collaboration with Carlos Castillo and Ricardo Baeza-Yates [1].

Online news reading is a common activity of Internet users. Users may have different motivations to visit a news site: some users want to remain informed about a specific news story they are following, such as an important sport tournament or a contentious important political issue; others visit news portals to read about breaking news and remain informed about current events in general.

story-focusWhile reading news, users sometimes become interested in a particular news item they just read, and want to find more about it. They may want to obtain various angles on the story, for example, to overcome media bias or to confirm the veracity of what they are reading. News sites often provide information on different aspects or components of a story they are covering. They also link to other articles published by them, and sometims even to articles published by other news sites or sources. An example of an article having links to others is shown on the right, at the bottom of the article.

We performed a large-scale analysis of this type of online news consumption: when users focus on a story while reading news. We referred to this as story-focused reading. Our study is based on a large sample of user interaction data on 65 popular news sites publishing articles in English. We show that:

  • Story-focused reading exists, and is not a trivial phenomenon. This type of news reading differs from a user daily consumption of news.
  • Story-focused reading is not simply a consequence of the fact that some stories are more popular, have more articles written about them, or covered by more news providers.
  • Story-focused reading is driven by the interest of the users. Even users that can be considered as casual news readers (they only read few articles) engage in story-focused reading.
  • When engaged in story-focused reading, users spend more time reading and visit more news providers. Only when users read many articles about a story, the reading time decreases. Our analysis suggests that this could be due to news articles containing mostly the same information.

The strategies that readers employ to find articles related to a story depend on how deep they want to delve into the story. If users are only reading a few articles about a story, they tend to gather all information from a single news site. In the case of deeper story-focused reading, where users are interested in the story details or specific information, they often use search and social media sites to access sites. Furthermore, many users are coming from less popular news sites and blogs, which makes sense, because blogs frequently link their posts to mainstream news sites when discussing an event and users are following these links to likely gather further information or confirm the veracity of what they are reading.

Strategies that keep users engaged with a news site include recommending news articles to users or integrating interactive features (e.g., multimedia content, social features, hyperlinks) into news articles. News providers can promote story-focused reading and increase engagement by linking their articles to other related content. Embedding links to related content into news articles and hyperlinks in general are an important factor that influences the stages of engagement (period of engagement, disengagement, and re-engagement). Having internal links within the article text promotes story-focused reading and as a result keeps users engaged:

It leads to a longer period of engagement (reading sessions are longer) and earlier re-engagement (shorter absence time). Providing links to external content does not have a negative effect on user engagement; the period of engagement remains the same (reading sessions are the same), and the re-engagement begins even sooner (shorter absence time).

This does not mean that news providers should just provide links; they should provide the right ones in terms of quantity and quality. The type, the position, and the number of links play an important role. Users tend to click on links that bring them to other news articles within the same news site, or to articles published by less known sources, probably because they provide new or less mainstream information. However, it is not a good strategy to offer too many such links, as this is likely to confuse or annoy users. Too many inline links can have detrimental effect on users’ reading experience. Finally, when engaged in story-focused reading, users tend to click on links that are close to the end of the article text.

The linking strategies of news providers affect the way users engage with their news sites, which by itself is not new. However, our results are in contradiction with the linking strategy that aims at keeping users as long as possible on a site by linking to other content on the site.

Instead, it can be beneficial (long-term) to entice users to leave the site (e.g., by offering them interesting content on other sites) in a way that users will want to return to it.

News providers could adapt their sites when they identify a user engaging in story-focused reading in various ways:

  • Such information could be integrated in the personalised news recommender of the news site. Story-related articles in the news feed could be highlighted or content frames containing information and links related to the story could be presented on the front page.
  • It might be also beneficial to provide and link to topic pages containing latest updates, background information, blog entries, eye witness reports, etc. related to the story.

Story-focused reading also brings new opportunities for news providers to drive traffic to their sites by collecting the most interesting articles and statements around a story, i.e., becoming a news story curator, and publishing them via social media channels or email newsletters.

How engaged are Wikipedia users?

Wikipedia Recently, we were asked: “How engaged are Wikipedia users?” To answer this question, we visited Alexa, a Web Analytics site, and learned that Wikipedia is one of the most visited sites in the world (ranked 6th), that users spend on average around 4:35 minutes per day on Wikipedia, and that many visits to Wikipedia come from search engines (43%). We also found studies about readers’ preferences, Wikipedia growth, and Wikipedia editors. There is however little about how users engage with Wikipedia, in particular about those not contributing content to Wikipedia.

Can we do more?

Beside reading and editing articles, users perform many other actions: they look at the revision history, search for specific content, browse through Wikipedia categories, visit portal sites to learn about specific topics, or visit the community portal. Although discussing an article is a sign of a highly engaged user, performing several actions within the same visit to Wikipedia is also a sign of a highly engaged user. It is this latter type of engagement we looked into.

Action networks

action_networkWe collected 13 months (September 2011 to September 2012) of browsing data from an anonymized sample of approximately 1.3M users.  We identified 48 actions such as reading an article, editing, opening an account, donating, visiting a special page. We then built a weighted action network: nodes are the actions and two nodes are connected by an edge if the two corresponding actions were performed during the same visit to Wikipedia. Each node has  a weight representing the number of users performing the corresponding action (the node traffic). Each edge has a weight representing the number of users that performed the two corresponding actions (the traffic between the two nodes).

Engagement over time

We use the following metrics to measure engagement on Wikipedia based on actions:

  • TotalNodeTraffic: total number of actions (sum of all node weights)
  • TotalEdgeTraffic: total number of pairwise actions (sum of all edge weights)
  • TotalTrafficRecirculation: actual network traffic with respect to maximum possible traffic (TotalEdgeTraffic/TotalNodeTraffic).

We calculated these metrics for the 13 months under consideration and plotted their variations over time. An increase in TotalNodeTraffic means that more users visited Wikipedia. An increase in TotalTrafficRecirculation means that more users performed at least two actions while on Wikipedia, our chosen indicator of high engagement in Wikipedia. We observe that TotalNodeTraffic increased first then became more or less stable. By contrast, TotalTrafficRecirculation mostly decreased, but we see a small peak in January 2011.

rcTraffic_monthlyTwo important events happened in our 13-month period. During the donation campaign (November to December 2011) more users visited Wikipedia (higher TotalNodeTraffic value). We speculate that many users became interested in Wikipedia during the campaign. However, because TotalTrafficRecirculation actually decreased for the same period, although more users visited Wikipedia, they did not perform two (or more) actions while visiting Wikiepedia; they did not become more engaged with Wikipedia. However, during the SOPA/PIPA protest (January 2012), we see a peak in TotalNodeTraffic and TotalTrafficRecirculation. More users visited Wikipedia and many users became more engaged with Wikipedia; they also read articles, gathered information about the protest, donated money while visiting Wikipedia.

rcTraffic_weekdays+endWe detected different engagement patterns on weekdays and weekends. Whereas more users visited Wikipedia during weekdays (high value of TotalNodeTraffic), users that visited Wikipedia during the weekend were more engaged (high value of TotalTrafficRecirculation). On weekends, users performed more actions during their visits.

People behave differently on weekdays compared to weekends. The same happens with Wikipedia.

Did the donation campaign make Wikipedia more engaging?

meaganmakes - 182-365+1 [cc] - 2 So which actions became more frequent as a result of the donation campaign? As expected, we observed a significant traffic increase on the “donate” node during the two months; many users made a donation. In addition, the traffic from some nodes to other nodes  increased but only slightly. Additional actions were performed;  for instance, more users created a user account, visited community-related pages, all within the same session. However, overall, users mostly performed individual actions since TotalTrafficRecirculation decreased during that time period.

So the campaign was successful in terms of donation, but less in terms of making Wikipedia more engaging.

This is a write-up of the presentation given by Janette Lehmann at TNETS Satellite, ECCS, Barcelona, September 2013.

Hey Twitter crowd … What else is there?

Original and shorter post at

Twitter is a powerful tool for journalists at multiple stages of the news production process: to detect newsworthy events, interpret them, or verify their factual veracity. In 2011, a poll on 478 journalists from 15 countries found that 47% of them used Twitter as a source of information. Journalists and news editors also use Twitter to contextualize and enrich their articles by examining the responses to them, including comments and opinions as well as pointers to other related news. This is possible because some users in Twitter devote a substantial amount of time and effort to news curation: carefully selecting and filtering news stories highly relevant to specific audiences.

We developed an automatic method that groups together all the users who tweet a particular news item, and later detects new contents posted by them that are related to the original news item. We call each group a transient news crowd. The beauty with this, in addition to be fully automatic, is that there is no need to pre-define topics and the crowd becomes available immediately, allowing journalists to cover news beats incorporating the shifts of interest of their audiences.

Transient news crowds
Figure 1. Detection of follow-up stories related to a published article using the crowd of users that tweeted the article.

Transient news crowds
In our experiments, we define the crowd of a news article as the set of users that tweeted the article within the first 6 hours after it is published. We followed users on each crowd during one week, recording every public tweet they posted during this period. We used Twitter data around news stories published by two prominent international news portals: BBC News and Al Jazeera English.

What did we find?

  • After they tweet a news article, people’s subsequent tweets are correlated to that article during a brief period of time.
  • The correlation is weak but significant, in terms of reflecting the similarity between the articles that originate a crowd.
  • While the majority of crowds simply disperse over time, parts of some crowds come together again around new newsworthy events.

What can we do with the crowd?
Given a news crowd and a time slice, we want to find the articles in a given time slice that are related to the article that created the crowd. To accomplish this, we used a machine learning approach, which we trained on data annotated using crowd sourcing. We experimented with three types of features:

  • frequency-based: how often an article is posted by the crowd compared to other articles?
  • text-based: how similar are the two articles considering the tweets posted them?
  • user-based: is the crowd focussed on the topic of the article? does it contain influential members?

We find that the features largely complement each other. Some features are always valuable, while others contribute only in some cases. The most important features include the similarity to the original story, as well as measures of how unique is the association of the candidate article and its contributing users to the specific story’s crowd.

Crowd summarisation
We illustrate the outcome of our automatic method with the article Central African rebels advance on capital, posted on Al Jazeera on 28 December, 2012.

Figure 2. Word clouds generated for the crowd on the article “Central African rebels advance on capital”, by considering the terms appearing in stories filtered by our system (top) and on the top stories by frequency (bottom).

Without using our method (in the figure, bottom), we obtain frequently-posted articles which are weakly related or not related at all to the original news article. Using our method (in the figure, top), we observe several follow-up articles to the original one. Four days after the news article was published, several members of the crowd tweeted an article about the fact that the rebels were considering a coalition offer. Seven days after the news article was published, crowd members posted that rebels had stopped advancing towards Bangui, the capital of the Central African Republic.

News crowds allow journalists to automatically track the development of stories. For more details you can check our papers:

  • Janette Lehmann, Carlos Castillo, Mounia Lalmas and Ethan Zuckerman: Transient News Crowds in Social Media. Seventh International AAAI Conference on Weblogs and Social Media (ICWSM), 8-10 July 2013, Cambridge, Massachusetts.
  • Janette Lehmann, Carlos Castillo, Mounia Lalmas and Ethan Zuckerman: Finding News Curators in Twitter. WWW Workshop on Social News On the Web (SNOW), Rio de Janeiro, Brazil.

Janette Lehmann, Universitat Pompeu Fabra
Carlos Castillo, Qatar Computing Research Institute
Mounia Lalmas, Yahoo! Labs Barcelona
Ethan Zuckerman, MIT Center for Civic Media

Today I am giving a keynote at the 18th International Conference on Application of Natural Language to Information Systems (NLDB2013), which is held at MediaCityUK, Salford.

I have now started to think at what are the questions to ask when evaluating user engagement. In the talk, I discuss these questions through five studies we did. Also included are questions asked when

  • evaluating serendipitous experience in the context of entity-driven search using social media such as Wikipedia and Yahoo! Answers.
  • evaluating the news reading experience when links to related articles are automatically generated using “light weight” understanding techniques.

The slides are available on Slideshare.

Relevant published papers include:

I will write about these two works in later posts.

Your news audience in Twitter – Discover your curators

@SNOW/WWW, 2013, by Janette Lehmann, Carlos Castillo, Mounia Lalmas, and Ethan Zuckerman

Original post at WWW Workshop on Social News on the Web

flickr - striatic - 436654901_e1b0204d14_o_edited

Information between journalists and their audience in social media flows in both ways. A recent study from the Oriella PR Network showed that over 54% of journalists use online social media platforms (Twitter, Facebook, and others) and 44% use blogs to find new story angles or verify the stories they work on. There are now platforms, such as Storyful, that provide user lists of high quality, developed by journalists for journalists.

Our starting point is the community of engaged readers of a news story — those who share a particular news article through Twitter. We refer to them as a transient news crowd, in analogy with the group of passers-by that gathers around unexpected events such as accidents in a busy street. The question is whether the users of such a crowd can provide further valuable information related to the story of the news article.

Many members of news crowds are far from being passive in their consumption of news content. They are news curators, because they filter, select, and disseminate carefully selected news stories about a topic.

A famous example for this type of news curator is Andy Carvin (@acarvin), who mostly collects news related to the Arabic world. He became famous for his curatorial work during the Arab Spring, where he aggregated reports in real time and tweeted up to thousands of tweets per day. We expect that among the users who share an article in Twitter are also other curators like Andy Carvin who may follow-up with further tweets.

We have observed that basically all news stories have a set of Twitter users who may be potential news curators for the topics of the story. For instance, among the people who tweeted the Al Jazeera’s article “Syria allows UN to step up food aid” (posted January 2013), there are at least two news curators: @RevolutionSyria and @KenanFreeSyria.

However, not everybody can be considered a news curator. Some people tweet one piece of news that was interesting to them and move on. Others tweet a wide range of news stories. Curators are individuals who carefully follow a story or related set of stories. In our SNOW 2013 work “Finding News Curators in Twitter”, we defined a set of features for each user and demonstrated that they can be used to automatically find relevant curators among the audience. The features describe the visibility, tweeting activity and the topical focus of a user. We collected news articles published in early 2013 of BBC World Service (BBC) and Al Jazeera English (AJE). Then, we followed the users that posted a specific article and analyzed their tweeting behavior. Our results reveal that 13% of the users from AJE and 1.8% of the users from BBC world are possible news curators.

The roles of curators in a crowd…

Some news curators are more focused than others. For instance, @KeriJSmith, a self-defined “internet marketer” tweets about various interesting news on a large variety of topics, while others are more selective. A famous example is Chan’ad Bahraini (@chanadbh) who tweets about Bahrain. Whether a user is topic-focused or not can be determined, for instance, by the number of different sections of a news web site s/he is tweeting about. If these sections differ (e.g. from finance to celebrities), we can assume that the user is less focused.

Considering only the topical focus of a user is not sufficient when identifying story curators. A significant amount of Twitter accounts operate as news aggregators – collecting news articles automatically (e.g. from RSS feeds) and posting their corresponding headlines and URLs to Twitter (45% in Al Jazeera English, 65% in BBC world). They can be identified easily, as most or all of their tweets contain URLs and they do not tend to interact much via messages with other users.

The majority of news aggregators post many tweets per day related to breaking news and top stories, e.g. @BreakingNews. Only a minority is focused on more specific topics, and thus constitutes topic-focused aggregators. The user @RevolutionSyria, for instance, distributes automatically news articles about the civil war in Syria. Whether the automatic generated tweets provide interesting content to a topic is questionable. Nonetheless, some news aggregators seem to be considered valuable by users, as in the case of @RevolutionSyria who has around 100,000 followers at the time of this writing.

In short, our current research deals with identifying crowds, curators, and aggregators. For more details you can check our articles and presentations:

  • Janette Lehmann, Carlos Castillo, Mounia Lalmas and Ethan Zuckerman: Finding News Curators in Twitter. To be presented at the WWW Workshop on Social News On the Web (SNOW), Rio de Janeiro, Brazil.
  • Janette Lehmann, Carlos Castillo, Mounia Lalmas and Ethan Zuckerman: Transient News Crowds in Social Media. To be presented at the Seventh International AAAI Conference on Weblogs and Social Media (ICWSM), 8-10 July 2013, Cambridge, Massachusetts.

Photo credits: Hobvias Sudoneighm (Creative Commons BY).