Today I am giving a keynote at the 18th International Conference on Application of Natural Language to Information Systems (NLDB2013), which is held at MediaCityUK, Salford.

I have now started to think at what are the questions to ask when evaluating user engagement. In the talk, I discuss these questions through five studies we did. Also included are questions asked when

  • evaluating serendipitous experience in the context of entity-driven search using social media such as Wikipedia and Yahoo! Answers.
  • evaluating the news reading experience when links to related articles are automatically generated using “light weight” understanding techniques.

The slides are available on Slideshare.

Relevant published papers include:

I will write about these two works in later posts.

What can absence time tell about user engagement?

Two widely employed engagement metrics are click-through rate and dwell time. These are particularly used for services where user engagement is about clicking, for example in the context of search where presumably users click on relevant results, and/or spending time on a site, for example consuming content in the context of a news portal.

In search, both have been used as indicator of relevance, and have been exploited to infer user satisfaction with their search results and improve ranking functions. However, how to properly interpret the relations between these metrics, retrieval quality and the long-term user engagement with the search application is not straightforward. Also, relying solely on clicks and time spent can  lead to contradictory if not erroneous conclusions. Indeed, with the current trend of displaying rich information on web pages, for instance the phone number of restaurants or weather data in search results, users do not need to click to access the information and the time spent on a website is shorter.

5127965259_66c1061cbb_nMeasure: Absence time 
The absence time measures the time it takes a user to decide to return to a site to accomplish a new task. Taking a news site as an example, a good experience associated with quality articles might motivate the user to come back to that news site on a regular basis. On the other hand, if the user is disappointed, for example, the articles were not interesting, the site was confusing, he or she may return less often and even switch to an alternative news provider. Another example is a visit to a community questions and answers website. If the questions of a user are well and promptly answered, the odds are that he or she will be enticed to raise new questions and return to the site soon.

Our assumption is that if users find a site interesting, engaging or useful, they will return to it sooner.

This assumption has the advantage of being simple, intuitive and applicable to a large number of settings.

Case study: Yahoo! Answers Japan
We used a popular community querying and answering website hosted by Yahoo! Japan, where users are given the possibility to ask questions about any topic of their interest. Other users may respond by writing an answer. These answers are recorded and can be searched by any user through a standard search interface. We studied the actions of approximately one million users during two weeks.  A user action happens every time a user interacts with Yahoo! Answers: every time he or she issues a query or clicks on a link, be it an answer, an ad or a navigation button. We compare the behaviour of users exposed to six functions used to rank past answers both in term of traditional metrics and of absence time.

Methodology: Survival analysis
We use Survival Analysis to study absence time. Survival Analysis has been used in applications concerned with the death of biological organisms, each receiving different treatments. An example is throat cancer treatment where patients are administered one of several drugs and the practitioner is interested in seeing how effective the different treatments are.  The analogy with our analysis of absence time is unfortunate but nevertheless useful. We associate the user exposition to one of the ranking functions as a “treatment” and his or her survival time as the absence time. In other words, a Yahoo! Answers user dies each time he or she visits the site … but hopefully resuscitates instantly as soon as his or her visit ends.

Survival analysis makes uses of a hazard rate, which reflects the probability that a user dies at a given time. It can be very loosely understood as the speed of death of a population of patients at that  time. Returning to our example, if the hazard rate of throat cancer patients administered with say drug A is higher than the hazard rate of patients under drug B treatment, then drug B patients have a higher probability of surviving until that time. A higher hazard rate implies a lower survival rate.

We use hazard rates to compare the different ranking functions for Yahoo! Answers: a higher hazard rate translates into a short absence time and a prompter return to Yahoo! Answers, which is a sign of higher engagement. What did we find?

A better ranking does not imply more engaged users
Ranking algorithms are compared with a number of measures; a widely used one is DCG, which rewards ranking algorithms retrieving relevant results at high ranks. The higher the DCG, the better the ranking algorithm. We saw that, for the six ranking functions we compared, a higher DCG did not always translate to a higher hazard rate, or in other words, users returning to Yahoo! Answers sooner.

Returning relevant results is important, but is not the only criterion to keep users engaged with the search application.

More clicks is not always good, but no click is bad
A common assumption is that a higher number of clicks is a reflection of a higher user satisfaction with the search results. We observe that up to 5 clicks, each new click is associated with a higher hazard rate, but the increases from the third click are small. A fourth or fifth click has a very similar hazard rate. From the sixth click, the hazard rates decreases slowly.

This suggests that on average, clicks after the fifth one reflect a poorer user experience; users cannot find the information they are looking for.

We also observed that the hazard rate with five clicks or more is always higher compared with no click at all; when users search on Yahoo! Answers, no click means a bad user experience.

A click at rank 3 is better than a click at rank 1
The hazard rate is larger for clicks at ranks 2, 3 and 4, the maximum arising at rank 3, when compared to click at rank 1. For lower ranks, the trend is toward decreasing hazard.  Only the click at rank 10 was found to be clearly less valuable than a click at rank 1. It seems that users unhappy with results at earlier ranks simply click on the last displayed result, for no apparent reason apart for it being the last one on the search result page.

Clicking lower in the ranking suggests a more careful choice from the user, while clicking at the bottom is a sign that the overall ranking is of low quality.

Clicking fast on a result is a good sign
We found that the shorter the time between the search results of a query being displayed and the first click, the higher the hazard rate.

Users who find their answers quickly return sooner to the search application.

More views is worst that more queries
When users are returned search results, they may click on a result, then return back to the search result page, and then click on another result. Each display of search results generates a view. At anytime, the user may submit a new query. Both returning to the search result page several times and a higher number of query reformulations are signs that the user is not satisfied with the current search results. Which one is worse? We could see that having more views than queries was associated on average with a low hazard rate, meaning a longer absence time.

This suggests that returning to the same search result page is a worse user experience  than reformulating the query.

Without the absence time, it would have been harder to observe this, unless we asked explicitly the users to tell us what is going on.

7179266571_541698d0e5_nA small warning
A user might decide to return sooner or later to a website due to reasons unrelated with the previous visits (being on holidays for example). It is important to have a large sample of interaction data to detect coherent signals and to take systematic effects into account.

Take away message

Using absence time to measure user engagement is easy to interpret and less ambiguous than many of the commonly employed metrics. Use it and get new insights with it.

This work was done in collaboration with Georges Dupret. More details about the study can be found in  Absence time and user engagement: Evaluating Ranking Functions, which was published at the 6th ACM International Conference on Web Search and Data Mining in Rome, 2013.

Photo credits: tanfelisa and kaniths (Creative Commons BY).

We need a taxonomy of web user engagement

There are lots lots and lots of metrics that can be used to assess how users engage with a website. Widely used ones by the web-analytics community are click-through rates, number of page views, time spent on a website, how often users return to a site, number of users.

uue_engmetrics_wordle

Although these metrics cannot explicitly explain why users engage with a site, they can act as proxy for online user engagement: two millions of users accessing a website daily is a strong indication of a high engagement with that site.

Metrics, metrics and metrics

There are three main types of web-analytics metrics:

  • Popularity metrics measure how much a website is used (for example, by counting the total number of users on the site in a week). The higher the number, the more popular the website.
  • How a website is used when visited is measured with activity metrics, for example, the average number of clicks per visit across all users.
  • Loyalty metrics are concerned with how often users return to a website. An example is the return rate, calculated as the average number of times users visited a website within a month.

Loyalty and popularity metrics can be calculated on a daily, weekly or monthly basis. Activity metrics are calculated at visit level.

So one would think that a highly engaging website is one with a high number of visits (very popular), where users spend lots of time and click often (lots of activity), and return frequently (high loyalty). But not all websites, whether popular or not, have both active and loyal users.

This does not mean that user engagement on such websites is lower; it is simply different.

422362185_a260ad4ee4_q What did we do?

We collected one-month browsing data from an anonymized sample of approximately 2M users. For 80 websites, encompassing a diverse set of services such as news, weather, movies, mail, we calculated the average values of the following eight metrics:

  • Popularity metrics: number of distinct users, number of visits, and number of clicks (also called page views) for that month.
  • Activity metrics: average number of page views per visit and average time per visit (also called dwell time).
  • Loyalty metrics: number of days a user visited the site, number of times a user visited the site, and average time a user spend on the site, for that month.

Websites differ widely in terms of their engagement

Some websites are very popular (for example, news portals) whereas others are visited by small groups of users (lots of specific-interest websites were this way). Visit activity also depends on the websites. For instance, search sites tend to have a much shorter dwell time than sites related to entertainment (where people play games). Loyalty per website differed as well. Media (news, magazines) and communication (messenger, mail) sites have many users returning to them much more regularly, than sites containing information of temporary interests (e-commerce site selling cars). Loyalty is also influenced by the frequency in which new content is published. Indeed, some sites produce new content once per week.

High popularity did not entail high activity. Many site have many users spending little time on them. A good example is of a search site, where users come, submit a query, get the result, and if satisfied, leave the site.

This results in a low dwell time even though user expectations were entirely met.

The same holds for a site on Q&A, or a weather site. What matters for such sites is their popularity.

Any patterns? Yes … 

To identify engagement patterns, we grouped the 80 sites using clustering approaches applied to the eight engagement metrics. We also extracted for each group which metrics and their values (whether high or low) were specific to that group. This process generated five groups with clear engagement patterns, and a sixth group with none:

  • Sites where the main factor was their high popularity (for example as measured by the high numbers of users). Examples of sites following this pattern include media sites providing daily news and search sites. Those are sites where users interact in various ways with them; what is common is that they are used by many users.
  • Sites with low popularity, for instance having a low number of visits. Many interest-specific sites followed this pattern. Those sites center around niche topics or services, which do not attract a large number of users.
  • Sites with a high number of clicks per visit. This pattern was followed by e-commerce and configuration (accessed by users to update their profiles for example) sites, where the main activity is to click.
  • Sites with high dwell time and low clicks per visit, and with low loyalty. This pattern was followed by domain-specific media sites of periodic nature (new content published on a weekly basis), which are therefore not often accessed. However when accessed, users spend more time to consume their content. The design of such sites (compared to mainstream media sites) leads to such type of engagement, since new content was typically published on their homepage. Thus users are not enticed to reach (if any) additional content.
  • Sites with high loyalty, small dwell time and few clicks. This pattern was followed by navigational sites (the front page of an Internet company), which role is to direct users to interesting content or service in other sites (of that same company); what matters is that users come regularly to them.

This simple study (80 sites and 8 metrics) identified several patterns of user engagement.

However, sites of the same type do not necessarily follow the same engagement pattern.

For instance, not all mainstream media sites followed the first pattern (high popularity). It is likely that, among others, the structure of the site has an effect.

Green apples measured  the meter, sports apples                                          … So what now?

We must study way more sites and include lots more engagement metrics. This is the only way to build, if we want, and we should, a taxonomy of web user engagement. With a taxonomy, we will know the best metrics to measure engagement on a site.

Counting clicks may be totally useless for some sites. But if not, and the number of clicks is for instance way too low, knowing which engagement pattern a site follows helps making the appropriate changes to the site.

This work was done in collaboration with Janette Lehmann, Elad Yom-Tov and Georges Dupret. More details about the study can be found in  Models of User Engagement, a paper presented at the 20th conference on User Modeling, Adaptation, and Personalization (UMAP), 2012.

Photo credits: Denis Vrublevski and matt hutchinson (Creative Commons BY).

Together with Heather O’Brien and Elad Yom-Tov, we will be giving a tutorial at the International World-Wide Web Conference (WWW), 13-17 May 2013, Rio de Janeiro.

The slides are now available on Slideshare.
You can also access the two-slides per page format (PDF) here: MeasuringUserEngagement or one-slide per page format (PDF) here.
The references can be found here: References_Tutorial.

We will continue updating the slides, correct any errors and so on. Feedback very welcome.

Measuring User Engagement

Together with Heather O’Brien and Elad Yom-Tov, we will be giving a tutorial at the International World-Wide Web Conference (WWW), 13-17 May 2013, Rio de Janeiro. Here is a description of our tutorial. We will add slides and a bibliography soon.

Introduction and Motivations
In the online world, user engagement refers to the quality of the user experience that emphasizes the positive aspects of the interaction with a web application and, in particular, the phenomena associated with wanting to use that application longer and frequently. User engagement is a key concept in the design of web applications, motivated by the observation that successful applications are not just used, but are engaged with. Users invest time, attention, and emotion in their use of technology, and it must satisfy both their pragmatic and hedonic needs and expectations. Measurement is key for evaluating the success of information technologies, and is particularly critical to any web applications, from media to e-commerce sites, as it informs our understanding of user needs and expectations, system design and functionality. For instance, news portals have become a very popular destination for web users who read news online. As there is great potential for online news consumption but also serious competition among news portals, online news providers strive to develop effective and efficient strategies to engage users longer in their sites. Measuring how users engage with a news portal can inform the portal if there are areas that need to be enhanced, if current optimization techniques are still effective, if the published material triggers user behavior that causes engagement with the portal, etc. Understanding the above is dependent upon the ability to measure user engagement. The focus of this tutorial is how user engagement is currently being measured and future considerations for its measurement.

User engagement is a multifaceted, complex phenomenon; this gives rise to a number of potential approaches for its measurement, both objective and subjective. Common ways of measuring user engagement include: self-reporting, e.g., questionnaires; observational methods, such as facial expression analysis, speech analysis, desktop actions, etc.; neuro-physiological signal processing methods, e.g., respiratory and cardiovascular accelerations and decelerations, muscle spasms, etc.; and web analytics, online behavior metrics that assess users’ depth of engagement with a site. These methods represent various tradeoffs between the scale of data analyzed and the depth of understanding. For instance, surveys are small-scale but deep, whereas clicks can be collected on a large-scale but provide shallow understanding.

The tutorial will start with a definition of user engagement and discuss the challenges associated with its measurement. The tutorial will then have two main parts. Part I will describe self-report measures, physiological measures, and web analytics. We aim to provide a full understanding of each type of approach, including methodological aspects, concrete findings, and advantages and disadvantages. Part II will concentrate on advanced aspects of user engagement measurement, and is comprised of three sub-sections. We will look at (1) how current metrics may or may not apply to the mobile environment; (2) the relationship between user engagement on-site with other sites in terms of user traffic or stylistics; and finally (3) the integration of various approaches for measuring engagement as a means of providing a deeper and more coherent understanding of engagement success. The tutorial will end with some conclusions, open research problems, and suggestions for future research and development.

Part I – Foundations

Approaches based on Self-Report Measures
Questionnaires are one of the most common ways of gathering information about the user experience. Although self-report measures are subjective in nature, they have several advantages, including being convenient and easy to administer, and capturing users’ perceptions of an experience at a particular point in time. The fundamental problem is that questionnaires are seldom subjected to rigorous evaluation. The User Engagement Scale (UES), a self-report measure developed by O’Brien and colleagues in 2010 will be used to discuss issues of reliability and validity with self-report measures. The UES consists of six underlying dimensions: Aesthetic Appeal, Perceived Usability, Focused Attention, Felt Involvement, Novelty, and Endurability (i.e., users’ overall evaluation). It has been used in online web surveys and user studies to assess engagement with e-commerce, wiki search, multimedia presentations, academic reading environments, and online news. Data analysis has focused on statistically analyzing the reliability and component structure of the UES, and on examining the relationship between the UES and other self-report measures, performance, and physiological measures. These findings will be shared, and the benefits and drawbacks of the UES for measuring engagement will be explored.

Approaches based physiological measures
Physiological data can be captured by a broad range of sensors related to different cognitive states. Examples of sensors are eye trackers (e.g., difficulty, attention, fatigue, mental activity, strong emotion), mouse pressure (stress, certainty of response), biosensors (e.g., temperature for negative affect and relaxation, electrodermal for arousal, blood flow for stress and emotion intensity), oximiters (e.g., pulse), camera (e.g., face tracking for general emotion detection). Such sensors have several advantages over questionnaires or online behaviour, since they are more directly connected to the emotional state of the user, are more objective (measuring involuntary body responses) and they are continuously measured. They are, however, more invasive and, apart from mouse tracking, cannot be used on a large-scale. They can nonetheless be highly indicative of immersive states through their links with attention, affect, the perception of aesthetics and novelty – all of which are important characteristics of user engagement. A particular focus in this tutorial will be the usage of mouse pressure, so-called mouse tracking, because of its potential for large-scale measurement. The use of eye-tracking to measure engagement will also be discussed, because of its relationship to mouse movement.

Approaches based on web analytic
The most common way that engagement is measured, especially in production websites, is through various proxy measures of user engagement. Standard metrics include the number of page views, number of unique users, dwell time, bounce rate, and click-through rate. In addition, with the explosion of user-generated content, the number of comments and social network “like” buttons are also becoming widely used measures of web service performance. In this part we will review these measures, and discuss what they measure vis-à-vis user engagement, and consequently their advantages and drawbacks. We will provide extensive details on the appropriateness of these metrics to various websites. Finally, we will discuss recent work on combining these measures to form single measures of user engagement.

Part II – Advanced Aspects

Measuring User Engagement in Mobile Information Searching
Mobile use is a growing area of interest with respect to user engagement. Mobile devices are utilized in dynamic and shifting contexts that form the fabric of everyday life, their portability and functionality make them more suited to some tasks than others, and they are often used in the presence of other people. All of these considerations – context, task, and social situatedness – have implications for user engagement. The Engagement Lab at the University of British Columbia is exploring user engagement with mobile devices in a series of studies. In this section of the tutorial, we will explore the ways in which mobile engagement may differ from engagement with other devices and what the implications of this are for measurement. We will describe both lab and field-based work that we are undertaking, and the measures that we are selecting to capture mobile engagement.

Networked User Engagement
Nowadays, many providers operate multiple content sites, which are very different from each other. For example, Yahoo! operates sites on finance, sports, celebrities, and shopping. Due to the varied content served by these sites, it is difficult to treat them as a single entity. For this reason, they are usually studied and optimized separately. However, user engagement should be examined not only within individual sites, but also across sites, that is the entire content provider network. Such engagement was recently defined by Lalmas et al. as “Networked User Engagement”. In this part of the tutorial we will present recent findings on the optimization of networked user engagement. We will demonstrate the effect of the network on user engagement, and show how changes in elements of websites can increase networked user engagement.

Combining different approaches
Little work has been done to integrate these various measures. It is important to combine insights from big data with deep analysis of human behavior in the lab, or through crowd-sourcing experiments, to obtain a coherent understanding of engagement success. However, a number of initiatives aiming to combine techniques from web analytics, existing works on user engagement coming from the domains of information science, multimodal human computer interaction and cognitive psychology, are emerging. We will discuss work emerging in these directions, and in particular studies related to mapping mouse tracking and qualitative measurement of user engagement, and the challenges in designing experiments, and interpreting and generalizing results.

Your news audience in Twitter – Discover your curators

@SNOW/WWW, 2013, by Janette Lehmann, Carlos Castillo, Mounia Lalmas, and Ethan Zuckerman

Original post at WWW Workshop on Social News on the Web

flickr - striatic - 436654901_e1b0204d14_o_edited

Information between journalists and their audience in social media flows in both ways. A recent study from the Oriella PR Network showed that over 54% of journalists use online social media platforms (Twitter, Facebook, and others) and 44% use blogs to find new story angles or verify the stories they work on. There are now platforms, such as Storyful, that provide user lists of high quality, developed by journalists for journalists.

Our starting point is the community of engaged readers of a news story — those who share a particular news article through Twitter. We refer to them as a transient news crowd, in analogy with the group of passers-by that gathers around unexpected events such as accidents in a busy street. The question is whether the users of such a crowd can provide further valuable information related to the story of the news article.

Many members of news crowds are far from being passive in their consumption of news content. They are news curators, because they filter, select, and disseminate carefully selected news stories about a topic.

A famous example for this type of news curator is Andy Carvin (@acarvin), who mostly collects news related to the Arabic world. He became famous for his curatorial work during the Arab Spring, where he aggregated reports in real time and tweeted up to thousands of tweets per day. We expect that among the users who share an article in Twitter are also other curators like Andy Carvin who may follow-up with further tweets.

We have observed that basically all news stories have a set of Twitter users who may be potential news curators for the topics of the story. For instance, among the people who tweeted the Al Jazeera’s article “Syria allows UN to step up food aid” (posted January 2013), there are at least two news curators: @RevolutionSyria and @KenanFreeSyria.

However, not everybody can be considered a news curator. Some people tweet one piece of news that was interesting to them and move on. Others tweet a wide range of news stories. Curators are individuals who carefully follow a story or related set of stories. In our SNOW 2013 work “Finding News Curators in Twitter”, we defined a set of features for each user and demonstrated that they can be used to automatically find relevant curators among the audience. The features describe the visibility, tweeting activity and the topical focus of a user. We collected news articles published in early 2013 of BBC World Service (BBC) and Al Jazeera English (AJE). Then, we followed the users that posted a specific article and analyzed their tweeting behavior. Our results reveal that 13% of the users from AJE and 1.8% of the users from BBC world are possible news curators.

The roles of curators in a crowd…

Some news curators are more focused than others. For instance, @KeriJSmith, a self-defined “internet marketer” tweets about various interesting news on a large variety of topics, while others are more selective. A famous example is Chan’ad Bahraini (@chanadbh) who tweets about Bahrain. Whether a user is topic-focused or not can be determined, for instance, by the number of different sections of a news web site s/he is tweeting about. If these sections differ (e.g. from finance to celebrities), we can assume that the user is less focused.

Considering only the topical focus of a user is not sufficient when identifying story curators. A significant amount of Twitter accounts operate as news aggregators – collecting news articles automatically (e.g. from RSS feeds) and posting their corresponding headlines and URLs to Twitter (45% in Al Jazeera English, 65% in BBC world). They can be identified easily, as most or all of their tweets contain URLs and they do not tend to interact much via messages with other users.

The majority of news aggregators post many tweets per day related to breaking news and top stories, e.g. @BreakingNews. Only a minority is focused on more specific topics, and thus constitutes topic-focused aggregators. The user @RevolutionSyria, for instance, distributes automatically news articles about the civil war in Syria. Whether the automatic generated tweets provide interesting content to a topic is questionable. Nonetheless, some news aggregators seem to be considered valuable by users, as in the case of @RevolutionSyria who has around 100,000 followers at the time of this writing.

In short, our current research deals with identifying crowds, curators, and aggregators. For more details you can check our articles and presentations:

  • Janette Lehmann, Carlos Castillo, Mounia Lalmas and Ethan Zuckerman: Finding News Curators in Twitter. To be presented at the WWW Workshop on Social News On the Web (SNOW), Rio de Janeiro, Brazil.
  • Janette Lehmann, Carlos Castillo, Mounia Lalmas and Ethan Zuckerman: Transient News Crowds in Social Media. To be presented at the Seventh International AAAI Conference on Weblogs and Social Media (ICWSM), 8-10 July 2013, Cambridge, Massachusetts.

Photo credits: Hobvias Sudoneighm (Creative Commons BY).