Tag Archives: user engagement

Viewport time: From user engagement to user attention in online news reading

This is the first blog post on a paper that will be presented at WSDM 2016 [1], on metrics of user engagement using viewport time. This work is in collaboration with Dmitry Lagun, and was carried out while Dmitry was a student at Emory University, and as part of a Yahoo Faculty Research and Engagement Program.

Figure 1 (a): Example page showing a pattern of user attention, which is the most common one when reader’s attention decays monotonically towards the bottom of the article.

Figure 1 (a): Example page showing the most common pattern of user attention, where the reader attention decays monotonically towards the bottom of the article.

Figure 1 (b): Example page showing a pattern of user attention with an unusual distribution of attention indicating that content positioned closer to the end of the article attracts significant portion of user attention.

Figure 1 (b): Example page showing an unusual distribution of attention, indicating that content positioned closer to the end of the article attracts significant portion of user attention.

Online content providers such as news portals constantly seek to attract large shares of online attention by keeping their users engaged. A common challenge is to identify which aspects of the online interaction influence user engagement the most, so that users spend time on the content provider site. This component of engagement can be described as a combination of cognitive processes such as focused attention, affect and interest, traditionally measured using surveys. It is also measured through large-scale analytical metrics that assess users’ depth of interaction with the site. Dwell time, the time spent on a resource (for example a webpage or a web site) is one such metric, and has proven to be a meaningful and robust metric of user engagement in many contexts.

However, dwell time has limitations. Consider Figure 1 above, which shows examples of two webpages (news articles) of a major news portal, with associated distribution of time users spend at each vertical position of the article. The blue densities on the right side indicate the average amount of time users spent viewing a particular part of the article. We see two patterns:

In (a) users spend most of their time towards the top of the page, whereas in (b) users spend significant amount of time further down the page, likely reading and contributing comments to the news articles. Although the dwell time for (b) is likely to be higher (the data indeed shows this), it does not tell us much about user attention on the page, neither it allows us to differentiate between consumption patterns with similar dwell time values.

Many works have looked at the relationship between dwell time and properties of webpages, leading to the following results:

  • A strong tendency to spend more time on interesting articles rather than on uninteresting ones.
  • A very weak correlation between article length and associated reading times, indicating that most articles are only read in parts, not in their entirety. When these two correlate, they do so only to some extent, suggesting that users have a maximum time-budget to consume an article.
  • The presence of videos and photos, the layout and textual features, and the readability of the webpage can influence the time users spend on a webpage.

However, dwell time does not capture where on the page users are focusing, namely the user attention. Hence, the suggestion of using other measurements to study user attention.

Studies of user attention using eye-tracking provided numerous insights about typical content examination strategies, such as top to bottom scanning of web search results. In the context of news reading, gaze is a reliable indicator of interestingness and correlates with self-reported engagement metrics, such as focused attention and affect. However, due to the high cost of eye-tracking studies, a considerable amount of research was devoted to finding more scalable methods of attention measurement, which would allow monitoring attention of online users at large scale. Mouse cursor tracking was proposed as a cheap alternative to eye-tracking. Mouse cursor position was shown to align with gaze position, when users perform a click or a pointing action in many search contexts, and to infer user interests in webpages. The ratio of mouse cursor movement to time spent on a webpage is also a good indicator of how interested users are in the webpage content, and cursor tracking can inform about whether users are attentive to certain content when reading it, and what their experience was.

However, despite promising results, the extent of coordination between gaze and mouse cursor depends on the user task e.g. text highlighting, pointing or clicking. Moreover, eye and cursor are poorly coordinated during cursor inactivity, hence limiting the utility of mouse cursor as an attention measurement tool in a news reading task, where minimal pointing is required. Thus, we propose to use instead viewport time to study user attention.

Viewport is defined as the position of the webpage that is visible at any given time to the user. Viewport time is the time a user spends viewing an article at a given viewport position.

Viewport time has been used as an implicit feedback information to improve search result ranking for subsequent search queries, to help eliminating position bias in search result examination, and to detect bad snippets and improve search result ranking in document summarization. Viewport time was also successfully used to infer user interest at sub-document level on mobile devices, and was helpful in evaluating rich informational results that may lack active user interaction, such as click.

Our work adds to this body of works, and explores viewport time, as a coarse, but more robust instrument to measure user attention during news reading.

Figure 2. Distribution of viewport time averaged across all page views.

Figure 2. Distribution of viewport time averaged across all page views.

Figure 2 shows the viewport time distribution computed from all page views on a large sample of news articles. It has a bi-modal shape with the first peak occurring at approximately 1000 px and the second, less pronounced peak at 5000 px, suggesting that most page views have the viewport profile that falls between cases (a) and (b) of Figure 1. This also shows that on average user spends significantly smaller amount of time at lower scroll positions – the viewport time decays towards the bottom of the page. The fact that users spend substantially less time reading seemingly equivalent amount of text (top versus bottom of the article) may also explain the weak correlation between article length and the dwell time reported in several works.

Although users often remain in the upper part of an article, some users do find the article interesting enough to spend significant amount of time at the lower part of the article, and even to interact with the comments. Thus, some articles entice users to deeply engage with their content.

In this paper, we build upon this observation and employ viewport data to develop user engagement metrics that can measure to what extent the user interaction with a news article follows the signature of positive user engagement, i.e., users read most of the article and read/post/reply to a comment. We then develop a probabilistic model that accounts for both the extent of the engagement and the textual topic of the article. Through our experiments we demonstrate that such model is able to predict future level of user engagement with a news article significantly better than currently available methods.

Online, users multitask

screenshot multitaskingWe often access several sites within an online session. We may perform one main task (when we plan a holiday, we often compare offers from different travel sites, go to a review site to check hotels), or several totally unrelated tasks in parallel (responding to an email while reading news). Both are what we call online multitasking. We are interested in the extent to which multitasking occurs, and whether we can identify patterns.

Our dataset

Our dataset consists of one month of anonymised interaction data from a sample of 2.5 millions users who gave their consent to provide browsing data through a toolbar. We selected 760 sites, which we categorised according to the type of services they offer. Examples of services include mail, news, social network, shopping, search, and sometimes cater to different audiences (for example, news about sport, tech and finance). Our dataset contains 41 million sessions, where a session ends if more than 30 minutes have elapsed between two successive page views. Finally, continuous page views of the same site are merged to form a site visit.

How much multitasking in a session?

On average, 10.20 distinct sites are visited within a session, and for 22% of the visits the site was accessed previously during the session. More sites are visited and revisited as the session length increases. Short sessions have on average 3.01 distinct sites with a revisitation rate of 0.10. By contrast, long sessions have on average 9.62 different visited sites with a revisitation rate of 0.22.

We focus on four categories of sites: news (finance), news (tech), social media, and mail. We extract for each category a random sample of 10,000 sessions. As shown in Figure 1 below, the sites with the highest number of visits within a session belong to the social media category (average of 2.28), whereas news (tech) sites are the least revisited sites (average of 1.76). The other two categories have on average 2.09 visits per session.

Visits and absence time
Figure 1: Site visit characteristics for four categories of sites: (Left) Distribution of time between visits; and (Right) Average and standard deviation of number of visits and time between visits.

What happens between the visits to a site?

We call  the time between visits to a site within the session absence time. We see three main patterns with the four categories of sites, as shown in Figure 1 above (right):

  • social media sites and news (tech) sites have an average absence time of 4.47 minutes and 3.95 minutes, respectively, although the distributions are similar;
  • news (finance) sites have a skewer distribution, indicating a higher proportion of short absence time for sites in this category;
  • mail sites have the highest absence time, 6.86 minutes on average.

However, the media of the distributions of the absence time across all categories of sites is less than 1 minute, and this for all categories. That is, many sites are revisited after a short break. We speculate that a short break corresponds to an interruption of the task being performed by the user (on the site), whereas a longer break indicates that the user is returning to the site to perform a new task.

How do users switch between sites?

Users can switch between sites in several ways:

  1. hyperlinking: clicking on a link,
  2. teleporting: jumping to a page using bookmarks or typing an URL,  or
  3. backpaging: using the back button on the browser, or when several tabs or windows are ope and the user returns to one of them).

The way users revisit sites varies depending on the session length. Teleporting and hyperlinking are the most important mechanisms to re-access a site during short sessions (30% teleporting and 52% hyperlinking for short sessions), whereas backpaging becomes more predominant in longer sessions. Tabs or the back button are often used to revisit a site.

Patterns of multitasking
Figure 2: (Top) Visit patterns described by the average time spent on the site at the ith visit in a session. (Bottom) Usage of navigation types described by the proportion of each navigation type at the ith visit in a session.

We also look at how users access a site at each revisit, for the four categories of sites. This is shown in Figure 2 (bottom).

  • For all four categories of sites, the first visit is often through teleportation. Accessing a site in this manner indicates a high level of engagement, in particular in terms of loyalty, with the site, since users are likely to have bookmarked the site at some previous interaction with it. In our dataset, teleportation is more frequently used to access news (tech) sites than news (finance) sites.
  • After the first visit, backpaging is increasingly used to access a site. This is an indication that users leave the site by opening a new tab or window, and then return to the site later to continue whatever they were doing on the site.
  • However, in general, users still revisit a site mostly through hyperlinking, suggesting that links still have an important role in directing users to a site. In our dataset, news (finance) sites are mostly accessed through links; users are directed to sites of this category via a link.

Time spent at each revisit

For each site, we select all sessions where the site was visited at least four times. We see four main patterns, which are shown in Figure 2 (top):

  • The time spent on social media sites increases at each revisit (a case of increased attention). The opposite is observed for mail sites (a case of decreased attention). A possible explanation is that, for mail sites, there are less messages to read in subsequent visits, whereas for social media sites, users have more time to spend on them eventually because the other tasks they were doing are getting finished.
  • News (finance) is an example of category for which neither a lower or higher dwell time is observed at each subsequent revisit (a case of constant attention). We hypothesise that each visit corresponds either to a new task or a user following some evolving piece of information such as checking the latest stock price figures.
  • The time spent on news (tech) sites at each revisit is fluctuating. Either no patterns exist or the pattern is complex, and cannot easily be described (a case of complex attention). However, when looking at the first two visits or the last two visits, in both cases, more time is spent in each second visit. This may indicate that the visits belong to two different tasks, and each task is performed in two distinct visits to the site. Teleportation is more frequent at the 1st and 3rd visits, which confirms this hypothesis (Figure 2, bottom).

Take away message

Multitasking exists, as many sites are visited and revisited during a session. Multitasking influences the way users access sites, and this depends on the type of site.

This work was done in collaboration with Janette Lehmann, Georges Dupret and Ricardo Baeza-Yates. More details about the study can be found in  Online Multitasking and User Engagement, ACM International Conference on Information and Knowledge Management (CIKM 2013), 27 October – 1 November 2013, San Francisco, USA.

Photo credits: D&D (Creative Commons BY).

How engaged are Wikipedia users?

Wikipedia Recently, we were asked: “How engaged are Wikipedia users?” To answer this question, we visited Alexa, a Web Analytics site, and learned that Wikipedia is one of the most visited sites in the world (ranked 6th), that users spend on average around 4:35 minutes per day on Wikipedia, and that many visits to Wikipedia come from search engines (43%). We also found studies about readers’ preferences, Wikipedia growth, and Wikipedia editors. There is however little about how users engage with Wikipedia, in particular about those not contributing content to Wikipedia.

Can we do more?

Beside reading and editing articles, users perform many other actions: they look at the revision history, search for specific content, browse through Wikipedia categories, visit portal sites to learn about specific topics, or visit the community portal. Although discussing an article is a sign of a highly engaged user, performing several actions within the same visit to Wikipedia is also a sign of a highly engaged user. It is this latter type of engagement we looked into.

Action networks

action_networkWe collected 13 months (September 2011 to September 2012) of browsing data from an anonymized sample of approximately 1.3M users.  We identified 48 actions such as reading an article, editing, opening an account, donating, visiting a special page. We then built a weighted action network: nodes are the actions and two nodes are connected by an edge if the two corresponding actions were performed during the same visit to Wikipedia. Each node has  a weight representing the number of users performing the corresponding action (the node traffic). Each edge has a weight representing the number of users that performed the two corresponding actions (the traffic between the two nodes).

Engagement over time

We use the following metrics to measure engagement on Wikipedia based on actions:

  • TotalNodeTraffic: total number of actions (sum of all node weights)
  • TotalEdgeTraffic: total number of pairwise actions (sum of all edge weights)
  • TotalTrafficRecirculation: actual network traffic with respect to maximum possible traffic (TotalEdgeTraffic/TotalNodeTraffic).

We calculated these metrics for the 13 months under consideration and plotted their variations over time. An increase in TotalNodeTraffic means that more users visited Wikipedia. An increase in TotalTrafficRecirculation means that more users performed at least two actions while on Wikipedia, our chosen indicator of high engagement in Wikipedia. We observe that TotalNodeTraffic increased first then became more or less stable. By contrast, TotalTrafficRecirculation mostly decreased, but we see a small peak in January 2011.

rcTraffic_monthlyTwo important events happened in our 13-month period. During the donation campaign (November to December 2011) more users visited Wikipedia (higher TotalNodeTraffic value). We speculate that many users became interested in Wikipedia during the campaign. However, because TotalTrafficRecirculation actually decreased for the same period, although more users visited Wikipedia, they did not perform two (or more) actions while visiting Wikiepedia; they did not become more engaged with Wikipedia. However, during the SOPA/PIPA protest (January 2012), we see a peak in TotalNodeTraffic and TotalTrafficRecirculation. More users visited Wikipedia and many users became more engaged with Wikipedia; they also read articles, gathered information about the protest, donated money while visiting Wikipedia.

rcTraffic_weekdays+endWe detected different engagement patterns on weekdays and weekends. Whereas more users visited Wikipedia during weekdays (high value of TotalNodeTraffic), users that visited Wikipedia during the weekend were more engaged (high value of TotalTrafficRecirculation). On weekends, users performed more actions during their visits.

People behave differently on weekdays compared to weekends. The same happens with Wikipedia.

Did the donation campaign make Wikipedia more engaging?

meaganmakes - 182-365+1 [cc] - 2 So which actions became more frequent as a result of the donation campaign? As expected, we observed a significant traffic increase on the “donate” node during the two months; many users made a donation. In addition, the traffic from some nodes to other nodes  increased but only slightly. Additional actions were performed;  for instance, more users created a user account, visited community-related pages, all within the same session. However, overall, users mostly performed individual actions since TotalTrafficRecirculation decreased during that time period.

So the campaign was successful in terms of donation, but less in terms of making Wikipedia more engaging.

This is a write-up of the presentation given by Janette Lehmann at TNETS Satellite, ECCS, Barcelona, September 2013.

Today I am giving a keynote at the 18th International Conference on Application of Natural Language to Information Systems (NLDB2013), which is held at MediaCityUK, Salford.

I have now started to think at what are the questions to ask when evaluating user engagement. In the talk, I discuss these questions through five studies we did. Also included are questions asked when

  • evaluating serendipitous experience in the context of entity-driven search using social media such as Wikipedia and Yahoo! Answers.
  • evaluating the news reading experience when links to related articles are automatically generated using “light weight” understanding techniques.

The slides are available on Slideshare.

Relevant published papers include:

I will write about these two works in later posts.

What can absence time tell about user engagement?

Two widely employed engagement metrics are click-through rate and dwell time. These are particularly used for services where user engagement is about clicking, for example in the context of search where presumably users click on relevant results, and/or spending time on a site, for example consuming content in the context of a news portal.

In search, both have been used as indicator of relevance, and have been exploited to infer user satisfaction with their search results and improve ranking functions. However, how to properly interpret the relations between these metrics, retrieval quality and the long-term user engagement with the search application is not straightforward. Also, relying solely on clicks and time spent can  lead to contradictory if not erroneous conclusions. Indeed, with the current trend of displaying rich information on web pages, for instance the phone number of restaurants or weather data in search results, users do not need to click to access the information and the time spent on a website is shorter.

5127965259_66c1061cbb_nMeasure: Absence time 
The absence time measures the time it takes a user to decide to return to a site to accomplish a new task. Taking a news site as an example, a good experience associated with quality articles might motivate the user to come back to that news site on a regular basis. On the other hand, if the user is disappointed, for example, the articles were not interesting, the site was confusing, he or she may return less often and even switch to an alternative news provider. Another example is a visit to a community questions and answers website. If the questions of a user are well and promptly answered, the odds are that he or she will be enticed to raise new questions and return to the site soon.

Our assumption is that if users find a site interesting, engaging or useful, they will return to it sooner.

This assumption has the advantage of being simple, intuitive and applicable to a large number of settings.

Case study: Yahoo! Answers Japan
We used a popular community querying and answering website hosted by Yahoo! Japan, where users are given the possibility to ask questions about any topic of their interest. Other users may respond by writing an answer. These answers are recorded and can be searched by any user through a standard search interface. We studied the actions of approximately one million users during two weeks.  A user action happens every time a user interacts with Yahoo! Answers: every time he or she issues a query or clicks on a link, be it an answer, an ad or a navigation button. We compare the behaviour of users exposed to six functions used to rank past answers both in term of traditional metrics and of absence time.

Methodology: Survival analysis
We use Survival Analysis to study absence time. Survival Analysis has been used in applications concerned with the death of biological organisms, each receiving different treatments. An example is throat cancer treatment where patients are administered one of several drugs and the practitioner is interested in seeing how effective the different treatments are.  The analogy with our analysis of absence time is unfortunate but nevertheless useful. We associate the user exposition to one of the ranking functions as a “treatment” and his or her survival time as the absence time. In other words, a Yahoo! Answers user dies each time he or she visits the site … but hopefully resuscitates instantly as soon as his or her visit ends.

Survival analysis makes uses of a hazard rate, which reflects the probability that a user dies at a given time. It can be very loosely understood as the speed of death of a population of patients at that  time. Returning to our example, if the hazard rate of throat cancer patients administered with say drug A is higher than the hazard rate of patients under drug B treatment, then drug B patients have a higher probability of surviving until that time. A higher hazard rate implies a lower survival rate.

We use hazard rates to compare the different ranking functions for Yahoo! Answers: a higher hazard rate translates into a short absence time and a prompter return to Yahoo! Answers, which is a sign of higher engagement. What did we find?

A better ranking does not imply more engaged users
Ranking algorithms are compared with a number of measures; a widely used one is DCG, which rewards ranking algorithms retrieving relevant results at high ranks. The higher the DCG, the better the ranking algorithm. We saw that, for the six ranking functions we compared, a higher DCG did not always translate to a higher hazard rate, or in other words, users returning to Yahoo! Answers sooner.

Returning relevant results is important, but is not the only criterion to keep users engaged with the search application.

More clicks is not always good, but no click is bad
A common assumption is that a higher number of clicks is a reflection of a higher user satisfaction with the search results. We observe that up to 5 clicks, each new click is associated with a higher hazard rate, but the increases from the third click are small. A fourth or fifth click has a very similar hazard rate. From the sixth click, the hazard rates decreases slowly.

This suggests that on average, clicks after the fifth one reflect a poorer user experience; users cannot find the information they are looking for.

We also observed that the hazard rate with five clicks or more is always higher compared with no click at all; when users search on Yahoo! Answers, no click means a bad user experience.

A click at rank 3 is better than a click at rank 1
The hazard rate is larger for clicks at ranks 2, 3 and 4, the maximum arising at rank 3, when compared to click at rank 1. For lower ranks, the trend is toward decreasing hazard.  Only the click at rank 10 was found to be clearly less valuable than a click at rank 1. It seems that users unhappy with results at earlier ranks simply click on the last displayed result, for no apparent reason apart for it being the last one on the search result page.

Clicking lower in the ranking suggests a more careful choice from the user, while clicking at the bottom is a sign that the overall ranking is of low quality.

Clicking fast on a result is a good sign
We found that the shorter the time between the search results of a query being displayed and the first click, the higher the hazard rate.

Users who find their answers quickly return sooner to the search application.

More views is worst that more queries
When users are returned search results, they may click on a result, then return back to the search result page, and then click on another result. Each display of search results generates a view. At anytime, the user may submit a new query. Both returning to the search result page several times and a higher number of query reformulations are signs that the user is not satisfied with the current search results. Which one is worse? We could see that having more views than queries was associated on average with a low hazard rate, meaning a longer absence time.

This suggests that returning to the same search result page is a worse user experience  than reformulating the query.

Without the absence time, it would have been harder to observe this, unless we asked explicitly the users to tell us what is going on.

7179266571_541698d0e5_nA small warning
A user might decide to return sooner or later to a website due to reasons unrelated with the previous visits (being on holidays for example). It is important to have a large sample of interaction data to detect coherent signals and to take systematic effects into account.

Take away message

Using absence time to measure user engagement is easy to interpret and less ambiguous than many of the commonly employed metrics. Use it and get new insights with it.

This work was done in collaboration with Georges Dupret. More details about the study can be found in  Absence time and user engagement: Evaluating Ranking Functions, which was published at the 6th ACM International Conference on Web Search and Data Mining in Rome, 2013.

Photo credits: tanfelisa and kaniths (Creative Commons BY).

We need a taxonomy of web user engagement

There are lots lots and lots of metrics that can be used to assess how users engage with a website. Widely used ones by the web-analytics community are click-through rates, number of page views, time spent on a website, how often users return to a site, number of users.

uue_engmetrics_wordle

Although these metrics cannot explicitly explain why users engage with a site, they can act as proxy for online user engagement: two millions of users accessing a website daily is a strong indication of a high engagement with that site.

Metrics, metrics and metrics

There are three main types of web-analytics metrics:

  • Popularity metrics measure how much a website is used (for example, by counting the total number of users on the site in a week). The higher the number, the more popular the website.
  • How a website is used when visited is measured with activity metrics, for example, the average number of clicks per visit across all users.
  • Loyalty metrics are concerned with how often users return to a website. An example is the return rate, calculated as the average number of times users visited a website within a month.

Loyalty and popularity metrics can be calculated on a daily, weekly or monthly basis. Activity metrics are calculated at visit level.

So one would think that a highly engaging website is one with a high number of visits (very popular), where users spend lots of time and click often (lots of activity), and return frequently (high loyalty). But not all websites, whether popular or not, have both active and loyal users.

This does not mean that user engagement on such websites is lower; it is simply different.

422362185_a260ad4ee4_q What did we do?

We collected one-month browsing data from an anonymized sample of approximately 2M users. For 80 websites, encompassing a diverse set of services such as news, weather, movies, mail, we calculated the average values of the following eight metrics:

  • Popularity metrics: number of distinct users, number of visits, and number of clicks (also called page views) for that month.
  • Activity metrics: average number of page views per visit and average time per visit (also called dwell time).
  • Loyalty metrics: number of days a user visited the site, number of times a user visited the site, and average time a user spend on the site, for that month.

Websites differ widely in terms of their engagement

Some websites are very popular (for example, news portals) whereas others are visited by small groups of users (lots of specific-interest websites were this way). Visit activity also depends on the websites. For instance, search sites tend to have a much shorter dwell time than sites related to entertainment (where people play games). Loyalty per website differed as well. Media (news, magazines) and communication (messenger, mail) sites have many users returning to them much more regularly, than sites containing information of temporary interests (e-commerce site selling cars). Loyalty is also influenced by the frequency in which new content is published. Indeed, some sites produce new content once per week.

High popularity did not entail high activity. Many site have many users spending little time on them. A good example is of a search site, where users come, submit a query, get the result, and if satisfied, leave the site.

This results in a low dwell time even though user expectations were entirely met.

The same holds for a site on Q&A, or a weather site. What matters for such sites is their popularity.

Any patterns? Yes … 

To identify engagement patterns, we grouped the 80 sites using clustering approaches applied to the eight engagement metrics. We also extracted for each group which metrics and their values (whether high or low) were specific to that group. This process generated five groups with clear engagement patterns, and a sixth group with none:

  • Sites where the main factor was their high popularity (for example as measured by the high numbers of users). Examples of sites following this pattern include media sites providing daily news and search sites. Those are sites where users interact in various ways with them; what is common is that they are used by many users.
  • Sites with low popularity, for instance having a low number of visits. Many interest-specific sites followed this pattern. Those sites center around niche topics or services, which do not attract a large number of users.
  • Sites with a high number of clicks per visit. This pattern was followed by e-commerce and configuration (accessed by users to update their profiles for example) sites, where the main activity is to click.
  • Sites with high dwell time and low clicks per visit, and with low loyalty. This pattern was followed by domain-specific media sites of periodic nature (new content published on a weekly basis), which are therefore not often accessed. However when accessed, users spend more time to consume their content. The design of such sites (compared to mainstream media sites) leads to such type of engagement, since new content was typically published on their homepage. Thus users are not enticed to reach (if any) additional content.
  • Sites with high loyalty, small dwell time and few clicks. This pattern was followed by navigational sites (the front page of an Internet company), which role is to direct users to interesting content or service in other sites (of that same company); what matters is that users come regularly to them.

This simple study (80 sites and 8 metrics) identified several patterns of user engagement.

However, sites of the same type do not necessarily follow the same engagement pattern.

For instance, not all mainstream media sites followed the first pattern (high popularity). It is likely that, among others, the structure of the site has an effect.

Green apples measured  the meter, sports apples                                          … So what now?

We must study way more sites and include lots more engagement metrics. This is the only way to build, if we want, and we should, a taxonomy of web user engagement. With a taxonomy, we will know the best metrics to measure engagement on a site.

Counting clicks may be totally useless for some sites. But if not, and the number of clicks is for instance way too low, knowing which engagement pattern a site follows helps making the appropriate changes to the site.

This work was done in collaboration with Janette Lehmann, Elad Yom-Tov and Georges Dupret. More details about the study can be found in  Models of User Engagement, a paper presented at the 20th conference on User Modeling, Adaptation, and Personalization (UMAP), 2012.

Photo credits: Denis Vrublevski and matt hutchinson (Creative Commons BY).

Together with Heather O’Brien and Elad Yom-Tov, we will be giving a tutorial at the International World-Wide Web Conference (WWW), 13-17 May 2013, Rio de Janeiro.

The slides are now available on Slideshare.
You can also access the two-slides per page format (PDF) here: MeasuringUserEngagement or one-slide per page format (PDF) here.
The references can be found here: References_Tutorial.

We will continue updating the slides, correct any errors and so on. Feedback very welcome.

Measuring User Engagement

Together with Heather O’Brien and Elad Yom-Tov, we will be giving a tutorial at the International World-Wide Web Conference (WWW), 13-17 May 2013, Rio de Janeiro. Here is a description of our tutorial. We will add slides and a bibliography soon.

Introduction and Motivations
In the online world, user engagement refers to the quality of the user experience that emphasizes the positive aspects of the interaction with a web application and, in particular, the phenomena associated with wanting to use that application longer and frequently. User engagement is a key concept in the design of web applications, motivated by the observation that successful applications are not just used, but are engaged with. Users invest time, attention, and emotion in their use of technology, and it must satisfy both their pragmatic and hedonic needs and expectations. Measurement is key for evaluating the success of information technologies, and is particularly critical to any web applications, from media to e-commerce sites, as it informs our understanding of user needs and expectations, system design and functionality. For instance, news portals have become a very popular destination for web users who read news online. As there is great potential for online news consumption but also serious competition among news portals, online news providers strive to develop effective and efficient strategies to engage users longer in their sites. Measuring how users engage with a news portal can inform the portal if there are areas that need to be enhanced, if current optimization techniques are still effective, if the published material triggers user behavior that causes engagement with the portal, etc. Understanding the above is dependent upon the ability to measure user engagement. The focus of this tutorial is how user engagement is currently being measured and future considerations for its measurement.

User engagement is a multifaceted, complex phenomenon; this gives rise to a number of potential approaches for its measurement, both objective and subjective. Common ways of measuring user engagement include: self-reporting, e.g., questionnaires; observational methods, such as facial expression analysis, speech analysis, desktop actions, etc.; neuro-physiological signal processing methods, e.g., respiratory and cardiovascular accelerations and decelerations, muscle spasms, etc.; and web analytics, online behavior metrics that assess users’ depth of engagement with a site. These methods represent various tradeoffs between the scale of data analyzed and the depth of understanding. For instance, surveys are small-scale but deep, whereas clicks can be collected on a large-scale but provide shallow understanding.

The tutorial will start with a definition of user engagement and discuss the challenges associated with its measurement. The tutorial will then have two main parts. Part I will describe self-report measures, physiological measures, and web analytics. We aim to provide a full understanding of each type of approach, including methodological aspects, concrete findings, and advantages and disadvantages. Part II will concentrate on advanced aspects of user engagement measurement, and is comprised of three sub-sections. We will look at (1) how current metrics may or may not apply to the mobile environment; (2) the relationship between user engagement on-site with other sites in terms of user traffic or stylistics; and finally (3) the integration of various approaches for measuring engagement as a means of providing a deeper and more coherent understanding of engagement success. The tutorial will end with some conclusions, open research problems, and suggestions for future research and development.

Part I – Foundations

Approaches based on Self-Report Measures
Questionnaires are one of the most common ways of gathering information about the user experience. Although self-report measures are subjective in nature, they have several advantages, including being convenient and easy to administer, and capturing users’ perceptions of an experience at a particular point in time. The fundamental problem is that questionnaires are seldom subjected to rigorous evaluation. The User Engagement Scale (UES), a self-report measure developed by O’Brien and colleagues in 2010 will be used to discuss issues of reliability and validity with self-report measures. The UES consists of six underlying dimensions: Aesthetic Appeal, Perceived Usability, Focused Attention, Felt Involvement, Novelty, and Endurability (i.e., users’ overall evaluation). It has been used in online web surveys and user studies to assess engagement with e-commerce, wiki search, multimedia presentations, academic reading environments, and online news. Data analysis has focused on statistically analyzing the reliability and component structure of the UES, and on examining the relationship between the UES and other self-report measures, performance, and physiological measures. These findings will be shared, and the benefits and drawbacks of the UES for measuring engagement will be explored.

Approaches based physiological measures
Physiological data can be captured by a broad range of sensors related to different cognitive states. Examples of sensors are eye trackers (e.g., difficulty, attention, fatigue, mental activity, strong emotion), mouse pressure (stress, certainty of response), biosensors (e.g., temperature for negative affect and relaxation, electrodermal for arousal, blood flow for stress and emotion intensity), oximiters (e.g., pulse), camera (e.g., face tracking for general emotion detection). Such sensors have several advantages over questionnaires or online behaviour, since they are more directly connected to the emotional state of the user, are more objective (measuring involuntary body responses) and they are continuously measured. They are, however, more invasive and, apart from mouse tracking, cannot be used on a large-scale. They can nonetheless be highly indicative of immersive states through their links with attention, affect, the perception of aesthetics and novelty – all of which are important characteristics of user engagement. A particular focus in this tutorial will be the usage of mouse pressure, so-called mouse tracking, because of its potential for large-scale measurement. The use of eye-tracking to measure engagement will also be discussed, because of its relationship to mouse movement.

Approaches based on web analytic
The most common way that engagement is measured, especially in production websites, is through various proxy measures of user engagement. Standard metrics include the number of page views, number of unique users, dwell time, bounce rate, and click-through rate. In addition, with the explosion of user-generated content, the number of comments and social network “like” buttons are also becoming widely used measures of web service performance. In this part we will review these measures, and discuss what they measure vis-à-vis user engagement, and consequently their advantages and drawbacks. We will provide extensive details on the appropriateness of these metrics to various websites. Finally, we will discuss recent work on combining these measures to form single measures of user engagement.

Part II – Advanced Aspects

Measuring User Engagement in Mobile Information Searching
Mobile use is a growing area of interest with respect to user engagement. Mobile devices are utilized in dynamic and shifting contexts that form the fabric of everyday life, their portability and functionality make them more suited to some tasks than others, and they are often used in the presence of other people. All of these considerations – context, task, and social situatedness – have implications for user engagement. The Engagement Lab at the University of British Columbia is exploring user engagement with mobile devices in a series of studies. In this section of the tutorial, we will explore the ways in which mobile engagement may differ from engagement with other devices and what the implications of this are for measurement. We will describe both lab and field-based work that we are undertaking, and the measures that we are selecting to capture mobile engagement.

Networked User Engagement
Nowadays, many providers operate multiple content sites, which are very different from each other. For example, Yahoo! operates sites on finance, sports, celebrities, and shopping. Due to the varied content served by these sites, it is difficult to treat them as a single entity. For this reason, they are usually studied and optimized separately. However, user engagement should be examined not only within individual sites, but also across sites, that is the entire content provider network. Such engagement was recently defined by Lalmas et al. as “Networked User Engagement”. In this part of the tutorial we will present recent findings on the optimization of networked user engagement. We will demonstrate the effect of the network on user engagement, and show how changes in elements of websites can increase networked user engagement.

Combining different approaches
Little work has been done to integrate these various measures. It is important to combine insights from big data with deep analysis of human behavior in the lab, or through crowd-sourcing experiments, to obtain a coherent understanding of engagement success. However, a number of initiatives aiming to combine techniques from web analytics, existing works on user engagement coming from the domains of information science, multimodal human computer interaction and cognitive psychology, are emerging. We will discuss work emerging in these directions, and in particular studies related to mapping mouse tracking and qualitative measurement of user engagement, and the challenges in designing experiments, and interpreting and generalizing results.