Promoting Positive Post-Click Experience for Native Advertising

Since September 2013, I have been working on user engagement in the context of native advertising. This blog post describes our first paper on this work, published at the Industry Track of ACM Knowledge Discovery & Data Mining (KDD) conference in 2015 [1]. This is work in collaboration with Janette Lehmann, Guy Shaked, Fabrizio Silvestri and Gabriele Tolomei.

Feed-based layouts, or streams, are becoming an increasingly common layout in many applications, and a predominant interface in mobile applications. In-stream advertising has emerged as a popular online advertising because it offers a user experience that fits nicely with that of the stream, and is often referred to as native advertising. In-stream or native ads have an appearance similar to that of the items in the stream, but clearly marked with a “Sponsored” label or a currency symbol e.g. “$” to indicate that they are in fact adverts.

A user decides if he or she is interested in the ad content by looking at its creative. If the user clicks on the creative he or she is redirected to the ad landing page, which is either a web page specifically created for that ad, or the advertiser homepage. The way user experiences the landing page, the ad post-click experience, is particularly important in the context of native ads because the creatives have mostly the same look and feel, and what differs mostly is their landing pages. The quality of the landing page will affect the ad post-click experience.

A positive experience increases the probability of users “converting” (e.g., purchasing an item, registering to a mailing list, or simply spending time on the site building an affinity with the brand). A positive post-click experience does not necessarily mean a conversion, as there may be many reasons why conversion does not happen, independent of the quality of the ad landing page. A more appropriate proxy of the post-click experience is the time a user spends on the ad site before returning back to the publisher site:

“the longer the time, the more likely the experience was positive”

The two most common measures used to quantify time spent on a site are dwell time and bounce rate. Dwell time is the time between users clicking on an ad creative until returning to the stream; bounce rate is the percentage of “short clicks” (clicks with dwell time less than a given threshold). On a randomly sampled native ads served on a mobile stream, we showed that these measures were indeed good proxies of post-click experience.

We also saw that users clicking on ads promoting a positive post-click experience, i.e. small bounce rate, were more likely to click on ads in the future, and their long-term engagement was positively affected.

Focusing on mobile, we found that a positive ad post-click experience was not just about serving ads with mobile-optimised landing pages; other aspects of an landing page affect the post-click experience. We therefore put forward a learning approach that analyses ad landing pages, and showed how these can predict dwell time and bounce rate. We experimented with three types of landing page features, related to the actual content and organization of the ad landing page, the similarity between the creative and the landing page, and ad past performance. The later type were best at predicting dwell time and bounce rate, but content and organization features performed well, and have the advantages to be applicable for all ads, not only for those that have been served.

Finally, we deployed our prediction model for ad quality based on dwell time on Yahoo Gemini, an unified ad marketplace for mobile search and native advertising, and validated its performance on the mobile news stream app running on iOS. Analyzing one month data through A/B testing, returning high quality ads, as measured in terms of the ad post-click experience, not only increases click-through rates by 18%, it has a positive effect on users: an increase in dwell time (+30%) and a decrease in bounce rate (-6.7%).

This work has progressed in two ways. We have improved the prediction model using survival random forests and considered new landing page features, such as text readability and the page structure [2]. We are also working with advertisers to help improving the quality of their landing pages. More about this in the near future.

Cursor movement and user engagement measurement

Many researchers have argued that cursor tracking data can provide enhance ways to learn about website visitors. One of the most difficult website performance metrics to accurately measure is user engagement, generally defined as the amount of attention and time visitors are willing to spend on a given website and how likely they are to return. Engagement is usually described as a combination of various characteristics. Many of these are difficult to measure, for example, focused attention and affect. These would traditionally be measured using physiological sensors (e.g. gaze tracking) or surveys. However, it may be possible that this information could be gathered through an analysis of cursor data.

This work [1] presents a study that asked participants to complete tasks on live websites using their own hardware in their natural environment. For each website two interfaces were created: one that would appear as normal and one that was intended to be aesthetically unappealing, as shown below. The participants, who were recruited through a crowd-sourcing platform, were tracked as they used modified variants of the Wikipedia and BBC News websites. There were asked to complete reading and information-finding tasks.

wiki_normal wiki_ugly








The aim of the study was to explore how cursor tracking data might tell us more about the user than could be measured using traditional means. The study explored several metrics that might be used when carrying out cursor tracking analyses. The results showed that it was possible to differentiate between users reading content and users looking for information based on cursor data. They also showed that the user’s hardware could be predicted from cursor movements alone. However, no relationship between cursor data and engagement was found. The implications of these results, from the impact on web analytics to the design of experiments to assess user engagement, are discussed.

This study demonstrates that designing experiments to obtain reliable insights about user engagement and its measurement remains challenging. Not finding a signal may not necessary means that the signal does not exist, but that some of the metrics used were not the correct ones. In hindsight, this is what we believe happened. The cursor metrics were not the right ones to differentiate between the levels of engagement experience as examined in this work. Indeed, recent work [2] showed that more complex mouse movement metrics did correlate with some engagement metrics.

  1. David Warnock and Mounia Lalmas. An Exploration of Cursor tracking Data. ArXiv e-prints, February 2015.
  2. Ioannis Arapakis, Mounia Lalmas Lalmas and George Valkanas. Understanding Within-Content Engagement through Pattern Analysis of Mouse Gestures, 23rd International Conference on Information and Knowledge Management (CIKM), November 2014.

A small note about being a woman in computer science

Today I attended an event about women in Computer Science (CS). I have attended many of them before. I was surprised that the same issues were discussed again and again after more than 15 years. I also heard of stories of women being advised against a career in CS or specific CS subjects. I also heard about sexual harassment at technical conferences.

I don’t know why (maybe I was very lucky), but I can’t remember experiencing much of these. Maybe my worst experience was when I attended a formal dinner for professors in Science & Engineering (I just became professor): there was 120 attendees and about 10 women. I didn’t understand the dress code (lounge suits) and I arrived with black jeans, black top and boots: one professor asked me for the “wine”.

One recommendation I often hear is to find a role model. I didn’t have any role model. Also why do I want to be like somebody? This does not mean that I was not inspired by people (men and women).

I was lucky to work with people that have been very supportive, in particular in helping me with my lack of confidence (which is still there). Those are the people who made a big difference to me, and have helped me to reach where I am.

Some people tried to help me to become more “successful” (whatever this means!): be more aggressive, speak more, speak louder, be more up-front, … , all things that I have always been struggling with. I did try to follow their advises as I wanted to be “more successful”. I even attended a voice course. I learned that to get a deeper voice, I should push my tummy out!!! No way 🙂

I really appreciate people trying to make me “more successful” … but after a while … I said “this is not me, it does not make me happy, and I don’t want it”.

What I am trying to say: listen to advices and recommendations, and decide what is RIGHT for you. Change what YOU think should change while remaining you. Take responsibility. And enjoy being you.

Online, users multitask

screenshot multitaskingWe often access several sites within an online session. We may perform one main task (when we plan a holiday, we often compare offers from different travel sites, go to a review site to check hotels), or several totally unrelated tasks in parallel (responding to an email while reading news). Both are what we call online multitasking. We are interested in the extent to which multitasking occurs, and whether we can identify patterns.

Our dataset

Our dataset consists of one month of anonymised interaction data from a sample of 2.5 millions users who gave their consent to provide browsing data through a toolbar. We selected 760 sites, which we categorised according to the type of services they offer. Examples of services include mail, news, social network, shopping, search, and sometimes cater to different audiences (for example, news about sport, tech and finance). Our dataset contains 41 million sessions, where a session ends if more than 30 minutes have elapsed between two successive page views. Finally, continuous page views of the same site are merged to form a site visit.

How much multitasking in a session?

On average, 10.20 distinct sites are visited within a session, and for 22% of the visits the site was accessed previously during the session. More sites are visited and revisited as the session length increases. Short sessions have on average 3.01 distinct sites with a revisitation rate of 0.10. By contrast, long sessions have on average 9.62 different visited sites with a revisitation rate of 0.22.

We focus on four categories of sites: news (finance), news (tech), social media, and mail. We extract for each category a random sample of 10,000 sessions. As shown in Figure 1 below, the sites with the highest number of visits within a session belong to the social media category (average of 2.28), whereas news (tech) sites are the least revisited sites (average of 1.76). The other two categories have on average 2.09 visits per session.

Visits and absence time
Figure 1: Site visit characteristics for four categories of sites: (Left) Distribution of time between visits; and (Right) Average and standard deviation of number of visits and time between visits.

What happens between the visits to a site?

We call  the time between visits to a site within the session absence time. We see three main patterns with the four categories of sites, as shown in Figure 1 above (right):

  • social media sites and news (tech) sites have an average absence time of 4.47 minutes and 3.95 minutes, respectively, although the distributions are similar;
  • news (finance) sites have a skewer distribution, indicating a higher proportion of short absence time for sites in this category;
  • mail sites have the highest absence time, 6.86 minutes on average.

However, the media of the distributions of the absence time across all categories of sites is less than 1 minute, and this for all categories. That is, many sites are revisited after a short break. We speculate that a short break corresponds to an interruption of the task being performed by the user (on the site), whereas a longer break indicates that the user is returning to the site to perform a new task.

How do users switch between sites?

Users can switch between sites in several ways:

  1. hyperlinking: clicking on a link,
  2. teleporting: jumping to a page using bookmarks or typing an URL,  or
  3. backpaging: using the back button on the browser, or when several tabs or windows are ope and the user returns to one of them).

The way users revisit sites varies depending on the session length. Teleporting and hyperlinking are the most important mechanisms to re-access a site during short sessions (30% teleporting and 52% hyperlinking for short sessions), whereas backpaging becomes more predominant in longer sessions. Tabs or the back button are often used to revisit a site.

Patterns of multitasking
Figure 2: (Top) Visit patterns described by the average time spent on the site at the ith visit in a session. (Bottom) Usage of navigation types described by the proportion of each navigation type at the ith visit in a session.

We also look at how users access a site at each revisit, for the four categories of sites. This is shown in Figure 2 (bottom).

  • For all four categories of sites, the first visit is often through teleportation. Accessing a site in this manner indicates a high level of engagement, in particular in terms of loyalty, with the site, since users are likely to have bookmarked the site at some previous interaction with it. In our dataset, teleportation is more frequently used to access news (tech) sites than news (finance) sites.
  • After the first visit, backpaging is increasingly used to access a site. This is an indication that users leave the site by opening a new tab or window, and then return to the site later to continue whatever they were doing on the site.
  • However, in general, users still revisit a site mostly through hyperlinking, suggesting that links still have an important role in directing users to a site. In our dataset, news (finance) sites are mostly accessed through links; users are directed to sites of this category via a link.

Time spent at each revisit

For each site, we select all sessions where the site was visited at least four times. We see four main patterns, which are shown in Figure 2 (top):

  • The time spent on social media sites increases at each revisit (a case of increased attention). The opposite is observed for mail sites (a case of decreased attention). A possible explanation is that, for mail sites, there are less messages to read in subsequent visits, whereas for social media sites, users have more time to spend on them eventually because the other tasks they were doing are getting finished.
  • News (finance) is an example of category for which neither a lower or higher dwell time is observed at each subsequent revisit (a case of constant attention). We hypothesise that each visit corresponds either to a new task or a user following some evolving piece of information such as checking the latest stock price figures.
  • The time spent on news (tech) sites at each revisit is fluctuating. Either no patterns exist or the pattern is complex, and cannot easily be described (a case of complex attention). However, when looking at the first two visits or the last two visits, in both cases, more time is spent in each second visit. This may indicate that the visits belong to two different tasks, and each task is performed in two distinct visits to the site. Teleportation is more frequent at the 1st and 3rd visits, which confirms this hypothesis (Figure 2, bottom).

Take away message

Multitasking exists, as many sites are visited and revisited during a session. Multitasking influences the way users access sites, and this depends on the type of site.

This work was done in collaboration with Janette Lehmann, Georges Dupret and Ricardo Baeza-Yates. More details about the study can be found in  Online Multitasking and User Engagement, ACM International Conference on Information and Knowledge Management (CIKM 2013), 27 October – 1 November 2013, San Francisco, USA.

Photo credits: D&D (Creative Commons BY).

How engaged are Wikipedia users?

Wikipedia Recently, we were asked: “How engaged are Wikipedia users?” To answer this question, we visited Alexa, a Web Analytics site, and learned that Wikipedia is one of the most visited sites in the world (ranked 6th), that users spend on average around 4:35 minutes per day on Wikipedia, and that many visits to Wikipedia come from search engines (43%). We also found studies about readers’ preferences, Wikipedia growth, and Wikipedia editors. There is however little about how users engage with Wikipedia, in particular about those not contributing content to Wikipedia.

Can we do more?

Beside reading and editing articles, users perform many other actions: they look at the revision history, search for specific content, browse through Wikipedia categories, visit portal sites to learn about specific topics, or visit the community portal. Although discussing an article is a sign of a highly engaged user, performing several actions within the same visit to Wikipedia is also a sign of a highly engaged user. It is this latter type of engagement we looked into.

Action networks

action_networkWe collected 13 months (September 2011 to September 2012) of browsing data from an anonymized sample of approximately 1.3M users.  We identified 48 actions such as reading an article, editing, opening an account, donating, visiting a special page. We then built a weighted action network: nodes are the actions and two nodes are connected by an edge if the two corresponding actions were performed during the same visit to Wikipedia. Each node has  a weight representing the number of users performing the corresponding action (the node traffic). Each edge has a weight representing the number of users that performed the two corresponding actions (the traffic between the two nodes).

Engagement over time

We use the following metrics to measure engagement on Wikipedia based on actions:

  • TotalNodeTraffic: total number of actions (sum of all node weights)
  • TotalEdgeTraffic: total number of pairwise actions (sum of all edge weights)
  • TotalTrafficRecirculation: actual network traffic with respect to maximum possible traffic (TotalEdgeTraffic/TotalNodeTraffic).

We calculated these metrics for the 13 months under consideration and plotted their variations over time. An increase in TotalNodeTraffic means that more users visited Wikipedia. An increase in TotalTrafficRecirculation means that more users performed at least two actions while on Wikipedia, our chosen indicator of high engagement in Wikipedia. We observe that TotalNodeTraffic increased first then became more or less stable. By contrast, TotalTrafficRecirculation mostly decreased, but we see a small peak in January 2011.

rcTraffic_monthlyTwo important events happened in our 13-month period. During the donation campaign (November to December 2011) more users visited Wikipedia (higher TotalNodeTraffic value). We speculate that many users became interested in Wikipedia during the campaign. However, because TotalTrafficRecirculation actually decreased for the same period, although more users visited Wikipedia, they did not perform two (or more) actions while visiting Wikiepedia; they did not become more engaged with Wikipedia. However, during the SOPA/PIPA protest (January 2012), we see a peak in TotalNodeTraffic and TotalTrafficRecirculation. More users visited Wikipedia and many users became more engaged with Wikipedia; they also read articles, gathered information about the protest, donated money while visiting Wikipedia.

rcTraffic_weekdays+endWe detected different engagement patterns on weekdays and weekends. Whereas more users visited Wikipedia during weekdays (high value of TotalNodeTraffic), users that visited Wikipedia during the weekend were more engaged (high value of TotalTrafficRecirculation). On weekends, users performed more actions during their visits.

People behave differently on weekdays compared to weekends. The same happens with Wikipedia.

Did the donation campaign make Wikipedia more engaging?

meaganmakes - 182-365+1 [cc] - 2 So which actions became more frequent as a result of the donation campaign? As expected, we observed a significant traffic increase on the “donate” node during the two months; many users made a donation. In addition, the traffic from some nodes to other nodes  increased but only slightly. Additional actions were performed;  for instance, more users created a user account, visited community-related pages, all within the same session. However, overall, users mostly performed individual actions since TotalTrafficRecirculation decreased during that time period.

So the campaign was successful in terms of donation, but less in terms of making Wikipedia more engaging.

This is a write-up of the presentation given by Janette Lehmann at TNETS Satellite, ECCS, Barcelona, September 2013.

Measuring user engagement for the “average” users and experiences: Can psychophysiological measurement help?

3081315619_fe0647a5d8_mI recently attended the Input-Output conference in Brighton, UK. The theme of the conference was “Interdisciplinary approaches to Causality in Engagement, Immersion, and Presence in Performance and Human-Computer Interaction”. I wanted to learn about  psychophysiological measurement.

I am myself on a quest: understand what is user engagement and how to measure it, with a focus on web applications with thousands to millions of users. To this end, I am looking at three measurement approaches: self-reporting (e.g., questionnaires); observational methods (e.g., facial expression analysis, mouse tracking); and of course web analytics (dwell time, page views, absence time).

Observational methods include measurement from psychophysiology, a branch of physiology that studies the relationship between physiological processes and thoughts, emotions, and behaviours. Indeed, the body responds to physiological processes: when we exercise, we sweat; when we get embarrassed, our cheeks get red and warm.

relaxCommon measurements include:

  • Event-related potentials – the electroencephalogram (EEG) is based on recordings of electrical brain activity measured at the surface of the scalp.
  • Functional magnetic resonance imaging (fMRI) – this technique involves imaging blood oxygenation using an MRI machine
  • Cardiovascular measures – heart rate (HR); beats per minute (BPM); heart rate variability (HRV).
  • Respiratory sensors – monitor oxygen intake and carbon dioxide output.
  • Electromyographic (EMG) sensors – measure electrical activity in muscles.
  • Pupillometry – measures measure variations in the diameter of the pupillary aperture of the eye in response to psychophysical and/or psychological stimuli.
  • Galvanic skin response (GSR) – measures perspiration/sweat gland activity, also called Skin Conductance Level  (SCL).
  • Temperature sensors – measure changes in blood flow and body temperature.

I learned how these measures are used, why, and some outcomes. But I started to ask myself. Yes these measures can help understanding engagement (and other related phenomena) for extreme cases, for example:

  • patient with a psychiatric disorder (such as depersonalisation disorder),
  • strong emotion caused by an intense experience (a play where the audience is part of the stage, or when on a roller coaster ride), or
  • total immersion (while playing a computer game), which actually goes beyond engagement.

In my work, I am measuring user engagement for the “average” users and experiences; millions of users who visit a news site on a daily basis to consume the latest news. Can these measures tell me something?

Some recent work published in the Journal of Cyberpsychology, Behavior, and Social Networking explored many of the above measures to study the body responses of 30 healthy subjects during a 3-minute exposure to a slide show of natural panoramas (relaxation condition), their personal social network account (Facebook), and a mathematical task (stress condition). They found differences in the measures depending on the condition. Neither the subjects nor the experiences were “extreme”. However, the experiences were different enough. Can a news portal experiment with three comparably distinct conditions?

Psychophysiology measurement can help understanding user engagement and other  phenomena. But to be able to do so for the average users or experiences, we are likely to need to conduct “large-ish scale” studies to obtain significant insights.

How large-ish? I do not know.

This is in itself an interesting and important question to ask, a question to keep in mind when exploring these types of measurement, as they are still expensive to conduct, cumbersome, and obtrusive. This is a fascinating area to dive into.

Image/photo credits: The Cognitive Neuroimaging Laboratory, and Image Editor and benarent ((Creative Commons BY).

Hey Twitter crowd … What else is there?

Original and shorter post at

Twitter is a powerful tool for journalists at multiple stages of the news production process: to detect newsworthy events, interpret them, or verify their factual veracity. In 2011, a poll on 478 journalists from 15 countries found that 47% of them used Twitter as a source of information. Journalists and news editors also use Twitter to contextualize and enrich their articles by examining the responses to them, including comments and opinions as well as pointers to other related news. This is possible because some users in Twitter devote a substantial amount of time and effort to news curation: carefully selecting and filtering news stories highly relevant to specific audiences.

We developed an automatic method that groups together all the users who tweet a particular news item, and later detects new contents posted by them that are related to the original news item. We call each group a transient news crowd. The beauty with this, in addition to be fully automatic, is that there is no need to pre-define topics and the crowd becomes available immediately, allowing journalists to cover news beats incorporating the shifts of interest of their audiences.

Transient news crowds
Figure 1. Detection of follow-up stories related to a published article using the crowd of users that tweeted the article.

Transient news crowds
In our experiments, we define the crowd of a news article as the set of users that tweeted the article within the first 6 hours after it is published. We followed users on each crowd during one week, recording every public tweet they posted during this period. We used Twitter data around news stories published by two prominent international news portals: BBC News and Al Jazeera English.

What did we find?

  • After they tweet a news article, people’s subsequent tweets are correlated to that article during a brief period of time.
  • The correlation is weak but significant, in terms of reflecting the similarity between the articles that originate a crowd.
  • While the majority of crowds simply disperse over time, parts of some crowds come together again around new newsworthy events.

What can we do with the crowd?
Given a news crowd and a time slice, we want to find the articles in a given time slice that are related to the article that created the crowd. To accomplish this, we used a machine learning approach, which we trained on data annotated using crowd sourcing. We experimented with three types of features:

  • frequency-based: how often an article is posted by the crowd compared to other articles?
  • text-based: how similar are the two articles considering the tweets posted them?
  • user-based: is the crowd focussed on the topic of the article? does it contain influential members?

We find that the features largely complement each other. Some features are always valuable, while others contribute only in some cases. The most important features include the similarity to the original story, as well as measures of how unique is the association of the candidate article and its contributing users to the specific story’s crowd.

Crowd summarisation
We illustrate the outcome of our automatic method with the article Central African rebels advance on capital, posted on Al Jazeera on 28 December, 2012.

Figure 2. Word clouds generated for the crowd on the article “Central African rebels advance on capital”, by considering the terms appearing in stories filtered by our system (top) and on the top stories by frequency (bottom).

Without using our method (in the figure, bottom), we obtain frequently-posted articles which are weakly related or not related at all to the original news article. Using our method (in the figure, top), we observe several follow-up articles to the original one. Four days after the news article was published, several members of the crowd tweeted an article about the fact that the rebels were considering a coalition offer. Seven days after the news article was published, crowd members posted that rebels had stopped advancing towards Bangui, the capital of the Central African Republic.

News crowds allow journalists to automatically track the development of stories. For more details you can check our papers:

  • Janette Lehmann, Carlos Castillo, Mounia Lalmas and Ethan Zuckerman: Transient News Crowds in Social Media. Seventh International AAAI Conference on Weblogs and Social Media (ICWSM), 8-10 July 2013, Cambridge, Massachusetts.
  • Janette Lehmann, Carlos Castillo, Mounia Lalmas and Ethan Zuckerman: Finding News Curators in Twitter. WWW Workshop on Social News On the Web (SNOW), Rio de Janeiro, Brazil.

Janette Lehmann, Universitat Pompeu Fabra
Carlos Castillo, Qatar Computing Research Institute
Mounia Lalmas, Yahoo! Labs Barcelona
Ethan Zuckerman, MIT Center for Civic Media