Wharton Customer Analytics
(Closed for Submission)
Collaborative Research Opportunity with Electronic Arts
Unlike past WCA programs, which focused (primarily) on providing access to data, this program will put researchers in direct collaboration with professionals at the company. This close-working relationship will allow researchers to benefit from the prior experience of the company’s data science team and work with them to identify the ideal data assets to support the project and quickly resolve issues as they arise. It may also allow the opportunity to prospectively test new strategies and algorithms live in EA’s systems. While these projects will involve regular meetings with EA, they are not intended to be consulting projects. Both WCA and EA are interested in developing innovative, implementable methods and publishing paper(s) co-authored by the researchers and EA data scientists.
The corporate partner seeks proposals in two areas: multi-touch attribution and product recommendations. The attribution project will focus on determining advertising response from the type of clickstream data typically available to an advertiser. The product recommendation project will focus on a subscription service where users have unlimited access to games. The goal is to develop a tool that will suggest new games to users to keep them engaged in the platform. More details for both projects are included in the project briefs below. EA seeks novel solutions and is open to innovative approaches to these problems that are scalable and will improve business outcomes.
The corporate partner seeks proposals in two areas: multi-touch attribution and product recommendations. Click the “+” to reveal details about each project.
Multi-Touch Attribution for Marketing Campaigns
As data scientists, we help marketing teams understand which online media (e.g. paid social media, YouTube, programmatic and fixed display) drive more sales and help them decide how to spend their advertising budget. The last-touch attribution model is still prevalent in gaming, despite the fact that it tends to inappropriately credit media channels that tend to occur later in the purchase funnel. These are not necessarily the media that influence a consumer’s decision to purchase a game. To determine the marginal and causal relationships between ads and conversion, we have developed a multi-touch attribution model to quantify each media’s impact. We are particularly interested in how ad exposures prior to releasing a game affect conversions just after the release. We have a model that is providing answers for us today, yet there remain several challenges and we seek scalable, implementable approaches to these (and other) problems.
Causality. While we may be tired of hearing that “correlation does not indicate causality,” this remains a critical issue in attribution. What is the best way to determine the causal effect of advertising from our historic data on media exposures and conversions? We seek solutions that will work within our data environment, where the data available for propensity matching is somewhat limited. Information about how users were targeted may be difficult to obtain. We also recognize that the best way to measure the lift due to advertising is a randomized A/B test. Are there suggestions on how to design a test to determine marginal contributions of ads in a multi-media environment?
Unrecognized players. Like most advertisers, our online media exposures are tracked by cookies, but it is difficult to determine conversions for these cookies. Many of our customers purchase games in a store or through their game console, and these conversions are not linked to cookies that track the online ads. We can only link the cookie to the conversion if the user is observed logging into an EA website on the device where they see the ads. When this happens, the user cookie id can be linked to the EA user id, and track the conversion when the user registers the game (regardless of where they purchased it). For these users where we can make the link, we have a rich set of covariates that we can use in our multi-touch attribution model such as past playing history and past response to ads served within games. Our inference about ad response is largely based on these “recognized” customers. For the remaining “unrecognized” players we only can track media impressions but not conversion. Is there a good way to predict the unrecognized user’s conversion and estimate the impact of ads on those players? A possible solution is to identify a group of recognized players who can represent an unrecognized population through propensity matching on media consumption. However, assuming that recognized players with the same media consumption distribution as the unrecognized players’ also share the same pattern of conversion may be unreasonable and cause a bias.
Zero impression players. The media platforms we purchase ads from provide us with data on those cookies who have seen one of our ads at least once. However, we want to calculate the conversion lift relative to no impressions, so we need to estimate conversions for the users who did not see any ads at all. Our approach has been to estimate a model using only the players who have seen at least one ad on at least one media and then make inference through counterfactuals on the number of impressions while maintaining players’ other features constant in the model. Is there a more accurate way to get the baseline for converters who didn’t see any ads, when we don’t actually observe those users? We do have data from our internal systems on players who converted organically (without any recorded ad exposures) but we do not currently use them in estimating the model because we are not completely confident that they have not seen any ads (e.g. due to cookie deletion and the unrecognized player problem above). We have no way to identify a set of potential players who were not exposed to ads and who did not convert.
Sequence of the media campaign. We are very interested in how the timing and sequence of media exposures affects conversion. In our preliminary analysis we have not observed a dominant sequence of ads among those players that convert. Are there clever ways to incorporate sequence or path into our attribution model? If sequence does matter, can we quantify the incremental impact of every ad on an individual level (based on their prior exposures) and use that to decide which cookies to target on each media? Can we optimize the timing of ad spending across media? As described above, the true media consumption journey of a player may only be partially tracked, due to users who delete cookies or use multiple devices.
External factors. Besides player game history or game features, purchasing a game can also be influenced by news, social media trends and big events. If these external factors are not counted in prediction conversions, lift will be determined solely on media consumption. Therefore, we assume that media campaigns do not make a negative impact on player conversions. One interesting question is how we can count the magnitude of external news/events on conversions?
Subscription Game Recommendation
Game companies are moving towards providing subscription services. Like Netflix, subscribers gain access to a suite of games to play as much as they want. As content on these services increases, it becomes more important to match players with games that they will like. This not only increases the value of the service, but also provides a personalized experience for subscribers. We are looking for ways to achieve better recommendations using innovative methods that are powerful, yet efficient enough to scale to tens of millions of users.
The data available to drive these recommendations includes a rich set of game attributes (e.g. genre, franchise) as well as records of which games each subscriber has played in the past and for how long. There may also be records on which games the player has played outside the subscription service. While we are looking for an approach that is useful in our specific gaming context, we fully expect researchers to develop methods that would be valuable across many industries.
Researchers can propose any tools and methods, but since the work will be implemented, computational efficiency and scalability are important. We can’t implement models where the model complexity and compute time explode with an increasing user base or game catalog. Approaches that partition the user base into smaller sets and then use more computationally intensive methods are acceptable, but we must be able to provide personalization at scale.
We list a few of our ideas for innovation below, but we are open to other ideas and suggestions for innovative approaches to this problem.
Better game attributes. Beyond just a simple user-item dataset, we would like to explore feature engineering extra variables to describe perception of games over time from public sources of data. Are there novel ways to extract variables describing games from publicly available spaces such as Wikipedia, Metacritic, Twitter, Instagram, etc. to enrich our data about the games? Do perceptions of games change over time (e.g. before and after launch) and should this be included in our recommendation models? The features may be textual (topic modeling, doc2vec, sentiment), temporal (Twitter volume, game’s subreddit subscribers, sentiment over time), or any other useful signal. Our hope is that this will allow the recommendation system to react to major events or changes in opinions over time.
New versus experienced players. We are looking for solutions that are applicable to the entire player base. Experienced players provide us with rich data, making it easier to predict their preferences, while the data is always “thin” for new players. This means that there very well may be multiple models – one for newcomers to the service and another for veterans that leverages their history.