Data Science and the Unintended Consequences of Communication

Social scientists have long struggled to solve the puzzle of unintended effects – situations in which intentional actions often bring about consequences not foreseen or anticipated by any of the actors involved. This puzzle is intriguing in itself, but it also arises important policy implications: the success of our interventions in the world depends, in the end, on our ability to anticipate the chain reactions those interventions might trigger. In this talk, I will give an overview of the sociological intuitions around the nature of unintended effects, discuss why they matter, and explain why digital data make those intuitions amenable to empirical testing for the first time.

Sandra González-Bailón
Assistant Professor of Communication
University of Pennsylvania

Large-scale human activity traces for human well-being

The popularity of wearable and mobile devices, including smartphones and smartwatches, has generated an explosion of detailed behavioral data. These massive digital traces provides us with an unparalleled opportunity to realize new types of scientific approaches that provide novel insights about our lives, health, and happiness. However, gaining valuable insights from these data requires new computational approaches that turn weak observational data into strong scientific results and can computationally test domain theories at scale. In this talk, I will describe novel computational methods that leverage digital activity traces at the scale of billions of actions taken by millions of people. These methods combine insights from data mining, social network analysis, and natural language processing to generate actionable insights about our physical and mental well-being.

Jure Leskovec
Associate Professor of Computer Science
Stanford University

The Promise of Algorithmic Policy: Judges, Doctors and Teachers

Over the last few years, my collaborators and I have been applying machine learning and other algorithmic techniques (e.g. machine learning) to various social science domains, largely with an eye towards improving policy. Sometimes we build algorithms with an eye towards scalability, attentive to policy constraints. Sometimes we build them with an eye towards answering fundamental scientific questions, so as to reshape how policy makers think. I will review this work and highlight a few points. First, predictive tools (rather than causal ones) by themselves are valuable. Second, new technical challenges arise in these applications and existing work largely ignores these challenges and thereby producing misleading results. Finally, and most importantly, algorithmic approaches–applied carefully–have the chance to both provide understanding and improve lives of many people. I hope this talk serves as an invitation - there are many scientific and policy breakthroughs to be had in this nascent area.

Sendhil Mullainathan
Robert C. Waggoner Professor of Economics
Harvard University

An Artificial and Human Intelligence Approach to the Replication Problem in Science

“The replicability crisis” arose after a super majority of papers, randomly sampled from top journals, failed replication and catalyzed calls for new methodologies for evaluating scientific replicability. Using data on 100 published papers that passed or failed manual replication tests, we trained an AI model using only the papers’ text – the data least attended to by human reviewers. The AI model’s accuracy was subsequently tested on data from four diverse disciplines. The results indicate that our AI model is significantly better at estimating replicability than chance, scientific experts, and prediction markets. Two analyses suggest that the AI model uses data in reviewers’ “blind spots.” AI predictions are closer to the predictions of human expert raters than novices. AI classification patterns are not explained by obvious stylistic disciplinary differences, words such as “remarkable” or “surprising,” or journal formatting. Finally, extending our original AI model by training it on text and features used by expert reviewers, we achieve an AI model with the highest predictive accuracy. These findings suggest that a mind plus machine partnership can potentially help address intractable scientific problems.

Brian Uzzi
Richard L. Thomas Professor at the Kellogg School of Management
Northwestern University

Using Advertising Audience Estimates to Improve Global Development Statistics

All online advertising platforms, including Facebook, LinkedIn and Twitter, provide potential advertisers with “audience estimates” of how many of their users match certain targeting criteria. These anonymous, aggregate estimates, which can be obtained free of charge, serve as a real-time digital census across their user inventory. In our research, we use this type data of data to fill data gaps in statistics of importance for monitoring global development. Concretely, I’ll show how this type of data can be used to (i) improve statistics on international migration, to (ii) track global internet access gender gaps, and to (iii) measure skill-related gender gaps in the US. Apart from showing the promise that these data sources hold, I’ll also discuss the limitations that come with using data generated by proprietary black boxes.

Ingmar Weber
Research Director
Qatar Computing Research Institute