Focusing the macroscope:
how we can use data to understand behavior
Joana Gonçalves de Sá
Individual decisions can have a large impact on society as a whole. This is obvious for political decisions, but still true for small, daily decisions made by common citizens. Individuals decide how to vote, whether to stay at home when they feel sick, to drive or to take the bus. In isolation, these individual decisions have a negligible social outcome, but collectively they determine the results of an election and the start of an epidemic. For many years, studying these processes was limited to observing the outcomes or to analyzing small samples. New data sources and data analysis tools have created a “macroscope” and made it possible to start studying the behavior of large numbers of individuals, enabling the emergence of large-scale quantitative social research. At the Data Science and Policy (DS&P) research group we are interested in understanding these decision-making events, expecting that this deeper knowledge will lead to a better understanding of human nature, and to improved public decisions. During the talk I will offer some examples of how can use this macroscope to study psychology and human behavior. And the end, and recognizing that these tools might also have a very negative impact on society, I will present new ideas in distributed computing and how it can help us in privacy protection.
Joana Gonçalves de Sá is an Associate Professor at Nova School of Business and Economics, Universidade Nova de Lisboa and the leader of the Data Science and Policy research group. Before that, she was a Principal Investigator at the Instituto Gulbenkian de Ciência (IGC), Portugal, where she remains as the Coordinator of the Science for Society Initiative and as the Director of the Graduate Program Science for Development (PGCD), aiming at improving science in Africa.
Her current research uses data analytics and machine learning to study complex problems at the interface between Biomedicine, Computation, Policy, Social Sciences, and Mathematics. These include epidemiology, critical thinking, network dynamics, political discourse, and their applications to human-behavior, with a large ethical and societal focus. She is also the President of the General Assembly of the Citizens Forum, an NGO that aims at improving the quality of the democratic discussion, through citizen assemblies.
Joana has a degree in Physics Engineering from Instituto Superior Técnico – University of Lisbon, and a PhD in Systems Biology from NOVA – ITQB, having developed her thesis at Harvard University, USA. In 2019, she was the recipient of an ERC Starting Grant to study human behavior using the online spread of “fake news” as a model system.
Task-Based Intelligent Retrieval and Recommendation
(Karen Spärck Jones Award Keynote)
While the act of looking for information happens within a context of a task from the user side, most search and recommendation systems focus on user actions (‘what’), ignoring the nature of the task that covers the process (‘how’) and user intent (‘why’). For long, scholars have argued that IR systems should help users accomplish their tasks and not just fulfill a search request. But just as keywords have been good enough approximators for information need, satisfying a set of search requests has been deemed to be good enough to address the task. However, with changing user behaviors and search modalities, specifically found in conversational interfaces, the challenge and opportunity to focus on task have become critically important and central to IR. In this talk, I will discuss some of the key ideas and recent works — both theoretical and empirical — to study and support aspects of task. I will show how we could derive user’s search path or strategy and intentions, and how they could be instrumental in not only creating more personalized search and recommendation solutions, but also solving problems not possible otherwise. Finally, I will extend this to the realm of intelligent assistants with our recent work in a new area called Information Fostering, where our knowledge of the user and the task can help us address another classical problem in IR — people don’t know what they don’t know.
Chirag Shah is an Associate Professor in Information School (iSchool) at University of Washington (UW) in Seattle. Before UW, he was a faculty at Rutgers University. His research interests include studies of interactive information retrieval/seeking, trying to understand the task a person is doing and providing proactive recommendations.
Dr. Shah received his MS in Computer Science from University of Massachusetts (UMass) at Amherst, and PhD in Information Science from University of North Carolina (UNC) at Chapel Hill.
He directs the InfoSeeking Lab where he investigates issues related to information seeking, human-computer interaction (HCI), and fairness in machine learning, supported by grants from National Science Foundation (NSF), National Institute of Health (NIH), Institute of Museum and Library Services (IMLS), Amazon, Google, and Yahoo.
Better Representations for Search Tasks
Neural models are having a major impact on Information Retrieval, much as they have had recently on other language technologies. Neural language models and continuous term representations provide new and more effective paths to overcoming vocabulary mismatch, probability estimation, and other core problems in Information Retrieval. Some classic Natural Language Processing tasks are now treated as text similarity problems, and techniques developed for NLP are being applied to classic IR problems, which reduces some of the past differences between IR and NLP. Everything uses machine learning. This technology shift is a good time to think about what is unique and distinct about Information Retrieval as a field compared to neighboring fields.
From its earliest days, Information Retrieval has studied document collections, information seekers, and information seeking tasks. These topics are embedded deeply in our experimental methodology and how we think about research problems. Neighboring fields focus more attention and computational effort on understanding individual documents, and less on how individual documents should be understood in the context of specific people, tasks, and collections.
This talk describes several recent research activities at CMU’s Language Technologies Institute. Although each has a different focus, the unifying theme is using knowledge of the search task, context, or corpus to develop more effective representations and models. We find that neural techniques offer new tools for understanding and modeling these core elements of search, in some cases reinvigorating research in stable areas and challenging old assumptions, but do not reduce their importance.–
Jamie Callan is a Professor at the Language Technologies Institute, a graduate department in Carnegie Mellon’s School of Computer Science. He has a joint appointment in the School of Information Systems and Management within CMU’s Heinz College.
His research and teaching focus on text-based information retrieval and analysis. His recent work develops advanced search engine architectures, neural search algorithms, use of semi-structured knowledge for open domain search, conversational information seeking, and large-scale distributed search. Jamie has published more than 200 research papers on these and related subjects.
Jamie’s scientific service includes annual participation on the program committees of the major information retrieval conferences. He is a past Treasurer and past Chair of SIGIR, the international professional society for Information Retrieval research, a co-founding Editor-in-Chief of Foundations and Trends in Information Retrieval, and a past Editor-in-Chief of ACM’s Transactions on Information Systems (TOIS).