Tutorial: Crowdsourced Data Processing: Industry & Academic Perspectives

  • Presentation Slides

Part 1: Overview & Academic Work (Aditya Parameswaran)

Part 2: Industry & Marketplace Surveys (Adam Marcus)

  • Tutorial Overview

Crowdsourcing and human computation enable organizations to accomplish tasks that are currently not possible through fully automated techniques, or require more flexibility and scalability than traditional employment relationships can facilitate. In the area of data processing, companies have benefited from crowd workers on platforms such as Amazon’s Mechanical Turk or Upwork to complete tasks as varied as content moderation, web content extraction, entity resolution, and video, audio, or image processing. Simultaneously, academic researchers from diverse areas ranging from the social sciences to computer science have embraced crowdsourcing as a research area, resulting in algorithms and systems that improve crowd work quality, latency, and cost. Given the relative nascence of the field, the academic and the practitioner communities have largely operated independently of each other for the past decade, rarely exchanging techniques and experiences.

This tutorial will synthesize and summarize insights from our recently published book entitled "Crowdsourced Data Management: Industry and Academic Perspectives." (Amazon link) The aims of the book and the tutorial are to narrow the gap between academics and practitioners.

The tutorial will be presented in two parts:

  • Research. We will summarize the state of the art in crowd-powered algorithms and system design tailored to large-scale data processing. We will provide pointers to best known solutions that practitioners can follow up on in order to implement common data processing tasks.
  • Practice. We will present the results of surveys of 13 industry users (e.g., Google, Facebook, Microsoft) and 4 marketplace providers of crowd work (e.g., CrowdFlower, Upwork) to identify how hundreds of engineers and tens of million dollars are invested in various crowdsourcing solutions. We will provide pointers to specific "pain points" that academics can follow up on in order to develop solutions that directly benefit crowdsourcing practice.

Through the tutorial, we will simultaneously introduce academics to real problems that practitioners encounter every day, and provide a survey of the state of the art for practitioners to incorporate into their designs.

  • About the Presenters

Adam Marcus is the Co-Founder and CTO of B12 (previously Unlimited Labs), a company dedicated to the future of how creative and analytical people do work. Prior to that, Adam led the data team at Locu, a startup that was acquired by GoDaddy. He completed his Ph.D. in computer science at MIT in 2012, where his dissertation was on database systems and human computation. Adam is a recipient of the NSF and NDSEG fellowships, and has previously worked at ITA, Google, IBM, and FactSet. In his free time, Adam builds course content to get people excited about data and programming.

Aditya Parameswaran is an assistant professor at the University of Illinois, Urbana Champaign, since August 2014. He's published extensively in crowd-powered algorithms and systems, with over 20 published papers, including two "best of conference" papers, and three outstanding dissertation awards (from the data management and data mining communities, and from Stanford University), all on crowdsourcing. More broadly, he works on interactive data analytics, combining techniques from data management and data mining, with close to 50 published papers. His group is supported with funding from NSF, NIH, Google, and the Siebel Energy Institute.

  • Important Dates

This tutorial will take place on Sunday, October 30, 2016, from 3-6pm.

  • Location

The tutorial will be held at the primary conference venue, the School of Information (see location & directions to the iSchool).


Aditya Parameswaran

University of Illinois at Urbana-Champagne