The Seventh Annual Tom Tom Founders Festival is a week-long celebration of innovators, visionaries, and artists who are shaping small cities. The Festival occurs at dozens of venues throughout downtown Charlottesville.
National statistics of police-involved deaths have been hampered by databases that rely on voluntary reporting by a decentralized system of more than 18,000 independent law enforcement agencies. As part of a redesign of the Arrest Related Deaths (ARD) program by BJS, RTI developed a coding and classification pipeline that applies machine learning techniques to identify deaths from open information sources, including news articles and official reports from state and local law enforcement agencies. This hybrid approach led to an annual estimate of 1,900 arrest-related deaths in the U.S., and 1,200 law enforcement homicides, in line with estimates from the 2015 capture recapture analysis and other independent media sources (Banks, Ruddle, Kennedy & Planty, 2016). The pipeline created in order to identify news articles representing arrested related deaths is comprised of several sequential computational processes. The initial step of the pipeline collects news articles from media monitoring services, where the articles collected have a title or text that contains a word from a list of domain relevant keywords. Following this step, the articles are deduplicated to remove articles that have significant overlap in article title or text. The remaining articles are run through a relevancy classifier - a machine learning model that predicts whether the article describes an arrest related death. A human coder makes the final determination of whether an article meets our definition of an arrest related death, and then links multiple articles to a specific decedent, if necessary, through a web interface. All told, this machine learning based system typically results in an 87% reduction in the total volume of articles. This project, reported by BJS, has been featured in numerous media outlets, including The Guardian, fivethirtyeight.com (who also included it among the Best Data Stories of 2016), and The Measure of Everyday Life podcast.
Peter Baumgartner is a data scientist at RTI International, a non-profit research institute. He applies natural language processing, machine learning, and design thinking to build things that help people.