Loading…
This event has ended. Visit the official site or create your own event on Sched.

The Seventh Annual Tom Tom Founders Festival is a week-long celebration of innovators, visionaries, and artists who are shaping small cities. The Festival occurs at dozens of venues throughout downtown Charlottesville. 

View analytic
Thursday, April 12 • 12:00pm - 12:15pm
Rapid NLP Annotation Through Binary Decisions, Pattern Bootstrapping and Active Learning Using Prodigy and spaCy Full

Log in to save this to your schedule and see who's attending!

Feedback form is now closed.
Limited Capacity full
Adding this to your schedule will put you on the waitlist.

In this talk, I'll present a fast, flexible and even somewhat fun approach to named entity annotation. Using our approach, a model can be trained for a new entity type in only a few hours, starting from only a feed of unannotated text and a handful of seed terms. Given the seed terms, we first perform an interactive lexical learning phase, using a semantic similarity model that can be trained from raw text via an algorithm such as word2vec. The similarity model can be made to learn vectors for longer phrases by pre-processing the text, and abstract patterns can be created referencing attributes such as part-of-speech tags. The patterns file is then used to present the annotator with a sequence of candidate phrases, so that the annotation can be conducted as a binary choice. The annotator's eyes remain fixed near the centre of the screen, decisions can be made with a click, swipe or single keypress, and tasks are buffered to prevent delays. Using this interface, annotation rates of 10-30 decisions per minute are common. If the decisions are especially easy (e.g. confirming that instances of a phrase are all valid entities), the rate may be several times faster. As the annotator accepts or rejects the suggested phrases, the responses are used to start training a statistical model. Predictions from the statistical model are then mixed into the annotation queue. Despite the sparsity of the signal (binary answers on one phrase per sentence), the model begins to learn surprisingly quickly. A global neural network model is used, with beam-search to allow a form of noise-contrastive estimation training. The pattern matcher and entity recognition model is available in our open-source library spaCy, while the interface, task queue and workflow management are implemented in our annotation tool Prodigy.

Speakers
avatar for Ines Montani

Ines Montani

Founder, Explosion AI
Ines is a developer specializing in applications for AI technology. She's the co-founder of Explosion AI and a core developer of spaCy, the leading open-source library for Natural Language Processing in Python and Prodigy, an annotation tool for radically efficient machine teachi... Read More →

Sponsors
avatar for Capital One

Capital One

Capital One is a diversified bank that offers a broad array of financial products and service to consumers, small businesses and commercial clients. 
avatar for S&P Global

S&P Global

S&P Global Inc. (prior to April 2016 McGraw Hill Financial, Inc., and prior to 2013 McGraw Hill Companies) is an American publicly traded corporation headquartered in New York City. Its primary areas of business are financial information and analytics. It is the parent company of... Read More →




Twitter Feed