Natural Language Processing in NJ: September 2012

Monday, September 24, 2012

Amanda Stent: InteractiveX: Generating multimedia summaries of spatio-temporal data sets

Amanda Stent will give a talk entitled "InteractiveX: Generating multimedia summaries of spatio-temporal data sets"

Date: October 1, 2012

Time: 2:30pm

Location: ETS, Anrig Hall, Room P-016 (directions | campus map)

VISITORS TO ETS: Please contact Joel Tetreault (jtetreault at ETS dot org) for security and arrival information.

ABSTRACT:

Organizations and individuals increasingly have to deal with large to very large data sets that include spatio-temporal information, such as network traffic data, credit card records, wildlife tracking data, exam scores by time and place, and even data from social media such as twitter. We have access to sophisticated statistical and visualization tools for analyzing these data sets. However, the output from these tools is frequently only understandable by experts -- and even experts can start to suffer from information overload. We have designed a system, interactiveX, that guides users to understand the meaning of large spatio-temporal data sets through automatic creation of interactive multimedia explanations that combine text, graphics and the results of data analysis. In this talk, we will first outline the challenges of this task. We will then present the architecture of our system, demonstrate some user interfaces to our system, and describe some recent research results from this work.

BIO:

Dr. Amanda Stent works on spoken dialog, natural language generation and assistive technology. She is currently a Principal Member of Technical Staff at AT&T Labs - Research in Florham Park, NJ and was previously an associate professor in the Computer Science Department at Stony Brook University in Stony Brook, NY. She holds a PhD in computer science from the University of Rochester. She has authored over 70 papers on natural language processing and holds several patents. She is VP of the ACL/ISCA Special Interest Group on Discourse and Dialog and one of the rotating editors of the journal Dialogue and Discourse.

Friday, September 21, 2012

Biemann Slides and Site Features

The slides from the Chris Biemann talk at ETS on September 20, titled "Text: Now in 2D -- Lexical Expansion Using Contextual Similarity", are now available to view and download.

We will host an archive of materials from past talks here. They will be found to the right under the heading "Links to Materials". Other recent additions to the site include a calendar and a list of upcoming events. The calendar can be found at the bottom of the page, and will be updated with any events posted here. For a quick view of upcoming events, check under the heading "Upcoming NLP Events" at the top of the column to the right. Any suggestions for further improvements and/or announcements are welcome!

Thursday, September 6, 2012

Chris Biemann: Text: Now in 2D — Lexical Expansion using Contextual Similarity

Chris Biemann will present a talk titled “Text: Now in 2D — Lexical Expansion using Contextual Similarity”.

Date: September 20, 2012
Time: 11:00am
Location: ETS, Conant Hall, Lounge A (directions | campus map).

ABSTRACT:

This talk introduces the metaphor of two-dimensional text. Starting from very basic concepts of structural linguistics, we define lexical expansion mechanisms that generate, for each term in context, a weighted list of possible expansions. While the mechanism is left unspecified by the metaphor, we use distributional similarity as a source for all-words unsupervised lexical expansion. Handling word sense ambiguity in the expansion mechanism will be discussed from two angles: Either a contextualized method can rank similar terms of the correct sense higher, or we can use a word sense induction clustering in order to aggregate over common features of the potential expansions.

This new representation has been successfully used in tasks like semantic text similarity and knowledge-based all-words word sense disambiguation. The key element of this representation and the method that computes it is that it can bridge lexical gaps and align passages that bear the same meaning without using the same words. Thus, it can be used as a basis technology for passage and answer scoring, and essay grading.

BIO:

Chris Biemann holds an MA and doctorate degree from the University of Leipzig, Germany. After his PostDoc at the semantic search start-up Powerset and subsequently the Microsoft Bing Search Engine, Chris became an assistant professor for language technology at the Technische Universität Darmstadt last year. His main research interests span unsupervised, knowledge-free acquisition, graph-based representations and algorithms, crowdsourcing, and big data for NLP applications. Currently, Chris is a visiting researcher at the IBM Watson Research Lab in Hawthorne NY, working with the Watson DeepQA team.

David Kaufer: Tools for Building Social Communities Around Texts and Text Analysis

David Kaufer will present a talk titled "Tools for Building Social Communities Around Texts and Text Analysis"

Date: September 13, 2012
Time: 1:30pm
Location: ETS, Conant Hall, Lounge A (directions | campus map).

ABSTRACT:

This talk will present two technologies used at Carnegie Mellon in the English department for creating social communities around texts (Classroom Salon) and text analysis (DocuScope). Classroom Salon (www.classroomsalon.org) is a web-based tool used to support writing and content classrooms. Classroom Salon supports both anchored and global annotation of text and in humanities classrooms allows teachers and students to create classroom discussions around text before the class meets.
This allows teachers to monitor each student's participation and depth of reading. In science classrooms, Classroom Salon has been used to assess student's comprehension of difficult concepts. It is being tested at the University of Wisconsin -- Milwaukee with a Gates Foundation Grant. Preliminary results show science teachers like it because it helps them gauge students' understanding of the material and adapt it accordingly.

DocuScope (http://www.cmu.edu/hss/english/research/docuscope.html) is a stand-alone java application that consists of large string-based dictionaries of English rhetorical patterns developed over a decade of inspection of texts. These patterns have been shown to accurately classify major genres of written English. They have been used to understand precisely how one genre of English differs from another (letters vs. reminiscences) or the variation that takes place within a single genre (the different rhetorical strategies that can define a reminiscence). In this talk, I'll review the main breakdown of the dictionaries. The dictionaries were first developed to support a course in comparative genres of English and I will also discuss some educational applications.

BIO:

David Kaufer is Professor of English at Carnegie Mellon. From 1994 to 2009, he was the Head of the Department. He serves on the Executive Board of the Rhetoric Society of America. He is the lead author of five books and co-author of two more. He is the author of over 100 refereed articles in the fields of text-based rhetorical analysis, rhetorical theory, and written composition.

Welcome!

Welcome to the NJ-NLP blog! This site will be used as a hub for sharing information about Natural Language Processing (NLP) events occurring in New Jersey. We will announce events and talks related to NLP, including location, time, topic/title, abstract and any other pertinent information.

The goal of this site is two-fold. First, it will serve as a sort of calendar of NLP events for New Jersey. Second, it will provide an overview of the types of work being done at the various institutions where NLP research is undertaken throughout the state. The blog is an open-ended idea, at this time, so it may evolve to include a broader range of posts. Anything that encourages a community of collaboration and the exchange of ideas is welcome.