Shay Cohen will present a talk at Rutgers titled "Consistent and Efficient Algorithms for Latent-Variable PCFGs"
Date: Friday, November 2
Time: 11:00am
Location: Rutgers School of Communication and Information, 4 Huntington St., New Brunswick, NJ (Faculty Lounge, Room 323) (map)
ABSTRACT:
In
the past few years, there has been an increased interest in the machine
learning community in spectral algorithms for estimating models with latent variables. Examples include
algorithms for estimating mixture of Gaussians or for estimating the parameters of a hidden Markov model.
Until the introduction of spectral algorithms, the EM algorithm has been the
mainstay for estimation with latent variables. Still, with EM there is no guarantee of convergence to the
global maximum of the likelihood function, and therefore EM generally does not provide consistent estimates for
the model parameters. Spectral algorithms, on the other hand, are often shown to be consistent.
In this talk, I am interested in presenting a spectral algorithm for
latent-variable PCFGs, a model widely used for parsing in the NLP community. This model augments the nonterminals in
a PCFG grammar with latent states. These latent states refine the nonterminal category in order to capture subtle
syntactic nuances in the data. This model has been successfully implemented in state-of-the-art parsers such
as the Berkeley parser.
The algorithm developed is considerably faster than EM, as it makes only one
pass over the data. Statistics are collected from the data in this pass, and singular value decomposition is
performed on matrices containing these statistics. Our algorithm is also provably consistent in the sense that, given
enough samples, it will estimate probabilities for test trees close to their true probabilities under the
latent-variable PCFG model.
If time permits, I will also present a method aimed at improving the efficiency
of parsing with latent variable PCFGs. This method relies on tensor decomposition of the latent-variable PCFG. This
tensor decomposition is approximate, and therefore the new parser is an approximate parser as well. Still, the
quality of approximation can be theoretically guaranteed by inspecting how errors from the approximation
propagate in the parse trees.
interests span a range of topics in natural language processing and machine learning. He is especially interested in developing algorithms and methods for
the use of probabilistic grammars.
BIO:
ShayCohen is a postdoctoral research scientist in the Department of Computer Science at Columbia University. He is currently a Computing Innovation fellow (NSF/CRA). He received his B.Sc. and M.Sc. from Tel Aviv University in 2000 and 2004, and his Ph.D. from Carnegie Mellon University in 2011. His research
Natural Language Processing in NJ
A source for information about NLP events in New Jersey
Thursday, October 25, 2012
Monday, September 24, 2012
Amanda Stent: InteractiveX: Generating multimedia summaries of spatio-temporal data sets
Amanda Stent will give a talk entitled "InteractiveX:
Generating multimedia summaries of spatio-temporal data sets"
Date: October 1, 2012
Time: 2:30pm
Location: ETS, Anrig Hall, Room P-016 (directions | campus map)
VISITORS TO ETS: Please contact Joel Tetreault (jtetreault at ETS dot org) for security and arrival information.
ABSTRACT:
Organizations and individuals increasingly have to deal with
large to very large data sets that include spatio-temporal information,
such as network traffic data, credit card records, wildlife tracking
data, exam scores by time and place, and even data from social media
such as twitter. We have access to sophisticated statistical and
visualization tools for analyzing these data sets. However, the
output from these tools is frequently only understandable by experts --
and even experts can start to suffer from information overload. We
have designed a system, interactiveX, that guides users to understand the
meaning of large spatio-temporal data sets through automatic creation of
interactive multimedia explanations that combine text, graphics and the
results of data analysis. In this talk, we will first outline the
challenges of this task. We will then present the architecture of
our system, demonstrate some user interfaces to our system, and describe
some recent research results from this work.
BIO:
Dr. Amanda Stent works on spoken dialog, natural
language generation and assistive technology. She is currently a
Principal Member of Technical Staff at AT&T Labs - Research in Florham Park, NJ and was
previously an associate professor in the Computer Science Department at
Stony Brook University in Stony Brook, NY. She holds a PhD in
computer science from the University of Rochester. She has authored
over 70 papers on natural language processing and holds several patents.
She is VP of the ACL/ISCA Special Interest Group on Discourse and
Dialog and one of the rotating editors of the journal Dialogue and
Discourse.
Friday, September 21, 2012
Biemann Slides and Site Features
The slides from the Chris Biemann talk at ETS on September 20, titled "Text: Now in 2D -- Lexical Expansion Using Contextual Similarity", are now available to view and download.
We will host an archive of materials from past talks here. They will be found to the right under the heading "Links to Materials". Other recent additions to the site include a calendar and a list of upcoming events. The calendar can be found at the bottom of the page, and will be updated with any events posted here. For a quick view of upcoming events, check under the heading "Upcoming NLP Events" at the top of the column to the right. Any suggestions for further improvements and/or announcements are welcome!
We will host an archive of materials from past talks here. They will be found to the right under the heading "Links to Materials". Other recent additions to the site include a calendar and a list of upcoming events. The calendar can be found at the bottom of the page, and will be updated with any events posted here. For a quick view of upcoming events, check under the heading "Upcoming NLP Events" at the top of the column to the right. Any suggestions for further improvements and/or announcements are welcome!
Thursday, September 6, 2012
Chris Biemann: Text: Now in 2D — Lexical Expansion using Contextual Similarity
Chris Biemann will present a talk
titled “Text: Now in 2D — Lexical Expansion using Contextual Similarity”.
Date: September 20, 2012
Time: 11:00am
Location: ETS, Conant Hall, Lounge A (directions | campus map).
Date: September 20, 2012
Time: 11:00am
Location: ETS, Conant Hall, Lounge A (directions | campus map).
This
talk introduces the metaphor of two-dimensional text. Starting from very basic
concepts of structural linguistics, we define lexical expansion mechanisms that
generate, for each term in context, a weighted list of possible expansions.
While the mechanism is left unspecified by the metaphor, we use distributional
similarity as a source for all-words unsupervised lexical expansion. Handling
word sense ambiguity in the expansion mechanism will be discussed from two
angles: Either a contextualized method can rank similar terms of the correct
sense higher, or we can use a word sense induction clustering in order to
aggregate over common features of the potential expansions.
This new representation has been successfully used in tasks like semantic text similarity and knowledge-based all-words word sense disambiguation. The key element of this representation and the method that computes it is that it can bridge lexical gaps and align passages that bear the same meaning without using the same words. Thus, it can be used as a basis technology for passage and answer scoring, and essay grading.
BIO:
Chris Biemann holds an MA and doctorate degree from the University of Leipzig, Germany. After his PostDoc at the semantic search start-up Powerset and subsequently the Microsoft Bing Search Engine, Chris became an assistant professor for language technology at the Technische Universität Darmstadt last year. His main research interests span unsupervised, knowledge-free acquisition, graph-based representations and algorithms, crowdsourcing, and big data for NLP applications. Currently, Chris is a visiting researcher at the IBM Watson Research Lab in Hawthorne NY, working with the Watson DeepQA team.
David Kaufer: Tools for Building Social Communities Around Texts and Text Analysis
David Kaufer will present a talk titled "Tools for Building Social Communities Around Texts and Text Analysis"
Date: September 13, 2012
Time: 1:30pm
Location: ETS, Conant Hall, Lounge A (directions | campus map).
Date: September 13, 2012
Time: 1:30pm
Location: ETS, Conant Hall, Lounge A (directions | campus map).
ABSTRACT:
This
talk will present two technologies used at Carnegie Mellon in the English
department for creating social communities around texts (Classroom Salon) and
text analysis (DocuScope). Classroom Salon (www.classroomsalon.org) is a
web-based tool used to support writing and content classrooms. Classroom
Salon supports both anchored and global annotation of text and in humanities
classrooms allows teachers and students to create classroom discussions around
text before the class meets.
This allows teachers to monitor each student's participation and depth of reading. In science classrooms, Classroom Salon has been used to assess student's comprehension of difficult concepts. It is being tested at the University of Wisconsin -- Milwaukee with a Gates Foundation Grant. Preliminary results show science teachers like it because it helps them gauge students' understanding of the material and adapt it accordingly.
This allows teachers to monitor each student's participation and depth of reading. In science classrooms, Classroom Salon has been used to assess student's comprehension of difficult concepts. It is being tested at the University of Wisconsin -- Milwaukee with a Gates Foundation Grant. Preliminary results show science teachers like it because it helps them gauge students' understanding of the material and adapt it accordingly.
DocuScope
(http://www.cmu.edu/hss/english/research/docuscope.html)
is a stand-alone java application that consists of large string-based
dictionaries of English rhetorical patterns developed over a decade of
inspection of texts. These patterns have been shown to accurately
classify major genres of written English. They have been used to
understand precisely how one genre of English differs from another (letters vs.
reminiscences) or the variation that takes place within a single genre (the
different rhetorical strategies that can define a reminiscence). In this
talk, I'll review the main breakdown of the dictionaries. The dictionaries were
first developed to support a course in comparative genres of English and I will
also discuss some educational applications.
BIO:
David Kaufer is Professor of English at Carnegie Mellon. From 1994 to 2009, he was
the Head of the Department. He serves on the Executive Board of the Rhetoric
Society of America. He is the lead author of five books and co-author of two
more. He is the author of over 100 refereed articles in the fields of
text-based rhetorical analysis, rhetorical theory, and written composition.
Welcome!
Welcome to the NJ-NLP blog! This site will be used as a hub for sharing information about Natural Language Processing (NLP) events occurring in New Jersey. We will announce events and talks related to NLP, including location, time, topic/title, abstract and any other pertinent information.
The goal of this site is two-fold. First, it will serve as a sort of calendar of NLP events for New Jersey. Second, it will provide an overview of the types of work being done at the various institutions where NLP research is undertaken throughout the state. The blog is an open-ended idea, at this time, so it may evolve to include a broader range of posts. Anything that encourages a community of collaboration and the exchange of ideas is welcome.
The goal of this site is two-fold. First, it will serve as a sort of calendar of NLP events for New Jersey. Second, it will provide an overview of the types of work being done at the various institutions where NLP research is undertaken throughout the state. The blog is an open-ended idea, at this time, so it may evolve to include a broader range of posts. Anything that encourages a community of collaboration and the exchange of ideas is welcome.
Subscribe to:
Posts (Atom)