COMPOSES

Compositional Operations in Semantic Space is a 5-year European Research Council project (nr. 283554) that started on November 1st, 2011. COMPOSES is funded within the 7th Framework Program by an ERC 2011 Starting Independent Research Grant (SH4: The Human Mind and Its Complexity panel).



Project outline | Team | Milestones | Publications, reports and presentations | Public code and resource release | Contact information

Project outline^

Pink dogs are rare. You understood this sentence even if you've never read it before, because you know the meanings of thousands of words (including pink, dogs and rare) and how to construct the meaning of a novel sentence from the meanings of its parts. The ability to construct new meanings by combining words into larger constituents is one of the fundamental and peculiarly human characteristics of language. For decades, scientists in different fields have tried to develop computational systems that understand sentences as humans do. They have, however, failed either the challenge of coverage (acquiring the meaning of thousands of words) or that of compositionality (putting together the parts to reconstruct the meaning of new sentences).

COMPOSES tackles the meaning induction and composition problem from a new perspective that brings together corpus-based distributional semantics (that is very successful at inducing the meaning of single content words, but ignores functional elements and compositionality) and formal semantics (that focuses on functional elements and composition, but largely ignores lexical aspects of meaning and lacks methods to learn the proposed structures from data). As in distributional semantics, we represent some content words (such as nouns) by vectors recording their corpus contexts. Implementing ideas from formal semantics, functional elements (such as determiners) are represented by functions mapping from expressions of one type onto composite expressions of the same or other types. These composition functions are induced from corpus data by statistical learning of mappings from observed context vectors of input arguments to observed context vectors of composite structures. We model a number of compositional processes in this way, developing a coherent fragment of the semantics of English in a large-scale data-driven fashion.

Given the novelty of the approach, we also propose several new evaluation frameworks: On the one hand, we take inspiration from cognitive science and psycholinguistics in designing elicitation methods to measure the perceived similarity and plausibility of sentences (such data will be elicited on a large scale by crowdsourcing). On the other, specialized entailment tests will assess the semantic inference properties of our corpus-induced system.

Team^

COMPOSES is carried out at the CLIC lab, a unit of the University of Trento's Center for Mind/Brain Sciences (CIMeC), in collaboration with the Departments of Computer Science (DISI) and Cognitive Science (DISCoF).

Senior researchers

Post docs

PhD Students

If you are interested in pursuing your doctoral studies with us, please contact us and keep an eye on the CIMeC doctoral school admission page. See a copy of the call with description of the profiles we are seeking here.

Project manager

Milestones^

  1. April 2014: First global evaluation of COMPOSES system
  2. January 2015: Release of semantic space models
  3. October 2015: Semantic norm data set release
  4. July 2016: COMPOSES code toolkit release
  5. October 2016: Second global evaluation of COMPOSES system

Publications, reports and presentations^

Public code and resources^

All the code and resources to be developed by COMPOSES will be publically and freely released. Code will be available via the COMPOSES github repository.

Contact information^

Write to marco baroni AT unitn it.