COMPOSES

Compositional Operations in Semantic Space is a 5-year European Research Council project (nr. 283554) that started on November 1st, 2011. COMPOSES is funded within the 7th Framework Program by an ERC 2011 Starting Independent Research Grant (SH4: The Human Mind and Its Complexity panel).



Project outline | Team | Milestones | Publications, reports, presentations and associated data | Software and resources | Contact information | Acknowledgments

Project outline^

Pink dogs are rare. You understood this sentence even if you've never read it before, because you know the meanings of thousands of words (including pink, dogs and rare) and how to construct the meaning of a novel sentence from the meanings of its parts. The ability to construct new meanings by combining words into larger constituents is one of the fundamental and peculiarly human characteristics of language. For decades, scientists in different fields have tried to develop computational systems that understand sentences as humans do. They have, however, failed either the challenge of coverage (acquiring the meaning of thousands of words) or that of compositionality (putting together the parts to reconstruct the meaning of new sentences).

COMPOSES tackles the meaning induction and composition problem from a new perspective that brings together corpus-based distributional semantics (that is very successful at inducing the meaning of single content words, but ignores functional elements and compositionality) and formal semantics (that focuses on functional elements and composition, but largely ignores lexical aspects of meaning and lacks methods to learn the proposed structures from data). As in distributional semantics, we represent some content words (such as nouns) by vectors recording their corpus contexts. Implementing ideas from formal semantics, functional elements (such as determiners) are represented by functions mapping from expressions of one type onto composite expressions of the same or other types. These composition functions are induced from corpus data by statistical learning of mappings from observed context vectors of input arguments to observed context vectors of composite structures. We model a number of compositional processes in this way, developing a coherent fragment of the semantics of English in a large-scale data-driven fashion.

Given the novelty of the approach, we also propose several new evaluation frameworks: On the one hand, we take inspiration from cognitive science and psycholinguistics in designing elicitation methods to measure the perceived similarity and plausibility of sentences (such data will be elicited on a large scale by crowdsourcing). On the other, specialized entailment tests assess the semantic inference properties of our corpus-induced system.

The following article sketches the approach we are implementing in COMPOSES in some detail:

  • M. Baroni, R. Bernardi and R. Zamparelli. 2014. Frege in space: A program for compositional distributional semantics. Linguistic Issues in Language Technologies 9(6): 5-110.
  • Team^

    COMPOSES is carried out at the CLIC lab, a unit of the University of Trento's Center for Mind/Brain Sciences (CIMeC), in collaboration with the Departments of Computer Science (DISI) and Cognitive Science (DiPSCo).

    Senior researchers

    Post docs

    PhD Students

    Project manager

    Milestones^

    1. April 2014: First global evaluation of COMPOSES system
    2. January 2015: Release of semantic space models
    3. October 2015: Semantic norm data set release
    4. July 2016: COMPOSES code toolkit release
    5. October 2016: Second global evaluation of COMPOSES system

    Publications, reports, presentations and associated data^

    Software and resources^

    We are developing the DISSECT toolkit to construct and compose distributional semantic representations.

    We also developed the SICK data set for large-scale evaluation of compositional semantic models. The data set constitutes the basis of SEMEVAL 2014 Task 1.

    We collected a large data set of subject ratings of semantic plausibility of adjective-noun phrases.

    See the publications above for links to other data sets that we make publicly available and the corresponding reference papers.

    Contact information^

    Write to marco baroni AT unitn it.

    Acknowledgments^

    We gratefully acknowledge the European Commission and European Research Council for the COMPOSES Starting Independent Research Grant funded under the 7th Framework Program.