COMPOSES
Project outline |
Team |
Milestones |
Publications, reports, presentations and associated data |
Software |
Contact information
Project outline^
Pink dogs are rare. You understood this sentence even if
you've never read it before, because you know the meanings of
thousands of words (including pink, dogs
and rare) and how to construct the meaning of a novel sentence
from the meanings of its parts. The ability to construct new meanings
by combining words into larger constituents is one of the fundamental
and peculiarly human characteristics of language. For decades,
scientists in different fields have tried to develop computational
systems that understand sentences as humans do. They have, however,
failed either the challenge of coverage (acquiring the meaning of
thousands of words) or that of compositionality (putting together the
parts to reconstruct the meaning of new sentences).
COMPOSES tackles the meaning induction and composition problem from
a new perspective that brings together corpus-based distributional
semantics (that is very successful at inducing the meaning of single
content words, but ignores functional elements and compositionality)
and formal semantics (that focuses on functional elements and
composition, but largely ignores lexical aspects of meaning and lacks
methods to learn the proposed structures from data). As in
distributional semantics, we represent some content words (such as
nouns) by vectors recording their corpus contexts. Implementing ideas
from formal semantics, functional elements (such as determiners) are
represented by functions mapping from expressions of one type onto
composite expressions of the same or other types. These composition
functions are induced from corpus data by statistical learning of
mappings from observed context vectors of input arguments to observed
context vectors of composite structures. We model a number of
compositional processes in this way, developing a coherent fragment of
the semantics of English in a large-scale data-driven fashion.
Given the novelty of the approach, we also propose several new
evaluation frameworks: On the one hand, we take inspiration from
cognitive science and psycholinguistics in designing elicitation
methods to measure the perceived similarity and plausibility of
sentences (such data will be elicited on a large scale by
crowdsourcing). On the other, specialized entailment tests will assess
the semantic inference properties of our corpus-induced system.
The following article sketches the approach we intend to implement
in COMPOSES in some detail:
M. Baroni, R. Bernardi and
R. Zamparelli. Submitted. Frege in
space: A program for compositional distributional semantics.
Since the article is currently under review, please contact us
before citing it (and we are very happy to get feedback about
it!).
Team^
COMPOSES is carried out at the CLIC lab, a unit of the
University of
Trento's Center for Mind/Brain Sciences
(CIMeC), in
collaboration with the Departments of Computer Science
(DISI) and
Cognitive Science (DiPSCo).
Senior researchers
Post docs
PhD Students
Project manager
Milestones^
- April 2014: First global evaluation of COMPOSES system
- January 2015: Release of semantic space models
- October 2015: Semantic norm data set release
- July 2016: COMPOSES code toolkit release
- October 2016: Second global evaluation of COMPOSES system
Publications, reports, presentations and
associated data^
- M. Baroni, R. Bernardi and
R. Zamparelli. Submitted. Frege
in space: A program for compositional distributional semantics.
(Currently under review, please do not cite without our permission,
and send us any feedback you might have, thank you!)
- A. Lazaridou, M. Marelli, R. Zamparelli and M. Baroni. To
appear. Compositional-ly
derived representations of morphologically complex words in
distributional semantics. Proceedings of ACL 2013 (51st Annual
Meeting of the Association for Computational Linguistics), East
Stroudsburg PA:
ACL.The data
set from this study.
- R. Bernardi, G. Dinu, M. Marelli and M. Baroni. To
appear. A
relatedness benchmark to test the role of determiners in compositional
distributional semantics. Proceedings of ACL 2013 (51st Annual
Meeting of the Association for Computational Linguistics), East
Stroudsburg PA:
ACL.The data
set from this study.
- E. Grefenstette, G. Dinu, Y.-Z. Zhang, M. Sadrzadeh and
M. Baroni. 2013. Multi-step regression learning for compositional
distributional semantics. Proceedings of IWCS 2013 (10th
International Conference on Computational Semantics), East Stroudsburg
PA: ACL: 131-142.
- G. Boleda, M. Baroni, L. McNally and N. Pham. 2013. Intensionality was only
alleged: On adjective-noun composition in distributional
semantics. Proceedings of IWCS 2013 (10th International Conference
on Computational Semantics), East Stroudsburg PA: ACL: 35-46.
- N. Pham, R. Bernardi, Y.-Z. Zhang and M. Baroni. 2013. Sentence
paraphrase detection: When determiners and word order make the
difference. Proceedings of the Towards a Formal Distributional
Semantics Workshop at IWCS
2013, East Stroudsburg PA: ACL: 21-29. The data
sets from this study.
- M. Baroni. 2012. Compositionality in distributional
semantics. Slides for
the EACL 2012
tutorial: one-slide-per-page
or four-slides-per-page
format.
- M. Baroni, R. Bernardi, N. Do and C. Shan. 2012. Entailment above the
word level in distributional semantics. Proceedings of EACL 2012
(13th Conference of the European Chapter of the Association for
Computational Linguistics), East Stroudsburg PA:
ACL, 23-32. The data
sets from this study.
- E. Vecchi, M. Baroni and
R. Zamparelli. 2011. (Linear)
maps of the impossible: Capturing semantic anomalies in distributional
space. Proceedings of the DISCO (Distributional Semantics and
Compositionality) Workshop at ACL 2011, East Stroudsburg PA: ACL,
1-9.
- M. Baroni and R. Zamparelli. 2010. Nouns are vectors,
adjectives are matrices: Representing adjective-noun constructions in
semantic space. Proceedings of the Conference on Empirical Methods
in Natural Language Processing (EMNLP 2010), East Stroudsburg PA: ACL,
1183-1193
Software^
We are developing the DISSECT toolkit to construct and compose distributional semantic representations.
Contact information^
Write to marco baroni AT unitn it.