Code and data for the monolingual generation experiments in:
Georgiana Dinu and Marco Baroni. 2014.
How to make words with vectors: Phrase generation in distributional semantics
- Requires: Python2.7, numpy and DISSECT
- Download: code+data
- Run: python example.py data/phrases.txt
It processes the lines in data/phrases.txt.
A line contains the phrase to be composed and the syntactic pattern to be generated. E.g.:
- [american-JJ man-NN AN] [[IN [JJ N AN] PN] NN AN]
Composes american-JJ and man-NN using the AN (adjective-noun) composition function. Decomposes into a noun with a prepositional phrase modfier. In this case it generates man in american country. More generally, a triple [C1 C2 COMPFUNC] means compose/decompose C1 and C2 (C1 and C2 are pos tags for decomposition (or * for any pos tag in the lexicon) and lexical items joint with pos tags for composition. COMPFUNC stands for the composition function to be used.
- Two previously learned COMPFUNCs are available: AN for noun phrase modification and PN for preposition noun phrase combinations.
- Vectors are learned with word2vec on BNC and WackyPedia/ukWaC
COMPOSES main page