Previous topic

Download & Setup

Next topic

Creating a semantic space

This Page


In this tutorial, we guide you first through the creation of some toy semantic spaces. Next, we will generate composed forms and see how to estimate the parameters of the composition models from training examples. You will then use the semantic spaces to measure the similarity of word/phrase pairs and to retrieve their nearest neighbours. The final tutorial page shows how to evaluate similarity scores against a gold standard. (NB: For the purposes of similarity and evaluation computations, it is indifferent whether the input objects are words or phrases).

All steps are first illustrated via Python code, then using the command-line tools (that, not surprisingly, are Python scripts themselves). If you have basic familiarity with Python or similar languages, we recommend you take a look at the Python code even if you intend to use DISSECT mainly via the command-line tools, because the code snippets give a clearer idea of how DISSECT works (and the Python functions provide more flexibility than the canned tools).

Assuming your DISSECT root directory is at $DISSECT, you can find all the (standalone) Python code snippets presented in the tutorial in $DISSECT/src/examples (relative paths in the sample Python code start from this directory). The same directory contains working examples of how the command-line tools can be used (cmd_* files). The example data are in $DISSECT/src/examples/data, and the command-line tools in $DISSECT/dissect/src/pipelines (the relative paths of the command-line examples in the tutorial start from this directory).