#### Previous topic

Similarity queries

#### Next topic

Practice with realistic data

# Evaluation of similarity scores¶

One common usage of the similarity scores for words and phrases that distributional semantic models produce is to compare them to gold standard similarity values, e.g., those elicited from humans in experiments. DISSECT currently supports 3 standard measures to compare the model scores against other numerical values: Pearson and Spearman correlation and AUC.

## Python code¶

The toy input file word_sims.txt contains word pairs with “gold standard” scores.

#ex20.py
#-------
from composes.utils import io_utils
from composes.utils import scoring_utils
from composes.similarity.cos import CosSimilarity

#compute similarities of a list of word pairs
fname = "data/in/word_sims.txt"
predicted = my_space.get_sims(word_pairs, CosSimilarity())

#compute correlations
print "Spearman"
print scoring_utils.score(gold, predicted, "spearman")
print "Pearson"
print scoring_utils.score(gold, predicted, "pearson")


## On the command line¶

The following script can be used to evaluate similarity scores against some gold standard.

Usage:

python2.7 evaluate_similarities.py [options] [config_file]

Options:

-i, --input input_file

Input file containing the gold and predicted similarity scores (as produced, for example, by running the compute_similarities.py script on a list of pairs already annotated with gold scores). One of -i or –in_dir has to be provided.

-c, --columns columns_in_the_input_file

Columns in the input file containing the gold and the predicted similarity scores. For example -c 3,4 if the gold score is in field 3 and the model-generated similarity score in field 4 (relative order of gold and similarity score does not matter, but it has to be consistent across lines, of course).

--in_dir input_directory

When provided, all the files in this directory are treated as input files (they should be in same format as for -i) and evaluated. One of -i or –in_dir has to be provided.

-m, --correlation_measure correlation_measures

List of comma-separated correlation measures. Example: pearson,spearman. Correlation measure must be one of auc, pearson or spearman.

--filter filter_string

If in_dir is provided, it acts as a filter on the files in this directory: file names not containing the filter string are ignored. Optional, by default no filter is used.

-l, --log file

Logger output file. Optional, by default no logging output is produced.

-h, --help

Displays help message.

Examples:

python2.7 evaluate_similarities.py -i ../examples/data/in/sim_data.txt -c 3,5 -m pearson,spearman
python2.7 evaluate_similarities.py --in_dir ../examples/data/in/ --filter sim_data -c 3,5 -m pearson,spearman


Here is how the output of the second command (sent to standard output) looks like:

sim_data.txt
CORRELATION:pearson
-0.988618
CORRELATION:spearman
-0.866025
sim_data2.txt
CORRELATION:pearson
-0.150445
CORRELATION:spearman
-0.500000
sim_data3.txt
CORRELATION:pearson
-0.988618
CORRELATION:spearman
-0.866025