EEG Dataset for Workshop on Computational NeurolinguisticsNAACL HLT 2010 Workshops |
To encourage submissions from researchers who do not
have access to brain imaging equipment, the organisers of the Workshop on Computational Neurolinguistics
are distributing neural recordings from semantic processing experiments, together with corresponding corpus
models. This page describes the Trento EEG dataset, recorded from
speakers of Italian during an image naming task. It was originally
reported in the following paper, presented at EMNLP 2009:
Data CollectionEEG data was gathered from native speakers of Italian during a simple behavioural experiment atthe CIMeC/DiSCoF laboratories at Trento University. Seven participants (five male and two female; age range 25-33; all with college education) performed a silent naming task. Each of them was presented on screen with a series of photographs of tools and land mammals, for which they had to think of the most appropriate name. They were not explicitly asked to group the entities into superordinate categories, or to concentrate on their semantic properties, but completing the task involved resolving each picture to its corresponding concept. Images remained on screen until a keyboard response was received from the participant to indicate a suitable label had been found, and presentations were interleaved with three second rest periods. Thirty stimuli in each of the two classes were each presented six times, in random order, to give a total of 360 image presentations in the session. Response rates were over 95%, and a post-session questionnaire determined that participants agreed on image labels in approximately 90% of cases. English terms for the concepts used are listed below (the Italian terms are supplied with the corpus models below - see the concept list file). Mammals: anteater, armadillo, badger, beaver, bison, boar, camel, chamois, chimpanzee, deer, elephant, fox, giraffe, gorilla, hare, hedgehog, hippopotamus, ibex, kangaroo, koala, llama, mole, monkey, mouse, otter, panda, rhinoceros, skunk, squirrel, zebra Tools: Allen key, axe, chainsaw, craft knife, crowbar, file, garden fork, garden trowel, hacksaw, hammer, mallet, nail, paint brush, paint roller, pen knife, pick axe, plaster trowel, pliers, plunger, pneumatic drill, power drill, rake, saw, scissors, scraper, screw, screwdriver, sickle, spanner, tape measure Corpus ModelsWe supply the three highest performing corpus models described in the EMNLP paper:
EEG DataWe are supplying seven data sets of pre-extracted signal power measures, one for each of the seven experimental participants. The terms of our ethical approval prevent us from allowing uncontrolled distribution of the datasets, so they can be obtained by emailing Brian Murphy to arrange a direct download (after undertaking not to use the data in a manner that is not consistent with the informed consent that participants gave). The datasets are supplied as matlab .mat data files, that can be opened directly with Matlab and Octave, or can be imported into a range of programming platforms.The EEG signals were recorded from 64 scalp locations based on the 10-20 standard montage. Preprocessing involved applying a band-pass filter at 1-50Hz and down-sampling to 120Hz sampling rate. Independent components related to eye-movement artefacts were manually identified and removed. All signal channels were z-score normalised, and a Laplacian sharpening was applied. The features were extracted for each participant session, and are metrics of signal power at a particular scalp location, in a particular frequency band, and at a particular time latency relative to the presentation of each image stimulus. For each stimulus presentation, 14,400 signal power features are extracted: 64 electrode channels by 15 frequency bands (of width 3.3Hz, between 1 and 50Hz) by 15 time intervals (of length 67ms, in the first second after image presentation). The data from each participant is stored in a matlab file of size ~23 MB. Each dataset has the following structure: >> data = open(filename) data = freqResolution: 15 timeResolution: 15 freqLimits: [0 50] timeLimits: [0 120] numChannels: 64 channelLocations: [1x64 struct] epochTriggersSamples: [360x2 double] selectedEpochOrderToTriggers: {1x360 cell} epochPower: [4-D double] >> size(data.epochPower) ans = 360 64 15 15 >> data.epochPower(1,2,3,4) ans = 1.9174e-07 The EEG power features are stored in the epochPower field, as a trial x channel x frequency x time matrix (terminological note: a single experimental presentation of a stimulus is termed a trial, and the corresponding signal an epoch). For example, the signal power estimate stored at data.epochPower(1,2,3,4) is that for the first trial epoch of the experiment, recorded on the second channel, in the third frequency band (6.7 to 9Hz) and the fourth time interval (200 to 267ms). Since the signals have been z-normed, the power estimates do not share a unit scale. [Note that the first three participant sessions used an expanded set of 87 stimuli (total 522 trials), which was whittled down to the 60 that were most reliable (in behavioural terms: ease and uniformity of naming). These extra trials can be ignored.] The presentation order of image stimuli was randomised for each experimental session, and is stored in code form in the selectedEpochOrderToTriggers string cell array. The meaning of these 'trigger' codes is given in the concept list file (updated - an ealier version of this file was incomplete). So the codes shown below indicate that in the experiment in question, the first three image stimuli to be presented were screw, axe, and monkey. Trigger codes in the 1-50 range correspond to animals, and those in the 65-115 range to tools. >> data.selectedEpochOrderToTriggers(1:3) ans = 'S 98' 'S 67' 'S 29' Though the identities and scalp locations of the channels are usually not needed for machine learning purposes, they are supplied here in data.channelLocations using data structures adopted from the EEGLAB package. For example, the first channel recorded activity at scalp location 'Fpz' (front-parietal, central), located at the euclidean coordinates (85,0,-2)mm, relative to the centre of the head: >> data.channelLocations(1) ans = labels: 'Fpz' ... X: 84.9814 Y: 0 Z: -1.7801 ... ContactBrian Murphy, Language, Interaction and Computation Lab, Centre for Mind/Brain Studies, University of Trento |
![]() |