gokhanercan.com - Gokhan Ercan Personal

Resources > AnlamVer Dataset

Word Similarity and Relatedness Dataset for Turkish.
See the paper "AnlamVer: Semantic Model Evaluation Dataset for Turkish - Similarity and Relatedness" for details.

Downloads

Download Final annotated dataset: anlamver-final.cvs

Download Individual scores of each annotator: anlamver-participants.cvs

This dataset is annotated by the open-source software WSQuest.

Column Names

Column Abbr.	Column Name	Note
QID	QuestionID
W1	Word1
W2	Word2
Sim	Similarity	Participants' average
Rel	Relatedness	Participants' average
S	Similar	Is in (similar) sub-space in Sim-Rel vector space.
D	Dissimilar	""
R	Related	""
U	Unrelated	""
SR	SimilarRelated	""
DR	DissimilarRelated	""
SU	SimilarUnrelated	""
DU	DissimilarUnrelated	""
AVG-C	Average concreteness	Individual concreness values from TKN dataset
W1F	Word1 frequency	Frequency values based on Boun Corpus
W2F	Word2 frequency	Frequency values based on Boun Corpus
AnyOOV	Any out-of-vocabulary(OOV) word exists	OOV values are based on BounCorpus
Two	Is both words OOV	OOV values are based on BounCorpus
EstSyn	EstimatedSynonym	Word-pair estimated as synonyn relation type before the annotation
EstAny	EstimatedAntonym	""
EstRHigh	EstimatedHighRelatedness	""
EstRMed	EstimatedMediumRelatedness	""
EstRLow	EstimatedLowRelatedness	""
EstHyp	EstimatedHyponym	""
EstMer	EstimatedMeronym	""
W1-RWG	RareWord(RW) group of word1	See paper for RW groups. RW groups are assigned by word frequency values.
W2-RWG	RareWord(RW) group of word2	""
RWMin	Minimum group of two words in the word-pair	""
W1-DG	Derivational group of word1	Value represents how many derivations the word has
W2-DG	Derivational group of word2	--
DGMax	Max of derivational groups	Max(W1-DG,W2-DG)
W1-IG	Inflectional group of word1	Value represents how many inflections the word has
W2-IG	Inflectional group of word2	--
IGMax	Max of inflectional groups	Max(W1-IG,W2-IG)

Cite

If you use this resource on your research, please cite the following paper:

Ercan, G. and Yıldız, O.T., 2018. AnlamVer: Semantic Model Evaluation Dataset for Turkish - Word Similarity and Relatedness. In Proceedings of the 27th International Conference on Computational Linguistics (pp. 3819-3836).