Abstract: |
We apply real-valued word vectors combined with two different types of classifiers (linear discriminant analy-
sis and feed-forward neural network) to scrutinize whether basic nominal categories can be captured by simple
word embedding models. We also provide a linguistic analysis of the errors generated by the classifiers. The
targeted language is Swedish, in which we investigate three nominal aspects: uter/neuter, common/proper, and
count/mass. They represent respectively grammatical, semantic, and mixed types of nominal classification
within languages. Our results show that word embeddings can capture typical grammatical and semantic fea-
tures such as uter/neuter and common/proper nouns. Nevertheless, the model encounters difficulties to identify
classes such as count/mass which not only combine both grammatical and semantic properties, but are also
subject to conversion and shift. Hence, we answer the call of the Special Session on Natural Language Process-
ing in Artificial Intelligence by approaching the topic of interfaces between morphology, lexicon, semantics,
and syntax via interdisciplinary methods combining machine learning of language and general linguistics. |