RusVectōrēs 2.0: Christmas Edition

The holidays are coming, and we are ready to release a new version of our service: RusVectōrēs 2.0. You may consider it as a gift to all the users and all the people interested in distributional semantics.

For those who were not aware of our service: : RusVectōrēs computes semantic relations between words in Russian. Semantic map

How is that done? In distributional semantics, words are usually represented as vectors in a multi-dimensional space of their contexts. Semantic similarity between two words is then trivially calculated as a cosine similarity between their corresponding vectors; it takes values between -1 and 1. 0 value means the words lack similar contexts, and thus their meanings are unrelated to each other. 1 value means that the words' contexts are absolutely identical, and thus their meaning is very similar.

RusVectōrēs allows to work with word vectors in the neural embedding models we trained on Russian National Corpus, news corpus and web corpus. Users can compute semantic associates of a given word, find a cosine similarity coefficient between a pair of words, perform simple algebraic operations on vectors. The models are trained using Skip-Gram and CBOW algorithms introduced in a well known word2vec tool.

We previously presented our service at the workshop "Quantitative Approaches to the Russian Language" in Helsinki in August and at the AINL-FRUCT tutorial on distributional semantics in Saint-Peterburg in November. Since then, 0we have significantly improved RusVectōrēs services, and you have even more possibilities for research! The main changes in the new release are the following::

  1. Lexical vector visualizationWe provide a simple API to query the service automatically! API allows to get the list of semantic associates for a given word in a given model. Perform GET requests to URLs following the pattern https://rusvectores.org/MODEL/WORD/api where MODEL is the identifier for the chosen model and WORD is the query word. We will return a tab-separated text file with the first 10 associates.
  2. Our web service now features visualizations for semantic relations between words. A user enters words, and the service builds a map of their inter-relations in the chosen model, and then returns 2-dimensional version of this map (projected from high-dimensional vector space).
  3. Visualizations of vectors for particular words in particular models are available at their unique URIs.
  4. The semantic calculator now is capable of two kinds of operations. First, it solves a proportion "find a word D related to the word C in the same way as the word A is related to the word B" (analogical inference). Second, it performs algebraic operations on vectors (addition, subtraction, finding the center of a lexical cluster).
  5. We remind that users can train their own neural embedding models with predefined settings on their own corpora, using our server.
  6. Finally, you can follow our RSS feed and always be aware of the recent changes!

We wish that your research were unlimited by the complexity of computations! Happy holidays!

RusVectōrēs Team:
Andrey Kutuzov (University of Oslo, Higher School of Economics)

Елизавета Кузьменко (Higher School of Economics)