This service computes semantic relations between words in Russian. It is named after RusCorpora, the site for the Russian National Corpus. They provide access to corpora, we provide access to semantic vectors (vectōrēs in Latin). These vectors reflect meaning based on word co-occurrence distribution in the training corpus (huge amounts of raw linguistic data).

In distributional semantics, words are usually represented as vectors in a multi-dimensional space of their contexts. Semantic similarity between two words is then calculated as a cosine similarity between their corresponding vectors; it takes values between -1 and 1 (usually only values above 0 are used in practical tasks). 0 value roughly means the words lack similar contexts, and thus their meanings are unrelated to each other. 1 value means that the words' contexts are absolutely identical, and thus their meaning is very similar.

Recently, distributional semantics received a substantially growing attention. The main reason for this is a very promising approach of employing the so-called predictive models to learn hiqh-quality dense vectors (embeddings). These models often are trained using shallow artifical neural networks. The most well-known tool in this field now is possibly word2vec, which allows very fast training, compared to previous approaches.

Word2vec's Continuous Bag-of-Words and Continuous Skipgram algorithms (and other similar methods) are being extensively studied and tested in application to English language. However, the number of relevant publications for Russian is still low. Thus, it is important to provide access to relevant tools and models for Russian linguistic community.

Unfortunately, training and querying word embedding models for large corpora can be computationally expensive. Thus, we provide ready-made models trained on several Russian corpora, and a convenient web interface to query them. You can also download the models to process them on your own. Moreover, our web service features a bunch of (hopefully) useful visualizations for semantic relations between words. In general, the reason behind RusVectōrēs is to lower the entry threshold for those who want to work in this new and exciting field.

What RusVectōrēs can do?

RusVectōrēs is basically a tool to explore relations between words in distributional models. You can think about it as a kind of `semantic calculator'. A user can choose one or several models to work with: currently we provide several models trained on different corpora (some of them have won top-ranking positions in the RUSSE evaluation track). The models contain from 120K to 400K lemmas each.

After choosing a model, it is possible to:

  1. calculate semantic similarity between pairs of words;
  2. find words semantically closest to the query word (optionally with part-of-speech filters);
  3. perform analogical inference: find a word X which is related to the word Y in the same way as the word A is related to the word B;
  4. apply simple algebraic operations to word vectors (addition, subtraction, finding average vector for a group of words and distances to this average value);
  5. draw semantic maps of relations between input words (it is useful to explore clusters and oppositions, or to test your hypotheses about them);
  6. get the raw vectors (arrays of real values) and their visualizations for words in the chosen model: just click on any word anywhere, or use a direct URI to the word of interest, as described below.

In the spirit of Semantic Web, each word in each model has its own unique URI explicitly stating lemma, model and part of speech (for example, http://rusvectores.org/en/ruwikiruscorpora/алгоритм_NOUN/). Web pages at these URIs contain lists of the nearest semantic associates for the corresponding word, belonging to the same part of speech as the word itself. Other information about the word is also shown.

We also provide a simple API to get the list of semantic associate for a given word in a given model. There are two possible formats: json and csv. Perform GET requests to URLs following the pattern http://rusvectores.org/MODEL/WORD/api/FORMAT where MODEL is the identifier for the chosen model, WORD is the query word and FORMAT is "csv" or "json", depending on the output format you need. We will return a json file or a tab-separated text file with the first 10 associates.

Additionally, you can get semantic similarities for word pairs in any of the provided models via queries of the following format: http://rusvectores.org/MODEL/WORD1__WORD2/api/similarity/ (note 2 underscore signs).

We recommend to experiment with algebraic operations on vectors, as they return interesting results. For example, the model trained on Russian National Corpus returns существование if we subtract любовь from жизнь. This may sound like something not very practical, but the existing research in English models has already proved that such relationships can be useful for many applications including machine translation.

Naturally, one can compare results from different models on one screen.

We would like RusVectōrēs to become a hub of scholarly knowledge about word embedding models for Russian, that's why there is a section with published academic papers and links to other relevant resources. At the same time, we hope that RusVectōrēs will also popularize distributional semantics and computational linguistics, making it more understandable and attractive to the Russian-speaking public.


This service runs on WebVectors, free and open source toolkit for serving distributional semantic models over the web.

Paper about WebVectors

You can also check a sister service for English and Norwegian.


If you are interested in distributional semantic models, you should really check these publications (in the chronological order):

Articles on distributional semantics

  1. Bybee, J. Frequency of use and the organization of language. (2006)
  2. Turney, P. D., P. Pantel (2010). “From frequency to meaning: Vector space models of semantics”. Journal of artificial intelligence research, 37(1), 141-188.
  3. Řehůřek, Radim, and Petr Sojka. Software framework for topic modelling with large corpora. // in Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks (2010).
  4. Panchenko A., et al.. "Serelex: Search and Visualization of Semantically Related Words”. In Proceedings of the 35th European Conference on Information Retrieval (ECIR 2013). Springer's Lecture Notes on Computer Science, 2013, Moscow (Russia).
  5. Mikolov, T., et al. Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013).
  6. Mikolov, Tomas, et al. “Exploiting similarities among languages for machine translation.” arXiv preprint arXiv:1309.4168 (2013).
  7. Baroni, Marco, et al. "Don’t count, predict! A systematic comparison of context-counting vs. context-predicting semantic vectors.” Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics. Vol. 1. (2014)
  8. Pennington, J., et al. "Glove: Global Vectors for Word Representation." EMNLP. Vol. 14. 2014.
  9. Kutuzov, Andrey and Kuzmenko, Elizaveta. “Comparing Neural Lexical Models of a Classic National Corpus and a Web Corpus: The Case for Russian”. A. Gelbukh (Ed.): CICLing 2015, Part I, Springer LNCS 9041, pp. 47–58, 2015. DOI: 10.1007/978-3-319-18111-0_4
  10. Bartunov Sergey et al. “Breaking Sticks and Ambiguities with Adaptive Skip-gram”. Eprint arXiv:1502.07257, 02/2015
  11. O. Levy, Y. Goldberg, and I. Dagan “Improving Distributional Similarity with Lessons Learned from Word Embeddings”. TACL 2015
  12. Xin Rong “word2vec Parameter Learning Explained”. arXiv preprint arXiv:1411.2738 (2015)
  13. Kutuzov, Andrey and Andreev, Igor. “Texts in, meaning out: neural language models in semantic similarity task for Russian.” Proceedings of the Dialog 2015 Conference, Moscow, Russia (2015)
  14. Panchenko A., et al. "RUSSE: The First Workshop on Russian Semantic Similarity". Proceedings of the Dialogue 2015 conference, Moscow, Russia (2015)
  15. Arefyev N.V., et al. "Evaluating three corpus-based semantic similarity systems for Russian". Proceedings of the Dialogue 2015 conference, Moscow, Russia (2015)
  16. Lopukhin K.A., et al. "The impact of different vector space models and supplementary techniques in Russian semantic similarity task". Proceedings of the Dialogue 2015 conference, Moscow, Russia (2015)
  17. Hamilton, W. L., et al. Diachronic Word Embeddings Reveal Statistical Laws of Semantic Change. arXiv preprint arXiv:1605.09096 (2016).
  18. Hamilton, W. L., et al. Cultural Shift or Linguistic Drift? Comparing Two Computational Measures of Semantic Change. arXiv preprint arXiv:1606.02821 (2016).

Andrey Kutuzov's talk "Distributional semantic models and their applications" (workshop at the Institute for Systems Analysis of Russian Academy of Sciences, 3 March 2017), in Russian:

Papers mentioning our service

  1. Kirillov, A.N., Krizhanovsky, A.A. The model of geometrical structure of a synset. Series "Mathematical modeling and information technologies", V. 08, pp. 45-54, 2016 (in Russian)
  2. Kuznetsov, I. O. Automatic semantic role labeling for Russian. PhD thesis, MSU, 2016 (in Russian).
  3. Kalimoldayev, M. N., Koibagarov, K. C., Pak, A. A., & Zharmagambetov, A. S. The application of the connectionist method of semantic similarity for Kazakh language. In Electronics Computer and Computation (ICECCO), 2015 Twelve International Conference on (pp. 1-3). IEEE.
  4. Kopotev, M., Pivovarova, L., & Kormacheva, D. Constructional generalization over Russian collocations. Memoires de la Societe neophilologique de Helsinki, 2016

Citing us

If you use RusVectōrēs, please cite this paper:

Kutuzov A., Kuzmenko E. (2017) WebVectors: A Toolkit for Building Web Interfaces for Vector Semantic Models. In: Ignatov D. et al. (eds) Analysis of Images, Social Networks and Texts. AIST 2016. Communications in Computer and Information Science, vol 661. Springer, Cham (pdf, bibtex)

We acknowledge support of Mail.ru Group in providing hosting facilities for this service.