Why worth use Synerise Cleora?

Key technical features of Cleora embeddings

The embeddings produced by Cleora are different from those produced by Node2vec, Word2vec, DeepWalk or other systems in this class by a number of key properties:

  • efficiency - Cleora is two orders of magnitude faster than Node2Vec or DeepWalk

  • inductivity - as Cleora embeddings of an entity are defined only by interactions with other entities, vectors for new entities can be computed on-the-fly

  • updatability - refreshing a Cleora embedding for an entity is a very fast operation allowing for real-time updates without retraining

  • stability - all starting vectors for entities are deterministic, which means that Cleora embeddings on similar datasets will end up being similar. Methods like Word2vec, Node2vec or DeepWalk return different results with every run.

  • cross-dataset compositionality - thanks to stability of Cleora embeddings, embeddings of the same entity on multiple datasets can be combined by averaging, yielding meaningful vectors

  • dim-wise independence - thanks to the process producing Cleora embeddings, every dimension is independent of others. This property allows for efficient and low-parameter method for combining multi-view embeddings with Conv1d layers.

  • extreme parallelism and performance - Cleora is written in Rust utilizing thread-level parallelism for all calculations except input file loading. In practice this means that the embedding process is often faster than loading the input data.

Key usability features of Cleora embeddings

The technical properties described above imply good production-readiness of Cleora, which from the end-user perspective can be summarized as follows:

  • heterogeneous relational tables can be embedded without any artificial data pre-processing

  • mixed interaction + text datasets can be embedded with ease

  • cold start problem for new entities is non-existent

  • real-time updates of the embeddings do not require any separate solutions

  • multi-view embeddings work out of the box

  • temporal, incremental embeddings are stable out of the box, with no need for re-alignment, rotations or other methods

  • extremely large datasets are supported and can be embedded within seconds / minutes