eta.embedding

Tools for creating embeddings of Eta documents and scoring embeddings using cosine similarity.

These classes serve as interfaces for invoking various embedding models or APIs.

Functions

sim

Compute the cosine similarity between vectors.

Classes

DummyEmbedder

An embedder that simply computes empty embeddings.

Embedder

Defines an abstract embedder class.

HFEmbedder

An embedder that uses HuggingFace's API to compute embeddings.

STEmbedder

An embedder that uses a native SentenceTransformer model to compute embeddings.

class Embedder[source]

Bases: object

Defines an abstract embedder class.

An embedder minimally contains a method for embedding a text or list of texts, and a method for scoring a set of documents (possibly with precomputed embeddings) relative to a text.

embed(texts)[source]

Embed a text or list of texts.

Parameters:

texts (str or list[str]) – Either a single text string or a list of text strings to embed.

Returns:

The embedding or embeddings computed from the input.

Return type:

list[float] or list[list[float]]

score(text, documents, embeddings=[])[source]

Score a set of documents relative to a text.

Parameters:
  • text (str) – A query text to use in computing scores for each document.

  • documents (list[str]) – A list of documents to score.

  • embeddings (list[list[float]], optional) – If embeddings for the documents have already been precomputed, passing the embeddings as an argument will bypass creating new embeddings for the documents.

Returns:

Scores for each document.

Return type:

list[float]

class STEmbedder(model='sentence-transformers/all-distilroberta-v1', parallelism=False)[source]

Bases: Embedder

An embedder that uses a native SentenceTransformer model to compute embeddings.

Parameters:
  • model (str) – The name of a SentenceTransformer model to use.

  • parallelism (bool, default=False) – Whether to enable or disable model parallelism.

model
Type:

SentenceTransformer

embed(texts)[source]

Embed a text or list of texts.

Parameters:

texts (str or list[str]) – Either a single text string or a list of text strings to embed.

Returns:

The embedding or embeddings computed from the input.

Return type:

list[float] or list[list[float]]

class HFEmbedder(host='https://api-inference.huggingface.co/pipeline/feature-extraction/', model='sentence-transformers/all-distilroberta-v1')[source]

Bases: Embedder

An embedder that uses HuggingFace’s API to compute embeddings.

Parameters:
  • host (str) – The URL of the embedding API to use.

  • model (str) – The name of the specific model to use.

host
Type:

str

model
Type:

str

url
Type:

str

header
Type:

dict

embed(texts)[source]

Embed a text or list of texts.

Parameters:

texts (str or list[str]) – Either a single text string or a list of text strings to embed.

Returns:

The embedding or embeddings computed from the input.

Return type:

list[float] or list[list[float]]

class DummyEmbedder[source]

Bases: Embedder

An embedder that simply computes empty embeddings.

sim(x, y)[source]

Compute the cosine similarity between vectors.