ragnar_retrieve_vss {ragnar}R Documentation

Vector Similarity Search Retrieval

Description

Computes a similarity measure between the query and the document embeddings and uses this similarity to rank and retrieve document chunks.

Usage

ragnar_retrieve_vss(
  store,
  query,
  top_k = 3L,
  ...,
  method = "cosine_distance",
  query_vector = store@embed(query),
  filter
)

Arguments

store

A RagnarStore object returned by ragnar_store_connect() or ragnar_store_create().

query

Character. The query string to embed and use for similarity search.

top_k

Integer. Maximum number of document chunks to retrieve. Defaults to 3.

...

Additional arguments passed to methods.

method

Character. Similarity method to use: "cosine_distance", "euclidean_distance", or "negative_inner_product". Defaults to "cosine_distance".

query_vector

Numeric vector. The embedding for query. Defaults to store@embed(query).

filter

Optional. A filter expression evaluated with dplyr::filter().

Details

Supported methods:

If filter is supplied, the function first performs the similarity search, then applies the filter in an outer SQL query. It uses the HNSW index when possible and falls back to a sequential scan for large result sets or filtered queries.

Value

A tibble with the top_k retrieved chunks, ordered by metric_value.

Note

The results are not re-ranked after identifying the unique values.

See Also

Other ragnar_retrieve: ragnar_retrieve(), ragnar_retrieve_bm25(), ragnar_retrieve_vss_and_bm25()

Examples


## Build a small store with categories
store <- ragnar_store_create(
  embed = \(x) ragnar::embed_openai(x, model = "text-embedding-3-small"),
  extra_cols = data.frame(category = character()),
  version = 1 # store text chunks directly
)

ragnar_store_insert(
  store,
  data.frame(
    category = c(rep("pets", 3), rep("dessert", 3)),
    text     = c("playful puppy", "sleepy kitten", "curious hamster",
                 "chocolate cake", "strawberry tart", "vanilla ice cream")
  )
)
ragnar_store_build_index(store)

# Top 3 chunks without filtering
ragnar_retrieve(store, "sweet")

# Combine filter with similarity search
ragnar_retrieve(store, "sweet", filter = category == "dessert")


[Package ragnar version 0.2.0 Index]