Skip to main content

Milvus - Vector Store

Use Milvus as a vector store for RAG.

Quick Startโ€‹

You need three things:

  1. A Milvus instance (cloud or self-hosted)
  2. An embedding model (to convert your queries to vectors)
  3. A Milvus collection with vector fields

Usageโ€‹

from litellm import vector_stores
import os

# Set your credentials
os.environ["MILVUS_API_KEY"] = "your-milvus-api-key"
os.environ["MILVUS_API_BASE"] = "https://your-milvus-instance.milvus.io"

# Search the vector store
response = vector_stores.search(
vector_store_id="my-collection-name", # Your Milvus collection name
query="What is the capital of France?",
custom_llm_provider="milvus",
litellm_embedding_model="azure/text-embedding-3-large",
litellm_embedding_config={
"api_base": "your-embedding-endpoint",
"api_key": "your-embedding-api-key",
"api_version": "2025-09-01"
},
milvus_text_field="book_intro", # Field name that contains text content
api_key=os.getenv("MILVUS_API_KEY"),
)

print(response)
from litellm import vector_stores

response = await vector_stores.asearch(
vector_store_id="my-collection-name",
query="What is the capital of France?",
custom_llm_provider="milvus",
litellm_embedding_model="azure/text-embedding-3-large",
litellm_embedding_config={
"api_base": "your-embedding-endpoint",
"api_key": "your-embedding-api-key",
"api_version": "2025-09-01"
},
milvus_text_field="book_intro",
api_key=os.getenv("MILVUS_API_KEY"),
)

print(response)

Advanced Optionsโ€‹

from litellm import vector_stores

response = vector_stores.search(
vector_store_id="my-collection-name",
query="What is the capital of France?",
custom_llm_provider="milvus",
litellm_embedding_model="azure/text-embedding-3-large",
litellm_embedding_config={
"api_base": "your-embedding-endpoint",
"api_key": "your-embedding-api-key",
},
milvus_text_field="book_intro",
api_key=os.getenv("MILVUS_API_KEY"),
# Milvus-specific parameters
limit=10, # Number of results to return
offset=0, # Pagination offset
dbName="default", # Database name
annsField="book_intro_vector", # Vector field name
outputFields=["id", "book_intro", "title"], # Fields to return
filter='book_id > 0', # Metadata filter expression
searchParams={"metric_type": "L2", "params": {"nprobe": 10}}, # Search parameters
)

print(response)

Required Parametersโ€‹

ParameterTypeDescription
vector_store_idstringYour Milvus collection name
custom_llm_providerstringSet to "milvus"
litellm_embedding_modelstringModel to generate query embeddings (e.g., "azure/text-embedding-3-large")
litellm_embedding_configdictConfig for the embedding model (api_base, api_key, api_version)
milvus_text_fieldstringField name in your collection that contains text content
api_keystringYour Milvus API key (or set MILVUS_API_KEY env var)
api_basestringYour Milvus API base URL (or set MILVUS_API_BASE env var)

Optional Parametersโ€‹

ParameterTypeDescription
dbNamestringDatabase name (default: "default")
annsFieldstringVector field name to search (default: "book_intro_vector")
limitintegerMaximum number of results to return
offsetintegerPagination offset
filterstringFilter expression for metadata filtering
groupingFieldstringField to group results by
outputFieldslistList of fields to return in results
searchParamsdictSearch parameters like metric type and search parameters
partitionNameslistList of partition names to search
consistencyLevelstringConsistency level for the search

Supported Featuresโ€‹

FeatureStatusNotes
Loggingโœ… SupportedFull logging support available
GuardrailsโŒ Not Yet SupportedGuardrails are not currently supported for vector stores
Cost Trackingโœ… SupportedCost is $0 for Milvus searches
Unified APIโœ… SupportedCall via OpenAI compatible /v1/vector_stores/search endpoint
PassthroughโŒ Not yet supported

Response Formatโ€‹

The response follows the standard LiteLLM vector store format:

{
"object": "vector_store.search_results.page",
"search_query": "What is the capital of France?",
"data": [
{
"score": 0.95,
"content": [
{
"text": "Paris is the capital of France...",
"type": "text"
}
],
"file_id": null,
"filename": null,
"attributes": {
"id": "123",
"title": "France Geography"
}
}
]
}

How It Worksโ€‹

When you search:

  1. LiteLLM converts your query to a vector using the embedding model you specified
  2. It sends the vector to your Milvus instance via the /v2/vectordb/entities/search endpoint
  3. Milvus finds the most similar documents in your collection using vector similarity search
  4. Results come back with distance scores

The embedding model can be any model supported by LiteLLM - Azure OpenAI, OpenAI, Bedrock, etc.