The pgvector extension
Use the pgvector for vector similarity search in Postgres
The pgvector
extension enables you to store vector embeddings and perform vector similarity search in Postgres. It is particularly useful for applications involving natural language processing, such as those built on top of OpenAI's GPT models.
pgvector
supports:
- Exact and approximate nearest neighbor search
- Single-precision, half-precision, binary, and sparse vectors
- L2 distance, inner product, cosine distance, L1 distance, Hamming distance, and Jaccard distance
- Any language with a Postgres client
- ACID compliance, point-in-time recovery, JOINs, and all other features of Postgres
This topic describes how to enable the pgvector
extension in Neon and how to create, store, and query vectors.
Enable the pgvector extension
You can enable the pgvector
extension by running the following CREATE EXTENSION
statement in the Neon SQL Editor or from a client such as psql
that is connected to Neon.
For information about using the Neon SQL Editor, see Query with Neon's SQL Editor. For information about using the psql
client with Neon, see Connect with psql.
Create a table to store vectors
To create a table for storing vectors, use the following SQL command, adjusting the dimensions as needed.
The command generates a table named items
with an embedding
column capable of storing vectors with 3 dimensions. OpenAI's text-embedding-ada-002
model supports 1536 dimensions for each piece of text, which creates more accurate embeddings for natural language processing tasks. For more information about embeddings, see Embeddings, in the OpenAI documentation.
Storing vectors and embeddings
After you have generated an embedding using a service like the OpenAI API, you can store the resulting vector in your database. Using a Postgres client library in your preferred programming language, you can execute an INSERT
statement similar to the following to store embeddings.
This command inserts two new rows into the items
table with the provided embeddings.
This command loads vectors in bulk using the COPY
command:
For a Python script example, see bulk_loading.py.
This command how how to upserts vectors:
This command shows how to update vectors:
This command shows how to delete vectors:
Querying vectors
To retrieve vectors and calculate similarity, use SELECT
statements and the built-in vector operators. For instance, you can find the top 5 most similar items to a given embedding using the following query:
This query computes the Euclidean distance (L2 distance) between the given vector and the vectors stored in the items table, sorts the results by the calculated distance, and returns the top 5 most similar items.
Supported distance functions include:
<->
- L2 distance<#>
- (negative) inner product<=>
- cosine distance<+>
- L1 distance (added in 0.7.0)
note
The inner product operator (<#>
) returns the negative inner product since Postgres only supports ASC
order index scans on operators.
Get the nearest neighbors to a row:
Get rows within a certain distance:
note
Combine with ORDER BY and LIMIT to use an index
Get the distance:
For inner product, multiply by -1
(since <#>
returns the negative inner product):
For cosine similarity, use 1 - cosine distance:
To average vectors:
To average groups of vectors:
Indexing vectors
By default, pgvector
performs exact nearest neighbor search, providing perfect recall. Adding an index on the vector column can improve query performance with a minor cost in recall.
Supported index types include:
HNSW
An HNSW index creates a multilayer graph. It has better query performance than IVFFlat (in terms of speed-recall tradeoff), but has slower build times and uses more memory. Also, an index can be created without any data in the table since there isn’t a training step like IVFFlat.
The following examples show how to add an HNSW index for the supported distance functions. The supported types include:
vector
- up to 2,000 dimensionshalfvec
- up to 4,000 dimensions (added in 0.7.0)bit
- up to 64,000 dimensions (added in 0.7.0)sparsevec
- up to 1,000 non-zero elements (added in 0.7.0)
L2 distance
note
Use halfvec_l2_ops
for halfvec
and sparsevec_l2_ops
for sparsevec
(and similar for the other distance functions).
Inner product
Cosine distance
L1 distance - added in 0.7.0
Hamming distance - added in 0.7.0
Jaccard distance - added in 0.7.0
HNSW index options
m
- the max number of connections per layer (16 by default)ef_construction
- the size of the dynamic candidate list for constructing the graph (64 by default)
This example demonstrates how to set the parameters:
A higher value of ef_construction
provides better recall at the cost of index build time and insert speed.
HNSW query options
You can specify the size of the dynamic candidate list for search. The size is 40 by default.
A higher value provides better recall at the cost of speed.
This query shows how to use SET LOCAL
inside a transaction to set ef_search
for a single query:
IVFFlat
An IVFFlat index divides vectors into lists and searches a subset of those lists that are closest to the query vector. It has faster build times and uses less memory than HNSW, but has lower query performance with respect to the speed-recall tradeoff.
Keys to achieving good recall include:
- Creating the index after the table has some data
- Choosing an appropriate number of lists. A good starting point is rows/1000 for up to 1M rows and
sqrt(rows)
for over 1M rows. - Specify an appropriate number of probes when querying. A higher number is better for recall, and a lower is better for speed. A good starting point is
sqrt(lists)
.
Supported types include:
vector
- up to 2,000 dimensionshalfvec
- up to 4,000 dimensions (added in 0.7.0)bit
- up to 64,000 dimensions (added in 0.7.0)
The following examples show how to add an index for each distance function:
L2 distance
CREATE INDEX ON items USING ivfflat (embedding vector_l2_ops) WITH (lists = 100);
note
Use halfvec_l2_ops
for halfvec (and similar with the other distance functions).
Inner product
Cosine distance
Hamming distance - added in 0.7.0
IVFFlat query options
You can specify the number of probes, which is 1 by default.
A higher value provides better recall at the cost of speed. You can set the value to the number of lists for exact nearest neighbor search, at which point the planner won’t use the index.
You can also use SET LOCAL
inside a transaction to set the number of probes for a single query:
Optimizing index build time
To optimize index build time, consider configuring the following session variables prior to building an index:
maintenance_work_mem
In Postgres, the maintenance_work_mem
setting determines the maximum memory allocation for tasks such as CREATE INDEX
. The default maintenance_work_mem
value in Neon is set according to your Neon compute size:
Compute Units (CU) | vCPU | RAM | maintenance_work_mem |
---|---|---|---|
0.25 | 0.25 | 1 GB | 64 MB |
0.50 | 0.50 | 2 GB | 64 MB |
1 | 1 | 4 GB | 67 MB |
2 | 2 | 8 GB | 134 MB |
3 | 3 | 12 GB | 201 MB |
4 | 4 | 16 GB | 268 MB |
5 | 5 | 20 GB | 335 MB |
6 | 6 | 24 GB | 402 MB |
7 | 7 | 28 GB | 470 MB |
8 | 8 | 32 GB | 537 MB |
To optimize pgvector
index build time, you can increase the maintenance_work_mem
setting for the current session with a command similar to the following:
The recommended setting is your working set size (the size of your tuples for vector index creation). However, your maintenance_work_mem
setting should not exceed 50 to 60 percent of your compute's available RAM (see the table above). For example, the maintenance_work_mem='10 GB'
setting shown above has been successfully tested on a 7 CU compute, which has 28 GB of RAM, as 10 GiB is less than 50% of the RAM available for that compute size.
max_parallel_maintenance_workers
The max_parallel_maintenance_workers
sets the maximum number of parallel workers that can be started by a single utility command such as CREATE INDEX
. By default, the max_parallel_maintenance_workers
setting is 2
. For efficient parallel index creation, you can increase this setting. Parallel workers are taken from the pool of processes established by max_worker_processes
(10
), limited by max_parallel_workers
(8
).
You can increase the maintenance_work_mem
setting for the current session with a command similar to the following:
For example, if you have a 7 CU compute size, you could set max_parallel_maintenance_workers
to 7, before index creation, to make use of all of the vCPUs available.
Differences in behaviour between pgvector 0.5.1 and 0.7.0
Differences in behavior in the following corner cases were found during our testing of pgvector
0.7.0:
Distance between a valid and NULL vector
The distance between a valid and NULL
vector (NULL::vector
) with pgvector
0.7.0 differs from pgvector
0.5.1 when using an HNSW or IVFFLAT index, as shown in the following examples:
HNSW
For the following script, comparing the NULL::vector
to non-null vectors the resulting output changes:
pgvector
0.7.0 output:
pgvector
0.5.1 output:
IVFFLAT
For the following script, comparing the NULL::vector
to non-null vectors the resulting output changes:
pgvector
0.7.0 output:
pgvector
0.5.1 output:
Error messages improvement for invalid literals
If you use an invalid literal value for the vector
data type, you will now see the following error message:
Resources
pgvector
source code: https://github.com/pgvector/pgvector
Need help?
Join our Discord Server to ask questions or see what others are doing with Neon. Users on paid plans can open a support ticket from the console. For more detail, see Getting Support.
Last updated on