The pgvector extension
Enable Postgres as a vector store with the pgvector extension
The pgvector
extension enables you to store vector embeddings and perform vector similarity search in Postgres. It is particularly useful for applications involving natural language processing, such as those built on top of OpenAI's GPT models.
pgvector
supports:
- Exact and approximate nearest neighbor search
- Single-precision, half-precision, binary, and sparse vectors
- L2 distance, inner product, cosine distance, L1 distance, Hamming distance, and Jaccard distance
- Any language with a Postgres client
- ACID compliance, point-in-time recovery, JOINs, and all other Postgres features
This topic describes how to enable the pgvector
extension in Neon and how to create, store, and query vectors.
Enable the pgvector extension
You can enable the pgvector
extension by running the following CREATE EXTENSION
statement in the Neon SQL Editor or from a client such as psql
that is connected to Neon.
For information about using the Neon SQL Editor, see Query with Neon's SQL Editor. For information about using the psql
client with Neon, see Connect with psql.
Create a table to store vectors
To create a table for storing vectors, you would use an SQL command similar to the following. Embeddings are stored in the VECTOR
type column. You can adjust the number of dimensions as needed.
note
The pgvector
extension supports some specialized types other than VECTOR
for storing embeddings. See HNSW vector types, and IVFFlat vector types.
This command generates a table named items
with an embedding
column capable of storing vectors with 3 dimensions. OpenAI's text-embedding-3-small
model supports 1536 dimensions by default for each piece of text, which creates more accurate embeddings for natural language processing tasks. However, using larger embeddings generally costs more and consumes more compute, memory, and storage than using smaller embeddings. To learn more about embeddings and the cost-performance tradeoff, see Embeddings, in the OpenAI documentation.
Storing embeddings
After generating embeddings using a service like OpenAI’s Embeddings API, you can store them in your database. Using a Postgres client library in your preferred programming language, you can execute an INSERT
statement similar to the following to store embeddings.
-
Insert two new rows into the
items
table with the provided embeddings. -
Load vectors in bulk using the
COPY
command:tip
For a Python script that loads embeddings in bulk, refer to this Bulk loading with COPY example provided in the
pgvector
GitHub repository. -
Upsert vectors:
-
Update vectors:
-
Delete vectors:
Querying vectors
To retrieve vectors and calculate similarity, use SELECT
statements and the distance function operators supported by pgvector
.
-
Get the nearest neighbor to a vector by L2 distance:
-
Get the nearest neighbor to a row by L2 distance:
-
Get rows within a certain distance by L2 distance:
note
To use an index with a query, include
ORDER BY
andLIMIT
clauses, as shown in the second query example above.
Distance function operators
<->
- L2 distance<#>
- (negative) inner product<=>
- cosine distance<+>
- L1 distance
note
The inner product operator (<#>
) returns the negative inner product since Postgres only supports ASC
order index scans on operators.
Distance queries
-
Get the distances:
-
For inner product, multiply by
-1
(since<#>
returns the negative inner product): -
For cosine similarity, use
1 -
cosine distance:
Aggregate queries
-
To average vectors:
-
To average groups of vectors:
Indexing vectors
By default, pgvector
performs exact nearest neighbor search, providing perfect recall. Adding an index on the vector column can improve query performance with a minor cost in recall. Unlike typical indexes, you will see different results for queries after adding an approximate index.
Supported index types include:
HNSW
An HNSW index creates a multilayer graph. It has better query performance than an IVFFlat index (in terms of speed-recall tradeoff), but has slower build times and uses more memory. Also, an HNSW index can be created without any data in the table since there isn’t a training step like there is for an IVFFlat index.
HNSW vector types
HNSW indexes are supported with the following vector types:
vector
- up to 2,000 dimensionshalfvec
- up to 4,000 dimensionsbit
- up to 64,000 dimensionssparsevec
- up to 1,000 non-zero elements
note
Notice how indexes are defined differently depending on the distance function being used. For example vector_l2_ops
is specified for L2 distance, vector_ip_ops
for inner product, and so on. Make sure you define your index according to the distance function you intend to use.
-
L2 distance:
-
Inner product:
-
Cosine distance:
-
L1 distance:
-
Hamming distance:
-
Jaccard distance:
HNSW index build options
m
- the max number of connections per layer (16 by default)ef_construction
- the size of the dynamic candidate list for constructing the graph (64
by default)
This example demonstrates how to set the parameters:
A higher value of ef_construction
provides better recall at the cost of index build time and insert speed.
HNSW index query options
You can specify the size of the candidate list for search. The size is 40
by default.
A higher value provides better recall at the cost of speed.
This query shows how to use SET LOCAL
inside a transaction to set ef_search
for a single query:
HNSW index build time
To optimize index build time, consider configuring the maintenance_work_mem
and max_parallel_maintenance_workers
session variables before building an index:
note
Like other index types, it’s faster to create an index after loading your initial data.
-
maintenance_work_mem
Indexes build significantly faster when the graph fits into Postgres
maintenance_work_mem
.A notice is shown when the graph no longer fits:
In Postgres, the
maintenance_work_mem
setting determines the maximum memory allocation for tasks such asCREATE INDEX
. The defaultmaintenance_work_mem
value in Neon is set according to your Neon compute size:Compute Units (CU) vCPU RAM maintenance_work_mem 0.25 0.25 1 GB 64 MB 0.50 0.50 2 GB 64 MB 1 1 4 GB 67 MB 2 2 8 GB 134 MB 3 3 12 GB 201 MB 4 4 16 GB 268 MB 5 5 20 GB 335 MB 6 6 24 GB 402 MB 7 7 28 GB 470 MB 8 8 32 GB 537 MB To optimize
pgvector
index build time, you can increase themaintenance_work_mem
setting for the current session with a command similar to the following:The recommended setting is your working set size (the size of your tuples for vector index creation). However, your
maintenance_work_mem
setting should not exceed 50 to 60 percent of your compute's available RAM (see the table above). For example, themaintenance_work_mem='10 GB'
setting shown above has been successfully tested on a 7 CU compute, which has 28 GB of RAM, as 10 GiB is less than 50% of the RAM available for that compute size. -
max_parallel_maintenance_workers
You can also speed up index creation by increasing the number of parallel workers. The default is
2
.The
max_parallel_maintenance_workers
sets the maximum number of parallel workers that can be started by a single utility command such asCREATE INDEX
. By default, themax_parallel_maintenance_workers
setting is2
. For efficient parallel index creation, you can increase this setting. Parallel workers are taken from the pool of processes established bymax_worker_processes
(10
), limited bymax_parallel_workers
(8
).You can increase the
maintenance_work_mem
setting for the current session with a command similar to the following:For example, if you have a 7 CU compute size, you could set
max_parallel_maintenance_workers
to 7, before index creation, to make use of all of the vCPUs available.For a large number of workers, you may also need to increase the Postgres
max_parallel_workers
, which is8
by default.
Check indexing progress
You can check indexing progress with the following query:
The phases for HNSW are:
- initializing
- loading tuples
For related information, see CREATE INDEX Progress Reporting, in the PostgreSQL documentation.
IVFFlat
An IVFFlat index divides vectors into lists and searches a subset of those lists that are closest to the query vector. It has faster build times and uses less memory than HNSW, but has lower query performance with respect to the speed-recall tradeoff.
Keys to achieving good recall include:
- Creating the index after the table has some data
- Choosing an appropriate number of lists. A good starting point is rows/1000 for up to 1M rows and
sqrt(rows)
for over 1M rows. - Specifying an appropriate number of probes when querying. A higher number is better for recall, and a lower is better for speed. A good starting point is
sqrt(lists)
.
IVFFlat vector types
IVFFlat indexes are supported with the following vector types:
vector
- up to 2,000 dimensionshalfvec
- up to 4,000 dimensions (added in 0.7.0)bit
- up to 64,000 dimensions (added in 0.7.0)
The following examples show how to add an index for each distance function:
note
Notice how indexes are defined differently depending on the distance function being used. For example vector_l2_ops
is specified for L2 distance, vector_cosine_ops
for cosine distance, and so on.
The following examples show how to add an index for each distance function:
-
L2 distance
note
Use
halfvec_l2_ops
for halfvec (and similar with the other distance functions). -
Inner product
-
Cosine distance
-
Hamming distance
IVFFlat query options
You can specify the number of probes, which is 1
by default.
A higher value provides better recall at the cost of speed. You can set the value to the number of lists for exact nearest neighbor search, at which point the planner won’t use the index.
You can also use SET LOCAL
inside a transaction to set the number of probes for a single query:
IVFFlat index build time
To optimize index build time, consider configuring the maintenance_work_mem
and max_parallel_maintenance_workers
session variables before building an index:
note
Like other index types, it’s faster to create an index after loading your initial data.
note
Like other index types, it’s faster to create an index after loading your initial data.
-
maintenance_work_mem
In Postgres, the
maintenance_work_mem
setting determines the maximum memory allocation for tasks such asCREATE INDEX
. The defaultmaintenance_work_mem
value in Neon is set according to your Neon compute size:Compute Units (CU) vCPU RAM maintenance_work_mem 0.25 0.25 1 GB 64 MB 0.50 0.50 2 GB 64 MB 1 1 4 GB 67 MB 2 2 8 GB 134 MB 3 3 12 GB 201 MB 4 4 16 GB 268 MB 5 5 20 GB 335 MB 6 6 24 GB 402 MB 7 7 28 GB 470 MB 8 8 32 GB 537 MB 9 9 36 GB 604 MB 10 10 40 GB 671 MB To optimize
pgvector
index build time, you can increase themaintenance_work_mem
setting for the current session with a command similar to the following:The recommended setting is your working set size (the size of your tuples for vector index creation). However, your
maintenance_work_mem
setting should not exceed 50 to 60 percent of your compute's available RAM (see the table above). For example, themaintenance_work_mem='10 GB'
setting shown above has been successfully tested on a 7 CU compute, which has 28 GB of RAM, as 10 GiB is less than 50% of the RAM available for that compute size. -
max_parallel_maintenance_workers
You can also speed up index creation by increasing the number of parallel workers. The default is
2
.The
max_parallel_maintenance_workers
sets the maximum number of parallel workers that can be started by a single utility command such asCREATE INDEX
. By default, themax_parallel_maintenance_workers
setting is2
. For efficient parallel index creation, you can increase this setting. Parallel workers are taken from the pool of processes established bymax_worker_processes
(10
), limited bymax_parallel_workers
(8
).You can increase the
maintenance_work_mem
setting for the current session with a command similar to the following:For example, if you have a 7 CU compute size, you could set
max_parallel_maintenance_workers
to 7, before index creation, to make use of all of the vCPUs available.For a large number of workers, you may also need to increase the Postgres
max_parallel_workers
, which is8
by default.
Check indexing progress
You can check indexing progress with the following query:
The phases for HNSW are:
- initializing
- loading tuples
For related information, see CREATE INDEX Progress Reporting, in the PostgreSQL documentation.
Filtering
There are a few ways to index nearest neighbor queries with a WHERE
clause:
Create an index on one or more of the WHERE
columns for exact search"
Create a partial index on the vector column for approximate search:
Use partitioning for approximate search on many different values of the WHERE
columns:
Half-precision vectors
Half-precision vectors enable the storage of vector embeddings using 16-bit floating-point numbers, or half-precision, which reduces both storage size and memory usage by nearly half compared 32-bit floats. This efficiency comes with minimal loss in precision, making half-precision vectors beneficial for applications dealing with large datasets or facing memory constraints.
When integrating OpenAI's embeddings, you can take advantage of half-precision vectors by storing embeddings in a compressed format. For instance, OpenAI’s high-dimensional embeddings can be effectively stored with half-precision vectors, achieving high levels of accuracy, such as a 98% rate. This approach optimizes memory usage while maintaining performance.
You can use the halfvec
type to store half-precision vectors, as shown here:
Binary vectors
Binary vector embeddings are a form of vector representation where each component is encoded as a binary digit, typically 0 or 1. For example, the word "cat" might be represented as [0, 1, 0, 1, 1, 0, 0, 1, ...],
with each position in the vector being binary.
These embeddings are advantageous for their efficiency in both storage and computation. Because they use only one bit per dimension, binary embeddings require less memory compared to traditional embeddings that use floating-point numbers. This makes them useful when there is limited memory or when dealing with large datasets. Additionally, operations with binary values are generally quicker than those involving real numbers, leading to faster computations.
However, the trade-off with binary vector embeddings is a potential loss in accuracy. Unlike denser embeddings, which have real-valued entries and can represent subtleties in the data, binary embeddings simplify the representation. This can result in a loss of information and may not fully capture the intricacies of the data they represent.
Use the bit
type to store binary vector embeddings:
Get the nearest neighbors by Hamming distance (added in 0.7.0)
Or (before 0.7.0)
Jaccard distance (<%>
) is also supported with binary vector embeddings.
Binary quantization
Binary quantization is a process that transforms dense or sparse embeddings into binary representations by thresholding vector dimensions to either 0 or 1.
Use expression indexing for binary quantization:
Get the nearest neighbors by Hamming distance:
Re-rank by the original vectors for better recall:
Sparse vectors
Sparse vectors have a large number of dimensions, where only a small proportion are non-zero.
Use the sparsevec
type to store sparse vectors:
Insert vectors:
The format is {index1:value1,index2:value2}/dimensions
and indices start at 1 like SQL arrays.
Get the nearest neighbors by L2 distance:
Differences in behaviour between pgvector 0.5.1 and 0.7.0
Differences in behavior in the following corner cases were found during our testing of pgvector
0.7.0:
Distance between a valid and NULL vector
The distance between a valid and NULL
vector (NULL::vector
) with pgvector
0.7.0 differs from pgvector
0.5.1 when using an HNSW or IVFFLAT index, as shown in the following examples:
HNSW
For the following script, comparing the NULL::vector
to non-null vectors the resulting output changes:
pgvector
0.7.0 output:
pgvector
0.5.1 output:
IVFFLAT
For the following script, comparing the NULL::vector
to non-null vectors the resulting output changes:
pgvector
0.7.0 output:
pgvector
0.5.1 output:
Error messages improvement for invalid literals
If you use an invalid literal value for the vector
data type, you will now see the following error message:
Resources
pgvector
source code: https://github.com/pgvector/pgvector
Need help?
Join our Discord Server to ask questions or see what others are doing with Neon. Users on paid plans can open a support ticket from the console. For more details, see Getting Support.