pgvector vs dedicated vector DBs (Pinecone / Qdrant / Weaviate / Milvus): an in-depth comparison and tech-selection guide

When you start building RAG or semantic search, the first big tech-selection question is "which vector-search foundation should I pick?" Do you get by with pgvector (a PostgreSQL extension), or do you bring in a dedicated vector DB like Pinecone, Qdrant, Weaviate, or Milvus?

This article is a tech-selection guide for making that call with axes, not gut feel. The bottom line up front: the question "which is the most powerful" does not exist. All that exists is "which fits your constraints (scale, consistency requirements, ops capacity, budget)." This piece organizes each product's characteristics and the decision axes, grounded in official sources, so buyers and architects can make this decision without regret.

Rules for this article: pgvector's specs are based on its official documentation. Each dedicated DB's characteristics are based on each vendor's official sources, but all performance numbers (QPS, latency, cost) are treated as "vendor claims" and are not presented as neutral benchmarks (performance swings wildly with data, dimensionality, recall target, and hardware). Product specs change fast, so always confirm against the latest official sources before adopting.

1. The bottom line first: a decision flowchart

Before the fine-grained comparison, 80% of projects are decided by these few questions.

Q1. Do you already run PostgreSQL?
   ├─ No  → Adding Postgres "just for vectors" is backwards.
   │        For a prototype consider Chroma; for production, Qdrant / Pinecone.
   └─ Yes ↓

Q2. Do embeddings need to stay consistent with business data (users, orders, documents)?
   │   (= when you delete a document, its vector should be deleted at the same time, etc.)
   ├─ Yes → pgvector is extremely favorable (same transaction, same backup).
   └─ Either way ↓

Q3. How large will the vectors be for the foreseeable future?
   ├─ Millions to tens of millions of rows  → pgvector can compete fine. First candidate.
   ├─ ~50 million to 100M+                   → consider pgvector + pgvectorscale (StreamingDiskANN).
   └─ Hundreds of millions to billions, high concurrency → a dedicated DB
       (Milvus distributed / Pinecone serverless) has the edge.

Q4. Is there a strict latency SLA at high-concurrency QPS, "independent of the app DB"?
   ├─ Yes → lean to a dedicated DB (in-memory HNSW / Pinecone DRN / Qdrant).
   └─ No  → unlikely to be a problem with pgvector.

The shortest conclusion: "you already have Postgres, you're up to tens of millions of rows, and consistency matters" — for this, the majority case for B2B SaaS and internal tools, pgvector is the first candidate. Because you can start without adding a single new piece of infrastructure (KISS / YAGNI).

2. The seven decision axes for tech selection

"Which fits" comes into view once you evaluate the following axes with your own project's weighting.

① Operational simplicity

If you already run Postgres, pgvector adds zero new infrastructure, monitoring, backups, or on-call surface. That's the biggest lever. A dedicated DB (self-hosted) means committing to operating "one more system." Managed offerings (Pinecone / Zilliz / Qdrant Cloud, etc.) carry the ops for you, but add a vendor and a monthly bill.

② Transactional consistency with business data

This is pgvector's decisive strength. You can update embeddings and business rows in the same ACID transaction, JOIN in SQL, and keep backups consistent. A dedicated DB is a separate system, so you need a sync pipeline between business DB ↔ vector DB, and that's the breeding ground for staleness incidents where "information you thought you deleted still shows up in search." If consistency is a requirement, pgvector has an overwhelming edge.

③ Scale ceiling

This is a direction, not a hard threshold (it's workload-dependent).

pgvector / + pgvectorscale: millions to tens of millions with room to spare. With pgvectorscale's disk-resident index and tuning, some cases push to 50 million to over 100 million.
Dedicated DBs (Milvus distributed, Pinecone serverless, Qdrant clusters): handling hundreds of millions to billions via horizontal sharding is their core job. Here, dedicated DBs are clearly ahead.

④ Latency SLA

If you must hold tight p95/p99 at high-concurrency QPS, and do so independently of the app DB's load, an in-memory HNSW dedicated engine (Pinecone DRN / Qdrant, etc.) has the edge. pgvector is plenty competitive at mid scale, but you should be conscious that the same instance also handles OLTP.

⑤ Metadata filtering capability

Here differences emerge. How fast can you do "semantic search over only this tenant's, only published documents"? Qdrant (filterable HNSW) and pgvectorscale (Filtered/StreamingDiskANN) apply filters during the ANN search — a pre/streaming filter — to avoid the "not enough results once filtered" problem. Plain pgvector has the most expressive power through arbitrary SQL WHERE and JOINs, but high-selectivity filters depend on index design. If complex, high-cardinality filters are central, weigh Qdrant / pgvectorscale heavily.

⑥ Cost model

pgvector: the marginal cost inside your existing Postgres (no new invoice; you pay in RAM/CPU/disk and DBA effort).
Managed dedicated DBs: a predictable monthly/usage fee, but pricey at scale plus egress/lock-in.
Self-hosted OSS (Qdrant/Milvus/Weaviate/Chroma): zero license fee; you pay in infrastructure + ops labor.

⑦ Migration cost / lock-in

Pinecone is proprietary (closed-source), and moving hundreds of millions of vectors is non-trivial = the largest exit cost. By contrast, pgvector (PostgreSQL License), Qdrant/Milvus/Chroma (Apache 2.0), and Weaviate (BSD-3) are OSS — self-hosting and migration are possible. Lock-in is an axis you should evaluate on day one.

3. Comparison table: pgvector and the major dedicated vector DBs

Each product's profile on one page. Read it as a difference in "design philosophy and arena," not in performance superiority.

Product	Deployment form	Main indexes	License	Standout strength	Main trade-off	Sweet-spot scale / target
pgvector	Postgres extension (anywhere)	HNSW / IVFFlat	PostgreSQL License	Consolidates into existing Postgres; same transaction as business data; plain SQL	Inferior to dedicated DBs at very large scale / high-concurrency SLA	Millions to tens of millions (+scale to ~100M). Postgres ops teams
Pinecone	Managed dedicated (cloud only)	Proprietary serverless (storage/compute separation)	Proprietary (non-OSS)	Zero-ops; scales to billions serverlessly	Largest lock-in; no self-hosting; internals undisclosed	Want zero ops; large scale; SaaS acceptable
Qdrant	OSS self-host + Cloud	Proprietary HNSW (filterable, written in Rust)	Apache 2.0	Best-in-class filtered ANN; high efficiency; no license traps	Horizontal scaling at large scale is ops work	Filter-heavy RAG/search/recommendation. OSS-minded
Weaviate	OSS self-host + Cloud	HNSW / flat / dynamic	BSD-3-Clause	Integrated object + vector; built-in embedding/hybrid-search modules	Heavier and more opinionated than pure ANN	Teams who want an integrated "AI-native DB" all in one
Milvus	OSS self-host (Lite/standalone/distributed) + Zilliz Cloud	The most (HNSW/IVF/DiskANN/SCANN/GPU)	Apache 2.0	Largest scale; widest index choices; GPU support	Distributed mode is heavy to operate (K8s, many components)	Hundreds of millions to billions, true large scale. Ops-mature teams
Chroma	Embedded/standalone/distributed + Cloud	HNSW-based	Apache 2.0	Fastest to prototype (`import chromadb` and go)	Track record at very large scale is newer	PoC, small-to-mid RAG, local development

The "deployment form," "license," and "indexes" columns are facts sourced from each vendor's official material. "Strength/trade-off" is a summary of design philosophy; confirm the superiority on your workload by measuring.

4. pgvector's "scale ceiling" is not fixed

The assumption that "pgvector is for small scale only" is outdated in 2026. The ceiling is pushed up by the Postgres ecosystem's extensions.

pgvectorscale (Timescale / TigerData)

It adds the following to plain pgvector (OSS under the PostgreSQL License).

StreamingDiskANN index: derived from Microsoft's DiskANN. Because it puts part of the index on disk, it scales more cost-effectively as the number of vectors grows than HNSW's fully in-memory approach.
Label-based / streaming filtered search: filtering that keeps fetching until enough results are gathered, avoiding the "not enough results once filtered" problem.
Statistical Binary Quantization: an improved version of standard BQ.

Read performance as a vendor claim: Timescale publishes figures like "50 million Cohere embeddings, 99% recall, with 28× lower p95 latency, 16× higher throughput, and 75% lower cost vs Pinecone's storage-optimized (s1) (self-hosted on EC2)," but this is a first-party comparison on their own conditions and a single dataset. Take it only in the form "Timescale reports this," and always verify on your own data.

VectorChord and others

VectorChord (TensorChord, the successor to pgvecto.rs) is a Postgres extension that aims for "disk-friendly, low-cost large scale" with IVF + RaBitQ quantization. It makes claims like "index builds 100× faster than pgvector," but treat this too as a vendor claim.

The point: choosing pgvector does not mean "being bound to small scale forever." Start with plain pgvector → move to pgvectorscale / VectorChord when scale arrives, all while staying inside Postgres — this is the reassurance on the scaling side.

5. Honestly: cases where pgvector is "not a fit"

I recommend consolidating into pgvector for many projects, but it's not a silver bullet. In the following situations, I consider a dedicated DB head-on.

Hundreds of millions to billions of vectors × sub-millisecond SLA × high concurrency: the arena of Milvus (distributed) / Pinecone (serverless), where horizontal sharding is a first-class feature.
GPU inference or specialized indexes (PQ/SCANN/DiskANN tuning) are essential: Milvus's breadth of indexes pays off.
You don't run Postgres in the first place: the consolidation benefit (unified ops) doesn't materialize. Adding Postgres just for vectors is backwards, and in that case Qdrant (OSS, strong filtering) or Pinecone (zero-ops) is more natural.
High-cardinality, complex filters are at the heart of performance: weigh Qdrant or pgvectorscale heavily.

The most honest thing in tech selection is to look squarely at "where your product is heading." Splitting today across two data stores for billions that never come, or pretending not to see a large scale that is clearly coming — both are failures.

6. Summary: a selection cheat sheet

How to frame the question: not "which is the most powerful" but "which fits my constraints." The axes are operational load, consistency, scale ceiling, latency, filtering, cost, and lock-in.
Already on Postgres + up to tens of millions of rows + consistency matters → pgvector (zero added infra, same transaction). Most B2B SaaS is here.
Hundreds of millions to billions + high-concurrency SLA + GPU/special index → Milvus / Pinecone.
Filtered search is the star, OSS-minded → Qdrant. Integrated AI-native DB → Weaviate. Fastest PoC → Chroma.
pgvector's scale can be pushed up with pgvectorscale (StreamingDiskANN). Performance numbers are vendor claims = measure on your own data.
Lock-in: Pinecone (proprietary) is the largest; the various OSS options have low exit costs. Evaluate on day one.

In the generative-AI voice chatbot, I made the call to consolidate business data and embeddings into PostgreSQL + pgvector rather than add a dedicated vector DB. That's because I prioritized the operational simplicity of handling semantic search over product documents (PDF/Excel/image/video) in the same DB and same transaction as the business data. On the other hand, had the requirement been billions of vectors or an independent low-latency SLA, I'd propose a dedicated DB without hesitation — because selection depends on requirements.

"Is pgvector enough, or should you bring in a dedicated vector DB?" — let's determine that first move together, from your scale, consistency requirements, budget, and team setup. Feel free to reach out even at the requirements-gathering stage. When you move into implementation, start with getting started with pgvector; for serious RAG, production RAG design; for speed/cost optimization, the complete tuning guide.

References (official / primary sources)

pgvector (GitHub) (PostgreSQL License, HNSW/IVFFlat) / pgvectorscale (Timescale) (StreamingDiskANN, label filtering; performance is a vendor claim)
Pinecone / Qdrant (GitHub, Apache 2.0) / Weaviate (GitHub, BSD-3) / Milvus (GitHub, Apache 2.0) / Chroma (GitHub, Apache 2.0)
Comparative performance/cost claims are based on each vendor's published figures (e.g. Timescale's pgvector vs Pinecone article); note these are not neutral benchmarks.

pgvector vs dedicated vector DBs (Pinecone / Qdrant / Weaviate / Milvus): an in-depth comparison and tech-selection guide

1. The bottom line first: a decision flowchart

2. The seven decision axes for tech selection

① Operational simplicity

② Transactional consistency with business data

③ Scale ceiling

④ Latency SLA

⑤ Metadata filtering capability

⑥ Cost model

⑦ Migration cost / lock-in

3. Comparison table: pgvector and the major dedicated vector DBs

4. pgvector's "scale ceiling" is not fixed

pgvectorscale (Timescale / TigerData)

VectorChord and others

5. Honestly: cases where pgvector is "not a fit"

6. Summary: a selection cheat sheet

References (official / primary sources)

Building Production LLM Apps with Vercel AI SDK v6: Streaming, Tool Calling, Structured Output, and RAG in Real Code

Getting started with pgvector: from installation to your first vector search (Docker, Supabase, AWS RDS/Aurora, Neon, Cloud SQL, Azure)

The Complete Guide to pgvector Tuning: Optimizing HNSW/IVFFlat Recall × Latency, and Quantization (halfvec, Binary Quantization) for Fast, Cheap, and Accurate

The reliability of structured output: why constrained decoding still doesn't give you 'correct output,' and production design

Also worth reading

Vercel storage implementation guide: choosing Blob, Edge Config, and Marketplace (Neon/Upstash) correctly by use

Echo × database production design: choosing pgx / sqlc / GORM, connection pools, transaction boundaries, and context propagation

The complete Echo production-deployment guide: zero-downtime operation with multi-stage Docker, distroless, server timeouts, and graceful shutdown

1. The bottom line first: a decision flowchart

2. The seven decision axes for tech selection

① Operational simplicity

② Transactional consistency with business data

③ Scale ceiling

④ Latency SLA

⑤ Metadata filtering capability

⑥ Cost model

⑦ Migration cost / lock-in

3. Comparison table: pgvector and the major dedicated vector DBs

4. pgvector's "scale ceiling" is not fixed

pgvectorscale (Timescale / TigerData)

VectorChord and others

5. Honestly: cases where pgvector is "not a fit"

6. Summary: a selection cheat sheet

References (official / primary sources)

Related articles

Building Production LLM Apps with Vercel AI SDK v6: Streaming, Tool Calling, Structured Output, and RAG in Real Code

Getting started with pgvector: from installation to your first vector search (Docker, Supabase, AWS RDS/Aurora, Neon, Cloud SQL, Azure)

The Complete Guide to pgvector Tuning: Optimizing HNSW/IVFFlat Recall × Latency, and Quantization (halfvec, Binary Quantization) for Fast, Cheap, and Accurate

The reliability of structured output: why constrained decoding still doesn't give you 'correct output,' and production design

Also worth reading

Vercel storage implementation guide: choosing Blob, Edge Config, and Marketplace (Neon/Upstash) correctly by use

Echo × database production design: choosing pgx / sqlc / GORM, connection pools, transaction boundaries, and context propagation

The complete Echo production-deployment guide: zero-downtime operation with multi-stage Docker, distroless, server timeouts, and graceful shutdown