Skip to main content
友田 陽大
Generative AI, LLMs & RAG
PostgreSQL
RAG
Pinecone
アーキテクチャ設計
コスト最適化

pgvector vs dedicated vector DBs (Pinecone / Qdrant / Weaviate / Milvus): an in-depth comparison and tech-selection guide

Which vector-search foundation should you pick? This compares pgvector (a PostgreSQL extension) against dedicated vector DBs (Pinecone, Qdrant, Weaviate, Milvus, Chroma) across seven axes — operational load, transactional consistency, scale ceiling, latency, metadata filtering, cost, and lock-in. Including a scaling strategy with pgvectorscale (StreamingDiskANN), it's a tech-selection guide to support the decisions of buyers and architects.

Published
Reading time
10 min read
Author
友田 陽大
Share

When you start building RAG or semantic search, the first big tech-selection question is "which vector-search foundation should I pick?" Do you get by with pgvector (a PostgreSQL extension), or do you bring in a dedicated vector DB like Pinecone, Qdrant, Weaviate, or Milvus?

This article is a tech-selection guide for making that call with axes, not gut feel. The bottom line up front: the question "which is the most powerful" does not exist. All that exists is "which fits your constraints (scale, consistency requirements, ops capacity, budget)." This piece organizes each product's characteristics and the decision axes, grounded in official sources, so buyers and architects can make this decision without regret.

Rules for this article: pgvector's specs are based on its official documentation. Each dedicated DB's characteristics are based on each vendor's official sources, but all performance numbers (QPS, latency, cost) are treated as "vendor claims" and are not presented as neutral benchmarks (performance swings wildly with data, dimensionality, recall target, and hardware). Product specs change fast, so always confirm against the latest official sources before adopting.


1. The bottom line first: a decision flowchart

Before the fine-grained comparison, 80% of projects are decided by these few questions.

Q1. Do you already run PostgreSQL?
   ├─ No  → Adding Postgres "just for vectors" is backwards.
   │        For a prototype consider Chroma; for production, Qdrant / Pinecone.
   └─ Yes ↓

Q2. Do embeddings need to stay consistent with business data (users, orders, documents)?
   │   (= when you delete a document, its vector should be deleted at the same time, etc.)
   ├─ Yes → pgvector is extremely favorable (same transaction, same backup).
   └─ Either way ↓

Q3. How large will the vectors be for the foreseeable future?
   ├─ Millions to tens of millions of rows  → pgvector can compete fine. First candidate.
   ├─ ~50 million to 100M+                   → consider pgvector + pgvectorscale (StreamingDiskANN).
   └─ Hundreds of millions to billions, high concurrency → a dedicated DB
       (Milvus distributed / Pinecone serverless) has the edge.

Q4. Is there a strict latency SLA at high-concurrency QPS, "independent of the app DB"?
   ├─ Yes → lean to a dedicated DB (in-memory HNSW / Pinecone DRN / Qdrant).
   └─ No  → unlikely to be a problem with pgvector.

The shortest conclusion: "you already have Postgres, you're up to tens of millions of rows, and consistency matters" — for this, the majority case for B2B SaaS and internal tools, pgvector is the first candidate. Because you can start without adding a single new piece of infrastructure (KISS / YAGNI).


2. The seven decision axes for tech selection

"Which fits" comes into view once you evaluate the following axes with your own project's weighting.

① Operational simplicity

If you already run Postgres, pgvector adds zero new infrastructure, monitoring, backups, or on-call surface. That's the biggest lever. A dedicated DB (self-hosted) means committing to operating "one more system." Managed offerings (Pinecone / Zilliz / Qdrant Cloud, etc.) carry the ops for you, but add a vendor and a monthly bill.

② Transactional consistency with business data

This is pgvector's decisive strength. You can update embeddings and business rows in the same ACID transaction, JOIN in SQL, and keep backups consistent. A dedicated DB is a separate system, so you need a sync pipeline between business DB ↔ vector DB, and that's the breeding ground for staleness incidents where "information you thought you deleted still shows up in search." If consistency is a requirement, pgvector has an overwhelming edge.

③ Scale ceiling

This is a direction, not a hard threshold (it's workload-dependent).

  • pgvector / + pgvectorscale: millions to tens of millions with room to spare. With pgvectorscale's disk-resident index and tuning, some cases push to 50 million to over 100 million.
  • Dedicated DBs (Milvus distributed, Pinecone serverless, Qdrant clusters): handling hundreds of millions to billions via horizontal sharding is their core job. Here, dedicated DBs are clearly ahead.

④ Latency SLA

If you must hold tight p95/p99 at high-concurrency QPS, and do so independently of the app DB's load, an in-memory HNSW dedicated engine (Pinecone DRN / Qdrant, etc.) has the edge. pgvector is plenty competitive at mid scale, but you should be conscious that the same instance also handles OLTP.

⑤ Metadata filtering capability

Here differences emerge. How fast can you do "semantic search over only this tenant's, only published documents"? Qdrant (filterable HNSW) and pgvectorscale (Filtered/StreamingDiskANN) apply filters during the ANN search — a pre/streaming filter — to avoid the "not enough results once filtered" problem. Plain pgvector has the most expressive power through arbitrary SQL WHERE and JOINs, but high-selectivity filters depend on index design. If complex, high-cardinality filters are central, weigh Qdrant / pgvectorscale heavily.

⑥ Cost model

  • pgvector: the marginal cost inside your existing Postgres (no new invoice; you pay in RAM/CPU/disk and DBA effort).
  • Managed dedicated DBs: a predictable monthly/usage fee, but pricey at scale plus egress/lock-in.
  • Self-hosted OSS (Qdrant/Milvus/Weaviate/Chroma): zero license fee; you pay in infrastructure + ops labor.

⑦ Migration cost / lock-in

Pinecone is proprietary (closed-source), and moving hundreds of millions of vectors is non-trivial = the largest exit cost. By contrast, pgvector (PostgreSQL License), Qdrant/Milvus/Chroma (Apache 2.0), and Weaviate (BSD-3) are OSS — self-hosting and migration are possible. Lock-in is an axis you should evaluate on day one.


3. Comparison table: pgvector and the major dedicated vector DBs

Each product's profile on one page. Read it as a difference in "design philosophy and arena," not in performance superiority.

ProductDeployment formMain indexesLicenseStandout strengthMain trade-offSweet-spot scale / target
pgvectorPostgres extension (anywhere)HNSW / IVFFlatPostgreSQL LicenseConsolidates into existing Postgres; same transaction as business data; plain SQLInferior to dedicated DBs at very large scale / high-concurrency SLAMillions to tens of millions (+scale to ~100M). Postgres ops teams
PineconeManaged dedicated (cloud only)Proprietary serverless (storage/compute separation)Proprietary (non-OSS)Zero-ops; scales to billions serverlesslyLargest lock-in; no self-hosting; internals undisclosedWant zero ops; large scale; SaaS acceptable
QdrantOSS self-host + CloudProprietary HNSW (filterable, written in Rust)Apache 2.0Best-in-class filtered ANN; high efficiency; no license trapsHorizontal scaling at large scale is ops workFilter-heavy RAG/search/recommendation. OSS-minded
WeaviateOSS self-host + CloudHNSW / flat / dynamicBSD-3-ClauseIntegrated object + vector; built-in embedding/hybrid-search modulesHeavier and more opinionated than pure ANNTeams who want an integrated "AI-native DB" all in one
MilvusOSS self-host (Lite/standalone/distributed) + Zilliz CloudThe most (HNSW/IVF/DiskANN/SCANN/GPU)Apache 2.0Largest scale; widest index choices; GPU supportDistributed mode is heavy to operate (K8s, many components)Hundreds of millions to billions, true large scale. Ops-mature teams
ChromaEmbedded/standalone/distributed + CloudHNSW-basedApache 2.0Fastest to prototype (import chromadb and go)Track record at very large scale is newerPoC, small-to-mid RAG, local development

The "deployment form," "license," and "indexes" columns are facts sourced from each vendor's official material. "Strength/trade-off" is a summary of design philosophy; confirm the superiority on your workload by measuring.


4. pgvector's "scale ceiling" is not fixed

The assumption that "pgvector is for small scale only" is outdated in 2026. The ceiling is pushed up by the Postgres ecosystem's extensions.

pgvectorscale (Timescale / TigerData)

It adds the following to plain pgvector (OSS under the PostgreSQL License).

  • StreamingDiskANN index: derived from Microsoft's DiskANN. Because it puts part of the index on disk, it scales more cost-effectively as the number of vectors grows than HNSW's fully in-memory approach.
  • Label-based / streaming filtered search: filtering that keeps fetching until enough results are gathered, avoiding the "not enough results once filtered" problem.
  • Statistical Binary Quantization: an improved version of standard BQ.

Read performance as a vendor claim: Timescale publishes figures like "50 million Cohere embeddings, 99% recall, with 28× lower p95 latency, 16× higher throughput, and 75% lower cost vs Pinecone's storage-optimized (s1) (self-hosted on EC2)," but this is a first-party comparison on their own conditions and a single dataset. Take it only in the form "Timescale reports this," and always verify on your own data.

VectorChord and others

VectorChord (TensorChord, the successor to pgvecto.rs) is a Postgres extension that aims for "disk-friendly, low-cost large scale" with IVF + RaBitQ quantization. It makes claims like "index builds 100× faster than pgvector," but treat this too as a vendor claim.

The point: choosing pgvector does not mean "being bound to small scale forever." Start with plain pgvector → move to pgvectorscale / VectorChord when scale arrives, all while staying inside Postgres — this is the reassurance on the scaling side.


5. Honestly: cases where pgvector is "not a fit"

I recommend consolidating into pgvector for many projects, but it's not a silver bullet. In the following situations, I consider a dedicated DB head-on.

  • Hundreds of millions to billions of vectors × sub-millisecond SLA × high concurrency: the arena of Milvus (distributed) / Pinecone (serverless), where horizontal sharding is a first-class feature.
  • GPU inference or specialized indexes (PQ/SCANN/DiskANN tuning) are essential: Milvus's breadth of indexes pays off.
  • You don't run Postgres in the first place: the consolidation benefit (unified ops) doesn't materialize. Adding Postgres just for vectors is backwards, and in that case Qdrant (OSS, strong filtering) or Pinecone (zero-ops) is more natural.
  • High-cardinality, complex filters are at the heart of performance: weigh Qdrant or pgvectorscale heavily.

The most honest thing in tech selection is to look squarely at "where your product is heading." Splitting today across two data stores for billions that never come, or pretending not to see a large scale that is clearly coming — both are failures.


6. Summary: a selection cheat sheet

  • How to frame the question: not "which is the most powerful" but "which fits my constraints." The axes are operational load, consistency, scale ceiling, latency, filtering, cost, and lock-in.
  • Already on Postgres + up to tens of millions of rows + consistency matterspgvector (zero added infra, same transaction). Most B2B SaaS is here.
  • Hundreds of millions to billions + high-concurrency SLA + GPU/special indexMilvus / Pinecone.
  • Filtered search is the star, OSS-mindedQdrant. Integrated AI-native DBWeaviate. Fastest PoCChroma.
  • pgvector's scale can be pushed up with pgvectorscale (StreamingDiskANN). Performance numbers are vendor claims = measure on your own data.
  • Lock-in: Pinecone (proprietary) is the largest; the various OSS options have low exit costs. Evaluate on day one.

In the generative-AI voice chatbot, I made the call to consolidate business data and embeddings into PostgreSQL + pgvector rather than add a dedicated vector DB. That's because I prioritized the operational simplicity of handling semantic search over product documents (PDF/Excel/image/video) in the same DB and same transaction as the business data. On the other hand, had the requirement been billions of vectors or an independent low-latency SLA, I'd propose a dedicated DB without hesitation — because selection depends on requirements.

"Is pgvector enough, or should you bring in a dedicated vector DB?" — let's determine that first move together, from your scale, consistency requirements, budget, and team setup. Feel free to reach out even at the requirements-gathering stage. When you move into implementation, start with getting started with pgvector; for serious RAG, production RAG design; for speed/cost optimization, the complete tuning guide.


References (official / primary sources)

友田

友田 陽大

Developer of a METI Minister's Award–winning product. With TypeScript + Python + AWS, I deliver SaaS, industry DX, and production-grade generative AI (RAG) end to end — from requirements to infrastructure and operations — single-handedly.

Got a challenge?

From design to implementation and operations — solo × generative AI

Implementation like this article's, end to end from requirements to production. Start with a free 30-minute technical consult and tell me about your situation.

Available for both project-based (contract) and advisory engagements. Start with a free 30-minute consult.

Also worth reading