The Complete Guide to Generating PDFs Programmatically in 2026
If you need to generate PDFs from your application — whether it's invoices, contracts, reports, or certificates — you're facing one of the most surprisingly complex problems in web development. The PDF specification itself is over 1,000 pages long. The ecosystem of tools is fragmented. And every approach comes with trade-offs that aren't obvious until you've already committed.
This guide is a comprehensive, honest look at every major approach to programmatic PDF generation. We'll compare low-level libraries, HTML-to-PDF converters, headless browsers, and dedicated APIs — with real code, real benchmarks, and a clear framework for deciding which approach fits your use case.
Understanding the PDF Format
Before diving into tools, it helps to understand what a PDF actually is. Unlike HTML, which describes content structure and lets the renderer decide layout, PDF describes exact visual placement. Every character, every line, every image has precise coordinates on the page.
This fundamental difference is why converting HTML to PDF is so hard. HTML says "put this paragraph after that heading." PDF says "draw the letter 'H' at coordinates (72, 740) in 12pt Helvetica."
A PDF file consists of four main sections:
- Header — declares the PDF version
- Body — contains objects (pages, fonts, images, text streams)
- Cross-reference table — an index of all objects for random access
- Trailer — points to the cross-reference table and the root object
Even a "simple" PDF with just the text "Hello World" requires roughly 30 lines of raw PDF code, including font dictionaries, page trees, and content streams. This is why nobody writes raw PDF — you always use a library or tool.
Approach 1: Low-Level PDF Libraries
These libraries give you a drawing canvas. You position every element manually. There's no concept of "flowing text" or "CSS" — you're placing things at exact coordinates.
TCPDF / FPDF (PHP)
TCPDF and FPDF are the veterans of PHP PDF generation. FPDF is minimal (under 100KB), while TCPDF adds Unicode support, barcodes, and more.
// FPDF example
$pdf = new FPDF();
$pdf->AddPage();
$pdf->SetFont('Arial', 'B', 16);
$pdf->Cell(40, 10, 'Hello World');
$pdf->Output('F', 'hello.pdf');
// TCPDF with Unicode
$pdf = new TCPDF('P', 'mm', 'A4');
$pdf->AddPage();
$pdf->SetFont('dejavusans', '', 12);
$pdf->Cell(0, 10, 'こんにちは世界', 0, 1);
$pdf->Output('hello.pdf', 'F');
Strengths:
- Zero external dependencies — pure PHP
- Extremely fast (< 50ms for simple documents)
- Tiny memory footprint
- Full control over every pixel
Weaknesses:
- Manual positioning is tedious for complex layouts
- No CSS support at all
- Tables require manual column width calculations
- Maintaining templates is painful — every layout change means recalculating coordinates
- Limited font support in FPDF (TCPDF is better)
Best for: Simple, structured documents where the layout rarely changes — receipts, shipping labels, simple invoices.
PDFLib (C with PHP/Java/.NET bindings)
PDFLib is the industrial-strength option. It's a C library with bindings for most languages, used by enterprises that need maximum performance and compliance (PDF/A, PDF/X, PDF/UA).
$pdf = new PDFlib();
$pdf->begin_document("", "");
$pdf->begin_page_ext(595, 842, "");
$font = $pdf->load_font("Helvetica", "unicode", "");
$pdf->setfont($font, 12);
$pdf->fit_textline("Hello World", 50, 700, "");
$pdf->end_page_ext("");
$pdf->end_document("");
Strengths:
- Blazing fast — can generate thousands of pages per second
- Full PDF standard compliance (PDF/A-1b through PDF/A-3, PDF/X, PDF/UA)
- Outstanding typography (kerning, ligatures, OpenType features)
- Table and text flow objects for semi-automatic layout
- Excellent documentation
Weaknesses:
- Commercial license ($$$) — the free "lite" version has limitations
- Steep learning curve
- Still coordinate-based, just with better abstractions
Best for: High-volume enterprise scenarios — financial documents, pharmaceutical labeling, publishing.
go-fpdf / gofpdf (Go)
For Go developers, go-fpdf provides FPDF-like functionality:
pdf := gofpdf.New("P", "mm", "A4", "")
pdf.AddPage()
pdf.SetFont("Arial", "B", 16)
pdf.Cell(40, 10, "Hello World")
pdf.OutputFileAndClose("hello.pdf")
Strengths:
- Native Go, compiles to a single binary
- Good performance characteristics from Go's runtime
- Familiar API if you've used FPDF
Weaknesses:
- Same manual positioning limitations as FPDF
- Smaller ecosystem of extensions
ReportLab (Python)
ReportLab is the go-to for Python. It has both a low-level canvas API and a higher-level "Platypus" layout engine.
from reportlab.lib.pagesizes import A4
from reportlab.pdfgen import canvas
c = canvas.Canvas("hello.pdf", pagesize=A4)
c.setFont("Helvetica", 12)
c.drawString(72, 700, "Hello World")
c.save()
The Platypus engine adds flowing content:
from reportlab.platypus import SimpleDocTemplate, Paragraph
from reportlab.lib.styles import getSampleStyleSheet
doc = SimpleDocTemplate("report.pdf")
styles = getSampleStyleSheet()
story = [
Paragraph("Chapter 1: Introduction", styles['Heading1']),
Paragraph("This is a flowing paragraph that wraps automatically...", styles['Normal']),
]
doc.build(story)
Strengths:
- Platypus handles page breaks, flowing text, and tables automatically
- Good documentation and active community
- Supports charts via the
graphicsmodule - Open source (BSD license)
Weaknesses:
- Python's speed can be a bottleneck at high volumes
- Platypus has a learning curve
- Complex layouts still require significant code
Best for: Data-heavy reports, scientific documents, Python-based applications.
PDFKit (Node.js)
PDFKit is a pure JavaScript library for creating PDF documents:
const PDFDocument = require('pdfkit');
const fs = require('fs');
const doc = new PDFDocument();
doc.pipe(fs.createWriteStream('output.pdf'));
doc.fontSize(16).text('Hello World', 100, 100);
doc.end();
Strengths:
- Pure Node.js, no native dependencies
- Streaming API — can pipe directly to HTTP responses
- Good image and vector graphics support
Weaknesses:
- Manual layout like all low-level libraries
- Font embedding can be tricky
When to Choose Low-Level Libraries
Pick a low-level library when:
- Your document structure is simple and predictable
- Performance is critical (thousands of PDFs per minute)
- You don't want any external dependencies or services
- The layout rarely changes
- You have developers who will maintain the template code
Avoid them when your templates need to be editable by non-developers, or when the layout is complex and changes frequently.
Approach 2: HTML-to-PDF Converters
This is where most developers end up. The idea is compelling: write your template in HTML/CSS (which you already know), and convert it to PDF.
wkhtmltopdf
For years, wkhtmltopdf was the default choice. It uses an old WebKit rendering engine to convert HTML to PDF:
wkhtmltopdf https://example.com output.pdf
// In PHP with snappy
$snappy = new Pdf('/usr/local/bin/wkhtmltopdf');
$snappy->generateFromHtml('<h1>Hello</h1>', 'output.pdf');
The harsh truth about wkhtmltopdf: The project is effectively dead. It was archived in 2023, uses a 2012-era WebKit engine, and doesn't support modern CSS features like Flexbox, Grid, or CSS variables. Many Linux distributions have dropped it from their package managers.
If you're starting a new project, don't use wkhtmltopdf. If you're already using it, plan a migration.
What it doesn't support:
- CSS Flexbox and Grid
- CSS custom properties (variables)
- Modern JavaScript (ES6+)
- SVG
<foreignObject> @media printqueries (partially)
WeasyPrint (Python)
WeasyPrint is an underrated gem. It's a Python library purpose-built for converting HTML/CSS to PDF, with excellent CSS support including CSS Paged Media (the W3C standard for print layouts):
from weasyprint import HTML
HTML('https://example.com').write_pdf('output.pdf')
# Or from a string
HTML(string='<h1>Hello World</h1>').write_pdf('output.pdf')
# With CSS Paged Media for headers/footers
html = """
<style>
@page {
size: A4;
margin: 2cm;
@top-center { content: "My Report"; }
@bottom-right { content: "Page " counter(page) " of " counter(pages); }
}
h1 { break-before: page; }
</style>
<h1>Chapter 1</h1>
<p>Content...</p>
"""
HTML(string=html).write_pdf('report.pdf')
Strengths:
- Excellent CSS support (Flexbox, multi-column, CSS Paged Media)
- CSS
@pagerules for margins, headers, footers, page numbers - Proper
break-before,break-after,break-insidehandling - Pure Python — no headless browser needed
- Active development and responsive maintainers
- Much lighter than Puppeteer/Playwright
Weaknesses:
- No JavaScript execution (it's a CSS renderer, not a browser)
- No CSS Grid support (as of 2026)
- Slower than low-level libraries (but faster than headless browsers)
- Requires system dependencies (cairo, pango, gdk-pixbuf)
Best for: Python applications that need good CSS support without the overhead of a full browser. Great for reports, invoices, and any document where CSS Paged Media features are useful.
Prince (Commercial)
Prince is the gold standard for HTML-to-PDF conversion. If you need pixel-perfect output with full CSS support, Prince is unmatched:
prince input.html -o output.pdf
Strengths:
- Best CSS support of any HTML-to-PDF converter
- CSS Grid, Flexbox, multi-column, CSS Paged Media — all work
- Outstanding typography (kerning, ligatures, hyphenation)
- Can generate PDF/A for archival
- Headers, footers, page numbers, cross-references via CSS
- Very fast
Weaknesses:
- Expensive license ($3,800+ for a server license)
- No JavaScript execution
- Proprietary
Best for: Publishing, legal documents, government agencies — anywhere the license cost is justified by the output quality.
Approach 3: Headless Browsers
The nuclear option: launch an actual browser, render your HTML, and "print" it to PDF. This guarantees pixel-perfect fidelity because you're using the same engine that renders web pages.
Puppeteer (Chrome/Chromium)
Puppeteer controls Chrome via the DevTools Protocol:
const puppeteer = require('puppeteer');
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto('https://example.com', { waitUntil: 'networkidle0' });
await page.pdf({
path: 'output.pdf',
format: 'A4',
margin: { top: '1cm', right: '1cm', bottom: '1cm', left: '1cm' },
printBackground: true,
});
await browser.close();
Playwright (Chrome, Firefox, WebKit)
Playwright supports multiple browsers and has a cleaner API:
const { chromium } = require('playwright');
const browser = await chromium.launch();
const page = await browser.newPage();
await page.goto('https://example.com');
await page.pdf({
path: 'output.pdf',
format: 'A4',
margin: { top: '1cm', right: '1cm', bottom: '1cm', left: '1cm' },
});
await browser.close();
# Playwright with Python
from playwright.sync_api import sync_playwright
with sync_playwright() as p:
browser = p.chromium.launch()
page = browser.new_page()
page.goto("https://example.com")
page.pdf(path="output.pdf", format="A4")
browser.close()
Gotenberg (Docker-based Service)
Gotenberg wraps Chromium and LibreOffice in a Docker container with a REST API:
curl --request POST \
--url http://localhost:3000/forms/chromium/convert/url \
--form url=https://example.com \
--form marginTop=1 \
--form marginBottom=1 \
-o output.pdf
Strengths of headless browsers:
- 100% CSS fidelity — if it renders in Chrome, the PDF will look the same
- JavaScript execution — dynamic charts, client-side rendering
- Can screenshot or PDF any existing web page
- Familiar HTML/CSS/JS development experience
Weaknesses of headless browsers:
- Resource-heavy: Chromium uses 200-500MB RAM per instance
- Slow: 1-5 seconds per PDF (vs. milliseconds for low-level libraries)
- Operational complexity: managing browser processes, handling crashes, memory leaks
- Limited print control: Chrome's
@pageCSS support is basic — no running headers/footers via CSS (you have to use Chrome's built-in header/footer option, which is ugly) - Concurrent requests are hard: you need browser pools, process isolation, and crash recovery
- Not ideal for high-volume: generating 10,000 PDFs/hour requires significant infrastructure
The Hidden Costs of Headless Browsers
Most tutorials make Puppeteer/Playwright look easy. And for a single PDF, they are. But in production, you'll encounter:
- Memory leaks: Long-running browser instances accumulate memory. You need to restart them periodically.
- Zombie processes: If your application crashes mid-generation, you'll have orphaned Chromium processes eating your RAM.
- Font rendering differences: Chromium renders fonts differently on Linux vs. macOS vs. Windows. Your local PDF might not match your server's PDF.
- Timeout management: pages with slow-loading assets can hang. You need aggressive timeouts and retry logic.
- Docker image size: A Chromium Docker image is 1-2GB. This affects deployment times and costs.
- Security: running a browser engine that processes arbitrary HTML introduces XSS-like attack vectors. Sandboxing is critical.
Here's what production-ready Puppeteer code actually looks like:
const puppeteer = require('puppeteer');
// Browser pool management
let browser;
let requestCount = 0;
const MAX_REQUESTS_PER_BROWSER = 100;
async function getBrowser() {
if (!browser || requestCount >= MAX_REQUESTS_PER_BROWSER) {
if (browser) {
await browser.close().catch(() => {});
}
browser = await puppeteer.launch({
headless: 'new',
args: [
'--no-sandbox',
'--disable-setuid-sandbox',
'--disable-dev-shm-usage',
'--disable-gpu',
'--single-process',
],
});
requestCount = 0;
}
requestCount++;
return browser;
}
async function generatePdf(html, options = {}) {
const browser = await getBrowser();
const page = await browser.newPage();
try {
// Set a timeout to avoid hanging
page.setDefaultTimeout(30000);
// Block unnecessary resources to speed up rendering
await page.setRequestInterception(true);
page.on('request', (req) => {
if (['image', 'stylesheet', 'font'].includes(req.resourceType())) {
req.continue();
} else if (req.resourceType() === 'script') {
req.continue(); // Allow scripts if needed
} else {
req.abort();
}
});
await page.setContent(html, { waitUntil: 'networkidle0' });
const pdfBuffer = await page.pdf({
format: 'A4',
margin: { top: '1cm', right: '1cm', bottom: '1cm', left: '1cm' },
printBackground: true,
...options,
});
return pdfBuffer;
} finally {
await page.close().catch(() => {});
}
}
That's a lot more code than the 5-line tutorial examples.
Approach 4: Template-Based API Services
API services abstract away the complexity. You design a template (either visually or in HTML/CSS), send your data, and receive a PDF. Services like PDF-API.io, DocSpring, Anvil, and PSPDFKit each take a slightly different approach — some focus on visual template editors, others on HTML-to-PDF via API, and some offer both.
The typical integration looks like this:
curl -X POST https://api.pdf-api.io/v1/generate \
-H "Authorization: Bearer YOUR_KEY" \
-H "Content-Type: application/json" \
-d '{ "template_id": "inv_abc123", "data": { "client": "Acme Corp" } }' \
-o invoice.pdf
What makes this approach compelling is that the complexity we discussed earlier — browser pools, font embedding, memory management, zombie processes — becomes someone else's problem.
Strengths:
- No infrastructure to manage
- Visual template editors for non-developers
- Consistent output regardless of your server's OS/fonts
- Handles concurrency, scaling, and error recovery for you
- Usually faster than self-hosted headless browsers
Weaknesses:
- Ongoing cost per PDF (though often cheaper than the engineering time to self-host)
- Your data leaves your infrastructure (unless the API offers on-premise)
- Vendor lock-in on the template format
- Dependent on the API's uptime
Best for: Teams that want to focus on their core product rather than PDF infrastructure. Especially valuable when non-developers need to create/edit templates.
Performance Comparison
Here are ballpark numbers for generating a typical one-page invoice on modern hardware:
| Approach | Time per PDF | Memory | Concurrency |
|---|---|---|---|
| FPDF / go-fpdf | 5-20ms | ~5MB | Excellent |
| TCPDF | 20-80ms | ~15MB | Very good |
| ReportLab | 30-100ms | ~20MB | Good |
| PDFKit (Node) | 20-60ms | ~30MB | Good |
| WeasyPrint | 200-500ms | ~50MB | Moderate |
| wkhtmltopdf | 500ms-2s | ~100MB | Poor |
| Puppeteer/Playwright | 1-5s | ~200MB+ | Poor (without pooling) |
| Prince | 50-200ms | ~30MB | Very good |
These numbers vary significantly based on document complexity, font count, image size, and whether you're reusing browser instances.
Decision Framework
Here's how to think about which approach to use:
Start with your constraints:
"We need maximum performance and minimal dependencies" → Low-level library (FPDF, PDFKit, ReportLab)
"Our templates change frequently and non-developers need to edit them" → Template-based API service
"We need pixel-perfect HTML/CSS rendering with JavaScript" → Headless browser (Puppeteer/Playwright)
"We need excellent CSS support without the browser overhead" → WeasyPrint or Prince
"We generate 100K+ PDFs per day" → Low-level library or Prince (or a distributed API service)
"We generate fewer than 1,000 PDFs per day and want simplicity" → API service or WeasyPrint
Consider your team:
- If your team is mostly backend developers comfortable with coordinate-based layouts → low-level library
- If your team is mostly full-stack developers comfortable with HTML/CSS → WeasyPrint, headless browser, or API
- If non-technical people need to create templates → API service with a visual editor
Consider your timeline:
- Need it working today: API service
- Can spend a week: WeasyPrint or headless browser
- Can spend a month: Low-level library with proper templating abstraction
Common Pitfalls
1. Font rendering inconsistencies
This is the #1 source of "it looks different in production" bugs. Always:
- Embed fonts explicitly rather than relying on system fonts
- Test on the same OS your production server runs
- Use web fonts (TTF/OTF/WOFF2) that you bundle with your application
2. Character encoding issues
Non-Latin characters (CJK, Arabic, Cyrillic) are a minefield:
- Ensure your library supports Unicode
- Some libraries require specific font files for CJK characters
- Test with real multilingual content early, not just ASCII
3. Page break handling
Automatic page breaks that cut through tables, images, or important content:
- Use
break-inside: avoidon elements that shouldn't split - Test with realistic data volumes — a 3-row table in development might be 300 rows in production
- Always test with edge cases: empty data, very long text, oversized images
4. Image resolution
Images that look fine on screen may be blurry when printed:
- Use at least 150 DPI for print, 300 DPI for high-quality documents
- SVG is ideal for logos, charts, and icons — it scales perfectly
- Compress images before embedding to keep file sizes reasonable
5. CSS @media print neglect
If you're using HTML-to-PDF conversion:
@media print {
.no-print { display: none; }
body { font-size: 11pt; }
a[href]::after { content: " (" attr(href) ")"; }
}
Most developers forget that PDF generation is essentially a print operation.
Hybrid Architectures
In practice, many applications use multiple approaches:
- Simple receipts → low-level library (fast, no dependencies)
- Complex reports with charts → headless browser (renders Chart.js/D3)
- Customer-editable templates → API service with visual editor
- Regulatory documents → Prince or PDFLib (compliance)
This isn't over-engineering — it's matching the tool to the job. A receipt that takes 5ms to generate shouldn't go through a headless browser, and a complex financial report shouldn't be built with manual coordinate positioning.
Conclusion
There's no single "best" way to generate PDFs. The right choice depends on your layout complexity, performance requirements, team skills, and budget. Start by understanding your constraints, prototype with two or three options, and measure what matters — generation speed, output quality, developer experience, and maintainability.
The PDF format isn't going away. Despite predictions about HTML replacing print, PDFs remain the universal format for documents that need to look the same everywhere. Whatever approach you choose, invest in a proper abstraction layer so you can swap implementations later without rewriting your entire template system.
Building a product that needs PDF generation? PDF-API.io provides a template-based API with a visual editor — so your team can design, iterate, and generate documents without managing PDF infrastructure. Try it free.