The Complete Guide to Generating PDFs Programmatically in 2026

If you need to generate PDFs from your application — whether it's invoices, contracts, reports, or certificates — you're facing one of the most surprisingly complex problems in web development. The PDF specification itself is over 1,000 pages long. The ecosystem of tools is fragmented. And every approach comes with trade-offs that aren't obvious until you've already committed.

This guide is a comprehensive, honest look at every major approach to programmatic PDF generation. We'll compare low-level libraries, HTML-to-PDF converters, headless browsers, and dedicated APIs — with real code, real benchmarks, and a clear framework for deciding which approach fits your use case.

Understanding the PDF Format

Before diving into tools, it helps to understand what a PDF actually is. Unlike HTML, which describes content structure and lets the renderer decide layout, PDF describes exact visual placement. Every character, every line, every image has precise coordinates on the page.

This fundamental difference is why converting HTML to PDF is so hard. HTML says "put this paragraph after that heading." PDF says "draw the letter 'H' at coordinates (72, 740) in 12pt Helvetica."

A PDF file consists of four main sections:

Header — declares the PDF version
Body — contains objects (pages, fonts, images, text streams)
Cross-reference table — an index of all objects for random access
Trailer — points to the cross-reference table and the root object

Even a "simple" PDF with just the text "Hello World" requires roughly 30 lines of raw PDF code, including font dictionaries, page trees, and content streams. This is why nobody writes raw PDF — you always use a library or tool.

Approach 1: Low-Level PDF Libraries

These libraries give you a drawing canvas. You position every element manually. There's no concept of "flowing text" or "CSS" — you're placing things at exact coordinates.

TCPDF / FPDF (PHP)

TCPDF and FPDF are the veterans of PHP PDF generation. FPDF is minimal (under 100KB), while TCPDF adds Unicode support, barcodes, and more.

// FPDF example
$pdf = new FPDF();
$pdf->AddPage();
$pdf->SetFont('Arial', 'B', 16);
$pdf->Cell(40, 10, 'Hello World');
$pdf->Output('F', 'hello.pdf');

// TCPDF with Unicode
$pdf = new TCPDF('P', 'mm', 'A4');
$pdf->AddPage();
$pdf->SetFont('dejavusans', '', 12);
$pdf->Cell(0, 10, 'こんにちは世界', 0, 1);
$pdf->Output('hello.pdf', 'F');

Strengths:

Zero external dependencies — pure PHP
Extremely fast (< 50ms for simple documents)
Tiny memory footprint
Full control over every pixel

Weaknesses:

Manual positioning is tedious for complex layouts
No CSS support at all
Tables require manual column width calculations
Maintaining templates is painful — every layout change means recalculating coordinates
Limited font support in FPDF (TCPDF is better)

Best for: Simple, structured documents where the layout rarely changes — receipts, shipping labels, simple invoices.

PDFLib (C with PHP/Java/.NET bindings)

PDFLib is the industrial-strength option. It's a C library with bindings for most languages, used by enterprises that need maximum performance and compliance (PDF/A, PDF/X, PDF/UA).

$pdf = new PDFlib();
$pdf->begin_document("", "");
$pdf->begin_page_ext(595, 842, "");
$font = $pdf->load_font("Helvetica", "unicode", "");
$pdf->setfont($font, 12);
$pdf->fit_textline("Hello World", 50, 700, "");
$pdf->end_page_ext("");
$pdf->end_document("");

Strengths:

Blazing fast — can generate thousands of pages per second
Full PDF standard compliance (PDF/A-1b through PDF/A-3, PDF/X, PDF/UA)
Outstanding typography (kerning, ligatures, OpenType features)
Table and text flow objects for semi-automatic layout
Excellent documentation

Weaknesses:

Commercial license ($$$) — the free "lite" version has limitations
Steep learning curve
Still coordinate-based, just with better abstractions

Best for: High-volume enterprise scenarios — financial documents, pharmaceutical labeling, publishing.

go-fpdf / gofpdf (Go)

For Go developers, go-fpdf provides FPDF-like functionality:

pdf := gofpdf.New("P", "mm", "A4", "")
pdf.AddPage()
pdf.SetFont("Arial", "B", 16)
pdf.Cell(40, 10, "Hello World")
pdf.OutputFileAndClose("hello.pdf")

Strengths:

Native Go, compiles to a single binary
Good performance characteristics from Go's runtime
Familiar API if you've used FPDF

Weaknesses:

Same manual positioning limitations as FPDF
Smaller ecosystem of extensions

ReportLab (Python)

ReportLab is the go-to for Python. It has both a low-level canvas API and a higher-level "Platypus" layout engine.

from reportlab.lib.pagesizes import A4
from reportlab.pdfgen import canvas

c = canvas.Canvas("hello.pdf", pagesize=A4)
c.setFont("Helvetica", 12)
c.drawString(72, 700, "Hello World")
c.save()

The Platypus engine adds flowing content:

from reportlab.platypus import SimpleDocTemplate, Paragraph
from reportlab.lib.styles import getSampleStyleSheet

doc = SimpleDocTemplate("report.pdf")
styles = getSampleStyleSheet()
story = [
    Paragraph("Chapter 1: Introduction", styles['Heading1']),
    Paragraph("This is a flowing paragraph that wraps automatically...", styles['Normal']),
]
doc.build(story)

Strengths:

Platypus handles page breaks, flowing text, and tables automatically
Good documentation and active community
Supports charts via the graphics module
Open source (BSD license)

Weaknesses:

Python's speed can be a bottleneck at high volumes
Platypus has a learning curve
Complex layouts still require significant code

Best for: Data-heavy reports, scientific documents, Python-based applications.

PDFKit (Node.js)

PDFKit is a pure JavaScript library for creating PDF documents:

const PDFDocument = require('pdfkit');
const fs = require('fs');

const doc = new PDFDocument();
doc.pipe(fs.createWriteStream('output.pdf'));
doc.fontSize(16).text('Hello World', 100, 100);
doc.end();

Strengths:

Pure Node.js, no native dependencies
Streaming API — can pipe directly to HTTP responses
Good image and vector graphics support

Weaknesses:

Manual layout like all low-level libraries
Font embedding can be tricky

When to Choose Low-Level Libraries

Pick a low-level library when:

Your document structure is simple and predictable
Performance is critical (thousands of PDFs per minute)
You don't want any external dependencies or services
The layout rarely changes
You have developers who will maintain the template code

Avoid them when your templates need to be editable by non-developers, or when the layout is complex and changes frequently.

Approach 2: HTML-to-PDF Converters

This is where most developers end up. The idea is compelling: write your template in HTML/CSS (which you already know), and convert it to PDF.

wkhtmltopdf

For years, wkhtmltopdf was the default choice. It uses an old WebKit rendering engine to convert HTML to PDF:

wkhtmltopdf https://example.com output.pdf

// In PHP with snappy
$snappy = new Pdf('/usr/local/bin/wkhtmltopdf');
$snappy->generateFromHtml('<h1>Hello</h1>', 'output.pdf');

The harsh truth about wkhtmltopdf: The project is effectively dead. It was archived in 2023, uses a 2012-era WebKit engine, and doesn't support modern CSS features like Flexbox, Grid, or CSS variables. Many Linux distributions have dropped it from their package managers.

If you're starting a new project, don't use wkhtmltopdf. If you're already using it, plan a migration.

What it doesn't support:

CSS Flexbox and Grid
CSS custom properties (variables)
Modern JavaScript (ES6+)
SVG <foreignObject>
@media print queries (partially)

WeasyPrint (Python)

WeasyPrint is an underrated gem. It's a Python library purpose-built for converting HTML/CSS to PDF, with excellent CSS support including CSS Paged Media (the W3C standard for print layouts):

from weasyprint import HTML

HTML('https://example.com').write_pdf('output.pdf')

# Or from a string
HTML(string='<h1>Hello World</h1>').write_pdf('output.pdf')

# With CSS Paged Media for headers/footers
html = """
<style>
    @page {
        size: A4;
        margin: 2cm;
        @top-center { content: "My Report"; }
        @bottom-right { content: "Page " counter(page) " of " counter(pages); }
    }
    h1 { break-before: page; }
</style>
<h1>Chapter 1</h1>
<p>Content...</p>
"""
HTML(string=html).write_pdf('report.pdf')

Strengths:

Excellent CSS support (Flexbox, multi-column, CSS Paged Media)
CSS @page rules for margins, headers, footers, page numbers
Proper break-before, break-after, break-inside handling
Pure Python — no headless browser needed
Active development and responsive maintainers
Much lighter than Puppeteer/Playwright

Weaknesses:

No JavaScript execution (it's a CSS renderer, not a browser)
No CSS Grid support (as of 2026)
Slower than low-level libraries (but faster than headless browsers)
Requires system dependencies (cairo, pango, gdk-pixbuf)

Best for: Python applications that need good CSS support without the overhead of a full browser. Great for reports, invoices, and any document where CSS Paged Media features are useful.

Prince (Commercial)

Prince is the gold standard for HTML-to-PDF conversion. If you need pixel-perfect output with full CSS support, Prince is unmatched:

prince input.html -o output.pdf

Strengths:

Best CSS support of any HTML-to-PDF converter
CSS Grid, Flexbox, multi-column, CSS Paged Media — all work
Outstanding typography (kerning, ligatures, hyphenation)
Can generate PDF/A for archival
Headers, footers, page numbers, cross-references via CSS
Very fast

Weaknesses:

Expensive license ($3,800+ for a server license)
No JavaScript execution
Proprietary

Best for: Publishing, legal documents, government agencies — anywhere the license cost is justified by the output quality.

Approach 3: Headless Browsers

The nuclear option: launch an actual browser, render your HTML, and "print" it to PDF. This guarantees pixel-perfect fidelity because you're using the same engine that renders web pages.

Puppeteer (Chrome/Chromium)

Puppeteer controls Chrome via the DevTools Protocol:

const puppeteer = require('puppeteer');

const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto('https://example.com', { waitUntil: 'networkidle0' });
await page.pdf({
    path: 'output.pdf',
    format: 'A4',
    margin: { top: '1cm', right: '1cm', bottom: '1cm', left: '1cm' },
    printBackground: true,
});
await browser.close();

Playwright (Chrome, Firefox, WebKit)

Playwright supports multiple browsers and has a cleaner API:

const { chromium } = require('playwright');

const browser = await chromium.launch();
const page = await browser.newPage();
await page.goto('https://example.com');
await page.pdf({
    path: 'output.pdf',
    format: 'A4',
    margin: { top: '1cm', right: '1cm', bottom: '1cm', left: '1cm' },
});
await browser.close();

# Playwright with Python
from playwright.sync_api import sync_playwright

with sync_playwright() as p:
    browser = p.chromium.launch()
    page = browser.new_page()
    page.goto("https://example.com")
    page.pdf(path="output.pdf", format="A4")
    browser.close()

Gotenberg (Docker-based Service)

Gotenberg wraps Chromium and LibreOffice in a Docker container with a REST API:

curl --request POST \
    --url http://localhost:3000/forms/chromium/convert/url \
    --form url=https://example.com \
    --form marginTop=1 \
    --form marginBottom=1 \
    -o output.pdf

Strengths of headless browsers:

100% CSS fidelity — if it renders in Chrome, the PDF will look the same
JavaScript execution — dynamic charts, client-side rendering
Can screenshot or PDF any existing web page
Familiar HTML/CSS/JS development experience

Weaknesses of headless browsers:

Resource-heavy: Chromium uses 200-500MB RAM per instance
Slow: 1-5 seconds per PDF (vs. milliseconds for low-level libraries)
Operational complexity: managing browser processes, handling crashes, memory leaks
Limited print control: Chrome's @page CSS support is basic — no running headers/footers via CSS (you have to use Chrome's built-in header/footer option, which is ugly)
Concurrent requests are hard: you need browser pools, process isolation, and crash recovery
Not ideal for high-volume: generating 10,000 PDFs/hour requires significant infrastructure

The Hidden Costs of Headless Browsers

Most tutorials make Puppeteer/Playwright look easy. And for a single PDF, they are. But in production, you'll encounter:

Memory leaks: Long-running browser instances accumulate memory. You need to restart them periodically.
Zombie processes: If your application crashes mid-generation, you'll have orphaned Chromium processes eating your RAM.
Font rendering differences: Chromium renders fonts differently on Linux vs. macOS vs. Windows. Your local PDF might not match your server's PDF.
Timeout management: pages with slow-loading assets can hang. You need aggressive timeouts and retry logic.
Docker image size: A Chromium Docker image is 1-2GB. This affects deployment times and costs.
Security: running a browser engine that processes arbitrary HTML introduces XSS-like attack vectors. Sandboxing is critical.

Here's what production-ready Puppeteer code actually looks like:

const puppeteer = require('puppeteer');

// Browser pool management
let browser;
let requestCount = 0;
const MAX_REQUESTS_PER_BROWSER = 100;

async function getBrowser() {
    if (!browser || requestCount >= MAX_REQUESTS_PER_BROWSER) {
        if (browser) {
            await browser.close().catch(() => {});
        }
        browser = await puppeteer.launch({
            headless: 'new',
            args: [
                '--no-sandbox',
                '--disable-setuid-sandbox',
                '--disable-dev-shm-usage',
                '--disable-gpu',
                '--single-process',
            ],
        });
        requestCount = 0;
    }
    requestCount++;
    return browser;
}

async function generatePdf(html, options = {}) {
    const browser = await getBrowser();
    const page = await browser.newPage();

    try {
        // Set a timeout to avoid hanging
        page.setDefaultTimeout(30000);

        // Block unnecessary resources to speed up rendering
        await page.setRequestInterception(true);
        page.on('request', (req) => {
            if (['image', 'stylesheet', 'font'].includes(req.resourceType())) {
                req.continue();
            } else if (req.resourceType() === 'script') {
                req.continue(); // Allow scripts if needed
            } else {
                req.abort();
            }
        });

        await page.setContent(html, { waitUntil: 'networkidle0' });

        const pdfBuffer = await page.pdf({
            format: 'A4',
            margin: { top: '1cm', right: '1cm', bottom: '1cm', left: '1cm' },
            printBackground: true,
            ...options,
        });

        return pdfBuffer;
    } finally {
        await page.close().catch(() => {});
    }
}

That's a lot more code than the 5-line tutorial examples.

Approach 4: Template-Based API Services

API services abstract away the complexity. You design a template (either visually or in HTML/CSS), send your data, and receive a PDF. Services like PDF-API.io, DocSpring, Anvil, and PSPDFKit each take a slightly different approach — some focus on visual template editors, others on HTML-to-PDF via API, and some offer both.

The typical integration looks like this:

curl -X POST https://api.pdf-api.io/v1/generate \
    -H "Authorization: Bearer YOUR_KEY" \
    -H "Content-Type: application/json" \
    -d '{ "template_id": "inv_abc123", "data": { "client": "Acme Corp" } }' \
    -o invoice.pdf

What makes this approach compelling is that the complexity we discussed earlier — browser pools, font embedding, memory management, zombie processes — becomes someone else's problem.

Strengths:

No infrastructure to manage
Visual template editors for non-developers
Consistent output regardless of your server's OS/fonts
Handles concurrency, scaling, and error recovery for you
Usually faster than self-hosted headless browsers

Weaknesses:

Ongoing cost per PDF (though often cheaper than the engineering time to self-host)
Your data leaves your infrastructure (unless the API offers on-premise)
Vendor lock-in on the template format
Dependent on the API's uptime

Best for: Teams that want to focus on their core product rather than PDF infrastructure. Especially valuable when non-developers need to create/edit templates.

Performance Comparison

Here are ballpark numbers for generating a typical one-page invoice on modern hardware:

Approach	Time per PDF	Memory	Concurrency
FPDF / go-fpdf	5-20ms	~5MB	Excellent
TCPDF	20-80ms	~15MB	Very good
ReportLab	30-100ms	~20MB	Good
PDFKit (Node)	20-60ms	~30MB	Good
WeasyPrint	200-500ms	~50MB	Moderate
wkhtmltopdf	500ms-2s	~100MB	Poor
Puppeteer/Playwright	1-5s	~200MB+	Poor (without pooling)
Prince	50-200ms	~30MB	Very good

These numbers vary significantly based on document complexity, font count, image size, and whether you're reusing browser instances.

Decision Framework

Here's how to think about which approach to use:

Start with your constraints:

"We need maximum performance and minimal dependencies" → Low-level library (FPDF, PDFKit, ReportLab)

"Our templates change frequently and non-developers need to edit them" → Template-based API service

"We need pixel-perfect HTML/CSS rendering with JavaScript" → Headless browser (Puppeteer/Playwright)

"We need excellent CSS support without the browser overhead" → WeasyPrint or Prince

"We generate 100K+ PDFs per day" → Low-level library or Prince (or a distributed API service)

"We generate fewer than 1,000 PDFs per day and want simplicity" → API service or WeasyPrint

Consider your team:

If your team is mostly backend developers comfortable with coordinate-based layouts → low-level library
If your team is mostly full-stack developers comfortable with HTML/CSS → WeasyPrint, headless browser, or API
If non-technical people need to create templates → API service with a visual editor

Consider your timeline:

Need it working today: API service
Can spend a week: WeasyPrint or headless browser
Can spend a month: Low-level library with proper templating abstraction

Common Pitfalls

1. Font rendering inconsistencies

This is the #1 source of "it looks different in production" bugs. Always:

Embed fonts explicitly rather than relying on system fonts
Test on the same OS your production server runs
Use web fonts (TTF/OTF/WOFF2) that you bundle with your application

2. Character encoding issues

Non-Latin characters (CJK, Arabic, Cyrillic) are a minefield:

Ensure your library supports Unicode
Some libraries require specific font files for CJK characters
Test with real multilingual content early, not just ASCII

3. Page break handling

Automatic page breaks that cut through tables, images, or important content:

Use break-inside: avoid on elements that shouldn't split
Test with realistic data volumes — a 3-row table in development might be 300 rows in production
Always test with edge cases: empty data, very long text, oversized images

4. Image resolution

Images that look fine on screen may be blurry when printed:

Use at least 150 DPI for print, 300 DPI for high-quality documents
SVG is ideal for logos, charts, and icons — it scales perfectly
Compress images before embedding to keep file sizes reasonable

5. CSS `@media print` neglect

If you're using HTML-to-PDF conversion:

@media print {
    .no-print { display: none; }
    body { font-size: 11pt; }
    a[href]::after { content: " (" attr(href) ")"; }
}

Most developers forget that PDF generation is essentially a print operation.

Hybrid Architectures

In practice, many applications use multiple approaches:

Simple receipts → low-level library (fast, no dependencies)
Complex reports with charts → headless browser (renders Chart.js/D3)
Customer-editable templates → API service with visual editor
Regulatory documents → Prince or PDFLib (compliance)

This isn't over-engineering — it's matching the tool to the job. A receipt that takes 5ms to generate shouldn't go through a headless browser, and a complex financial report shouldn't be built with manual coordinate positioning.

Conclusion

There's no single "best" way to generate PDFs. The right choice depends on your layout complexity, performance requirements, team skills, and budget. Start by understanding your constraints, prototype with two or three options, and measure what matters — generation speed, output quality, developer experience, and maintainability.

The PDF format isn't going away. Despite predictions about HTML replacing print, PDFs remain the universal format for documents that need to look the same everywhere. Whatever approach you choose, invest in a proper abstraction layer so you can swap implementations later without rewriting your entire template system.

Building a product that needs PDF generation? PDF-API.io provides a template-based API with a visual editor — so your team can design, iterate, and generate documents without managing PDF infrastructure. Try it free.

The Complete Guide to Generating PDFs Programmatically in 2026

Understanding the PDF Format

Approach 1: Low-Level PDF Libraries

TCPDF / FPDF (PHP)

PDFLib (C with PHP/Java/.NET bindings)

go-fpdf / gofpdf (Go)

ReportLab (Python)

PDFKit (Node.js)

When to Choose Low-Level Libraries

Approach 2: HTML-to-PDF Converters

wkhtmltopdf

WeasyPrint (Python)

Prince (Commercial)

Approach 3: Headless Browsers

Puppeteer (Chrome/Chromium)

Playwright (Chrome, Firefox, WebKit)

Gotenberg (Docker-based Service)

The Hidden Costs of Headless Browsers

Approach 4: Template-Based API Services

Performance Comparison

Decision Framework

Start with your constraints:

Consider your team:

Consider your timeline:

Common Pitfalls

1. Font rendering inconsistencies

2. Character encoding issues

3. Page break handling

4. Image resolution

5. CSS @media print neglect

Hybrid Architectures

Conclusion

Ready to automate your PDFs?

5. CSS `@media print` neglect