Back to Blog
Tutorial

HTML to PDF: Why It's Harder Than You Think (And How to Get It Right)

February 1, 202613 min read

HTML to PDF: Why It's Harder Than You Think (And How to Get It Right)

"Just convert the HTML to PDF." It's one of the most commonly underestimated tasks in software engineering. On the surface, it seems straightforward — we already have tools that render HTML beautifully in browsers, so surely we can just "print" that to a PDF?

The reality is far more nuanced. The web was designed for screens — scrolling, resizing, interactive. PDF was designed for paper — fixed dimensions, precise layout, archival. Bridging these two worlds introduces a class of problems that most developers don't anticipate until they're knee-deep in production bugs.

This article is a comprehensive guide to everything that can go wrong — and how to get it right.

The Fundamental Rendering Gap

Browsers and PDF engines solve fundamentally different problems:

Concept Browser (HTML/CSS) PDF
Canvas Infinite vertical scroll Fixed page dimensions
Layout Content flows and reflows Content is placed at exact coordinates
Text System fonts, web fonts, font shaping Embedded fonts, subset or full
Responsive Adapts to viewport width Fixed width (e.g., 210mm for A4)
Interactivity Clickable, scrollable, animated Static (links work, but limited)
Color sRGB by default CMYK for print, sRGB for screen

This table explains why a page that looks perfect in Chrome can produce a broken PDF. The rendering engines are solving different problems with different constraints.

The Box Model Difference

In a browser, if content is too long, the page scrolls. In PDF, if content is too long, it either:

  1. Overflows the page (content is clipped or goes off the edge)
  2. Breaks to the next page (if the engine supports page breaks)
  3. Shrinks to fit (some engines do this, often ruining the layout)

This is the single biggest source of bugs in HTML-to-PDF conversion. Designs that look perfect with 5 rows of data break horribly with 50 rows.

CSS Paged Media: The Standard Nobody Knows

The W3C created a specification called CSS Paged Media specifically for controlling print and PDF layouts. It's been a Candidate Recommendation since 2013. Most web developers have never heard of it.

The @page Rule

The @page rule controls page dimensions and margins:

@page {
    size: A4;                    /* or: letter, legal, 210mm 297mm */
    margin: 20mm 15mm 25mm 15mm; /* top right bottom left */
}

@page :first {
    margin-top: 40mm;           /* Extra margin on the first page */
}

@page :left {
    margin-left: 25mm;          /* Wider left margin on left pages */
    margin-right: 15mm;
}

@page :right {
    margin-left: 15mm;
    margin-right: 25mm;         /* Wider right margin on right pages */
}

Page Margin Boxes (Running Headers/Footers)

CSS Paged Media defines 16 margin boxes around each page where you can place running headers, footers, and page numbers:

@page {
    @top-center {
        content: "Confidential Report";
        font-size: 9pt;
        color: #666;
    }

    @bottom-left {
        content: "Generated on 2026-02-01";
        font-size: 8pt;
    }

    @bottom-right {
        content: "Page " counter(page) " of " counter(pages);
        font-size: 8pt;
    }
}

Page Break Control

h1 {
    break-before: page;         /* Always start a new page before h1 */
}

h2 {
    break-after: avoid;         /* Don't break right after h2 */
}

table, figure {
    break-inside: avoid;        /* Don't split tables or figures */
}

tr {
    break-inside: avoid;        /* Don't split table rows */
}

The Support Problem

Here's the frustrating part — CSS Paged Media support varies wildly:

Feature Chrome/Puppeteer WeasyPrint Prince wkhtmltopdf
@page size
@page margin
@page :first
@page :left/:right
Margin boxes (@top-center)
counter(page) in margin boxes
break-inside: avoid ⚠️ Partial ⚠️ Partial
break-before: page
Named page types

Chrome (and by extension Puppeteer/Playwright) has surprisingly poor @page support. You can set the size and margins, but running headers/footers and page counters via CSS don't work — you're limited to Chromium's built-in header/footer templates, which offer minimal customization.

This is a major reason why many teams that start with Puppeteer end up migrating to WeasyPrint or Prince when they need proper print features.

Font Embedding: The Silent Saboteur

Fonts are responsible for more "it looks different on the server" bugs than any other factor. Here's why.

How Browsers Handle Fonts

When you specify font-family: 'Helvetica', 'Arial', sans-serif, the browser:

  1. Checks if the first font is available on the system
  2. Falls back to the next font if it isn't
  3. Falls back to the system's default sans-serif font

This means the same HTML renders with different fonts on different machines:

  • macOS has Helvetica → uses Helvetica
  • Windows doesn't have Helvetica → falls back to Arial
  • Linux might not have either → uses Liberation Sans or DejaVu Sans

Consequences for PDF Generation

If your development machine is macOS and your server is Linux, you will see font differences. Characters may have different widths, causing text to wrap differently, which can cascade into completely different page layouts.

The fix: Always use web fonts, and embed them explicitly.

@font-face {
    font-family: 'Inter';
    src: url('/fonts/Inter-Regular.woff2') format('woff2');
    font-weight: 400;
    font-style: normal;
}

@font-face {
    font-family: 'Inter';
    src: url('/fonts/Inter-Bold.woff2') format('woff2');
    font-weight: 700;
    font-style: normal;
}

body {
    font-family: 'Inter', sans-serif;
}

Ship the font files with your application. Don't rely on Google Fonts CDN — your server might not have internet access, or the CDN might be slow, causing timeouts. This is one area where API-based PDF services have an advantage: they manage font libraries on their infrastructure, so you don't have to worry about installing fonts on every server.

Font Subsetting

A full font file (all Unicode characters) can be 500KB-2MB. If every PDF embeds the full font, file sizes balloon quickly.

Font subsetting embeds only the characters actually used in the document. Prince and WeasyPrint do this automatically. For Puppeteer, the full font is embedded unless the font itself is pre-subsetted.

If PDF file size matters (and it often does for email attachments), consider:

  • Pre-subsetting fonts to only include Latin characters (if that's all you need)
  • Using font formats that compress well (WOFF2)
  • Testing file sizes with realistic content

Page Break Strategies

Getting page breaks right is the difference between a professional document and a broken one. Here are the patterns that work:

Pattern 1: Unbreakable Blocks

Wrap content that should stay together in a container with break-inside: avoid:

<div class="card" style="break-inside: avoid;">
    <h3>Employee: Jane Smith</h3>
    <p>Department: Engineering</p>
    <table>
        <tr><td>Metric</td><td>Score</td></tr>
        <tr><td>Performance</td><td>95%</td></tr>
    </table>
</div>

Caveat: If the block is taller than a full page, break-inside: avoid is ignored. Always test with worst-case content sizes.

Pattern 2: Repeating Table Headers

One of the most-requested features is table headers that repeat on every page:

<table>
    <thead>
        <tr><th>Date</th><th>Description</th><th>Amount</th></tr>
    </thead>
    <tbody>
        <!-- 200 rows here -->
    </tbody>
</table>
thead {
    display: table-header-group;   /* Repeat on every page */
}

tfoot {
    display: table-footer-group;   /* Repeat at the bottom of every page */
}

This works in WeasyPrint and Prince. Puppeteer/Chrome support is inconsistent — the headers repeat, but styling sometimes breaks.

Pattern 3: Forced Page Breaks for Sections

.chapter {
    break-before: page;
}

.chapter:first-child {
    break-before: avoid;  /* Don't insert a blank first page */
}

Pattern 4: Orphan and Widow Control

p {
    orphans: 3;  /* At least 3 lines at the bottom of a page */
    widows: 3;   /* At least 3 lines at the top of a new page */
}

This prevents single lines from appearing alone at the top or bottom of a page. It's a typographic best practice that most developers don't know exists.

Building a Production-Ready Pipeline

Here's what a robust HTML-to-PDF pipeline looks like, regardless of which tool you choose:

Step 1: Separate Data from Presentation

Never generate HTML by string concatenation:

// ❌ Don't do this
$html = '<h1>Invoice #' . $invoice->number . '</h1>';
$html .= '<p>Total: $' . $invoice->total . '</p>';

Use a proper template engine:

// ✅ Do this
$html = view('pdf.invoice', [
    'invoice' => $invoice,
    'items' => $invoice->items()->with('product')->get(),
    'company' => $invoice->company,
])->render();

This separation means designers can modify templates without touching business logic, and developers can change data structures without breaking layouts.

Step 2: Create a Print-Specific Stylesheet

Don't try to use your web CSS for PDF generation. Create a dedicated print stylesheet:

/* pdf-base.css */
* {
    box-sizing: border-box;
    margin: 0;
    padding: 0;
}

body {
    font-family: 'Inter', sans-serif;
    font-size: 10pt;           /* Use pt for print, not px */
    line-height: 1.4;
    color: #1a1a1a;
}

/* Use mm/cm for spacing, not rem/em */
.header {
    padding-bottom: 5mm;
    border-bottom: 0.5pt solid #e5e5e5;
    margin-bottom: 5mm;
}

/* No hover effects, transitions, or animations */
a {
    color: #1a1a1a;
    text-decoration: none;
}

Key differences from web CSS:

  • Use pt for font sizes (standard print unit)
  • Use mm/cm for spacing (physical units, not relative)
  • Remove all interactive styles (hover, focus, transitions)
  • Avoid viewport-relative units (vw, vh)
  • Set explicit widths — no responsive breakpoints in PDF

Step 3: Handle Edge Cases

// Sanitize HTML content to prevent XSS in PDF
$html = Purifier::clean($userContent);

// Set reasonable limits
if (mb_strlen($description) > 10000) {
    $description = mb_substr($description, 0, 10000) . '...';
}

// Handle missing images gracefully
$logoUrl = $company->logo_url ?? asset('images/default-logo.png');

Step 4: Test with Realistic Data

Create test fixtures that cover:

  • Empty state: zero line items, no logo, missing optional fields
  • Normal state: 5-10 line items, all fields populated
  • Extreme state: 200 line items, very long product names, maximum discounts
  • Unicode: non-Latin characters, RTL text, emoji
  • Large numbers: amounts in the millions with proper formatting
// Example test fixtures
$fixtures = [
    'minimal' => ['items' => [], 'notes' => null],
    'normal' => ['items' => $fiveItems, 'notes' => 'Thank you'],
    'extreme' => ['items' => $twoHundredItems, 'notes' => str_repeat('Long note. ', 100)],
    'unicode' => ['items' => $itemsWithCJK, 'company' => 'Ünïcödé GmbH'],
];

foreach ($fixtures as $name => $data) {
    $pdf = PdfGenerator::generate('invoice', $data);
    Storage::put("test-output/invoice-{$name}.pdf", $pdf);
}

Review the output PDFs visually. Automated tests can verify that generation doesn't crash, but only human eyes can catch layout issues.

Step 5: Monitor in Production

Track these metrics:

  • Generation time: p50, p95, p99 — catch performance regressions
  • Error rate: failed generations per hour
  • File size: unusually large PDFs might indicate font embedding issues
  • Memory usage: especially for headless browser approaches

Common Recipes

Recipe: Invoice with Page-Specific Running Footer

@page {
    size: A4;
    margin: 15mm 15mm 25mm 15mm;

    @bottom-center {
        content: "Page " counter(page) " of " counter(pages);
        font-size: 8pt;
        color: #999;
    }
}

@page :first {
    margin-top: 10mm;  /* Less top margin on first page (logo is there) */

    @bottom-center {
        content: none;  /* No page number on first page */
    }
}

Recipe: Two-Column Layout That Respects Page Breaks

.two-column {
    columns: 2;
    column-gap: 8mm;
    column-rule: 0.5pt solid #e5e5e5;
}

.two-column h3 {
    column-span: all;            /* Headings span both columns */
    break-after: avoid;
}

.two-column .entry {
    break-inside: avoid;         /* Don't split entries across columns */
}

Recipe: Landscape Pages in an Otherwise Portrait Document

@page landscape-page {
    size: A4 landscape;
}

.wide-table-section {
    page: landscape-page;       /* This section uses landscape pages */
}

Note: The page property for named pages only works in WeasyPrint and Prince, not in Chromium.

Choosing Your Approach

After understanding all these challenges, here's a practical decision tree:

  1. Do you need JavaScript rendering? (charts, dynamic content)

    • Yes → Headless browser (Puppeteer/Playwright)
    • No → Continue
  2. Do you need running headers/footers with page numbers?

    • Yes → WeasyPrint, Prince, or a dedicated PDF engine
    • No → Continue
  3. Is your layout mostly tables with some text?

    • Yes → Low-level library might be simpler than HTML
    • No → Continue
  4. Will non-developers edit templates?

    • Yes → Template-based API service with a visual editor (like PDF-API.io or DocSpring)
    • No → WeasyPrint (open source) or Prince (commercial)

The best approach often becomes clear once you list your top 5 requirements in order of priority. The tool that satisfies the first 3 is usually the right choice, even if it's weaker on requirements 4 and 5.

Conclusion

HTML-to-PDF conversion is a solved problem — but it's solved differently depending on your requirements. The key lessons:

  1. Don't assume browser rendering = PDF rendering. Test early with your actual tool.
  2. Learn CSS Paged Media. It's the standard for print CSS and works with the best tools.
  3. Embed your fonts. Never rely on system fonts for consistent output.
  4. Test with extreme data. The happy path is easy; edge cases break layouts.
  5. Separate concerns. Data, template, and rendering should be independent layers.

The investment in understanding these fundamentals pays off quickly. A well-architected PDF pipeline can serve your application for years without major maintenance.


Don't want to deal with font embedding, page breaks, and rendering inconsistencies? PDF-API.io handles the hard parts — design templates visually and generate pixel-perfect PDFs via API. Get started free.

Ready to automate your PDFs?

Start generating professional documents in minutes. Free plan includes 100 PDFs/month.

Start for Free