Back to Blog
SEO & Acquisition

Understanding PDF/A: When and Why You Need Archival PDFs

January 28, 202611 min read

Understanding PDF/A: When and Why You Need Archival PDFs

A regular PDF depends on the viewer to render it correctly — referencing system fonts, linking to external resources, relying on JavaScript. What happens when the font is no longer available? When the linked resource disappears? When the JavaScript engine changes?

For documents that need to be readable in 10, 50, or 100 years — tax records, legal contracts, medical records, government filings — this dependency is unacceptable. That's why PDF/A exists.

PDF/A (where "A" stands for "Archival") is an ISO standard (ISO 19005) designed to ensure that a PDF can be rendered identically regardless of when, where, or with what software it's opened. It's not a different file format — it's a constrained subset of the PDF format that eliminates anything that could prevent future rendering.

What PDF/A Prohibits (and Why)

The restrictions in PDF/A exist to ensure self-containment and deterministic rendering:

No External Dependencies

Prohibited Reason
External fonts (not embedded) Font might not be available in 20 years
External images or resources URL might be dead
Embedded audio/video Codecs change over time
JavaScript JS engines evolve, behavior changes
External color profiles Profile might not be available

No Ambiguous Rendering

Prohibited Reason
Transparency (in PDF/A-1) Older renderers handle it differently
Encryption Prevents access if password is lost
Non-embedded fonts Different fonts = different rendering
LZW compression Was patented (now expired, but still prohibited)

Required Elements

Required Reason
All fonts fully embedded Self-contained rendering
XMP metadata Machine-readable document information
ICC color profiles Consistent color reproduction
PDF/A identification Declares conformance level
Document title in metadata Findability and cataloging

PDF/A Conformance Levels

PDF/A has evolved through four major versions, each adding capabilities:

PDF/A-1 (ISO 19005-1:2005)

Based on PDF 1.4. The original standard. Two conformance levels:

  • PDF/A-1b ("basic"): Ensures visual appearance is preserved. The document will look the same, but text extraction and accessibility might not work perfectly.
  • PDF/A-1a ("accessible"): Includes everything in 1b PLUS logical structure (tags), Unicode mapping for all text, and language declaration. This is PDF/A-1b plus accessibility.

Best for: Documents where visual preservation is the primary goal and you're starting fresh with PDF/A.

PDF/A-2 (ISO 19005-2:2011)

Based on PDF 1.7. Adds significant features:

  • JPEG 2000 compression: Better image quality at smaller file sizes
  • Transparency: Finally allowed (PDF/A-1 prohibited it)
  • PDF/A-2 containers: A PDF/A-2 file can embed other PDF/A files (useful for archives containing multiple documents)
  • Digital signatures: Proper support for PAdES signatures
  • OpenType fonts: Support for CFF-based OpenType fonts

Conformance levels: PDF/A-2a, PDF/A-2b, and PDF/A-2u (adds Unicode mapping to 2b).

Best for: Most new implementations. It's the sweet spot between compatibility and features.

PDF/A-3 (ISO 19005-3:2012)

Identical to PDF/A-2 but with one major addition: you can embed ANY file type as an attachment. Excel spreadsheets, XML data, original source files — anything.

This created the ZUGFeRD standard in Germany (and its EU successor, Factur-X): an invoice PDF/A-3 that contains a machine-readable XML file with the structured invoice data. Humans see a beautiful invoice; automated systems extract the XML.

invoice.pdf (PDF/A-3)
├── Visual Invoice (what humans see)
└── Embedded: invoice.xml (what machines parse)
    ├── Invoice number: INV-2026-001
    ├── Line items [...]
    ├── Tax calculations [...]
    └── Payment terms [...]

Best for: E-invoicing (ZUGFeRD/Factur-X), archiving documents with their source data.

PDF/A-4 (ISO 19005-4:2020)

Based on PDF 2.0. The latest version simplifies the conformance levels:

  • PDF/A-4: Replaces the old "b" (basic) level
  • PDF/A-4e: For engineering documents (supports 3D content, rich media)
  • PDF/A-4f: Allows embedded files (like PDF/A-3)

Best for: Future-proofing new systems, engineering documents with 3D models.

Which Level Should You Choose?

Do you need to embed non-PDF files?
  → Yes → PDF/A-3 (or PDF/A-4f)
  → No ↓

Do you need accessibility (screen readers)?
  → Yes → PDF/A-2a
  → No ↓

Do you need transparency or JPEG 2000?
  → Yes → PDF/A-2b
  → No → PDF/A-1b (widest compatibility)

For most business applications (invoices, contracts, reports), PDF/A-2b is the recommended default. It supports modern features while having broad compatibility.

Who Requires PDF/A?

PDF/A isn't just a nice-to-have. It's legally required or strongly recommended in many contexts:

Government and Public Sector

  • US Federal Courts: Accept PDF/A for electronic case filings
  • EU Public Procurement: Many member states require PDF/A for procurement documents
  • Swiss Federal Archives: Require PDF/A for all archived documents
  • German Federal Archives: Mandate PDF/A-1 or PDF/A-2

Financial and Tax

  • ZUGFeRD / Factur-X: The German and European e-invoicing standard is built on PDF/A-3
  • Many EU countries: Moving toward mandatory e-invoicing with PDF/A-3 containers
  • FINRA (US): Recommends PDF/A for record retention

Healthcare

  • HIPAA (US): Doesn't specifically require PDF/A, but its long-term retention requirements make PDF/A a practical choice
  • Various EU health data regulations: PDF/A recommended for patient records

Legal

  • Many court systems: Accept or require PDF/A for electronic filings
  • Notarial acts: Some jurisdictions require PDF/A for digitized notarial documents

Creating PDF/A Documents

From HTML (Using WeasyPrint)

WeasyPrint doesn't generate PDF/A directly, but you can post-process with tools like ghostscript:

gs -dPDFA=2 -dBATCH -dNOPAUSE -sProcessColorModel=DeviceRGB \
   -sDEVICE=pdfwrite -sPDFACompatibilityPolicy=1 \
   -sOutputFile=output-pdfa.pdf input.pdf

Using PDFLib

PDFLib has native PDF/A support:

$pdf = new PDFlib();
$pdf->begin_document("", "pdfa=PDF/A-2b");

// All fonts must be embedded
$font = $pdf->load_font("Helvetica", "unicode", "embedding");

// ICC color profile is required
$icc = $pdf->load_iccprofile("sRGB", "usage=outputintent");

// XMP metadata is required
$pdf->set_info("Title", "Invoice INV-2026-001");
$pdf->set_info("Author", "Acme Corp");

Using API Services

The simplest path to PDF/A compliance is often an API service that handles the requirements automatically. When you generate a document through PDF-API.io, you can request PDF/A output — the service handles font embedding, ICC profiles, XMP metadata, and compliance validation internally.

curl -X POST https://api.pdf-api.io/v1/generate \
    -H "Authorization: Bearer YOUR_KEY" \
    -H "Content-Type: application/json" \
    -d '{
        "template_id": "inv_abc123",
        "data": { "client": "Acme Corp" },
        "options": { "pdf_standard": "pdf-a-2b" }
    }' \
    -o invoice.pdf

Validating PDF/A Compliance

Creating a PDF/A document isn't enough — you need to validate that it actually complies with the standard.

Validation Tools

  1. VeraPDF: The open-source reference validator. Used by the PDF Association and many national archives. It's the gold standard for validation.
verapdf --flavour 2b document.pdf
  1. Adobe Acrobat Pro: Has a built-in "Preflight" tool that checks PDF/A compliance. Go to Tools > Print Production > Preflight > PDF/A compliance.

  2. PDF/A Pilot: Commercial tool from Callas Software with excellent fixing capabilities — it can often repair non-compliant PDFs automatically.

  3. JHOVE: Open-source file format identification and validation, used by many digital preservation libraries.

Common Compliance Failures

The most frequent reasons a PDF fails PDF/A validation:

  1. Fonts not embedded: By far the most common issue. Ensure ALL fonts are fully embedded, not just referenced.

  2. Missing XMP metadata: PDF/A requires XMP (Extensible Metadata Platform) metadata, including the PDF/A version identification.

  3. Transparency issues (PDF/A-1): If you're targeting PDF/A-1, all transparency must be flattened.

  4. Wrong color space: Images in CMYK without an ICC profile will fail. Either embed an ICC profile or convert images to sRGB.

  5. External references: Any reference to an external resource (external link in an action, external font reference) will fail.

Integrating Validation into CI/CD

# GitHub Actions example
validate-pdfa:
  runs-on: ubuntu-latest
  steps:
    - uses: actions/checkout@v4
    - name: Install VeraPDF
      run: |
        wget https://software.verapdf.org/releases/verapdf-installer.zip
        unzip verapdf-installer.zip
        ./verapdf-installer/verapdf-install --console
    - name: Generate test PDFs
      run: php artisan pdf:generate-test-fixtures
    - name: Validate PDF/A compliance
      run: |
        for file in storage/test-pdfs/*.pdf; do
          verapdf --flavour 2b "$file"
        done

Migration: Converting Existing PDFs to PDF/A

If you have a backlog of regular PDFs that need to be converted to PDF/A for compliance:

Ghostscript (Open Source)

gs -dPDFA=2 -dBATCH -dNOPAUSE \
   -sProcessColorModel=DeviceRGB \
   -sDEVICE=pdfwrite \
   -dPDFACompatibilityPolicy=1 \
   -dEmbedAllFonts=true \
   -sOutputFile=output-pdfa.pdf \
   input.pdf

Caution: Automated conversion isn't always perfect. The converted PDF may have font substitutions, color shifts, or layout changes. Always validate the output.

What Gets Lost in Conversion

When converting a regular PDF to PDF/A, some features are removed:

  • JavaScript (stripped entirely)
  • External links (may be converted to text)
  • Multimedia content (removed)
  • Encryption (removed — PDF/A cannot be encrypted)
  • Non-embedded fonts (substituted or rejected)

Conclusion

PDF/A is essential for any document that needs to survive longer than the software that created it. The standard is well-defined, the tools are mature, and the compliance requirements are increasingly clear.

For new systems, choose PDF/A-2b as your default. For e-invoicing in the EU, use PDF/A-3 with embedded XML. Validate with VeraPDF, and integrate validation into your build pipeline.

The small upfront investment in PDF/A compliance pays dividends in regulatory compliance, legal defensibility, and long-term document integrity.


Need to generate PDF/A-compliant documents? PDF-API.io supports PDF/A-2b and PDF/A-3 output with automatic font embedding and validation. Get started free.

Ready to automate your PDFs?

Start generating professional documents in minutes. Free plan includes 100 PDFs/month.

Start for Free