Understanding PDF/A: When and Why You Need Archival PDFs
A regular PDF depends on the viewer to render it correctly — referencing system fonts, linking to external resources, relying on JavaScript. What happens when the font is no longer available? When the linked resource disappears? When the JavaScript engine changes?
For documents that need to be readable in 10, 50, or 100 years — tax records, legal contracts, medical records, government filings — this dependency is unacceptable. That's why PDF/A exists.
PDF/A (where "A" stands for "Archival") is an ISO standard (ISO 19005) designed to ensure that a PDF can be rendered identically regardless of when, where, or with what software it's opened. It's not a different file format — it's a constrained subset of the PDF format that eliminates anything that could prevent future rendering.
What PDF/A Prohibits (and Why)
The restrictions in PDF/A exist to ensure self-containment and deterministic rendering:
No External Dependencies
| Prohibited | Reason |
|---|---|
| External fonts (not embedded) | Font might not be available in 20 years |
| External images or resources | URL might be dead |
| Embedded audio/video | Codecs change over time |
| JavaScript | JS engines evolve, behavior changes |
| External color profiles | Profile might not be available |
No Ambiguous Rendering
| Prohibited | Reason |
|---|---|
| Transparency (in PDF/A-1) | Older renderers handle it differently |
| Encryption | Prevents access if password is lost |
| Non-embedded fonts | Different fonts = different rendering |
| LZW compression | Was patented (now expired, but still prohibited) |
Required Elements
| Required | Reason |
|---|---|
| All fonts fully embedded | Self-contained rendering |
| XMP metadata | Machine-readable document information |
| ICC color profiles | Consistent color reproduction |
| PDF/A identification | Declares conformance level |
| Document title in metadata | Findability and cataloging |
PDF/A Conformance Levels
PDF/A has evolved through four major versions, each adding capabilities:
PDF/A-1 (ISO 19005-1:2005)
Based on PDF 1.4. The original standard. Two conformance levels:
- PDF/A-1b ("basic"): Ensures visual appearance is preserved. The document will look the same, but text extraction and accessibility might not work perfectly.
- PDF/A-1a ("accessible"): Includes everything in 1b PLUS logical structure (tags), Unicode mapping for all text, and language declaration. This is PDF/A-1b plus accessibility.
Best for: Documents where visual preservation is the primary goal and you're starting fresh with PDF/A.
PDF/A-2 (ISO 19005-2:2011)
Based on PDF 1.7. Adds significant features:
- JPEG 2000 compression: Better image quality at smaller file sizes
- Transparency: Finally allowed (PDF/A-1 prohibited it)
- PDF/A-2 containers: A PDF/A-2 file can embed other PDF/A files (useful for archives containing multiple documents)
- Digital signatures: Proper support for PAdES signatures
- OpenType fonts: Support for CFF-based OpenType fonts
Conformance levels: PDF/A-2a, PDF/A-2b, and PDF/A-2u (adds Unicode mapping to 2b).
Best for: Most new implementations. It's the sweet spot between compatibility and features.
PDF/A-3 (ISO 19005-3:2012)
Identical to PDF/A-2 but with one major addition: you can embed ANY file type as an attachment. Excel spreadsheets, XML data, original source files — anything.
This created the ZUGFeRD standard in Germany (and its EU successor, Factur-X): an invoice PDF/A-3 that contains a machine-readable XML file with the structured invoice data. Humans see a beautiful invoice; automated systems extract the XML.
invoice.pdf (PDF/A-3)
├── Visual Invoice (what humans see)
└── Embedded: invoice.xml (what machines parse)
├── Invoice number: INV-2026-001
├── Line items [...]
├── Tax calculations [...]
└── Payment terms [...]
Best for: E-invoicing (ZUGFeRD/Factur-X), archiving documents with their source data.
PDF/A-4 (ISO 19005-4:2020)
Based on PDF 2.0. The latest version simplifies the conformance levels:
- PDF/A-4: Replaces the old "b" (basic) level
- PDF/A-4e: For engineering documents (supports 3D content, rich media)
- PDF/A-4f: Allows embedded files (like PDF/A-3)
Best for: Future-proofing new systems, engineering documents with 3D models.
Which Level Should You Choose?
Do you need to embed non-PDF files?
→ Yes → PDF/A-3 (or PDF/A-4f)
→ No ↓
Do you need accessibility (screen readers)?
→ Yes → PDF/A-2a
→ No ↓
Do you need transparency or JPEG 2000?
→ Yes → PDF/A-2b
→ No → PDF/A-1b (widest compatibility)
For most business applications (invoices, contracts, reports), PDF/A-2b is the recommended default. It supports modern features while having broad compatibility.
Who Requires PDF/A?
PDF/A isn't just a nice-to-have. It's legally required or strongly recommended in many contexts:
Government and Public Sector
- US Federal Courts: Accept PDF/A for electronic case filings
- EU Public Procurement: Many member states require PDF/A for procurement documents
- Swiss Federal Archives: Require PDF/A for all archived documents
- German Federal Archives: Mandate PDF/A-1 or PDF/A-2
Financial and Tax
- ZUGFeRD / Factur-X: The German and European e-invoicing standard is built on PDF/A-3
- Many EU countries: Moving toward mandatory e-invoicing with PDF/A-3 containers
- FINRA (US): Recommends PDF/A for record retention
Healthcare
- HIPAA (US): Doesn't specifically require PDF/A, but its long-term retention requirements make PDF/A a practical choice
- Various EU health data regulations: PDF/A recommended for patient records
Legal
- Many court systems: Accept or require PDF/A for electronic filings
- Notarial acts: Some jurisdictions require PDF/A for digitized notarial documents
Creating PDF/A Documents
From HTML (Using WeasyPrint)
WeasyPrint doesn't generate PDF/A directly, but you can post-process with tools like ghostscript:
gs -dPDFA=2 -dBATCH -dNOPAUSE -sProcessColorModel=DeviceRGB \
-sDEVICE=pdfwrite -sPDFACompatibilityPolicy=1 \
-sOutputFile=output-pdfa.pdf input.pdf
Using PDFLib
PDFLib has native PDF/A support:
$pdf = new PDFlib();
$pdf->begin_document("", "pdfa=PDF/A-2b");
// All fonts must be embedded
$font = $pdf->load_font("Helvetica", "unicode", "embedding");
// ICC color profile is required
$icc = $pdf->load_iccprofile("sRGB", "usage=outputintent");
// XMP metadata is required
$pdf->set_info("Title", "Invoice INV-2026-001");
$pdf->set_info("Author", "Acme Corp");
Using API Services
The simplest path to PDF/A compliance is often an API service that handles the requirements automatically. When you generate a document through PDF-API.io, you can request PDF/A output — the service handles font embedding, ICC profiles, XMP metadata, and compliance validation internally.
curl -X POST https://api.pdf-api.io/v1/generate \
-H "Authorization: Bearer YOUR_KEY" \
-H "Content-Type: application/json" \
-d '{
"template_id": "inv_abc123",
"data": { "client": "Acme Corp" },
"options": { "pdf_standard": "pdf-a-2b" }
}' \
-o invoice.pdf
Validating PDF/A Compliance
Creating a PDF/A document isn't enough — you need to validate that it actually complies with the standard.
Validation Tools
- VeraPDF: The open-source reference validator. Used by the PDF Association and many national archives. It's the gold standard for validation.
verapdf --flavour 2b document.pdf
-
Adobe Acrobat Pro: Has a built-in "Preflight" tool that checks PDF/A compliance. Go to Tools > Print Production > Preflight > PDF/A compliance.
-
PDF/A Pilot: Commercial tool from Callas Software with excellent fixing capabilities — it can often repair non-compliant PDFs automatically.
-
JHOVE: Open-source file format identification and validation, used by many digital preservation libraries.
Common Compliance Failures
The most frequent reasons a PDF fails PDF/A validation:
-
Fonts not embedded: By far the most common issue. Ensure ALL fonts are fully embedded, not just referenced.
-
Missing XMP metadata: PDF/A requires XMP (Extensible Metadata Platform) metadata, including the PDF/A version identification.
-
Transparency issues (PDF/A-1): If you're targeting PDF/A-1, all transparency must be flattened.
-
Wrong color space: Images in CMYK without an ICC profile will fail. Either embed an ICC profile or convert images to sRGB.
-
External references: Any reference to an external resource (external link in an action, external font reference) will fail.
Integrating Validation into CI/CD
# GitHub Actions example
validate-pdfa:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Install VeraPDF
run: |
wget https://software.verapdf.org/releases/verapdf-installer.zip
unzip verapdf-installer.zip
./verapdf-installer/verapdf-install --console
- name: Generate test PDFs
run: php artisan pdf:generate-test-fixtures
- name: Validate PDF/A compliance
run: |
for file in storage/test-pdfs/*.pdf; do
verapdf --flavour 2b "$file"
done
Migration: Converting Existing PDFs to PDF/A
If you have a backlog of regular PDFs that need to be converted to PDF/A for compliance:
Ghostscript (Open Source)
gs -dPDFA=2 -dBATCH -dNOPAUSE \
-sProcessColorModel=DeviceRGB \
-sDEVICE=pdfwrite \
-dPDFACompatibilityPolicy=1 \
-dEmbedAllFonts=true \
-sOutputFile=output-pdfa.pdf \
input.pdf
Caution: Automated conversion isn't always perfect. The converted PDF may have font substitutions, color shifts, or layout changes. Always validate the output.
What Gets Lost in Conversion
When converting a regular PDF to PDF/A, some features are removed:
- JavaScript (stripped entirely)
- External links (may be converted to text)
- Multimedia content (removed)
- Encryption (removed — PDF/A cannot be encrypted)
- Non-embedded fonts (substituted or rejected)
Conclusion
PDF/A is essential for any document that needs to survive longer than the software that created it. The standard is well-defined, the tools are mature, and the compliance requirements are increasingly clear.
For new systems, choose PDF/A-2b as your default. For e-invoicing in the EU, use PDF/A-3 with embedded XML. Validate with VeraPDF, and integrate validation into your build pipeline.
The small upfront investment in PDF/A compliance pays dividends in regulatory compliance, legal defensibility, and long-term document integrity.
Need to generate PDF/A-compliant documents? PDF-API.io supports PDF/A-2b and PDF/A-3 output with automatic font embedding and validation. Get started free.