Web Tools

How PDF Compression Works

Updated 11 May 20266 minReviewed for accuracy

A PDF is a container: text, images, fonts, vector graphics, and metadata bundled into one file. PDF compression shrinks that container by encoding each component more efficiently. Most of the savings come from images, which often dominate file size, but well-optimized PDFs squeeze every part of the structure.

Key Takeaways

  • A PDF stores text, images, fonts, and metadata. Compression targets each component.
  • Image compression is the biggest lever, usually 70%+ of the file size.
  • Lossless compression preserves all original data (text, vectors).
  • Lossy compression sacrifices some image quality for smaller size.
  • Aggressive compression can degrade quality; conservative compression is usually invisible.

What's Inside a PDF

A typical PDF file contains:

  • Text streams. Encoded characters, paragraph layout instructions.
  • Embedded fonts. Often the largest non-image component.
  • Images. Photos, illustrations, scanned pages, usually the biggest contributor to size.
  • Vector graphics. Lines, shapes, diagrams, small in size.
  • Metadata. Title, author, creation date, hyperlinks.
  • Forms and annotations. Interactive elements.

Compression operates on each of these differently.

Lossless vs Lossy Compression

Lossless compression preserves every byte of the original. After decompression, the result is bit-for-bit identical to the input.

Examples:

  • ZIP and Flate compression for text and metadata streams
  • PNG image compression
  • LZW compression

Lossy compression discards information considered "less important," producing a smaller file at the cost of some quality.

Examples:

  • JPEG compression for photos
  • Discarding embedded font subsets
  • Reducing image resolution

Text and vector content should always be lossless. Images are typically where lossy compression lives, especially for scanned documents and photo-heavy PDFs.

How Image Compression Dominates File Size

A 50-page document with 10 high-resolution images (5 MB each) is roughly 50 MB. The text and vectors might total 1–2 MB. Compressing the images by 80% (to 1 MB each) drops the whole file to about 12 MB, a 76% reduction.

This is why "PDF compression" tools focus heavily on image optimization. Strategies include:

  1. Downsampling. Reduce image resolution. A 600 DPI scan often serves the same purpose at 150 DPI for screen viewing.
  2. Recompression. Re-encode images at lower JPEG quality.
  3. Color space reduction. Convert RGB to grayscale where appropriate.
  4. Format conversion. Convert lossless image formats to JPEG when acceptable.

A scanned PDF saved at 300 DPI in lossless format can often be reduced 90%+ without visible quality loss on a screen.

Text and Font Compression

Text in PDFs is compressed using lossless algorithms (typically Flate, the same algorithm used in ZIP). Compression ratios of 3:1 to 5:1 are typical for plain text.

Embedded fonts can be a surprising size contributor. A full font file is 200 KB–2 MB. PDFs can:

  • Subset fonts: include only the glyphs actually used in the document. A document using only 80 characters embeds 80 glyphs, not the full character set.
  • Reuse fonts: when multiple pages use the same font, embed it once.
  • Convert to outlines: for very short usages (a single title), convert text to vector paths instead of embedding the font.

Aggressive font subsetting can cut PDF size by 30–50% when fonts are heavily used.

Compression Settings: Quality vs Size

PDF compressors usually expose a quality vs size dial. Common levels:

SettingDPIImage QualityTypical Size Reduction
Original / Losslessunchangedunchanged5–20% (text/font only)
High quality300 DPI90% JPEG30–50%
Standard150 DPI75% JPEG60–75%
Web / email96 DPI60% JPEG80–90%
Minimum72 DPI40% JPEG90–95%

For most screen viewing, "standard" or "web" levels produce file size reductions that look nearly identical to the original. For print, keep at "high quality" or above.

When to Compress (and When Not To)

Compress when:

  • The file is for email or web distribution
  • Quality requirements are screen viewing only
  • Storage or bandwidth is constrained
  • The original was over-engineered (e.g., 600 DPI scans for a 1-page memo)

Don't compress (or compress lightly) when:

  • The file will be printed at high quality
  • Images have detail at small scales (technical diagrams, fine print)
  • The file is an archival master
  • Legal or regulatory requirements specify originals

A good practice: keep an uncompressed master, distribute a compressed working copy.

Worked Example: Compressing a 50 MB Scan

A 50-page scanned PDF (300 DPI color) totals 50 MB.

Step 1: Downsample images to 150 DPI. New size: ~25 MB. Step 2: Convert color images to grayscale (assuming the original is mostly black text). New size: ~12 MB. Step 3: Recompress images as JPEG quality 75%. New size: ~6 MB. Step 4: Apply Flate compression to text and metadata. New size: ~5.7 MB.

Total reduction: 88%. The compressed PDF is still readable on screen and easily emailed.

For OCR-ready scans (where text-recognition software needs clean character edges), don't go below 200 DPI.

OCR and Compression

Scanned PDFs benefit from Optical Character Recognition (OCR), which adds a hidden text layer on top of the images. OCR doesn't reduce file size directly (it might add 1–2%), but it makes the document searchable and selectable, which is often more valuable than additional compression.

Some PDF compressors offer "OCR + compress" workflows that produce smaller searchable PDFs than running each step separately.

Common Mistakes

Compressing the same file repeatedly. Lossy compression compounds with each pass. Always compress from the original.

Using web-quality settings for print. Web-resolution images look pixelated when printed at letter-paper size.

Over-compressing detail images. Maps, fine print, and small text need higher quality settings.

Removing useful metadata. Title, author, and structure tags help searchability and accessibility; strip them only when there's a reason.

Ignoring image format mismatch. A photo as PNG is larger than the same photo as JPEG. PDF compressors that convert between formats can help.

Compressing already-compressed files. Re-compressing a file that has already been through aggressive compression yields tiny savings and quality loss.

File Size Targets

Approximate targets by use case:

Use CaseTarget Size
Email attachment< 10 MB (often < 5)
Web download< 2 MB per page
Print master5–20 MB per page acceptable
Archivaloriginal quality, large size fine
Mobile viewing< 1 MB per page

A 10-page PDF should usually be 1–3 MB for general distribution. If it's 50 MB, compression will help dramatically.

Practical Scenarios

Scenario 1: Email a contract. Source PDF is 18 MB from a high-DPI scan. Compress to 150 DPI grayscale → 2.5 MB. Easy to email.

Scenario 2: Upload a portfolio. A designer's portfolio PDF has full-resolution photos and is 80 MB. Compress at "high quality," which keeps photo detail while reducing to ~12 MB.

Scenario 3: Submit a tax document. Government portal limits uploads to 5 MB. Compress aggressively (web quality) to fit, but verify legibility.

Scenario 4: Archive vs distribute. Keep the 50 MB original on a hard drive; share the 5 MB compressed version with collaborators.

FAQ

Why is my PDF so large? Usually because of embedded images. A scanned PDF or photo-heavy document at high resolution can easily exceed 50 MB. Compression typically reduces this by 70–90%.

Does PDF compression lose quality? Lossless compression of text and vectors doesn't. Lossy compression of images does, though the loss is often imperceptible at moderate settings.

Can I compress a PDF without losing readability? Yes. Light to moderate compression (150–200 DPI, JPEG quality 75%) preserves screen readability while cutting size significantly.

How do I reduce PDF size for email? Use a PDF compressor at standard or web-quality settings. Most email systems limit attachments to 25 MB; aiming for under 10 MB is safer.

Can I un-compress a compressed PDF? You can apply lossless re-encoding, but you cannot recover detail lost to lossy compression. Always keep the original master.

Is JPEG or PNG better for PDF images? JPEG for photos (better compression for natural images). PNG or lossless for diagrams, screenshots, and images with sharp edges.

Why is the same PDF different sizes on different software? PDF generation tools make different decisions about compression, font embedding, and image handling. The PDF specification allows many valid representations.

Related Tools

The PDF Compressor handles file size reduction at multiple quality levels. For other PDF operations, see the PDF Merger, PDF Splitter, and PDF Converter.

Related Articles

Final Thoughts

PDF compression is mostly image compression with a few smaller wins from fonts and text. Understanding what's inside your PDF, and what's actually taking up space, lets you make smart trade-offs between quality and size. For most everyday distribution, modest compression is invisible to the reader and dramatic for the file size. Save the original at full quality, distribute the compressed copy, and most of the workflow takes care of itself.