How SanctumPDF Processes Your PDFs Without Ever Seeing Them
12 March 2026 · Lars Holmström · 6 min read
Every day, millions of people upload sensitive documents to free online PDF tools. Bank statements. Tax returns. Contracts. Medical records. The files travel to an unknown server, get processed by unknown code, and maybe get deleted afterwards. Maybe.
We built SanctumPDF because we thought there was a better way.
The problem with “free” PDF tools
Most online PDF tools follow the same architecture: you upload your file, their server processes it, and you download the result. It's simple, and it works. But it means a copy of your document exists on someone else's computer, even if only temporarily.
For a party invitation or a university assignment, that's probably fine. For a bank statement with your account number, transaction history, and spending habits? That's a different calculation entirely.
The usual response from these tools is a privacy policy promising they delete your files after processing. And most of them probably do. But a promise isn't a guarantee — and if a server is breached, your deleted file might not be as deleted as you thought.
We wanted to build something where the privacy guarantee isn't a policy. It's the architecture.
How SanctumPDF works
When you use a SanctumPDF tool — compress, merge, split, convert, fill a form — your PDF file never leaves your browser. Here's what actually happens:
1. You select your file. The file is loaded into your browser's memory. At this point, it's no different from opening a file in any desktop application. The data exists on your device and nowhere else.
2. WebAssembly does the heavy lifting. We use a combination of open-source libraries compiled to WebAssembly (Wasm) that run directly in your browser tab. These are the same kinds of engines that power desktop PDF software — but instead of running on your computer as installed software, they run inside your browser's sandboxed environment.
Our core engine stack includes pdf-lib for document manipulation (merge, split, form filling), pdf.js (Mozilla's PDF renderer, the same one that powers Firefox) for previews and image conversion, and QPDF for stream compression and optimisation. All of these are permissively licensed — MIT and Apache 2.0.
3. Processing happens in a Web Worker. To keep the interface responsive while crunching your PDF, we offload heavy operations to a separate thread using Web Workers. You see a progress bar; your browser stays responsive.
4. You download the result. The processed file is generated entirely in your browser and saved to your device. At no point did any data leave your computer.
What about the Bank Statement Converter?
This is the one feature where we need to be transparent about a nuance.
Our Bank Statement Converter uses AI to extract transaction data from PDF bank statements and structure it into CSV, Excel, JSON, or OFX format. AI models are computationally expensive — too expensive to run in a browser tab.
So here's how we handle it:
Step 1 (in your browser): We parse your PDF entirely client-side using pdf.js. The text content — dates, descriptions, amounts — is extracted as plain text. The PDF file itself, including its images, fonts, metadata, and binary structure, stays in your browser.
Step 2 (server-side): Only the extracted text is sent to our AI service for structured parsing. The AI reads the text, identifies the table structure, and returns a clean JSON array of transactions.
Step 3 (back in your browser): The structured data is returned to your browser where you can review it in an interactive table, edit any values, toggle columns on or off, and export in your preferred format.
The PDF file is never uploaded. The extracted text is not stored after processing. We're transparent about this because we think you deserve to know exactly what happens with your data — especially your financial data.
Why not just use Ghostscript?
If you've worked with PDF processing before, you might wonder why we didn't just use Ghostscript, the industry-standard PDF engine. We started down that path, and it taught us an important lesson about open-source licensing.
Ghostscript is dual-licensed: AGPL-3.0 or commercial. The AGPL requires that if you distribute the software or make it available as a service, your entire application's source code must be released under the AGPL. For a commercial SaaS product, that means either open-sourcing everything or purchasing a commercial licence from Artifex.
We chose a different path. Our stack is built entirely on permissively licensed libraries. pdf-lib is MIT. pdf.js is Apache 2.0. QPDF is Apache 2.0. We use the browser's native Canvas API for image compression — no licence needed for that. The result is a stack with zero AGPL dependencies, zero licence fees, and a total compressed engine size of roughly 580KB.
The compression pipeline
Since people often ask how client-side PDF compression works without Ghostscript, here's a brief look at our approach.
The biggest files in most PDFs are embedded images. A “10MB PDF” is often 500KB of text and layout data plus 9.5MB of high-resolution images.
We do the same thing as Ghostscript, but using the browser's native capabilities:
- pdf-lib parses the PDF and identifies all embedded images
- Each image is drawn onto an OffscreenCanvas at a reduced resolution (configurable via our Light, Medium, and Heavy presets)
- The Canvas API recompresses the image as JPEG at the target quality level
- pdf-lib replaces the original image data with the compressed version
- QPDF performs a final pass of stream recompression and object deduplication
The result is typically a 50–80% file size reduction for image-heavy PDFs — comparable to Ghostscript's output for the documents most people need to compress.
14 languages from day one
SanctumPDF launches with support for 14 languages: English, Spanish, Portuguese, French, German, Japanese, Norwegian, Swedish, Danish, Italian, Dutch, Hindi, Bahasa Indonesia, and Turkish.
We used next-intl with Next.js App Router for internationalisation, with subpath routing so each language gets its own URL structure for SEO. Every tool page, every description, every button — all translated and indexable.
Why bother with 14 languages for a launch? Because most of our competitors don't. The Wasm-first PDF tools that share our privacy-focused approach are almost all English-only. That's a lot of organic search traffic left on the table.
What's next
SanctumPDF is launching with 12 tools: Compress, Merge, Split, PDF to Image, Fill Forms, Bank Statement Converter, Rotate, Protect, Unlock, Page Numbers, Reorder Pages, and Watermark.
We're working on OCR (client-side text extraction from scanned documents), PDF-to-Word conversion, and an AI chat feature that lets you ask questions about your PDF documents.
If you deal with bank statements regularly — whether you're a bookkeeper, accountant, or just someone who wants their transaction data in a spreadsheet — we'd love for you to try the Bank Statement Converter. It's free for one extraction per day, and we think it's the fastest and most private way to get your financial data out of PDF and into a format you can actually work with.
And if you just need to compress a PDF before emailing it, we've got you covered there too. No upload required.
SanctumPDF is built and operated by Lars Holmström from Melbourne, Australia. Try it free at sanctumpdf.com.
Have questions about our architecture, privacy approach, or anything else? Get in touch at hello@sanctumpdf.com.