Blog Post

DocuLens: The AI Tool That Actually Understands What's in Your PDF

4 min read

How to turn any document into structured JSON data in seconds without templates, regex, or coding.

DocuLens: The AI Tool That Actually Understands What's in Your PDF

The Problem with Traditional PDF Extraction

If you’ve ever tried to extract structured data from a batch of PDFs, you know the pain.

Most tools force you into one of two frustrating paths:

  1. Dumb Text Extraction: You get a massive wall of copy-pasted text, stripping away tables and formatting, leaving you to write complex regular expressions to find what you need.
  2. Fragile Templates: You draw boxes on a screen and say, “The invoice number is always at coordinates X, Y.” The moment a vendor moves their logo or adds an extra line item, the template breaks entirely.

We faced this exact problem at Frism. To build our AI-first stock market research platform, we needed to read thousands of annual reports, financial filings, and regulatory disclosures daily. We couldn’t rely on brittle templates for documents that constantly change layouts.

So, we built our own solution. Today, we’re opening it up as a standalone product: DocuLens.


What is DocuLens?

DocuLens is a fully-managed SaaS platform that extracts structured data from documents using AI. Instead of drawing boxes or writing code, you describe what you want in plain English, and the AI figures out where it lives on the page.

You upload a PDF or image, define your schema, and within seconds, you get clean, typed JSON. It works flawlessly on scanned documents, handwritten forms, messy layouts, and multi-column tables—things that break traditional OCR entirely.

It runs entirely in your browser as a SaaS application, is free to try, and requires no setup other than a quick Google sign-in. For developers, it also offers seamless API integration.


How It Works: Step-by-Step

Getting started with DocuLens takes less than a minute. Here is the workflow:

1. Define Your Schema

Instead of templates, you define a list of fields, their types, and natural-language descriptions. For example, you might add:

  • invoice_number (string): “The unique identifier for the invoice”
  • total_amount (currency): “The total payable amount including all taxes”
  • due_date (date): “The date the payment is due”
  • line_items (list of objects): “The individual products or services billed”

2. Upload Your Document

DocuLens supports a wide range of formats: PDFs, JPG, PNG, WebP, TIFF, HEIC, AVIF, GIF, and BMP. If you have a massive 200-page document, you can specify a page range to only extract from the pages that matter.

3. Choose Your Extraction Speed

We offer multiple speed modes depending on your needs:

  • Fast: Best for simple, clean documents.
  • Thinking: Balanced speed and reasoning.
  • Extreme: For highly dense, complex pages and intricate tables.
  • Auto: Let DocuLens decide the best approach.

4. Get Typed, Structured Results

Within seconds, you receive a clean JSON output. But it’s not just the raw value. Every extracted field comes with:

  • The Value: Correctly typed (e.g., currency comes back as a number, not a string with a symbol).
  • Confidence Level: High, medium, or low.
  • Source Page: Exactly which page the value was found on.
  • Reasoning: A brief explanation of why the AI chose that value.

If the AI can’t find a field, it explicitly tells you, rather than hallucinating a guess.


Key Features Built for Real Workflows

DocuLens goes far beyond basic extraction:

  • Rich Type System: Fields aren’t just generic “text”. Declare them as currency, percentage, date, boolean, list, or even nested objects (like a list of line items).
  • Nested Objects (Tables): Need to reconstruct a table? Define a list<object> with sub-fields (product name, quantity, unit price) and watch the AI seamlessly rebuild the entire table structure.
  • Analyze Before You Extract (Dry Run): Have a massive PDF? Run an “Analyze” pass first. It gives you a page-by-page breakdown of content types (scanned, digital text, tables) and an exact credit cost estimate before you spend a dime.
  • Saved Fieldsets: Save your schema templates. Next time you process a similar document, just load your saved fieldset and go.
  • Developer API Keys: It’s not just a UI tool. DocuLens is a complete SaaS platform providing API keys for developers. You can bypass the web UI entirely and integrate the extraction engine directly into your own applications and automated pipelines.

What Can You Build With It?

If a human can read it, DocuLens can extract it. Common use cases include:

  • Invoice Processing: Pulling vendors, amounts, GST numbers, and line items.
  • Contract Review: Extracting parties involved, termination clauses, and payment terms.
  • Financial Filings: Reading revenue, EPS, and guidance numbers from dense annual reports.
  • Medical Forms: Capturing patient details, diagnosis codes, and prescriptions.
  • KYC Documents: Extracting ID numbers, dates of birth, and addresses from passports or IDs.

Try DocuLens Today

I stopped copying data out of PDFs manually, and you can too.

DocuLens is live now with a free tier and starting credits—no credit card required.

🌐 Start extracting with DocuLens

Practical Details:

  • Sign-in: Google OAuth only (takes 10 seconds)
  • Output: Typed JSON with confidence scores and source citations
  • Built By: The team at Frism