- Python 55.2%
- Jupyter Notebook 44.8%
| .vscode | ||
| app | ||
| input | ||
| notebook | ||
| .gitignore | ||
| config.json | ||
| document_classes copy.json | ||
| document_classes.json | ||
| PLAN.md | ||
| README.md | ||
| requirements.txt | ||
| run.py | ||
| SPECIFICATION.md | ||
Certificate Scanning Service
A proof-of-concept FastAPI microservice that uses AI vision models (via OpenRouter) to extract structured data from certificate documents — without OCR.
Why not OCR?
Traditional OCR produces erratic results on documents with tabular layouts, merged cells, and complex formatting — exactly the kind of structure found in industrial certificates. This service takes a different approach:
- Find — An image-generation model receives a full page scan and redraws it with a blue rectangle painted over the target section. The rectangle is detected with HSV colour thresholding to obtain pixel coordinates.
- Read — The page is cropped to those coordinates and sent to a lightweight multimodal model that extracts key/value data as structured JSON.
Because the finder and reader are separate steps, temporal coherence can be exploited: when the document layout doesn't change between batches, the expensive finder step can be skipped entirely and previous scan coordinates reused. The reader step alone is fast and cheap, especially when the input crop is small and clear.
Architecture
PDF upload
│
▼
pdf_to_image ──► page PNGs (at configured DPI)
│
▼
┌─────────────────────────────────────────────┐
│ For each reading zone defined in the │
│ document class: │
│ │
│ 1. Letterbox page to nearest supported │
│ aspect ratio (black bars, no distort) │
│ 2. Send to FINDER model (image → image) │
│ 3. Detect blue rectangle (HSV threshold) │
│ 4. Map coordinates back to original space │
│ 5. Crop page to found region │
│ 6. Send crop to READER model (image → JSON)│
│ 7. Validate output with dynamic Pydantic │
│ model built from the zone definition │
└─────────────────────────────────────────────┘
│
▼
Structured JSON response
Project structure
app/
main.py FastAPI app, endpoints, startup seeding
config.py Settings loaded from config.json + .env
models.py Pydantic models (API schemas, document class definition)
database.py SQLAlchemy async engine + ORM tables
pipeline/
scan.py Scanning pipeline (finder + reader orchestration)
pdf_to_image.py PDF → PIL images via pdf2image
util.py Aspect ratio / resolution helpers
config.json App configuration (models, DPI, upscale factors)
document_classes.json Seed data defining document types and reading zones
run.py Uvicorn entrypoint
Setup
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
Create a .env file with your OpenRouter API key:
OPENROUTER_API_KEY=sk-or-...
Configuration
config.json controls the AI models and image processing parameters:
| Key | Description |
|---|---|
finder_model |
OpenRouter model for the finder step (image → image) |
reader_model |
OpenRouter model for the reader step (image → JSON) |
finder_upscale_factor |
Scale factor applied to images before sending to finder |
reader_upscale_factor |
Scale factor applied to crops before sending to reader |
default_dpi |
Resolution for PDF-to-image conversion |
db_path |
Path to the SQLite database file |
debug_output_dir |
Directory for debug image output |
Running
python run.py
The server starts on http://127.0.0.1:8000. On startup it creates the SQLite database and seeds document classes from document_classes.json.
API
GET /health
Returns {"status": "ok"}.
GET /document-classes
Lists all registered document classes and their reading zone definitions.
POST /read?document-class=<name>&debug=<bool>
Upload a PDF and scan it against a document class.
document-class(required) — name of a registered document class (e.g.arcelor-gent)debug(optional, defaultfalse) — whentrue, saves intermediate images (letterboxed input, finder output, reader crop) todebug/scan-<id>/- Body — multipart file upload (
filefield)
Returns structured JSON with extracted data per page and zone.
Document classes
A document class defines what to look for and read on each page. See document_classes.json for the seed format. Each class contains:
reading_zones— sections to locate on the page, each with:finder_prompt— natural language description of the target sectionread_lines— keys to extract, each with aprompt_snippetdescribing how to find the value and an expectedtype
Roadmap
This is a proof of concept. Planned improvements:
- Temporal coherence — skip the finder when layout hasn't changed; re-find only when reader confidence drops
- Confidence scoring — track reading reliability across scans
- Batch processing — handle multi-document uploads
- Pluggable models — easy switching between AI providers and models