How AI document extraction works in BrokerTools
Insurance documents arrive in every format imaginable: Excel spreadsheets with merged cells and hidden rows, PDFs with tables that do not line up, CSVs with inconsistent date formats, scanned policy wordings, and plain-text emails with data pasted in.
Every BrokerTools app — Blotter, Runner, Bordereaux, and Specimen — uses the same approach: traditional document parsing combined with AI-powered extraction. The first step is parsing: Excel files are read with their full structure (sheets, merged cells, formulas), PDFs are converted to text (or OCR for scanned documents), and CSVs are normalised.
The parsed content is then sent to Claude on AWS Bedrock for extraction. Depending on the app, the AI identifies claims data, submission fields, bordereaux columns, or policy wording sections — even when headers are non-standard or data is spread across multiple tables on the same sheet.
Each extraction includes a confidence assessment. The AI flags cases where it is uncertain — ambiguous date formats (is 03/04/2025 March 4th or April 3rd?), unclear currency, or duplicate references that might represent the same record reported differently.
We take the compliance requirements of London market brokers seriously. All processing happens in real-time via encrypted API calls to AWS Bedrock in the EU (Frankfurt). Documents are never stored — they are processed in memory and discarded. Bedrock does not log prompts or outputs, and your data is never used for model training. No document content leaves the EU.
The cost of each extraction depends on the document size and complexity, typically ranging from 2p for a simple spreadsheet to £1 for a large scanned PDF. You see the exact cost before and after each operation.