Home
About
Contact
All projects
AI OCR

Murabau Kft. - AI Document Processing

Invoices and completion certificates processed in seconds

Automatic processing of construction industry documents with Mistral AI OCR. Upload, recognition, filing. What used to take hours is now done in minutes.

95%+
recognition accuracy
< 1 min
processing time
0
manual data entry

The challenge#

Murabau Kft. is a regional construction company specializing in the construction of residential buildings and industrial facilities. In a typical project, they work with 8-12 subcontractors simultaneously — electricians, tilers, painters, mechanical crews, crane operators, earthwork teams — each using their own invoicing system. On top of that come the suppliers (building material retailers, equipment rental companies, transport firms), who also send documents on a monthly basis. In total, the company handles 60-80 incoming documents per month: invoices, completion certificates, delivery notes, and partial invoice attachments.

The document formats are extremely varied. Some generate PDFs from Billingo, others use Szamlazz.hu, and there are subcontractors who send hand-filled, scanned invoices as email attachments. Completion certificates are even more unpredictable: they're often prepared in Word or by hand, their format differs by project and subcontractor, and they frequently include detailed itemized work logs.

Processing was previously entirely manual. The finance staff member would open the document, locate the data — issuer name, tax ID, line items, unit prices, total amount, payment deadline — then type it into an Excel spreadsheet. An average invoice took 5-10 minutes to process, but more complex completion certificates could take up to 15-20 minutes, because work phases, completion percentages, and previous partial invoices also needed to be reconciled.

The real problem wasn't just the time, but the errors. Mistyped tax IDs, misread amounts, documents filed in the wrong category, mixed-up partial invoice numbers. These errors later caused serious headaches in accounting: they surfaced during tax filings or a tax authority audit — at the worst possible moment. Construction industry documents are also particularly complex: specialized terminology, itemized work logs, a complicated hierarchy of partial and final invoices, and payment conditions tied to completion certification all increase the chance of error.

iWhat is AI OCR?

OCR (Optical Character Recognition) is the technology of text recognition: the machine "reads" the document and converts it to text. Traditional OCR works well for simple documents, but frequently makes errors with construction industry invoices.

AI OCR goes a step further: it doesn't just recognize text, it understands context. It knows that the number after "Total" is the amount, the date after "Deadline" is the due date, and the name after "Issuer" is the company name. We used Mistral AI because it handles Hungarian-language documents exceptionally well.

Why this solution?#

We evaluated three options together with Murabau, testing each on a sample set of 20 documents containing mixed-format invoices and completion certificates:

  1. Traditional OCR software — Cheap, but due to the varied formats of construction documents, the differing quality of scanned images, and Hungarian technical terminology, it only achieved 60-70% accuracy. Every third document had to be corrected manually, which barely reduced the workload. Not good enough.
  2. Major cloud OCR service (Google Document AI, AWS Textract) — Technologically accurate, but the per-processing cost scaled up quickly at high volumes. Hungarian language support was also limited: it performed particularly poorly with construction industry terms — formwork, curing time, reinforcement quantities — which was below what we needed.
  3. Mistral AI + n8n — The best value: Mistral understands Hungarian excellently, and with n8n the processing logic is flexibly customizable. The Mistral model can handle PDFs natively without prior image conversion and outputs structured JSON that the n8n workflow can process immediately.

We chose the third option because it not only recognized general fields (amount, date, partner) accurately, but we could also teach the system to recognize and categorize construction-specific line items — concreting, formwork, insulation, reinforcement. Mistral API's per-document cost was also a fraction of the other solutions, making the ROI clear.

The solution in detail#

1

Document upload — simple web interface

The user uploads documents through a clean, simple web form. The form contains just two elements: a file picker and an "Upload" button — nothing unnecessary, nothing to confuse the user. The system accepts PDFs, scanned images (JPG, PNG), and files copied directly from email attachments. Files don't need to be pre-renamed or converted; the system works with any filename.

Bulk upload is also possible: at month's end, you can drag and drop 20-30 documents at once, and the system processes them all sequentially and automatically. A progress indicator shows which document the system is currently working on, so the user can follow the status in real time. The entire upload process takes 10-15 seconds per document on the human side.

2

Mistral AI processing — context-based recognition

After upload, the document goes to the Mistral AI OCR model, which doesn't simply read characters — it understands the document's logical structure. The model visually analyzes the PDF, identifies the header, line item section, and summary block, then extracts the structured data:

  • The issuer's name, address, and tax ID — regardless of how they appear in the document
  • The invoice number, issue date, and fulfillment date
  • All line items with quantity, unit price, and net value
  • The total amount (net, gross, VAT) and the applicable VAT rate
  • The payment deadline and payment method (bank transfer, cash, compensation)
  • For completion certificates, the work phase, completion percentage, and reference partial invoice number

The model has also been prepared for construction industry terminology: formwork, reinforcement, concreting, insulation, subfloor concrete, waterproofing — it recognizes these accurately and assigns them to the appropriate cost category. The output is always structured JSON, which guarantees the next automation step can work with the data without issues.

3

Automatic filing — everything falls into place

The extracted JSON data automatically enters the Airtable registry through the n8n workflow, where the system performs several logical steps. First, it categorizes the document by type: material invoice, subcontractor fee, equipment rental fee, shipping cost, or completion certificate. Then, based on the issuer's tax ID, it identifies the partner — if documents have been received from them before, it's automatically assigned; if it's a new partner, a record is created.

The system also analyzes the invoice line items and assigns costs to the appropriate project. If the system is uncertain about a piece of data — for example, it recognized the amount or tax ID with less than 80% confidence — it highlights it in yellow for review. The user then sees the flagged fields in a review view and approves or corrects the data with a single click. Experience shows this occurs in only about 5% of cases, and typically with scanned, lower-quality documents. Filed documents are immediately searchable by partner, amount, date, or project name.

Before and after#

Előtte
  • 5-10 minutes of manual data entry per document
  • Typos, wrong categories — 4-5 errors per month
  • Month-end peak: a full day of administration
  • Only one colleague knew the system
  • Paper-based filing, difficult retrieval
Utána
  • < 1 minute automatic processing
  • 95%+ accuracy, machine consistency
  • Month-end peak: 30 minutes of supervised processing
  • Anyone can use it — upload and done
  • Digital registry, instant search
murabau-docs.makeden.hu

Document upload and processing

DocumentTypeIssuerAmountStatus
INV-2026/0847Material invoiceTuzep Kft.847,200 HUF✅ Processed
CC-2026/023Completion cert.Beton Mix Kft.2,340,000 HUF✅ Processed
INV-2026/0848SubcontractorVillszer Bt.456,000 HUF⚠️ Review
INV-2026/0849Rental feeDaru Rent Kft.180,000 HUF✅ Processed

The yellow marking indicates the system is uncertain about a line item in the subcontractor invoice — it can be reviewed and approved with a single click.

Results in numbers#

MetricBeforeAfter
Processing time/document5-10 min< 1 min
Data entry errors/month4-5< 1 (with machine flagging)
Month-end administration~8 hours~30 min
Retrieval time5-15 min (folders)< 10 seconds (search)
Recognition accuracy95%+

"We used to enter invoices by hand. Now I upload the PDF and it's done. Incredible how much simpler it's become."

How to apply this in your business#

Not just for construction companies

AI OCR technology is applicable in any industry where documents need to be processed regularly. The key isn't the industry, but the pattern: if you repeatedly need to extract and record data from similarly structured documents, AI OCR almost certainly pays for itself.

Some concrete examples where our clients are already applying or planning this:

  • Accounting firms: Automatic processing and loading of incoming invoices into accounting systems — especially efficient when you handle invoices from multiple clients who all use different invoicing software
  • Logistics companies: Processing of waybills, CMRs, customs declarations — the multilingual nature of shipping documents is a particularly good example of where AI OCR outperforms traditional solutions
  • Healthcare: Digitizing and organizing lab results, discharge summaries — here accuracy is critical, and the AI flagging system ensures no data slips through unchecked
  • Retail: Automatic filing of supplier invoices, delivery notes — for webshops and wholesalers, the daily document count can easily reach 20-30

The key: it doesn't need to be perfect. If the system is accurate 95% of the time and flags the remaining 5% for review, you've already saved enormous time. The most important consideration is to take the routine data entry off your team's shoulders so your staff can spend their time on their actual work — whether that's construction management, client relations, or financial analysis.

How to get started? As a first step, gather a month's worth of documents — invoices, certificates, anything you record by hand — and count how many there are and how much time you spend on them. If you're above 10 documents per month, the ROI is practically guaranteed. If you're above 50 per month, the ROI shows up in the very first month.

If you face similar document processing challenges, book a free consultation — we'll show you what ROI to expect based on your specific case.

Tech stack#

ToolRole
n8nWorkflow automation, the engine of the processing pipeline
Mistral AIOCR and document interpretation, with Hungarian language support
AirtableCentral registry and document database
CloudflareWeb upload interface hosting
Technologies used
n8n
Mistral AI
OCR
Airtable