Troubleshooting
Common problems and how to fix them — failed uploads, missing pages, OCR errors, and slow processing.
Last updated
What if Handwriting OCR misidentifies the language of my document?
Handwriting OCR auto-detects the language of each document. Most of the time this works invisibly, but a few situations can throw it off:
- Mixed-language pages — a French document with English place names, or notes that switch between two languages mid-page, can pull the detector toward the wrong language.
- Very short content — single words or fragments don't give the detector enough to work with.
- Languages that share characters — Spanish and Portuguese, or Norwegian and Danish, can be confused on short passages.
- Stylised cursive — older or unusual handwriting styles can register as a different language entirely.
What to do
- Set the expected language when uploading. Both the dashboard upload flow and the API support specifying a language explicitly — when set, this overrides auto-detection.
- Split mixed-language documents by language section if you can — separate uploads give cleaner results than mixed pages.
- Re-upload with a clearer scan if the source quality is borderline; better contrast helps the detector.
If you're consistently seeing misidentification on a specific document type, send us a sample and we can take a look.
Why am I getting a blank or empty output file?
Blank outputs occur when scans are extremely faint, contain no legible text, or the PDF uses an unsupported encoding with no extractable image layer. Re-exporting the PDF as images or rescanning the page usually resolves the issue.
Why am I getting an "invalid field name" error in my custom extractor?
Custom extractor field names must use letters, numbers, and underscores only — anything else triggers a validation error.
Common causes
- Spaces —
Customer Namefails; usecustomer_nameorCustomerName. - Hyphens or punctuation —
date-of-birthanddate.of.birthboth fail; usedate_of_birth. - Starting with a number — some integrations require the first character to be a letter; if so, prefix with a word like
field_1rather than1. - Non-ASCII characters — accented letters and other Unicode characters are rejected.
Fixing it
Open the extractor, rename the offending field, and save. The change is non-destructive — past extractions are unaffected, but future runs will use the new name.
If you need a human-readable label in your downstream system, keep the technical field name simple (customer_name) and map it to the display name in your own application.
Why are some pages in my PDF not showing in the output?
This usually happens when certain PDF pages are corrupted, contain incompatible encodings, or store content as non-image objects. Re-exporting the PDF from your scanner or splitting it into smaller files typically resolves the issue.
Why did my document fail to upload?
Upload failures usually occur when the file exceeds the 20 MB size limit, is corrupted, or is saved in an unsupported format. Re-exporting the PDF or image, reducing resolution slightly, or splitting very large documents normally resolves the issue. If uploads consistently fail, try a different browser or test a smaller representative sample.
Why did the system process my pages out of order?
The system preserves the original page order exactly as provided. If the output appears out of order, the source PDF was likely misarranged or contained hidden blank pages. Re-exporting or reordering pages in a PDF editor typically fixes the issue.
Why do I get poor results on handwritten forms?
Forms often mix printed text, boxes, labels, and handwritten fields, which can confuse standard OCR. Inconsistent handwriting across different fields also affects accuracy. For forms, Custom Extractors provide far more predictable results by mapping each field explicitly.
Why do my results show missing or incomplete text?
Missing text almost always comes from low-quality scans, faint ink, shadows, skewed pages, or handwriting that is very difficult to interpret. Mixed languages, overlapping text, and pencil markings can also reduce detection. Try rescanning at 300 DPI with strong contrast or upload a better-quality sample using your trial credits to confirm expected performance.
Why does my browser freeze or behave oddly when uploading files?
Some browsers—especially Safari—struggle with large uploads or multi-file selections. If you encounter freezing or unexpected behavior, switch to Chrome, Firefox, or Edge and try again. This resolves most upload interface issues.
Why does table extraction sometimes fail to detect a table?
Hand-drawn borders, uneven grid lines, faint pencil marks, shadows, and inconsistent cell shapes can prevent the system from detecting a table. Higher-resolution scans with clearer contrast usually help. You can test a clean sample using your trial credits to check performance on your layout.
Why does table extraction sometimes miss rows or columns?
Table extraction works best when tables have clear visible structure. Several common factors can cause rows or columns to be missed or split incorrectly.
Source-side factors
- No visible borders — borderless tables are inferred from spacing alone; if column gaps are inconsistent, the model may merge or split columns.
- Merged or split cells — heavily merged headers or vertical writing across cells can confuse alignment.
- Inconsistent row heights — multi-line entries that wrap unevenly may be read as multiple rows.
- Faded or low-contrast scans — light borders may not register, leaving the model to guess at structure.
- Skewed or photographed pages — perspective distortion misaligns the rows and columns the model sees.
What to try first
- Re-scan with stronger contrast at 300 DPI or higher, with the page flat and squared.
- Add visible borders in the source if you control the form template.
- Check the JSON output — the structured output sometimes preserves cells that flatten poorly into the XLSX export.
When to use a custom extractor instead
If you're processing the same form repeatedly — invoices, lab results, tax forms — a custom extractor usually outperforms generic table extraction. Extractors target named fields directly, sidestepping table inference entirely. They're available on Pro and Business plans.
If you're seeing consistent issues on a specific document type, send us a sample and we can advise on the right approach.
Why does the OCR output contain random or 'nonsense' words?
Random words appear when the system struggles to interpret unclear handwriting, low-contrast scans, unusual letter shapes, or mixed languages. Historical documents and stylized cursive are especially challenging. Uploading a clean representative sample using your free trial credits is the best way to understand performance on your specific handwriting style.
Why does the output contain text that wasn't in the original document?
Occasionally the AI generates text that wasn't actually present on the page. This is called AI hallucination — a known limitation of generative models, where the system confidently produces plausible-sounding content that doesn't exist in the source.
It's most likely to happen when:
- The handwriting is extremely faint, smudged, or partially missing
- The page contains very low information density (mostly blank with a few words)
- The layout is unusual enough that the model has to "guess" at structure
How to work around it
- Use the plain-text export instead of the AI-enhanced format. The plain-text output is closer to a raw transcription and far less likely to include invented content.
- Compare side-by-side before trusting a result for an important document — Handwriting OCR shows the source image alongside the transcription so you can verify each section.
- Re-scan if possible — better source quality (300 DPI minimum, good contrast, flat page) reduces the model's need to fill in gaps.
If you've encountered a specific case where the output is clearly hallucinated, send it to us — investigations on real examples help us tune the model.
Why does the system misread numbers or certain characters?
Some handwriting styles make characters look nearly identical—such as 1 and 7, 0 and O, 5 and S, or looping cursive letters. This affects both humans and OCR. Higher DPI scans, darker ink, and more consistent handwriting improve character accuracy. Testing a clean sample using your trial credits will give you a realistic baseline.
Why is the formatting of my output different from my original document?
Handwriting OCR is designed to extract text, not reproduce layout. Irregular spacing, annotations, columns, and freeform handwritten structures are simplified in the output. If you need structured results such as tables or keyed fields, use a Custom Extractor or the table extraction option.
Why is the system slow or taking longer than usual?
Temporary slowdowns can happen during peak load or when processing pages with very complex content. Most documents still finish within seconds. If a document seems stuck, refresh the page or check your connection—processing continues even if you navigate away.
Why is the transcription inaccurate?
Poor input quality is the number one cause of inaccurate results. If the scan is blurry, low-resolution, skewed, faint, noisy, shadowed, or taken at an angle, the system can't magically recover detail that isn’t visible in the image. Even humans struggle with unclear handwriting. To get reliable output, the document must be clear: scan at 300 DPI or higher, ensure strong contrast, flatten the page, and avoid shadows or reflections. If you want to confirm performance for your specific documents, upload a clean representative sample using your free trial credits.
Why isn't editing working in my browser?
The dashboard editor is a rich-text web component, and a few browser combinations can stop it working as expected.
Recommended browsers
- Chrome (current version) — fully supported
- Edge, Brave, Arc — Chromium-based, fully supported
- Firefox (current version) — fully supported
- Safari — supported on Safari 16+ on macOS / iPadOS; older Safari versions can have issues with inline editing and clipboard behaviour
Quick fixes to try
- Update your browser — most editing issues we see are on outdated versions.
- Disable browser extensions that modify pages — content blockers, password managers, and grammar plugins can intercept editing events.
- Try an incognito / private window — a clean profile rules out extensions and cached state.
- Try a different browser — Chrome on Mac or Windows is the safest fallback.
Mobile
The dashboard works on mobile browsers, but for heavy editing we recommend a desktop browser — small screens and touch keyboards make precise edits difficult.
If the editor still doesn't work after the above, contact us with your browser name and version and we'll investigate.