API documentation

The Handwriting OCR API provides a simple, reliable way to extract text and data from documents and images. Using state-of-the-art OCR technology, it can process handwritten text, printed documents, and structured data like tables. The API is RESTful, uses JSON for response data, and requires authentication via API tokens.

Key Features

Handwriting recognition and text extraction
Table structure detection and data extraction
Support for PDF and common image formats (JPG, PNG, TIFF, etc.)
Multiple export formats (TXT, DOCX, JSON, CSV, XLSX)

Basic Process

Upload Document Start by uploading your document with a specified action:
- transcribe: Extract all text from the document
- tables: Extract data from tables
- extractor: Extract structured data using a Custom Extractor.
Check Status After upload, your document enters the processing queue. Check its status using the document ID returned in step 1.
Download Results Once processing is complete, download the results in your preferred format:
- Transcription: TXT, DOCX, or JSON
- Tables: XLSX or JSON
- Extractor: XLSX, CSV, or JSON.

Getting Started

Create an account at handwritingocr.com
Generate an API token in the dashboard
For Custom Extractors, create and test an Extractor first.
Test the API with a sample document
Monitor results in the dashboard

Authentication

The API uses token-based authentication. Each request must include a valid API token in the Authorization header. API tokens provide full access to the document management API for a specific user account.

Tokens are generated through the web interface at https://www.handwritingocr.com/settings/api
Tokens never expire but can be revoked or replaced at any time.
Multiple active tokens are not supported
Token permissions cannot be customized - each token has full access to all API endpoints

Authentication Header

Include your API token in all requests using the Bearer authentication scheme:

Authorization: Bearer your-api-token

Webhooks

Webhooks provide a more efficient and real-time alternative by automatically delivering the processed result in JSON format to a specified URL as soon as the document is ready, saving you bandwidth and reducing latency. You can set your webhook URL through the user dashboard, or for each request in the Upload Document endpoint documented above.

All webhooks are protected by HMAC-SHA256 signature verification. Each webhook request includes an X-Signature header containing a cryptographic signature that proves the request came from our servers and hasn't been modified.

Go to the user dashboard to generate your webhook secret, or set a global webhook URL.

Rate Limits

To ensure fair usage, protect our services from abuse, and maintain high availability for all users, our API enforces rate limits. Familiarizing yourself with these limits will help you build robust and efficient integrations.

Standard Rate Limit

Limit: We enforce a global rate limit of 2 requests per second (RPS).
Scope: This limit is applied at the account level and is shared across all API endpoints. It is not a per-endpoint limit.

Detecting Rate Limits

When your application exceeds the rate limit, the API will respond with an:

HTTP Status Code: 429 Too Many Requests

Rate Limit Headers

To help you manage your request volume and anticipate when limits might be reached, the API includes the following headers in its responses:

X-RateLimit-Limit: The maximum number of requests allowed within the current time window.
X-RateLimit-Remaining: The number of requests remaining in the current time window.
Retry-After: Sent with a 429 Too Many Requests response, this header indicates the number of seconds your application should wait before attempting another request. It is crucial to respect this header to allow your connection to recover.

Best Practices for Managing Rate Limits

To operate efficiently within these limits, especially when processing multiple documents, we recommend the following best practices:

Use the document list endpoint: For tasks like checking the status of multiple documents, utilize batch or list endpoints (such as a document list endpoint if available). This allows you to retrieve the status of many items in a single API call instead of polling each one individually.
Process sequentially based on status: Only attempt to retrieve full results for a document (e.g., download) once its status has changed to "processed" (or your equivalent terminal status).
Implement exponential backoff: When you receive a 429 status code, use the Retry-After header value to pause before retrying. If Retry-After is not present, or as a general error handling strategy, implement an exponential backoff mechanism for retries. This helps reduce pressure on the API during busy periods.
Cache responses: Cache responses from the API where appropriate to avoid requesting the same data repeatedly.
Utilize Webhooks: By setting up a webhook, our service will proactively send your results to your specified URL as soon as they are ready. This eliminates the need for you to poll for status updates or use the API to download your results, significantly reducing your API call volume. You can set a webhook in your user dashboard's settings page.

Increasing Rate Limits

For users with consistently higher throughput requirements, we offer increased rate limits for Enterprise subscribers. These limits are determined on a case-by-case basis by negotiation.

If you anticipate needing a higher rate limit than the standard offering, please contact our support team to discuss your specific needs.

Support

For technical support or questions, contact support@handwritingocr.com

List documents

Retrieves a paginated list of documents belonging to the authenticated user. Documents are sorted by creation date in descending order.

Endpoint

GET https://www.handwritingocr.com/api/v3/documents

Headers

Key	Value	Required	Notes
Authorization	Bearer your-api-token	Yes
Accept	application/json	Yes

Name	Type	Required	Notes
per_page	integer	No	Number of items per page. Default is 50. Maximum 200.
page	integer	No	The page number for pagination. Defaults to 1.
action	string	No	Filter results by action. Options are `transcribe`, `tables`, `extractor`.
status	string	No	Filter results by status. Options are `new`, `processing`, `processed`, `failed`.

Response Codes

Code	Explanation
200	Success - Returns list of documents.
401	Unauthorized - Invalid or missing API token.
422	Validation Error - Invalid parameters.

Request

 1curl -X GET "https://www.handwritingocr.com/api/v3/documents?page=1&per_page=100" \
 2     -H "Authorization: Bearer your-api-token" \
 3     -H "Accept: application/json"

Response

 1{
 2    "documents": [
 3        {
 4            "id": "vD9ldm3D9p",
 5            "file_name": "example-document.pdf",
 6            "action": "transcribe",
 7            "page_count": 1,
 8            "status": "processed",
 9            "automatically_deleted_at": "2025-03-14 12:12:22",
10            "created_at": "2025-02-28T12:12:22.000000Z",
11            "updated_at": "2025-02-28T12:12:37.000000Z"
12        },
13        {
14            "id": "1D8BAMl69J",
15            "file_name": "invoice-document.pdf",
16            "action": "tables",
17            "page_count": 1,
18            "status": "failed",
19            "automatically_deleted_at": "2025-03-14 12:11:22",
20            "created_at": "2025-02-28T12:11:22.000000Z",
21            "updated_at": "2025-02-28T12:11:46.000000Z"
22        },
23        {
24            "id": "NV8OOawW87",
25            "file_name": "extraction-document.jpg",
26            "action": "extractor",
27            "page_count": 1,
28            "status": "processing",
29            "automatically_deleted_at": "2025-03-14 12:08:03",
30            "created_at": "2025-02-28T12:08:03.000000Z",
31            "updated_at": "2025-02-28T12:08:24.000000Z"
32        }
33    ],
34    "current_page": 1,
35    "per_page": 50,
36    "total": 3,
37    "last_page": 1,
38    "next_page_url": null,
39    "prev_page_url": null,
40    "from": 1,
41    "to": 3
42}

Upload document

Upload a new document for processing. Supports PDF files and various image formats. The API will automatically check the page count of the submitted document against your credit balance before queueing for processing.

Endpoint

POST https://www.handwritingocr.com/api/v3/documents

Headers

Key	Value	Required
Authorization	Bearer your-api-token	Yes
Accept	application/json	Yes
Content-Type	multipart/form-data	Yes

Name	Type	Required	Notes
action	string	Yes	Filter results by action. Options are `transcribe`, `tables`, `extractor`.
file	file	Yes	The document to process. Valid file types are PDF, JPG, PNG, TIFF, HEIC, GIF. Maximum file size is 20MB.
delete_after	integer	No	Seconds until auto-deletion. Overrides the auto-deletion period set in your user settings. Minimum is 300 seconds. Maximum is 1209600 seconds (14 days).
extractor_id	string	No	A 10-character alphanumeric string e.g. Ks08XVPyMd. Create and test an extractor in the dashboard to get the extractor ID. Required when `action` is `extractor`.
webhook_url	string	No	A webhook URL to send the results for this request. Overrides your global webhook URL.

Response Codes

Code	Explanation
201	Success - Document created and queued for processing
400	Bad Request - Missing required fields.
401	Unauthorized - Invalid or missing API token.
403	Forbidden - Insufficient page credits.
415	Unsupported Media Type.
422	Validation Error - Invalid parameters.
429	Too many requests - Rate limited.
500	Server Error - File storage or processing failed.

Request

 1curl -X POST "https://www.handwritingocr.com/api/v3/documents" \
 2     -H "Authorization: Bearer your-api-token" \
 3     -H "Accept: application/json" \
 4     -F "file=@/path/to/document.pdf" \
 5     -F "action=transcribe" \
 6     -F "delete_after=604800"

Response

 1{
 2    "id": "abc123",
 3    "status": "queued"
 4}

Download result

Retrieve the status of a document or download the processed results. The format extension is optional - if not provided, returns a JSON response. If the format extension is provided, downloads the processed document in the specified format.

Image thumbnail URLs are provided for each page. These images must be authenticated with your API token to download.

Webhooks

We strongly encourage using a webhook instead of polling this endpoint repeatedly. See above for more details about webhooks and how to use them with the Handwriting OCR API.

Endpoint

GET https://www.handwritingocr.com/api/v3/documents/{id}[.{format}]

Headers

Key	Value	Required	Notes
Authorization	Bearer your-api-token	Yes
Accept	application/json	Yes

Path Parameters

Name	Type	Required	Notes
id	string	Yes	The document's unique identifier, example `abcde12345`.
format	string	No	Output format. Varies by action: valid values are `txt`, `docx`, `xlsx`, `csv`, and `json`.

Response Codes

Code	Explanation
200	Success - Returns list of documents.
202	Accepted - Document is still being processed.
400	Bad Request - Invalid format for action type.
401	Unauthorized - Invalid or missing API token.
403	Forbidden - No permission to access document.
404	Not found - Document not found.
429	Too many requests - Rate limited.
500	Server Error - Error preparing file for download.

Request

 1curl -X GET "https://www.handwritingocr.com/api/v3/documents/abc123.txt" \
 2     -H "Authorization: Bearer your-api-token" \
 3     -H "Accept: application/json" \
 4     --output document.txt

Response

 1{
 2    "id": "3486EvMD9p",
 3    "file_name": "page-1.jpg",
 4    "action": "transcribe",
 5    "page_count": 2,
 6    "status": "processed",
 7    "results": [
 8        {
 9            "page_number": 1,
10            "transcript": "Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua."
11        },
12        {
13            "page_number": 2,
14            "transcript": "Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat."
15        }
16    ],
17    "thumbnails": [
18        {
19            "page_number": 1,
20            "url": "https://www.handwritingocr.com/api/v3/document/3486EvMD9p/image-1.jpg"
21        },
22        {
23            "page_number": 2,
24            "url": "https://www.handwritingocr.com/api/v3/document/3486EvMD9p/image-2.jpg"
25        }
26    ],
27    "automatically_deleted_at": "2025-03-05 19:47:44",
28    "created_at": "2025-02-19T19:47:44.000000Z",
29    "updated_at": "2025-02-21T03:05:42.000000Z"
30}

Delete document

Permanently delete a document and its associated files. This action cannot be undone.

Endpoint

DELETE https://www.handwritingocr.com/api/v3/documents/{id}

Headers

Key	Value	Required	Notes
Authorization	Bearer your-api-token	Yes
Accept	application/json	Yes

Path Parameters

Name	Type	Required	Notes
id	string	Yes	The document's unique identifier.

Response Codes

Code	Explanation
204	Success - Document deleted.
401	Unauthorized - Invalid or missing API token.
403	Forbidden - No permission to delete document.
404	Not Found - Document not found.
500	Server Error - Error deleting document.

Request

 1curl -X DELETE "https://www.handwritingocr.com/api/v3/documents/abc123" \
 2     -H "Authorization: Bearer your-api-token" \
 3     -H "Accept: application/json"