Extractor
The Extractor API specifications and task parameters, supporting optical character recognition (OCR) from images or font detection and text extraction from PPTX presentations.
API Endpoints
File extraction tasks are processed asynchronously. Use the two endpoints below to create and retrieve tasks. Click "Try it" on the right to open the interactive playground.
/tools/tasks
Creates an asynchronous extraction task. Pass the corresponding extractor tool identifier in type.
/tools/tasks/:id
Queries the execution status of an extraction task and retrieves the final output download link via result.url.
Request Headers
| Header | Type | Required | Description |
|---|---|---|---|
Authorization |
String |
Yes | Bearer <YOUR_API_KEY> — API Key. |
Content-Type |
String |
Yes (POST) | Must be multipart/form-data. |
POST Request Body Parameters (Form Data)
| Parameter | Type | Required | Description |
|---|---|---|---|
files |
File |
Yes | The source file to be processed. Supports uploading multiple files. |
type |
String |
Yes | Task type identifier. For example: image.ocr. |
params |
String (JSON) |
No | JSON string of tool-specific parameters. Defaults to empty JSON string "{}". |
notifyURL |
String |
No | Webhook callback URL to receive task status update notifications. |
Task Parameter Specifications
Image OCR
Extracts multilingual text from images. Supports image/* format, with a file size limit of 50 MB. The structure of params is as follows:
{
"language": "zh-hans" // string. Recognition language. Options: "zh-hans" (Simplified Chinese) | "zh-hant" (Traditional Chinese) | "en" (English) | "ja" (Japanese) | "ko" (Korean) | "fr" (French) | "de" (German) | "es" (Spanish)
}
PPTX Font Finder
Scans and lists all font families referenced in a presentation slides. Supports .pptx format, with a file size limit of 300 MB. params is an empty object {}.
PPTX Text Extractor
Extracts text shapes from all slide layers, supporting custom structural filtering. Supports .pptx format, with a file size limit of 300 MB. The structure of params is as follows:
{
"hasChildren": false, // boolean. Whether to include child layers (recursively extract text from child shapes)
"pickPages": [1, 2, 5], // array. Optional. List of slide page numbers to extract (1-based index)
"imageFilter": 0.2, // number. Optional. Filter elements based on area ratio relative to slide (0~1)
"includeLayout": false, // boolean. Whether to extract text from layout slides
"includeMaster": false, // boolean. Whether to extract text from master slides
"includeNotes": false, // boolean. Whether to extract text from notes slides
"ignoreEmptyText": true, // boolean. Whether to filter out empty text boxes
"onlyBrief": true // boolean. Whether to return a simplified summary structure (reduces payload size)
}