Extractor

The Extractor API specifications and task parameters, supporting optical character recognition (OCR) from images or font detection and text extraction from PPTX presentations.

API Endpoints

File extraction tasks are processed asynchronously. Use the two endpoints below to create and retrieve tasks. Click "Try it" on the right to open the interactive playground.

POST /tools/tasks

Creates an asynchronous extraction task. Pass the corresponding extractor tool identifier in type.

GET /tools/tasks/:id

Queries the execution status of an extraction task and retrieves the final output download link via result.url.

Request Headers

Header	Type	Required	Description
`Authorization`	`String`	Yes	`Bearer <YOUR_API_KEY>` — API Key.
`Content-Type`	`String`	Yes (POST)	Must be `multipart/form-data`.

POST Request Body Parameters (Form Data)

Parameter	Type	Required	Description
`files`	`File`	Yes	The source file to be processed. Supports uploading multiple files.
`type`	`String`	Yes	Task type identifier. For example: `image.ocr`.
`params`	`String (JSON)`	No	JSON string of tool-specific parameters. Defaults to empty JSON string `"{}"`.
`notifyURL`	`String`	No	Webhook callback URL to receive task status update notifications.

Task Parameter Specifications

Image OCR

Extracts multilingual text from images. Supports image/* format, with a file size limit of 50 MB. The structure of params is as follows:

{
  "language": "zh-hans" // string. Recognition language. Options: "zh-hans" (Simplified Chinese) | "zh-hant" (Traditional Chinese) | "en" (English) | "ja" (Japanese) | "ko" (Korean) | "fr" (French) | "de" (German) | "es" (Spanish)
}

PPTX Font Finder

Scans and lists all font families referenced in a presentation slides. Supports .pptx format, with a file size limit of 300 MB. params is an empty object {}.

PPTX Text Extractor

Extracts text shapes from all slide layers, supporting custom structural filtering. Supports .pptx format, with a file size limit of 300 MB. The structure of params is as follows:

{
  "hasChildren": false,       // boolean. Whether to include child layers (recursively extract text from child shapes)
  "pickPages": [1, 2, 5],     // array. Optional. List of slide page numbers to extract (1-based index)
  "imageFilter": 0.2,         // number. Optional. Filter elements based on area ratio relative to slide (0~1)
  "includeLayout": false,     // boolean. Whether to extract text from layout slides
  "includeMaster": false,     // boolean. Whether to extract text from master slides
  "includeNotes": false,      // boolean. Whether to extract text from notes slides
  "ignoreEmptyText": true,    // boolean. Whether to filter out empty text boxes
  "onlyBrief": true           // boolean. Whether to return a simplified summary structure (reduces payload size)
}

Headers Authentication

Authorization string required

Access token passed as Bearer token. Enter your key here:

Body Parameters multipart/form-data

files file required

Select the source file to be processed.

No file chosen

type select required

Extractor tool identifier.

notifyURL string optional

Webhook callback URL to receive task status update notifications.

params json string

Custom parameters. Must be a JSON string.

Path Parameters URL path

id string required

Task UUID.

File Converter Merger & Splitter