Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/run-llama/liteparse/llms.txt

Use this file to discover all available pages before exploring further.

LiteParse extracts text from PDFs, Office documents, and images with precise spatial layout and bounding boxes. It includes built-in Tesseract.js OCR, supports pluggable HTTP OCR servers, and generates high-quality page screenshots for LLM agents — all without sending data to the cloud.

Quick Start

Install LiteParse and parse your first document in under 2 minutes.

Library Usage

Use LiteParse as a Node.js library in your application.

CLI Reference

Explore all CLI commands: parse, batch-parse, and screenshot.

API Reference

Full TypeScript API — the LiteParse class, config options, and types.

Key features

Spatial text extraction

Preserves text layout with precise bounding boxes using PDF.js — ideal for structured documents.

Built-in OCR

Tesseract.js is included out of the box. No setup required for scanned documents.

Pluggable OCR servers

Connect EasyOCR, PaddleOCR, or any custom OCR server via a simple HTTP API.

Multi-format input

Automatically converts DOCX, XLSX, PPTX, and images to PDF before parsing.

Screenshot generation

Generate high-quality page screenshots for LLM visual agents.

Runs locally

No cloud dependencies. Everything runs on your machine — Linux, macOS, or Windows.

Get started

1

Install LiteParse

Install globally via npm to use the lit CLI, or add as a library dependency.
npm install -g @llamaindex/liteparse
2

Parse a document

Run the lit parse command on any PDF, Office document, or image.
lit parse document.pdf
3

Use as a library

Import LiteParse in your Node.js project for programmatic access.
import { LiteParse } from '@llamaindex/liteparse';

const parser = new LiteParse({ ocrEnabled: true });
const result = await parser.parse('document.pdf');
console.log(result.text);
Need higher accuracy on complex documents — dense tables, multi-column layouts, or handwritten text? Try LlamaParse, the cloud-based document parser built for production pipelines.