Quickstart - Mint Starter Kit

CLI

Install the CLI

Install the lit command globally:

npm install -g @llamaindex/liteparse

Verify the install:

lit --version

Parse your first document

Pass any PDF (or supported document format) to lit parse:

lit parse document.pdf

Output is printed to stdout as plain text with the spatial layout preserved. To save it to a file:

lit parse document.pdf -o output.txt

Pipe a remote PDF directly without downloading it first:

curl -sL https://example.com/report.pdf | lit parse -

Try JSON output

Use --format json to get structured output with per-page text items and bounding boxes:

lit parse document.pdf --format json -o output.json

Library

Install the package

Add LiteParse as a dependency:

npm install @llamaindex/liteparse

Parse a file

Import LiteParse and call parse() with a file path:

import { LiteParse } from '@llamaindex/liteparse';

const parser = new LiteParse({ ocrEnabled: true });
const result = await parser.parse('document.pdf');
console.log(result.text);

result.text contains the full document text with spatial layout preserved across all pages. Per-page data is available in result.pages.

Parse from a Buffer or Uint8Array

You can pass raw bytes instead of a file path. PDF bytes go straight to the parser with zero disk I/O; non-PDF bytes are written to a temp file for format conversion.

From disk
From HTTP

import { LiteParse } from '@llamaindex/liteparse';
import { readFile } from 'fs/promises';

const parser = new LiteParse();

const pdfBytes = await readFile('document.pdf');
const result = await parser.parse(pdfBytes);
console.log(result.text);

import { LiteParse } from '@llamaindex/liteparse';

const parser = new LiteParse();

const response = await fetch('https://example.com/document.pdf');
const buffer = Buffer.from(await response.arrayBuffer());
const result = await parser.parse(buffer);
console.log(result.text);

Get structured JSON output

Set outputFormat: 'json' to receive per-page text items with coordinates and font metadata:

import { LiteParse } from '@llamaindex/liteparse';

const parser = new LiteParse({ outputFormat: 'json' });
const result = await parser.parse('document.pdf');

for (const page of result.json.pages) {
  console.log(`Page ${page.page}: ${page.textItems.length} text items`);
  for (const item of page.textItems) {
    console.log(`  "${item.text}" at (${item.x}, ${item.y})`);
  }
}

Each textItem includes text, x, y, width, height, fontName, and fontSize.

OCR is enabled by default using the built-in Tesseract.js engine — no setup required. On the first run, Tesseract downloads language data from the internet. For offline use, set TESSDATA_PREFIX to a directory containing pre-downloaded .traineddata files.

Next steps

Library usage

Explore the full LiteParse API: configuration options, OCR setup, screenshot generation, and more.

CLI reference

Full reference for lit parse, lit batch-parse, and lit screenshot commands and all their flags.

Documentation Index

​CLI

​Library

​Next steps

Library usage

CLI reference

CLI

Library

Next steps