Open Source · CPU-Only · No AI Calls

Cut your LLM token bill
by 17–46% — no model changes,
no code rewrite

TextCompressor compresses prompts automatically as a local proxy or hosted API. Works with OpenAI, Anthropic, Ollama, LM Studio, and any OpenAI-compatible client.

Start Free View on GitHub

Benchmark Results

Real savings on real prompts

4,821

Original tokens

→

3,940

Compressed tokens

18.3% saved — typical OpenClaw session

Tested across 11,760 benchmark questions on War & Peace, legal (FAR), financial (SEC 10-K), and technical (RFC 7231) documents. Light compression saves ~17% with only ~2.7pp accuracy penalty. See full benchmark report →

How It Works

Three steps, zero friction

Point your LLM client at localhost:8080

Start the TextCompressor proxy. Set your app's base URL — no other code changes needed.

            textcompressor-proxy --level light --port 8080

            # Then set: OPENAI_BASE_URL=http://localhost:8080

TextCompressor strips filler words from every prompt

Stop words, filler phrases, and redundant tokens are removed deterministically. Nouns, verbs, and meaning are preserved. Legal domain mode protects regulatory language.

Your LLM gets a tighter prompt — same answers, fewer tokens

Every request is forwarded to the real API with the savings reported in response headers: X-TC-Token-Reduction-Pct, X-TC-Tokens-Saved.

Pricing

Start free, scale when you need to

Free

100 requests / month

All 3 domains (general, legal, technical)
Light & medium compression
Local proxy (unlimited, offline)
Aggressive compression
Priority support

Install the local proxy

🍎

macOS

macOS 12+, Apple Silicon & Intel

Download .zip

🪟

Windows

Windows 10/11, x64

Download .exe

🐧

Linux

Ubuntu, Debian, Arch, Fedora

Download .tar.gz

Or install via pip: pip install textcompressor

FAQ

Common questions

Does it change the meaning of my prompts?

No. TextCompressor removes stop words and filler phrases only — words like "the", "a", "however", "please note that". Nouns, verbs, numbers, and domain-specific terms are untouched. Your LLM sees a denser version of the same message.

Does my data leave my machine?

Not with the local app. The compression runs entirely on your CPU — no external calls. Only the final compressed prompt is forwarded to your LLM API (OpenAI, Ollama, etc.), the same as without TextCompressor.

Will it work with my LLM?

If it has an OpenAI-compatible API, yes. Tested with GPT-4o, Claude, Ollama, LM Studio, and OpenClaw. Any client that respects OPENAI_BASE_URL works with zero code changes.

What's the accuracy impact?

Light compression (~17% token savings) shows only ~2.7pp accuracy penalty across 11,760 benchmark tests — and on legal/regulatory text, GPT-4o actually performs slightly better with light compression than without. Medium saves ~34% at ~5.1pp. We recommend starting with light.

Is it open source?

The source is available on GitHub for review. The hosted API and paid plans fund ongoing development.

Cut your LLM token billby 17–46% — no model changes,no code rewrite