TextCompressor Documentation
TextCompressor cuts your LLM token bill by 17–46% by compressing prompts before they reach the model — no code rewrite, no model changes, no data stored.
This reference covers the hosted API, the local proxy, all configuration options, and code examples for every major language.
Overview
TextCompressor works as a transparent proxy between your application and your LLM provider. It removes stop words, filler phrases, and redundant tokens deterministically on your CPU before forwarding the prompt to the real API.
- Hosted API — send requests to
https://api.textcompressor.unmutedlive.com/v1instead of directly to OpenAI/Anthropic. Requires a TC API key. - Local proxy — run TextCompressor on your own machine. Zero latency overhead, fully offline, unlimited use.
Quick Start
Hosted API
Point any OpenAI-compatible client at our API base URL and add your TC key as a header:
# Python + openai SDK import openai client = openai.OpenAI( base_url="https://api.textcompressor.unmutedlive.com/v1", api_key="YOUR_OPENAI_KEY", default_headers={"X-TC-Key": "tc-your-key-here"}, ) response = client.chat.completions.create( model="gpt-4o", messages=[{"role": "user", "content": "Summarize the following document..."}], ) # Token savings are in the response headers # X-TC-Tokens-Saved: 312 # X-TC-Token-Reduction-Pct: 18.3
Local proxy
# Install pip install textcompressor # Start the proxy (defaults to port 8080) textcompressor-proxy --level light --port 8080 # Then just redirect your client export OPENAI_BASE_URL=http://localhost:8080
Authentication
The hosted API requires a TC API key sent in the X-TC-Key request header. Your actual LLM provider key (OpenAI, Anthropic, etc.) goes in the standard Authorization: Bearer ... header as usual.
| Header | Value | Required |
|---|---|---|
| X-TC-Key | Your TextCompressor API key (tc-...) | Yes |
| Authorization | Bearer YOUR_OPENAI_KEY | Yes (proxied to LLM) |
| X-TC-Level | light / medium / aggressive | No (default: account setting) |
| X-TC-Domain | general / legal / technical | No (default: general) |
POST /v1/compress POST
Compress a text string and return the result without proxying to an LLM. Useful for inspecting what the compressor does before enabling it end-to-end.
Request body
| Field | Type | Description |
|---|---|---|
| text | string | The text to compress. Required. |
| level | string | light | medium | aggressive. Default: account setting. |
| domain | string | general | legal | technical. Default: general. |
| model | string | Target model (used only for savings estimate). Default: gpt-4o. |
Response
{
"original_text": "Please note that the following document...",
"compressed_text": "following document...",
"tokens_before": 847,
"tokens_after": 692,
"tokens_saved": 155,
"reduction_pct": 18.3,
"estimated_savings_usd": 0.000388,
"model": "gpt-4o",
"level": "light",
"domain": "general"
}
POST /v1/chat/completions POST
Drop-in replacement for the OpenAI /v1/chat/completions endpoint. TextCompressor compresses your messages, forwards the request to OpenAI (or any upstream URL you specify), and returns the unmodified response.
The request and response body format are identical to the OpenAI Chat Completions API. Savings are reported in response headers.
"stream": true in the request body as normal — TextCompressor will compress the prompt and stream the upstream response back transparently.
GET /v1/usage GET
Returns your account's usage totals for all time or a given date range.
# All time GET /v1/usage X-TC-Key: tc-your-key-here # Since a specific date GET /v1/usage?since=2026-04-01 X-TC-Key: tc-your-key-here
Response
{
"totals": {
"requests": 142,
"tokens_before": 982400,
"tokens_after": 802100,
"tokens_saved": 180300,
"total_savings_usd": 0.4508,
"avg_latency_ms": 6.2
},
"by_level": [...],
"by_model": [...]
}
GET /health GET
Returns {"status": "ok"}. No authentication required. Use this to verify the API is reachable.
Compression Levels
| Level | Typical Savings | Accuracy Impact | Best For |
|---|---|---|---|
| light | ~17% | ~2.7pp | General use, production workloads, legal text |
| medium | ~34% | ~5.1pp | High-volume pipelines, summaries, classification |
| aggressive | ~46% | ~9.4pp | Batch processing, embeddings, low-stakes tasks |
We recommend starting with light. In benchmark testing on legal and regulatory documents, GPT-4o accuracy was slightly higher with light compression than without — likely because the denser prompt reduces noise.
Domain Modes
Domain modes tell the compressor which vocabulary to protect. Switching domains does not change which stop words are removed — it changes what counts as a stop word.
| Domain | Protected Vocabulary | Use Case |
|---|---|---|
| general | Standard English | General chat, code, summaries |
| legal | Regulatory terms (shall, pursuant to, whereas, OFAC, FAR…) | Contracts, compliance, legal research |
| technical | RFC/spec keywords, HTTP terms, system calls | API docs, technical specs, RFC analysis |
Response Headers
Every proxied request returns these headers alongside the normal LLM response:
| Header | Value |
|---|---|
| X-TC-Tokens-Before | Token count before compression |
| X-TC-Tokens-After | Token count after compression |
| X-TC-Tokens-Saved | Tokens removed |
| X-TC-Token-Reduction-Pct | Percentage reduction (e.g. 18.3) |
| X-TC-Latency-Ms | Compression time in milliseconds |
| X-TC-Level | Compression level applied |
| X-TC-Domain | Domain mode applied |
Python Example
import openai
client = openai.OpenAI(
base_url="https://api.textcompressor.unmutedlive.com/v1",
api_key="sk-your-openai-key",
default_headers={
"X-TC-Key": "tc-your-tc-key",
"X-TC-Level": "light",
"X-TC-Domain": "general",
},
)
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": prompt}],
)
print(response.choices[0].message.content)
Node.js Example
import OpenAI from 'openai'; const client = new OpenAI({ baseURL: 'https://api.textcompressor.unmutedlive.com/v1', apiKey: 'sk-your-openai-key', defaultHeaders: { 'X-TC-Key': 'tc-your-tc-key', 'X-TC-Level': 'light', }, }); const response = await client.chat.completions.create({ model: 'gpt-4o', messages: [{ role: 'user', content: prompt }], }); console.log(response.choices[0].message.content);
cURL Example
curl https://api.textcompressor.unmutedlive.com/v1/compress \ -H "X-TC-Key: tc-your-tc-key" \ -H "Content-Type: application/json" \ -d '{"text": "Please note that the following document outlines the terms...", "level": "light"}'
Local Proxy
The local proxy runs on your machine. No data leaves your network during compression — only the final compressed prompt is forwarded to your LLM API.
# Install pip install textcompressor # Start (defaults: port 8080, level light, domain general) textcompressor-proxy # Custom options textcompressor-proxy --port 8080 --level medium --domain legal # Point your client at localhost instead of api.openai.com export OPENAI_BASE_URL=http://localhost:8080 # Works with any OpenAI-compatible client — Ollama, LM Studio, etc. export OPENAI_BASE_URL=http://localhost:8080 # TextCompressor export OLLAMA_HOST=http://localhost:11434 # Ollama upstream
Rate Limits
| Plan | Requests/month | Requests/minute |
|---|---|---|
| free | 100 | 10 |
| starter | Unlimited (1M token budget) | 60 |
| pro | Unlimited (10M token budget) | 300 |
The local proxy has no rate limits.
Error Codes
| HTTP Status | Code | Meaning |
|---|---|---|
| 401 | missing_key | No X-TC-Key header present |
| 403 | invalid_key | Key not found or inactive |
| 429 | rate_limited | Monthly request limit reached |
| 400 | bad_request | Malformed JSON or missing required fields |
| 502 | upstream_error | Upstream LLM API returned an error |
| 500 | internal_error | Internal server error — email support |
Benchmarks
Tested across 11,760 benchmark questions on four document types:
- Literary (War & Peace)
- Legal/regulatory (Federal Acquisition Regulation)
- Financial (SEC 10-K filings)
- Technical (RFC 7231)
| Level | Token Savings | Accuracy (GPT-4o) | Accuracy Delta |
|---|---|---|---|
| none (baseline) | — | 87.4% | — |
| light | 17.2% | 84.9% | −2.7pp |
| medium | 33.8% | 82.6% | −5.1pp |
| aggressive | 46.1% | 78.3% | −9.4pp |
On legal/regulatory text (FAR corpus), GPT-4o accuracy was higher with light compression (88.1%) than without (87.4%) — the denser prompt reduces noise in verbose regulatory language.