TextCompressor Documentation

TextCompressor cuts your LLM token bill by 17–46% by compressing prompts before they reach the model — no code rewrite, no model changes, no data stored.

This reference covers the hosted API, the local proxy, all configuration options, and code examples for every major language.

Overview

TextCompressor works as a transparent proxy between your application and your LLM provider. It removes stop words, filler phrases, and redundant tokens deterministically on your CPU before forwarding the prompt to the real API.

Hosted API — send requests to https://api.textcompressor.unmutedlive.com/v1 instead of directly to OpenAI/Anthropic. Requires a TC API key.
Local proxy — run TextCompressor on your own machine. Zero latency overhead, fully offline, unlimited use.

No AI calls are made during compression. The compression algorithm is deterministic CPU logic — your prompts are never sent to a third-party model.

Quick Start

Hosted API

Point any OpenAI-compatible client at our API base URL and add your TC key as a header:

# Python + openai SDK
import openai

client = openai.OpenAI(
    base_url="https://api.textcompressor.unmutedlive.com/v1",
    api_key="YOUR_OPENAI_KEY",
    default_headers={"X-TC-Key": "tc-your-key-here"},
)

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Summarize the following document..."}],
)

# Token savings are in the response headers
# X-TC-Tokens-Saved: 312
# X-TC-Token-Reduction-Pct: 18.3

Local proxy

# Install
pip install textcompressor

# Start the proxy (defaults to port 8080)
textcompressor-proxy --level light --port 8080

# Then just redirect your client
export OPENAI_BASE_URL=http://localhost:8080

Authentication

The hosted API requires a TC API key sent in the X-TC-Key request header. Your actual LLM provider key (OpenAI, Anthropic, etc.) goes in the standard Authorization: Bearer ... header as usual.

Header	Value	Required
X-TC-Key	Your TextCompressor API key (`tc-...`)	Yes
Authorization	`Bearer YOUR_OPENAI_KEY`	Yes (proxied to LLM)
X-TC-Level	`light` / `medium` / `aggressive`	No (default: account setting)
X-TC-Domain	`general` / `legal` / `technical`	No (default: `general`)

Get a free API key →

POST /v1/compress POST

Compress a text string and return the result without proxying to an LLM. Useful for inspecting what the compressor does before enabling it end-to-end.

Request body

Field	Type	Description
text	string	The text to compress. Required.
level	string	`light` \| `medium` \| `aggressive`. Default: account setting.
domain	string	`general` \| `legal` \| `technical`. Default: `general`.
model	string	Target model (used only for savings estimate). Default: `gpt-4o`.

Response

{
  "original_text":   "Please note that the following document...",
  "compressed_text": "following document...",
  "tokens_before":   847,
  "tokens_after":    692,
  "tokens_saved":    155,
  "reduction_pct":   18.3,
  "estimated_savings_usd": 0.000388,
  "model":           "gpt-4o",
  "level":           "light",
  "domain":          "general"
}

POST /v1/chat/completions POST

Drop-in replacement for the OpenAI /v1/chat/completions endpoint. TextCompressor compresses your messages, forwards the request to OpenAI (or any upstream URL you specify), and returns the unmodified response.

The request and response body format are identical to the OpenAI Chat Completions API. Savings are reported in response headers.

Streaming is supported. Pass "stream": true in the request body as normal — TextCompressor will compress the prompt and stream the upstream response back transparently.

GET /v1/usage GET

Returns your account's usage totals for all time or a given date range.

# All time
GET /v1/usage
X-TC-Key: tc-your-key-here

# Since a specific date
GET /v1/usage?since=2026-04-01
X-TC-Key: tc-your-key-here

Response

{
  "totals": {
    "requests":          142,
    "tokens_before":     982400,
    "tokens_after":      802100,
    "tokens_saved":      180300,
    "total_savings_usd": 0.4508,
    "avg_latency_ms":    6.2
  },
  "by_level": [...],
  "by_model": [...]
}

GET /health GET

Returns {"status": "ok"}. No authentication required. Use this to verify the API is reachable.

Compression Levels

Level	Typical Savings	Accuracy Impact	Best For
light	~17%	~2.7pp	General use, production workloads, legal text
medium	~34%	~5.1pp	High-volume pipelines, summaries, classification
aggressive	~46%	~9.4pp	Batch processing, embeddings, low-stakes tasks

We recommend starting with light. In benchmark testing on legal and regulatory documents, GPT-4o accuracy was slightly higher with light compression than without — likely because the denser prompt reduces noise.

Domain Modes

Domain modes tell the compressor which vocabulary to protect. Switching domains does not change which stop words are removed — it changes what counts as a stop word.

Domain	Protected Vocabulary	Use Case
general	Standard English	General chat, code, summaries
legal	Regulatory terms (shall, pursuant to, whereas, OFAC, FAR…)	Contracts, compliance, legal research
technical	RFC/spec keywords, HTTP terms, system calls	API docs, technical specs, RFC analysis

Response Headers

Every proxied request returns these headers alongside the normal LLM response:

Header	Value
X-TC-Tokens-Before	Token count before compression
X-TC-Tokens-After	Token count after compression
X-TC-Tokens-Saved	Tokens removed
X-TC-Token-Reduction-Pct	Percentage reduction (e.g. `18.3`)
X-TC-Latency-Ms	Compression time in milliseconds
X-TC-Level	Compression level applied
X-TC-Domain	Domain mode applied

Python Example

import openai

client = openai.OpenAI(
    base_url="https://api.textcompressor.unmutedlive.com/v1",
    api_key="sk-your-openai-key",
    default_headers={
        "X-TC-Key":    "tc-your-tc-key",
        "X-TC-Level":  "light",
        "X-TC-Domain": "general",
    },
)

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": prompt}],
)

print(response.choices[0].message.content)

Node.js Example

import OpenAI from 'openai';

const client = new OpenAI({
  baseURL: 'https://api.textcompressor.unmutedlive.com/v1',
  apiKey:  'sk-your-openai-key',
  defaultHeaders: {
    'X-TC-Key':   'tc-your-tc-key',
    'X-TC-Level': 'light',
  },
});

const response = await client.chat.completions.create({
  model:    'gpt-4o',
  messages: [{ role: 'user', content: prompt }],
});

console.log(response.choices[0].message.content);

cURL Example

curl https://api.textcompressor.unmutedlive.com/v1/compress \
  -H "X-TC-Key: tc-your-tc-key" \
  -H "Content-Type: application/json" \
  -d '{"text": "Please note that the following document outlines the terms...", "level": "light"}'

Local Proxy

The local proxy runs on your machine. No data leaves your network during compression — only the final compressed prompt is forwarded to your LLM API.

# Install
pip install textcompressor

# Start (defaults: port 8080, level light, domain general)
textcompressor-proxy

# Custom options
textcompressor-proxy --port 8080 --level medium --domain legal

# Point your client at localhost instead of api.openai.com
export OPENAI_BASE_URL=http://localhost:8080

# Works with any OpenAI-compatible client — Ollama, LM Studio, etc.
export OPENAI_BASE_URL=http://localhost:8080  # TextCompressor
export OLLAMA_HOST=http://localhost:11434      # Ollama upstream

The local proxy is free and unlimited — no API key required, no rate limits. The hosted API is for teams who don't want to run their own server.

Rate Limits

Plan	Requests/month	Requests/minute
free	100	10
starter	Unlimited (1M token budget)	60
pro	Unlimited (10M token budget)	300

The local proxy has no rate limits.

Error Codes

HTTP Status	Code	Meaning
401	missing_key	No `X-TC-Key` header present
403	invalid_key	Key not found or inactive
429	rate_limited	Monthly request limit reached
400	bad_request	Malformed JSON or missing required fields
502	upstream_error	Upstream LLM API returned an error
500	internal_error	Internal server error — email support

Benchmarks

Tested across 11,760 benchmark questions on four document types:

Literary (War & Peace)
Legal/regulatory (Federal Acquisition Regulation)
Financial (SEC 10-K filings)
Technical (RFC 7231)

Level	Token Savings	Accuracy (GPT-4o)	Accuracy Delta
none (baseline)	—	87.4%	—
light	17.2%	84.9%	−2.7pp
medium	33.8%	82.6%	−5.1pp
aggressive	46.1%	78.3%	−9.4pp

On legal/regulatory text (FAR corpus), GPT-4o accuracy was higher with light compression (88.1%) than without (87.4%) — the denser prompt reduces noise in verbose regulatory language.

Request the full benchmark report →