TextCompressor Documentation

TextCompressor cuts your LLM token bill by 17–46% by compressing prompts before they reach the model — no code rewrite, no model changes, no data stored.

This reference covers the hosted API, the local proxy, all configuration options, and code examples for every major language.

Overview

TextCompressor works as a transparent proxy between your application and your LLM provider. It removes stop words, filler phrases, and redundant tokens deterministically on your CPU before forwarding the prompt to the real API.

No AI calls are made during compression. The compression algorithm is deterministic CPU logic — your prompts are never sent to a third-party model.

Quick Start

Hosted API

Point any OpenAI-compatible client at our API base URL and add your TC key as a header:

# Python + openai SDK
import openai

client = openai.OpenAI(
    base_url="https://api.textcompressor.unmutedlive.com/v1",
    api_key="YOUR_OPENAI_KEY",
    default_headers={"X-TC-Key": "tc-your-key-here"},
)

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Summarize the following document..."}],
)

# Token savings are in the response headers
# X-TC-Tokens-Saved: 312
# X-TC-Token-Reduction-Pct: 18.3

Local proxy

# Install
pip install textcompressor

# Start the proxy (defaults to port 8080)
textcompressor-proxy --level light --port 8080

# Then just redirect your client
export OPENAI_BASE_URL=http://localhost:8080

Authentication

The hosted API requires a TC API key sent in the X-TC-Key request header. Your actual LLM provider key (OpenAI, Anthropic, etc.) goes in the standard Authorization: Bearer ... header as usual.

HeaderValueRequired
X-TC-KeyYour TextCompressor API key (tc-...)Yes
AuthorizationBearer YOUR_OPENAI_KEYYes (proxied to LLM)
X-TC-Levellight / medium / aggressiveNo (default: account setting)
X-TC-Domaingeneral / legal / technicalNo (default: general)

Get a free API key →

POST /v1/compress POST

Compress a text string and return the result without proxying to an LLM. Useful for inspecting what the compressor does before enabling it end-to-end.

Request body

FieldTypeDescription
textstringThe text to compress. Required.
levelstringlight | medium | aggressive. Default: account setting.
domainstringgeneral | legal | technical. Default: general.
modelstringTarget model (used only for savings estimate). Default: gpt-4o.

Response

{
  "original_text":   "Please note that the following document...",
  "compressed_text": "following document...",
  "tokens_before":   847,
  "tokens_after":    692,
  "tokens_saved":    155,
  "reduction_pct":   18.3,
  "estimated_savings_usd": 0.000388,
  "model":           "gpt-4o",
  "level":           "light",
  "domain":          "general"
}

POST /v1/chat/completions POST

Drop-in replacement for the OpenAI /v1/chat/completions endpoint. TextCompressor compresses your messages, forwards the request to OpenAI (or any upstream URL you specify), and returns the unmodified response.

The request and response body format are identical to the OpenAI Chat Completions API. Savings are reported in response headers.

Streaming is supported. Pass "stream": true in the request body as normal — TextCompressor will compress the prompt and stream the upstream response back transparently.

GET /v1/usage GET

Returns your account's usage totals for all time or a given date range.

# All time
GET /v1/usage
X-TC-Key: tc-your-key-here

# Since a specific date
GET /v1/usage?since=2026-04-01
X-TC-Key: tc-your-key-here

Response

{
  "totals": {
    "requests":          142,
    "tokens_before":     982400,
    "tokens_after":      802100,
    "tokens_saved":      180300,
    "total_savings_usd": 0.4508,
    "avg_latency_ms":    6.2
  },
  "by_level": [...],
  "by_model": [...]
}

GET /health GET

Returns {"status": "ok"}. No authentication required. Use this to verify the API is reachable.

Compression Levels

LevelTypical SavingsAccuracy ImpactBest For
light~17%~2.7ppGeneral use, production workloads, legal text
medium~34%~5.1ppHigh-volume pipelines, summaries, classification
aggressive~46%~9.4ppBatch processing, embeddings, low-stakes tasks

We recommend starting with light. In benchmark testing on legal and regulatory documents, GPT-4o accuracy was slightly higher with light compression than without — likely because the denser prompt reduces noise.

Domain Modes

Domain modes tell the compressor which vocabulary to protect. Switching domains does not change which stop words are removed — it changes what counts as a stop word.

DomainProtected VocabularyUse Case
generalStandard EnglishGeneral chat, code, summaries
legalRegulatory terms (shall, pursuant to, whereas, OFAC, FAR…)Contracts, compliance, legal research
technicalRFC/spec keywords, HTTP terms, system callsAPI docs, technical specs, RFC analysis

Response Headers

Every proxied request returns these headers alongside the normal LLM response:

HeaderValue
X-TC-Tokens-BeforeToken count before compression
X-TC-Tokens-AfterToken count after compression
X-TC-Tokens-SavedTokens removed
X-TC-Token-Reduction-PctPercentage reduction (e.g. 18.3)
X-TC-Latency-MsCompression time in milliseconds
X-TC-LevelCompression level applied
X-TC-DomainDomain mode applied

Python Example

import openai

client = openai.OpenAI(
    base_url="https://api.textcompressor.unmutedlive.com/v1",
    api_key="sk-your-openai-key",
    default_headers={
        "X-TC-Key":    "tc-your-tc-key",
        "X-TC-Level":  "light",
        "X-TC-Domain": "general",
    },
)

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": prompt}],
)

print(response.choices[0].message.content)

Node.js Example

import OpenAI from 'openai';

const client = new OpenAI({
  baseURL: 'https://api.textcompressor.unmutedlive.com/v1',
  apiKey:  'sk-your-openai-key',
  defaultHeaders: {
    'X-TC-Key':   'tc-your-tc-key',
    'X-TC-Level': 'light',
  },
});

const response = await client.chat.completions.create({
  model:    'gpt-4o',
  messages: [{ role: 'user', content: prompt }],
});

console.log(response.choices[0].message.content);

cURL Example

curl https://api.textcompressor.unmutedlive.com/v1/compress \
  -H "X-TC-Key: tc-your-tc-key" \
  -H "Content-Type: application/json" \
  -d '{"text": "Please note that the following document outlines the terms...", "level": "light"}'

Local Proxy

The local proxy runs on your machine. No data leaves your network during compression — only the final compressed prompt is forwarded to your LLM API.

# Install
pip install textcompressor

# Start (defaults: port 8080, level light, domain general)
textcompressor-proxy

# Custom options
textcompressor-proxy --port 8080 --level medium --domain legal

# Point your client at localhost instead of api.openai.com
export OPENAI_BASE_URL=http://localhost:8080

# Works with any OpenAI-compatible client — Ollama, LM Studio, etc.
export OPENAI_BASE_URL=http://localhost:8080  # TextCompressor
export OLLAMA_HOST=http://localhost:11434      # Ollama upstream
The local proxy is free and unlimited — no API key required, no rate limits. The hosted API is for teams who don't want to run their own server.

Rate Limits

PlanRequests/monthRequests/minute
free10010
starterUnlimited (1M token budget)60
proUnlimited (10M token budget)300

The local proxy has no rate limits.

Error Codes

HTTP StatusCodeMeaning
401missing_keyNo X-TC-Key header present
403invalid_keyKey not found or inactive
429rate_limitedMonthly request limit reached
400bad_requestMalformed JSON or missing required fields
502upstream_errorUpstream LLM API returned an error
500internal_errorInternal server error — email support

Benchmarks

Tested across 11,760 benchmark questions on four document types:

LevelToken SavingsAccuracy (GPT-4o)Accuracy Delta
none (baseline)87.4%
light17.2%84.9%−2.7pp
medium33.8%82.6%−5.1pp
aggressive46.1%78.3%−9.4pp

On legal/regulatory text (FAR corpus), GPT-4o accuracy was higher with light compression (88.1%) than without (87.4%) — the denser prompt reduces noise in verbose regulatory language.

Request the full benchmark report →