AI Token Estimator

Analyze language mix and estimate token usage across OpenAI, Codex, Claude, and DeepSeek profiles

Estimate token usage for pasted text or uploaded TXT/Markdown files.

What it does:

  • Detects mixed language/script composition, including Chinese Han, Latin, Kana, Hangul, Cyrillic, Arabic, emoji, symbols, and code-like lines
  • Counts OpenAI / Codex o200kbase and OpenAI cl100kbase with an offline tokenizer
  • Counts Claude with Anthropic counttokens when CLAUDEAPIKEY or ANTHROPICAPI_KEY is available, and falls back to heuristic only if the official call fails
  • Estimates DeepSeek token usage with transparent heuristics when exact provider token counters are unavailable
  • Marks each profile as exact-offline-tokenizer, official-provider-api, or heuristic so the result does not overclaim precision

Example Results

1 examples

Estimate a mixed Chinese and English prompt

Analyze a short mixed-language instruction before sending it to multiple AI models

{
  "result": {
    "input": {
      "characters": 37
    },
    "language": {
      "primary": "Latin",
      "mixed": true
    },
    "estimates": [
      {
        "profile": "openai-codex-o200k-base"
      }
    ]
  }
}
View input parameters
{ "inputText": "请总结 this API design and list 3 risks.", "modelProfile": "All Profiles", "countMode": "raw-text" }

Click to upload file or drag and drop file here

Maximum file size: 20MB Supported formats: text/plain, text/markdown, .txt, .md, .csv, .json, .log

Key Facts

Category
AI
Input Types
textarea, file, select
Output Type
json
Sample Coverage
4
API Ready
Yes

Overview

The AI Token Estimator helps you analyze text composition and accurately estimate token usage across major AI models including OpenAI, Codex, Claude, and DeepSeek. By evaluating mixed-language scripts, code blocks, and symbols, it provides precise offline token counts alongside API-backed or heuristic estimates to help you optimize prompt costs and manage context window limits.

When to Use

  • When preparing large prompts or datasets containing mixed languages, code, or emojis and you need to estimate costs before sending them to LLM APIs.
  • When optimizing prompt lengths to fit within specific model context windows like OpenAI cl100k_base or o200k_base.
  • When analyzing log files, CSVs, or Markdown documents to calculate bulk token consumption for batch processing.

How It Works

  • Paste your text into the input area or upload a supported file format such as TXT, Markdown, CSV, JSON, or log files.
  • Select your target model profile (e.g., OpenAI cl100k_base, Claude, DeepSeek, or All Profiles) and choose between Raw Text or Chat Message counting modes.
  • The tool analyzes the script composition (such as Latin, Chinese Han, or symbols) and runs offline tokenizers or API-based estimators.
  • View the structured JSON output detailing character counts, detected language mix, and token estimates labeled by precision type (exact, official API, or heuristic).

Use Cases

Budgeting API costs for high-volume translation tasks involving mixed English and Chinese text.
Pre-filtering large Markdown documentation files to ensure they do not exceed Claude or OpenAI context limits.
Analyzing system log files to estimate the token footprint of debugging data before feeding it to an LLM.

Examples

1. Estimating Multilingual Prompt Tokens

Localization Engineer
Background
A localization engineer needs to translate a mixed English and Chinese instruction set and wants to estimate the token usage across multiple LLM providers to choose the most cost-effective model.
Problem
Manually counting characters does not reflect how different tokenizers (like o200k_base vs cl100k_base) handle mixed CJK and Latin characters.
How to Use
Paste the mixed-language prompt into the Input Text area, select 'All Profiles' under Model Profiles, and click run.
Example Config
inputText: '请总结 this API design and list 3 risks.', modelProfile: 'All Profiles', countMode: 'raw-text'
Outcome
The tool outputs a JSON breakdown showing exact token counts for OpenAI profiles and heuristic estimates for Claude and DeepSeek, highlighting the language mix.

2. Checking Markdown Documentation Size

Technical Writer
Background
A technical writer wants to feed a long Markdown API documentation file into Claude to generate a summary but needs to ensure it fits within token limits.
Problem
Large files with code blocks and symbols can consume unexpected amounts of tokens depending on the model's tokenizer.
How to Use
Upload the `.md` file using the Text File input, select 'Claude Sonnet Estimate' as the Model Profile, and run the analysis.
Example Config
textFile: 'api_docs.md', modelProfile: 'Claude Sonnet Estimate', countMode: 'raw-text'
Outcome
Receives a precise token estimate for Claude, indicating whether the file fits safely within the context window.

Try with Samples

json, csv, markdown

Related Hubs

FAQ

How accurate are the token estimates?

OpenAI and Codex counts are exact using offline tokenizers. Claude uses official API counts when keys are provided, while DeepSeek and other profiles use transparent heuristic estimations.

Can I estimate tokens for chat messages instead of raw text?

Yes, you can switch the Count Mode option from Raw Text to Chat Message to simulate chat format overhead.

What file formats does the estimator support?

You can upload TXT, Markdown (MD), CSV, JSON, and log files up to 20MB.

Does this tool support multilingual text?

Yes, it automatically detects mixed scripts including Chinese Han, Latin, Kana, Hangul, Cyrillic, Arabic, emojis, and code lines.

Are my API keys or text data stored?

No, all text processing and offline tokenization happen locally, and API calls are made directly to the providers without storing your data.

API Documentation

Request Endpoint

POST /en/api/tools/ai-token-estimator

Request Parameters

Parameter Name Type Required Description
inputText textarea No -
textFile file (Upload required) No -
modelProfile select No -
countMode select No -

File type parameters need to be uploaded first via POST /upload/ai-token-estimator to get filePath, then pass filePath to the corresponding file field.

Response Format

{
  "key": {...},
  "metadata": {
    "key": "value"
  },
  "error": "Error message (optional)",
  "message": "Notification message (optional)"
}
JSON Data: JSON Data

AI MCP Documentation

Add this tool to your MCP server configuration:

{
  "mcpServers": {
    "elysiatools-ai-token-estimator": {
      "name": "ai-token-estimator",
      "description": "Analyze language mix and estimate token usage across OpenAI, Codex, Claude, and DeepSeek profiles",
      "baseUrl": "https://elysiatools.com/mcp/sse?toolId=ai-token-estimator",
      "command": "",
      "args": [],
      "env": {},
      "isActive": true,
      "type": "sse"
    }
  }
}

You can chain multiple tools, e.g.: `https://elysiatools.com/mcp/sse?toolId=png-to-webp,jpg-to-webp,gif-to-webp`, max 20 tools.

Supports URL file links or Base64 encoding for file parameters.

If you encounter any issues, please contact us at [email protected]