Understanding tokenization and consumption in LLMs

Large language models (LLMs) such as ChatGPT, Claude Cowork and GitHub Copilot have revolutionised the way individuals and organizations interact with artificial intelligence for content generation, coding assistance and collaborative work. At the core of these advancements lies the concept of tokenization — a fundamental process that dictates how user inputs are interpreted, processed and ultimately billed. Understanding tokenization is crucial for tech-savvy professionals seeking to optimise their usage, predict costs and appreciate the nuanced differences between leading AI platforms.

Understanding tokenization: Tokens versus words and sentences

Tokenization refers to the method by which LLMs break down text into smaller, more manageable units called tokens. Unlike words or sentences, tokens are not strictly defined by linguistic boundaries; rather, they are subunits that may represent a single character, a fragment of a word, an entire word or even punctuation marks.

For instance, the English word “unbelievable” might be split into tokens such as “un,” “believ” and “able,” depending on the underlying tokenizer. This approach allows models to handle a wider range of languages, complex vocabulary and even programming syntax with greater efficiency. Consequently, tokenization is more granular than word or sentence segmentation, enabling LLMs to manage context and meaning with remarkable flexibility.

Prompt input lifecycle: From user entry to model response

The journey of a prompt through an LLM begins when a user submits their input — be it a question, instruction or code snippet. This input is first processed by a tokenizer specific to the platform, which converts the raw text into a sequence of tokens. Each token is then assigned a unique identifier, forming a numerical representation of the prompt. The LLM receives this sequence and processes it using its neural architecture, which has been trained to predict the most probable next token based on the context provided by preceding tokens.

As the model processes the input, it generates a response token by token, constructing the output iteratively until a stop condition is met — such as reaching a maximum token limit or encountering an end-of-sequence marker. The resulting output is then detokenized, meaning the sequence of tokens is translated back into human-readable text before being presented to the user. Throughout this lifecycle, both the prompt and the generated response contribute to the total token count, which is central to calculating usage and costs.

Token consumption calculation: Measuring and charging usage

Token consumption is a critical metric for both users and providers of LLM services, as it directly impacts performance, cost and feasibility of large-scale deployments. Most platforms calculate token usage by summing the number of tokens in the prompt and the response. For example, if a user submits a prompt that tokenizes into 50 tokens and the model returns 100 tokens in its reply, the total consumption is 150 tokens for that interaction. This approach ensures that users are billed proportionally to the computational effort their queries require.

The granularity of tokenization means that the same phrase may yield different token counts depending on the language, punctuation or even the specific tokenizer algorithm in use. As such, users may notice slight variations in token consumption when interacting with different models or platforms, even when submitting identical prompts. Understanding these nuances allows professionals to craft more efficient queries and better estimate their usage.

Platform comparisons: ChatGPT, Claude Cowork and GitHub Copilot

While the foundational process of tokenization is conceptually similar across platforms, each service employs its own implementation and optimizations. ChatGPT, developed by OpenAI, utilizes a byte pair encoding (BPE) based tokenizer, which splits text into subword units to balance efficiency and coverage of vocabulary. Token limits per interaction and billing structures are well-documented, allowing users to predict consumption with reasonable accuracy.

Claude Cowork, powered by Anthropic’s Claude model, also relies on a subword tokenization method but may use a different variant of BPE or a unique algorithm tailored to its training data. The specifics of token segmentation and consumption calculation can thus differ slightly from OpenAI’s approach. Claude Cowork often emphasizes safety and context retention, which may influence how prompts are broken down and processed, potentially leading to distinct token counts for similar inputs.

Comparing various features of popular Generative AI solutions

Feature	ChatGPT	Claude Cowork	GitHub Copilot
Token consumption	Employs a byte pair encoding (BPE) tokenizer, breaking text into subword units. Token usage is transparent, with well-documented limits per interaction.	Relies on subword tokenization, possibly with a unique algorithm tailored to its model. Token segmentation and counts may differ slightly, with focus on safety and context retention.	Optimised for code, its tokenizer is sensitive to programming syntax and structure. Token usage can rise with complex code and is generally abstracted from the user.
Cost per prompt	Transparent pricing based on token count, allowing for easy estimation of costs with each prompt.	Charges are based on token consumption, though the billing structure may vary slightly due to algorithmic differences.	Costs are linked to underlying token use, but users typically see subscription-based pricing rather than per-prompt charges.
Variety of models available	Offers several model versions (e.g., GPT-3.5, GPT-4), catering to different needs for accuracy and efficiency.	Presents model options within the Claude family, with configurations aimed at collaborative and secure use cases.	Primarily uses Codex, which is a GPT-based model fine-tuned for code, with updates released periodically.
User experience	Designed for general queries and conversations, offering a predictable and straightforward experience.	Focuses on a collaborative workspace, emphasising safety, extended context and team-oriented workflows.	Integrated directly into code editors, providing real-time code suggestions with minimal disruption to development flow.
Licence cost	Usually subscription-based, with options for free tiers and paid plans depending on usage volume.	May offer both individual and enterprise licensing, tailored to collaborative environments.	Charged as a monthly or annual subscription, often with a free trial for initial usage.
Additional notable features	Provides API access, with extensive documentation and support for integration into various applications.	Emphasizes ethical responses and safety in outputs, supporting longer context windows for complex tasks.	Specialised for software development tasks, with deep integration in popular IDEs and support for multiple programming languages.

Each of these platforms is crafted to meet specific user needs, and their approaches to tokenization, billing and user interaction reflect their primary audiences. Whether one is seeking clarity in cost and usage, collaborative features or seamless coding assistance, understanding these distinctions can help users select the platform that best fits their requirements.

GitHub Copilot, designed primarily as a coding assistant, leverages the Codex model, a derivative of OpenAI’s GPT architecture. Its tokenizer is optimised for programming languages, enabling it to handle code syntax, indentation and comments with high fidelity. As a result, tokenization in Copilot is particularly sensitive to code structure, and token consumption may spike with verbose or complex code snippets. Additionally, Copilot’s integration within development environments means that token usage is often abstracted from the user, though underlying billing and performance considerations remain consistent with LLM principles.

In summary, while all three platforms convert prompts into tokens using subword or character-based algorithms, the specifics of tokenization, usage calculation and processing are shaped by their respective target audiences and applications. ChatGPT offers transparency and predictability for general-purpose queries, Claude Cowork tailors its approach for collaborative and secure interactions, and GitHub Copilot optimizes for code-centric workloads.

Best practices in token optimization

Effective token optimization is essential for maximising the value and efficiency of interactions with advanced LLM platforms. By carefully considering how prompts are structured and processed, users can reduce unnecessary token consumption, streamline responses and ultimately lower costs. Below, we explore practical strategies and examples for optimising tokens in GitHub Copilot, Claude Cowork and ChatGPT.

With GitHub Copilot, developers should aim to write concise code comments and avoid overly verbose explanations within prompts. For instance, rather than elaborating every requirement, providing clear, targeted instructions—such as “generate a Python function to sort a list”—can produce accurate results while minimising token usage. Additionally, breaking down complex tasks into smaller, manageable prompts helps maintain clarity and reduces the likelihood of excessive token consumption.

For collaborative platforms like Claude Cowork, it is beneficial to tailor prompts to the specific context and participants. Using succinct language and focusing on actionable requests ensures that token usage is distributed efficiently during team discussions. For example, instead of a lengthy background, stating “summarise today’s meeting notes for the project” provides precise guidance and optimizes the response length.

When engaging with ChatGPT, users should avoid redundant phrasing and combine related queries into single prompts where feasible. By framing questions such as “What are the key features of platform X?” instead of listing multiple isolated questions, users can obtain comprehensive answers in fewer tokens. Employing bullet points or numbered lists within prompts can also help clarify requirements and reduce ambiguity.

Across all platforms, reviewing prompt history and analysing token consumption patterns can lead to more strategic usage. By leveraging platform-specific documentation and tools, users can refine their approach and develop prompt templates that consistently yield efficient results. Ultimately, mindful prompt engineering and a clear understanding of platform behaviour are key to achieving optimal token utilization in LLM workflows.

Conclusion

A thorough understanding of tokenization and token consumption is indispensable for professionals engaging with advanced LLM platforms. Recognizing that tokenization operates at a level finer than words or sentences enables users to craft more efficient prompts and anticipate usage costs with greater accuracy. While the lifecycle from prompt input to model response shares commonalities across ChatGPT, Claude Cowork and GitHub Copilot, platform-specific differences in tokenization algorithms and application focus lead to distinct user experiences. By staying informed about these processes, users can make more strategic choices, optimise their workflows and fully leverage the capabilities of modern language models.

This article was made possible by our partnership with the IASA Chief Architect Forum. The CAF’s purpose is to test, challenge and support the art and science of Business Technology Architecture and its evolution over time as well as grow the influence and leadership of chief architects both inside and outside the profession. The CAF is a leadership community of the IASA, the leading non-profit professional association for business technology architects.

This article is published as part of the Foundry Expert Contributor Network.
Want to join?