Context Compaction

Forge includes powerful automatic context management capabilities that optimize AI conversations while preserving important information.

What is Context Compaction?

As conversations with AI agents grow longer, they can exceed token limits and become inefficient. Context compaction automatically summarizes older parts of conversations when they reach configurable thresholds, allowing you to maintain longer, more productive interactions without hitting model context limits.

Key benefits include:

Extended Conversations: Continue conversations beyond normal token limits
Optimized Performance: Reduce token usage and improve response times
Preserved Context: Keep critical information while summarizing less important details
Cost Efficiency: Reduce token usage in API calls

How It Works

When a conversation reaches the configured token threshold:

The system identifies which messages to preserve (based on the retention window)
Older messages are sent to the configured summarization model
A concise summary replaces the older messages
New conversation turns continue with the summarized context

This process happens automatically and transparently to the user.

Configuration Options

Add the following to your forge.yaml file under an agent configuration:

agents:
  - id: assistant
    model: anthropic/claude-3.5-sonnet
    compact:
      max_tokens: 2000 # Maximum tokens for the summary
      token_threshold: 80000 # When to trigger compaction
      model: google/gemini-2.0-flash-001 # Model to use for summarization
      retention_window: 6 # Recent messages to preserve
      prompt: "{{> system-prompt-context-summarizer.hbs }}" # Optional custom prompt

Configuration Parameters

Parameter	Required	Description
`max_tokens`	Yes	Maximum token count for the generated summary
`token_threshold`	Yes	Conversation token count that triggers compaction
`model`	Yes	AI model to use for generating the summary
`retention_window`	No	Number of recent messages to preserve unchanged
`prompt`	No	Custom prompt template for summarization

Best Practices

Selecting Appropriate Thresholds

Set token_threshold based on your model's context window size. For example:

For Claude 3.7 Sonnet (~200K token window): 150,000 to 180,000 tokens
For Claude 3.5 haiku (~200K token window): 120,000 to 160,000 tokens

Choosing Summarization Models

For the summarization model, balance speed and quality:

Fast models (like Gemini Flash) provide quicker summaries
More powerful models may provide better context preservation but take longer

Retention Window Considerations

The retention window controls how many recent messages are preserved verbatim:

Larger windows maintain more recent context but reduce compaction efficiency
Smaller windows allow for more aggressive compaction but may lose recent details
A typical value is 6-10 messages

The template receives the conversation history and should instruct the model on how to create an effective summary.

Example Use Cases

Long Debugging Sessions

When debugging complex issues, conversations can become lengthy. Context compaction allows the agent to remember key debugging steps while summarizing earlier diagnostics.

Multi-Stage Project Development

For projects developed over multiple sessions, context compaction enables the agent to maintain awareness of project requirements and previous decisions while focusing on current tasks.

Interactive Learning and Tutorials

When using Forge for learning or following tutorials, compaction helps maintain the thread of the lesson while summarizing earlier explanations.

Performance Considerations

Context compaction runs asynchronously to minimize impact on response times. However, consider these performance factors:

Summarization Latency: More powerful summarization models may take longer
Summary Quality vs. Speed: Balance between fast models and high-quality summaries
Token Thresholds: Lower thresholds trigger more frequent compaction
Retention Window Size: Larger windows preserve more context but reduce compaction efficiency

Troubleshooting

Issue: Context Seems Lost After Compaction

Increase max_tokens to allow for more detailed summaries
Use a more capable summarization model
Increase the retention window to preserve more recent messages

Issue: Slow Responses After Threshold is Reached

Choose a faster summarization model
Reduce the token threshold to trigger earlier compaction
Consider lowering max_tokens if full context isn't critical

Agent Configuration - Learn about other agent configuration options
Operation Modes - Understand how context works in different operation modes
Tools Reference - Explore the tools agents use to interact with your system

By effectively using context compaction, you can maintain longer, more productive AI conversations while optimizing for performance and cost efficiency.

What is Context Compaction?​

How It Works​

Configuration Options​

Configuration Parameters​

Best Practices​

Selecting Appropriate Thresholds​

Choosing Summarization Models​

Retention Window Considerations​

Example Use Cases​

Long Debugging Sessions​

Multi-Stage Project Development​

Interactive Learning and Tutorials​

Performance Considerations​

Troubleshooting​

Issue: Context Seems Lost After Compaction​

Issue: Slow Responses After Threshold is Reached​

Related Features​