Hebo Eval CLI
Hebo Eval is a powerful command-line tool for evaluating and testing language models. It provides a robust framework for running evaluations against your AI agents and generating detailed reports.
Installation
npm install -g hebo-eval@latest
Basic Usage
hebo-eval run <agent> [options]
Command Options
Option | Description | Default |
---|---|---|
-d, --directory <path> | Directory containing test cases | ./examples |
-c, --config <path> | Path to configuration file | - |
-t, --threshold <number> | Score threshold for passing (0-1) | 0.8 |
-f, --format <format> | Output format (json|markdown|text) | text |
-s, --stop-on-error | Stop processing on first error | false |
-m, --max-concurrency <number> | Maximum number of concurrent test executions | 5 |
-v, --verbose | Show detailed output for all test cases | false |
Configuration
You can configure Hebo Eval in two ways:
-
Environment Variables:
export HEBO_API_KEY=your_api_key_here export OPENAI_API_KEY=your_openai_key_here export HEBO_EMBEDDING_API_KEY=your_embedding_key_here
-
Configuration File: Create a YAML configuration file (e.g.,
hebo-evals.config.yaml
):# Hebo Eval Configuration Template # Copy this file to hebo-evals.config.yaml and replace the values with your own # Provider configurations providers: openai: provider: openai baseUrl: https://api.openai.com/v1 apiKey: ${OPENAI_API_KEY} # Will be replaced with the value of OPENAI_API_KEY environment variable authHeader: name: Authorization format: Bearer ${OPENAI_API_KEY} # Can use the same environment variable multiple times hebo: provider: hebo baseUrl: https://app.hebo.ai apiKey: ${HEBO_API_KEY} # Will be replaced with the value of HEBO_API_KEY environment variable authHeader: name: Authorization format: Bearer ${HEBO_API_KEY} # Default provider to use if not specified in the command defaultProvider: hebo # Embedding configuration embedding: provider: hebo model: hebo-embeddings baseUrl: https://api.hebo.ai/v1 apiKey: ${HEBO_EMBEDDING_API_KEY} # Can reuse the same environment variable
Note: The configuration file supports environment variable substitution using
${VARIABLE_NAME}
syntax. This allows you to keep sensitive information like API keys in environment variables while referencing them in your configuration file.
Configuration Template
Hebo Eval provides a boilerplate configuration template called hebo-evals.config.yaml
. This template includes:
- Multiple Provider Support: Configure both OpenAI and Hebo providers
- Environment Variable Integration: Use
${VARIABLE_NAME}
syntax for secure API key management - Flexible Authentication: Support for different authentication header formats
- Embedding Configuration: Separate configuration for embedding models
To use the template:
-
Copy the template to your project directory:
cp hebo-evals.config.yaml ./your-project/
-
Set up your environment variables:
export OPENAI_API_KEY=your_openai_key_here export HEBO_API_KEY=your_hebo_key_here export HEBO_EMBEDDING_API_KEY=your_embedding_key_here
-
Customize the configuration as needed for your specific use case.
Security Note: Never commit your actual configuration file with real API keys to version control. Always use environment variables for sensitive information.
Provider Mapping Logic
When running Hebo Eval, the provider is automatically determined based on the model or agent name you specify in the command. The tool uses pattern matching to map the model/agent name to the appropriate provider:
- OpenAI: Any model name that starts with
gpt-
(e.g.,gpt-3.5-turbo
,gpt-4
) is mapped to the OpenAI provider. - Hebo: Any model name that contains a colon (
:
) (e.g.,gato-qa:v1
) is mapped to the Hebo provider. - Anthropic: Any model name that starts with
claude-
(e.g.,claude-2
,claude-instant
) is mapped to the Anthropic provider. - Custom: Any model name that starts with
custom-
is mapped to the Custom provider.
This mapping allows you to simply specify the model or agent name when running evaluations, and Hebo Eval will automatically select the correct provider configuration based on these naming conventions.
Example:
hebo-eval run gpt-4 # Uses OpenAI provider
hebo-eval run gato-qa:v1 # Uses Hebo provider
hebo-eval run claude-2 # Uses Anthropic provider (future support)
hebo-eval run custom-foo # Uses Custom provider (future support)
Note:
Currently, Hebo Eval only supports OpenAI and Hebo as providers. Support for additional providers such as Anthropic and Custom is planned for future releases.
If you need to override the default mapping, you can specify the provider explicitly in your configuration file or command options.
Test Cases
Hebo Eval supports a flexible test case structure that allows you to organize and manage your test cases effectively.
Test Case Structure
-
Multiple Test Cases in One File:
- Test cases can be defined in a single file, separated by
---
- Each test case starts with a title using
# Test Case Name
format
- Test cases can be defined in a single file, separated by
-
Directory Organization:
- Test cases can be organized in subdirectories
- All test cases in subdirectories are automatically discovered and executed
Test Case Format
# Basic Conversation Test
System: You are a helpful assistant that specializes in customer service.
User: Hi there!
Assistant: Hello! How can I assist you today?
---
# Weather Query Test
System: You are a weather assistant that provides detailed weather information.
User: Could you check the current weather in New York for me?
Assistant: It's rainy in New York today with a temperature of 59°F. There's an 80% chance of rain, high humidity (96%), and a light breeze at 2 mph. You might want to bring an umbrella!
Special Characters and Multiline Messages
When writing test cases, it’s important to understand how to handle special characters and multiline messages correctly. Here’s a comprehensive guide:
Special Characters
The following characters have special meaning in test cases:
Character | Usage | Example |
---|---|---|
: | Role marker delimiter | user: , assistant: , system: |
# | Test case title | # My Test Case |
--- | Test case separator | Used between test cases |
Escaping Special Characters
To include literal special characters in your messages, you can use them directly in the message content. The parser will only interpret these characters as special when they appear in specific contexts:
:
is only special when it appears after a role marker#
is only special when it appears at the start of a line---
is only special when it appears on its own line
Examples:
# Special Characters Example
user: The price is $10:50
assistant: That's correct! The colon (:) is just part of the price.
user: Here's a markdown heading: # Important Note
assistant: The # symbol is just part of the text here.
user: The separator looks like this: ---
assistant: Yes, that's just three hyphens in the text.
Multiline Message Format
Hebo Eval supports two styles of multiline messages:
-
Indented Style:
user: This is a multiline message that continues on the next line with proper indentation
-
Non-indented Style:
user: This is another multiline message that continues on the next line without indentation
Both styles are valid and will be parsed correctly. Choose the style that best fits your needs.
Directory Structure Example
tests/
├── basic/
│ ├── conversations.txt
│ └── simple_queries.txt
├── advanced/
│ ├── tool_usage.txt
│ └── complex_scenarios.txt
└── main.txt
Output Format
The tool supports three output formats with different verbosity levels:
Default Output (Concise)
Passed examples/more tests/test/Silly math
Passed examples/example/First Test Case
Passed examples/more tests/test/Math
Passed examples/math/math
Passed examples/example/Second Test Case
Passed examples/example/Third Test Case
Failed examples/stocks/stocks
Passed examples/news/news
Passed examples/translation/translation
Passed examples/weather/weather
Failed Test Details
=================
examples/stocks/stocks
Status: Failed
Score: 0.398
Time: 16694.51ms
Input:
user: what's the current price of Apple stock?
assistant: I'll check the current stock price
Apple's stock (AAPL) is currently trading at USD175.25, up 2.3 percent today
user: can you write that again in simple terms?
Expected Output:
assistant: something something someting in simple terms
Actual Response:
Sure, I can rephrase that in simpler terms:
Apple's shares cost $175.25 each right now. The price went up a bit today.
Error:
Response mismatch
Test Summary
============
Total: 10
Passed: 9
Failed: 1
Duration: 50.54s
Example Usage
-
Basic Evaluation:
hebo-eval run gato-qa:v1
-
Custom Directory and Format:
hebo-eval run gato-qa:v1 -d ./my-tests -f markdown
-
With Configuration File:
hebo-eval run gato-qa:v1 -c ./hebo-evals.config.yaml
-
Custom Threshold and Concurrency:
hebo-eval run gato-qa:v1 -t 0.5 -m 10
-
Verbose Output:
hebo-eval run gato-qa:v1 -v
Best Practices
- Always set up your API keys using environment variables for security
- Use the provided
hebo-evals.config.yaml
template as a starting point - Start with a small test set before running large evaluations
- Use descriptive test case titles with the
#
format - Organize test cases in subdirectories for better management
- Keep your configuration file secure and never commit API keys to version control
- Use the
-v
flag when debugging test failures - Leverage environment variable substitution in your configuration for better security
Troubleshooting
If you encounter the “HEBO_API_KEY is required” error:
-
Verify your environment variables:
export HEBO_API_KEY=your_api_key_here
-
Or use a configuration file:
hebo-eval run <agent> --config path/to/hebo-evals.config.yaml