NOMOS brings software engineering best practices to AI agent development, making it possible to apply traditional testing methodologies to ensure your agents behave reliably and predictably.
NOMOS supports two primary approaches for writing tests:
Copy
# YAML Test Configuration: Define tests declaratively using YAML filesllm: provider: openai model: gpt-4o-miniunit: test_greeting_response: input: "" expectation: "Greets the user warmly and asks how to help" test_order_taking_with_context: context: current_step_id: "take_order" history: - type: summary summary: - "Customer expressed interest in ordering coffee" - "Agent moved to order-taking step" input: "I'd like a large latte" expectation: "Acknowledges the order and asks for any additional items" test_invalid_transition: context: current_step_id: "greeting" input: "Process my payment" expectation: "Explains that payment processing comes after order confirmation" invalid: true # This test expects the agent to NOT transition inappropriately
When using the Pythonic approach, NOMOS provides special pytest features:
Copy
# AI-Powered Test Validation: Use smart_assert for natural language test validationdef test_tool_call_validation(agent: Agent): """Test that agent makes correct tool calls.""" decision, _, _ = agent.next("I want to calculate my budget for $5000 income") # Traditional assertion assert decision.action.value == "TOOL_CALL" assert decision.tool_call.tool_name == "calculate_budget" # Smart assertion using natural language smart_assert( decision, "Calls the calculate_budget tool with monthly income of 5000", agent.llm ) # You can also check negative cases with pytest.raises(AssertionError): smart_assert( decision, "Responds with text instead of calling a tool", agent.llm )
test_tool_integration: context: current_step_id: "check_inventory" input: "Do you have medium lattes available?" expectation: "Uses get_available_coffee_options tool and provides accurate availability"
Testing Step Transitions:
Copy
test_step_routing: context: current_step_id: "order_complete" input: "Thank you, goodbye" expectation: "Transitions to farewell step and thanks customer"
Testing Error Handling:
Copy
test_invalid_input: context: current_step_id: "payment" input: "banana helicopter" expectation: "Asks for clarification about payment method" invalid: true
# YAML Configuration Testing: Use the NOMOS CLI to run YAML-defined tests# Run all tests for an agentnomos test --config config.agent.yaml --tests tests.agent.yaml# Run specific test casesnomos test --config config.agent.yaml --tests tests.agent.yaml --filter "test_greeting"# Run tests with verbose outputnomos test --config config.agent.yaml --tests tests.agent.yaml --verbose# Generate test coverage reportnomos test --config config.agent.yaml --tests tests.agent.yaml --coverage
Create tests for each step’s specific behavior and available tools
Copy
# Test greeting steptest_greeting: context: current_step_id: "greeting" input: "Hello" expectation: "Warm greeting and explanation of available services"# Test order steptest_order_taking: context: current_step_id: "take_order" input: "I want a latte" expectation: "Uses menu tools and confirms order details"
Verify that tools are called correctly with proper parameters
Copy
test_tool_parameters: context: current_step_id: "add_item" input: "Add a large cappuccino to my order" expectation: "Calls add_to_cart with coffee_type='Cappuccino', size='Large'"
While unit testing validates individual steps, end-to-end (E2E) testing validates complete user scenarios from start to finish. NOMOS provides scenario-based testing to simulate real user interactions.
E2E tests use scenarios that describe complete user journeys:
Copy
# YAML Scenario Definition: Define scenarios declarativelyllm: provider: openai model: gpt-4o-miniscenarios: complete_coffee_order: scenario: "New customer wants to order a medium cappuccino with an extra shot and pay by card" expectation: "Agent should greet, show menu, take order, confirm details, and process payment" max_turns: 15 handle_unavailable_item: scenario: "Customer orders an item that's not available and needs an alternative" expectation: "Agent politely explains unavailability and suggests alternatives" max_turns: 8
Define E2E test scenarios in your test configuration:
Copy
# tests.agent.yamlllm: provider: openai model: gpt-4o-miniscenarios: complete_coffee_order: scenario: "New customer wants to order a medium cappuccino with an extra shot and pay by card" expectation: "Agent should greet, show menu, take order, confirm details, and process payment" max_turns: 15 handle_unavailable_item: scenario: "Customer orders an item that's not available and needs an alternative" expectation: "Agent politely explains unavailability and suggests alternatives" max_turns: 8 complex_multi_item_order: scenario: "Customer orders multiple different drinks with modifications for a group" expectation: "Agent accurately captures all items and modifications, confirms total" max_turns: 20
# Run all E2E scenariosnomos test --e2e ./e2e_tests.yaml# Run specific scenarionomos test --e2e ./e2e_tests.yaml --scenario complete_coffee_order# Run with detailed outputnomos test --e2e ./e2e_tests.yaml --verbose
Here’s how the barista agent might be tested using both approaches:
Copy
# YAML Test Configuration: Declarative test definitionsllm: provider: openai model: gpt-4o-miniunit: test_greeting_new_customer: input: "Hi there" expectation: "Greets warmly and offers to show menu or take order" test_menu_inquiry: context: current_step_id: "start" input: "What drinks do you have?" expectation: "Uses get_available_coffee_options and lists available drinks with prices" test_add_to_cart: context: current_step_id: "take_coffee_order" input: "I'll have a large latte" expectation: "Calls add_to_cart with correct parameters and confirms addition" test_invalid_payment_method: context: current_step_id: "finalize_order" input: "I'll pay with bitcoin" expectation: "Explains accepted payment methods (Card or Cash)" invalid: truescenarios: complete_order_flow: scenario: "Customer orders a medium cappuccino and pays with card" expectation: "Successfully processes order from greeting to payment completion" max_turns: 12
This comprehensive testing approach ensures your NOMOS agents are reliable, predictable, and ready for production deployment.
Assistant
Responses are generated using AI and may contain mistakes.