Requirements¶
The following section lists all the project requirements that have been identified during the analysis phase for the Test 2 Code with LLMs (T2C) project.
Business Requirements¶
- Test2Code shall provide working code implementations that adhere to the given test specifications.
- Test2Code shall demonstrate the feasibility of integrating AI-powered code generation into CI/CD pipelines for automated development workflows.
- Test2Code shall experiment with multiple LLMs to evaluate and compare their performance on code generation tasks, providing empirical evidence for model selection.
- Test2Code shall serve as a proof-of-concept for shifting developer roles from code writers to test specification providers.
Domain Requirements¶
Code Generation Domain¶
- The system shall generate syntactically valid code in the target programming language(s) based on provided test specifications.
- Generated code shall pass all provided test cases without modification, ensuring functional correctness.
- The system shall support multiple types of test inputs:
- Unit tests (testing individual functions/methods)
- Integration tests (testing component interactions)
- Acceptance tests (testing user-facing behavior)
- Generated code shall follow common programming conventions and best practices for the target language.
- The system shall handle increasingly complex software projects as case studies:
- Simple games (Tic Tac Toe)
- Moderate complexity (Snake game)
- Higher complexity (Space Invaders)
Large Language Model Domain¶
- The system shall interface with multiple LLM providers:
- Open-source models (Mistral, DeepSeek R1, Smollm2, Qwen3)
- Commercial APIs (GitHub Copilot, Gemini Flash)
- The system shall implement proper prompt engineering techniques to optimize code generation quality from test specifications.
- The system shall handle LLM API rate limits and failure scenarios gracefully.
- Generated code quality shall be measurable and comparable across different LLM models.
Testing Domain¶
- The system shall parse and understand various test frameworks and assertion patterns.
- Test specifications shall provide sufficient context for the LLM to understand:
- Expected function signatures
- Input/output relationships
- Business logic requirements
- Edge case handling
- The system shall validate that generated code passes the original test suite through automated execution.
CI/CD Integration Domain¶
- The system shall be packageable as a command-line tool suitable for CI/CD pipeline integration.
- The system shall support configurable retry mechanisms when initial code generation attempts fail tests.
- The system shall provide detailed logging and reporting for pipeline integration and debugging.
- The system shall handle version control workflows, including:
- Reading tests from feature branches
- Generating code to designated output directories
- Integration with pull request workflows
Research Domain¶
- The system shall collect metrics on code generation success rates across different:
- Test types (unit, integration, acceptance)
- Project complexity levels
- LLM models and configurations
- The system shall enable reproducible experiments with consistent test cases across different model evaluations.
- Generated code quality shall be evaluated using multiple criteria:
- Test pass rates
- Code complexity metrics
- Performance characteristics
Functional Requirements¶
User Functional Requirements¶
- Users shall provide a directory of test files to generate corresponding implementation code.
- Users shall specify the target output directory for generated code.
- Users shall select which LLM model to use for code generation.
- Users shall run comparative experiments across multiple models with identical test inputs.
- Users shall integrate the tool into automated pipelines with appropriate exit codes and logging.
System Functional Requirements¶
- The system shall execute generated code against the original test suite to validate correctness.
- The system shall implement retry logic with iterative refinement when tests fail.
- The system shall generate comprehensive reports on generation success/failure rates.
- The system shall support batch processing of multiple test files simultaneously.
- The system shall validate generated code syntax before test execution.
Non-Functional Requirements¶
- The system shall generate code for simple test cases (< 10 tests) within 2 minutes per LLM model.
- The system shall handle network failures and API timeouts gracefully with appropriate retry mechanisms.
- The system architecture shall allow easy addition of new LLM providers.
- API keys and credentials shall be stored securely and not logged or exposed.
Implementation Requirements¶
- The system shall be implemented in Python 3.9+ for broad compatibility and rich ecosystem support.
- The system shall implement a plugin architecture for easy addition of new LLM providers.
- The system shall include unit tests covering all core functionality with >80% code coverage.