Skip to content

AI feature development playbook

This playbook outlines our approach to developing AI features at GitLab, similar to and concurrent with the Build track of our product development flow. It serves as a playbook for AI feature development and operational considerations.

Getting Started

AI Feature Development Flow

The AI feature development process consists of five key interdependent and iterative phases:

Plan

This phase prepares AI features so they are ready to be built by engineering. It supplements the plan phase of the build track of the product development flow.

At this point, the customer problem should be well understood, either because of a clearly stated requirement, or by working through the product development flow validation track.

As part of this phase, teams decide if approved models satisfy the requirements of the new feature, or submit a proposal for the approval of other models. Teams also design or adopt testing and evaluation strategies, which includes identifying required datasets.

Key Activities

  • Define AI feature requirements and success criteria
  • Select models and assess their capabilities
  • Plan testing and evaluation strategy

Resources

Develop

The develop phase, and the closely aligned test and evaluate phase, are where we build AI features, address bugs or technical debt, and test the solutions before launching them. It supplements the develop and test phase of the build track of the product development flow.

This phase includes prompt engineering, where teams craft and refine prompts to achieve desired AI model behavior. This often requires multiple iterations to optimize for accuracy, consistency, and user experience.

Development might include integrating chosen models with GitLab infrastructure through the AI Gateway, and implementing API interfaces. Teams must consider requirements for supporting GitLab Duo Self-Hosted.

Key Activities

Resources

Test & Evaluate

In the test and evaluate phase, we validate AI feature quality, performance, and security, using traditional automated testing practices, as well as evaluation of AI-generated content. It supplements the develop and test phase of the build track of the product development flow.

Evaluation involves creating datasets that represent real-world usage scenarios to ensure comprehensive coverage of the feature's behavior. Teams implement evaluation strategies covering multiple aspects of the quality of AI-generated content, as well as performance characteristics.

Key Activities

Resources

Launch & Monitor

This phase focuses on safely introducing AI features to production through controlled rollouts and comprehensive monitoring. It supplements the launch phase of the build track of the product development flow.

We employ feature flags to control access and gradually expand user exposure, starting with internal teams before broader incremental release. Monitoring tracks technical metrics (latency, error rates, resource usage) and AI-specific indicators (model performance, response quality, user satisfaction). Alerting systems can be used to detect performance degradation, unusual patterns, or safety concerns that require immediate attention.

Key Activities

Resources

Improve

This phase focuses on iteratively improving the feature based on data, user feedback, and changing requirements. It supplements the improve phase of the build track of the product development flow.

We analyze real-world usage patterns and performance metrics to identify opportunities for improvement, whether in prompt engineering, model selection, system architecture, or feature design. User feedback should capture qualitative insights about user satisfaction. Teams can iteratively refine prompts based on user interactions and feedback.

This phase includes model migrations as newer, more capable models become available.

Key Activities

Phase Interdependencies

Each phase can feed back into any or all earlier phases as development proceeds. The develop and test & evaluate phases are especially intertwined. Examples of interdependencies include:

  • Evaluation insights might require new development iterations.
  • Production monitoring results may suggest architectural replanning.
  • User feedback could inform evaluation strategy changes.