Prompt Engineering Guide

This guide outlines the key aspects of prompt engineering when working with Large Language Models (LLMs), including prompt design, optimization, evaluation, and monitoring.

Understanding prompt engineering

For an overview, see this video.

Most important takeaways:

Definition of a prompt:
- An instruction sent to a language model to solve a task
- Forms the core of AI features in user interfaces
Importance of prompt quality:
- Greatly influences the quality of the language model's response
- Iterating on prompts is crucial for optimal results
Key considerations when crafting prompts:
- Understand the task you're asking the model to perform
- Know what kind of response you're expecting
- Prepare a dataset to test the prompts
- Be specific - provide lots of details and context to help the AI understand
- Give examples of potential questions and desired answers
Prompt universality:
- Prompts are not universal across different language models
- When changing models, prompts need to be adjusted
- Consult the language model provider's documentation for specific tips
- Test new models before fully switching
Tools for working with prompts:
- Anthropic Console: A platform for writing and testing prompts
- Generator Prompt: A tool that creates crafted prompts based on task descriptions
Prompt structure:
- Typically includes a general task description
- Contains placeholders for input text
- May include specific instructions and suggested output formats
- Consider wrapping inputs in XML tags for better understanding and data extraction
System prompts:
- Set the general tone and role for the AI
- Can improve the model's performance
- Usually placed at the beginning of the prompt
- Set the role for the language model
Best practices:
- Invest time in understanding the assignment
- Use prompt generation tools as a starting point
- Test and iterate on prompts to improve results
- Use proper English grammar and syntax to help the AI understand
- Allow uncertainty - tell the AI to say "I don't know" if it is unsure
- Use positive phrasing - say what the AI should do, not what it shouldn't do

Best practices for writing effective prompts

For an overview, see this video about writing effective prompts.

Here are the key takeaways from this video:

No universal "good" prompt:
- The effectiveness of a prompt depends on the specific task.
- There's no one-size-fits-all approach to prompt writing.
Characteristics of effective prompts:
- Clear and explanatory of the task and expected outcomes.
- Direct and detailed.
- Specific about the desired output.
Key elements to consider:
- Understand the task, audience, and end goal.
- Explain these elements clearly in the prompt.
Strategies for improving prompt performance:
- Add instructions in sequential steps.
- Include relevant examples.
- Ask the model to think in steps (chain of thought).
- Request reasoning before providing answers.
- Guide the input - use delimiters to clearly indicate where the user's input starts and ends.
Adapting to model preferences:
- Adjust prompts to suit the preferred data structure of the model.
- For example, Anthropic models work well with XML tags.
Importance of system prompts:
- Set the role for the language model.
- Placed at the beginning of the interaction.
- Can include awareness of tools or long context.
Iteration is crucial:
- Emphasized as the most important part of working with prompts.
- Continual refinement leads to better results.
- Build quality control - automate testing prompts with RSpec or Rake tasks to catch differences.
Use traditional code:
- If a task can be done efficiently outside of calling an LLM, use code for more reliable and deterministic outputs.

Tuning and optimizing workflows for prompts

Prompt tuning for LLMs using LangSmith and Anthropic Workbench together + CEF

Iterating on the prompt using Anthropic console

For an overview, see this video.

Iterating on the prompt using LangSmith

For an overview, see this video.

Using Datasets for prompt tuning with LangSmith

For an overview, see this video.

Using automated evaluation in LangSmith

For an overview, see this video.

Using pairwise experiments in LangSmith

For an overview, see this video.

View the CEF documentation.

When to use LangSmith and when CEF

For an overview, see this video.

Key Points on CEF (Centralized Evaluation Framework) Project

Initial Development
- Start with pure LangSmith for prompt iteration
- Easier and quicker to set up
- More cost-effective for early stages
When to Transition to CEF
- When investing more in the feature
- For working with larger datasets
- For repeated, long-term use
CEF Setup Considerations
- Requires upfront time investment
- Need to adjust evaluations for specific features
- Set up input data (for example, local GDK for chat features)
Challenges
- Ensuring consistent data across different users
- Exploring options like seats and imports for data sharing
Current CEF Capabilities
- Supports chat questions about code
- Handles documentation-related queries
- Includes evaluations for code suggestions
Advantages of CEF
- Allows running evaluations on local GDK
- Results viewable in LangSmith UI
- Enables use of larger datasets
Flexibility
- Requires customization for specific use cases
- Not a one-size-fits-all solution
Documentation
- CEF has extensive documentation available.
Adoption
- Already in use by some teams, including code suggestions and create teams

Further resources

For more comprehensive prompt engineering guides, see: