# promptfoo
**Repository Path**: sharpmddr/promptfoo
## Basic Information
- **Project Name**: promptfoo
- **Description**: No description available
- **Primary Language**: Unknown
- **License**: MIT
- **Default Branch**: azure-pricing
- **Homepage**: None
- **GVP Project**: No
## Statistics
- **Stars**: 0
- **Forks**: 0
- **Created**: 2025-01-10
- **Last Updated**: 2025-01-10
## Categories & Tags
**Categories**: Uncategorized
**Tags**: None
## README
# promptfoo: test your LLM app locally
[](https://npmjs.com/package/promptfoo)
[](https://npmjs.com/package/promptfoo)
[](https://github.com/promptfoo/promptfoo/actions/workflows/main.yml)

[](https://discord.gg/gHPS9jjfbs)
`promptfoo` is a tool for testing, evaluating, and red-teaming LLM apps.
With promptfoo, you can:
- **Build reliable prompts, models, and RAGs** with benchmarks specific to your use-case
- **Secure your apps** with automated [red teaming](https://www.promptfoo.dev/docs/red-team/) and pentesting
- **Speed up evaluations** with caching, concurrency, and live reloading
- **Score outputs automatically** by defining [metrics](https://www.promptfoo.dev/docs/configuration/expected-outputs)
- Use as a [CLI](https://www.promptfoo.dev/docs/usage/command-line), [library](https://www.promptfoo.dev/docs/usage/node-package), or in [CI/CD](https://www.promptfoo.dev/docs/integrations/github-action)
- Use OpenAI, Anthropic, Azure, Google, HuggingFace, open-source models like Llama, or integrate custom API providers for [any LLM API](https://www.promptfoo.dev/docs/providers)
The goal: **test-driven LLM development** instead of trial-and-error.
```sh
npx promptfoo@latest init
```
# [» View full documentation «](https://www.promptfoo.dev/docs/intro)
promptfoo produces matrix views that let you quickly evaluate outputs across many prompts and inputs:

It works on the command line too:

It also produces high-level vulnerability and risk reports:

## Why choose promptfoo?
There are many different ways to evaluate prompts. Here are some reasons to consider promptfoo:
- **Developer friendly**: promptfoo is fast, with quality-of-life features like live reloads and caching.
- **Battle-tested**: Originally built for LLM apps serving over 10 million users in production. Our tooling is flexible and can be adapted to many setups.
- **Simple, declarative test cases**: Define evals without writing code or working with heavy notebooks.
- **Language agnostic**: Use Python, Javascript, or any other language.
- **Share & collaborate**: Built-in share functionality & web viewer for working with teammates.
- **Open-source**: LLM evals are a commodity and should be served by 100% open-source projects with no strings attached.
- **Private**: This software runs completely locally. The evals run on your machine and talk directly with the LLM.
## Workflow
Start by establishing a handful of test cases - core use cases and failure cases that you want to ensure your prompt can handle.
As you explore modifications to the prompt, use `promptfoo eval` to rate all outputs. This ensures the prompt is actually improving overall.
As you collect more examples and establish a user feedback loop, continue to build the pool of test cases.
## Usage - evals
To get started, run this command:
```sh
npx promptfoo@latest init
```
This will create a `promptfooconfig.yaml` placeholder in your current directory.
After editing the prompts and variables to your liking, run the eval command to kick off an evaluation:
```
npx promptfoo@latest eval
```
## Usage - red teaming/pentesting
Run this command:
```sh
npx promptfoo@latest redteam init
```
This will ask you questions about what types of vulnerabilities you want to find and walk you through running your first scan.
### Configuration
The YAML configuration format runs each prompt through a series of example inputs (aka "test case") and checks if they meet requirements (aka "assert").
See the [Configuration docs](https://www.promptfoo.dev/docs/configuration/guide) for a detailed guide.
```yaml
prompts:
- file://prompt1.txt
- file://prompt2.txt
providers:
- openai:gpt-4o-mini
- ollama:llama3.1:70b
tests:
- description: 'Test translation to French'
vars:
language: French
input: Hello world
assert:
- type: contains-json
- type: javascript
value: output.length < 100
- description: 'Test translation to German'
vars:
language: German
input: How's it going?
assert:
- type: llm-rubric
value: does not describe self as an AI, model, or chatbot
- type: similar
value: was geht
threshold: 0.6 # cosine similarity
```
### Supported assertion types
See [Test assertions](https://www.promptfoo.dev/docs/configuration/expected-outputs) for full details.
Deterministic eval metrics
| Assertion Type | Returns true if... |
| ------------------------------- | ----------------------------------------------------------------- |
| `equals` | output matches exactly |
| `contains` | output contains substring |
| `icontains` | output contains substring, case insensitive |
| `regex` | output matches regex |
| `starts-with` | output starts with string |
| `contains-any` | output contains any of the listed substrings |
| `contains-all` | output contains all list of substrings |
| `icontains-any` | output contains any of the listed substrings, case insensitive |
| `icontains-all` | output contains all list of substrings, case insensitive |
| `is-json` | output is valid json (optional json schema validation) |
| `contains-json` | output contains valid json (optional json schema validation) |
| `is-sql` | output is valid sql |
| `contains-sql` | output contains valid sql |
| `is-xml` | output is valid xml |
| `contains-xml` | output contains valid xml |
| `javascript` | provided Javascript function validates the output |
| `python` | provided Python function validates the output |
| `webhook` | provided webhook returns `{pass: true}` |
| `rouge-n` | Rouge-N score is above a given threshold (default 0.75) |
| `bleu` | BLEU score is above a given threshold (default 0.5) |
| `levenshtein` | Levenshtein distance is below a threshold |
| `latency` | Latency is below a threshold (milliseconds) |
| `perplexity` | Perplexity is below a threshold |
| `perplexity-score` | Normalized perplexity |
| `cost` | Cost is below a threshold (for models with cost info such as GPT) |
| `is-valid-openai-function-call` | Ensure that the function call matches the function's JSON schema |
| `is-valid-openai-tools-call` | Ensure that all tool calls match the tools JSON schema |
Model-assisted eval metrics
| Assertion Type | Method |
| --------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------- |
| [similar](https://www.promptfoo.dev/docs/configuration/expected-outputs/similar) | Embeddings and cosine similarity are above a threshold |
| [classifier](https://www.promptfoo.dev/docs/configuration/expected-outputs/classifier) | Run LLM output through a classifier |
| [llm-rubric](https://www.promptfoo.dev/docs/configuration/expected-outputs/model-graded) | LLM output matches a given rubric, using a Language Model to grade output |
| [answer-relevance](https://www.promptfoo.dev/docs/configuration/expected-outputs/model-graded) | Ensure that LLM output is related to original query |
| [context-faithfulness](https://www.promptfoo.dev/docs/configuration/expected-outputs/model-graded) | Ensure that LLM output uses the context |
| [context-recall](https://www.promptfoo.dev/docs/configuration/expected-outputs/model-graded) | Ensure that ground truth appears in context |
| [context-relevance](https://www.promptfoo.dev/docs/configuration/expected-outputs/model-graded) | Ensure that context is relevant to original query |
| [factuality](https://www.promptfoo.dev/docs/configuration/expected-outputs/model-graded) | LLM output adheres to the given facts, using Factuality method from OpenAI eval |
| [model-graded-closedqa](https://www.promptfoo.dev/docs/configuration/expected-outputs/model-graded) | LLM output adheres to given criteria, using Closed QA method from OpenAI eval |
| [moderation](https://www.promptfoo.dev/docs/configuration/expected-outputs/moderation) | Make sure outputs are safe |
| [select-best](https://www.promptfoo.dev/docs/configuration/expected-outputs/model-graded) | Compare multiple outputs for a test case and pick the best one |
Every test type can be negated by prepending `not-`. For example, `not-equals` or `not-regex`.
### Tests from spreadsheet
Some people prefer to configure their LLM tests in a CSV. In that case, the config is pretty simple:
```yaml
prompts:
- file://prompts.txt
providers:
- openai:gpt-4o-mini
tests: file://tests.csv
```
See [example CSV](https://github.com/promptfoo/promptfoo/blob/main/examples/simple-test/tests.csv).
### Command-line
If you're looking to customize your usage, you have a wide set of parameters at your disposal.
| Option | Description |
| ----------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `-p, --prompts ` | Paths to [prompt files](https://www.promptfoo.dev/docs/configuration/parameters#prompts), directory, or glob |
| `-r, --providers ` | One of: openai:chat, openai:completion, openai:model-name, localai:chat:model-name, localai:completion:model-name. See [API providers][providers-docs] |
| `-o, --output ` | Path to [output file](https://www.promptfoo.dev/docs/configuration/parameters#output-file) (csv, json, yaml, html) |
| `--tests ` | Path to [external test file](https://www.promptfoo.dev/docs/configurationexpected-outputsassertions#load-an-external-tests-file) |
| `-c, --config ` | Path to one or more [configuration files](https://www.promptfoo.dev/docs/configuration/guide). `promptfooconfig.yaml` is automatically loaded if present |
| `-j, --max-concurrency ` | Maximum number of concurrent API calls |
| `--table-cell-max-length ` | Truncate console table cells to this length |
| `--prompt-prefix ` | This prefix is prepended to every prompt |
| `--prompt-suffix ` | This suffix is append to every prompt |
| `--grader` | [Provider][providers-docs] that will conduct the evaluation, if you are [using LLM to grade your output](https://www.promptfoo.dev/docs/configuration/expected-outputs#llm-evaluation) |
After running an eval, you may optionally use the `view` command to open the web viewer:
```sh
npx promptfoo view
```
### Examples
#### Prompt quality
In [this example](https://github.com/promptfoo/promptfoo/tree/main/examples/assistant-cli), we evaluate whether adding adjectives to the personality of an assistant bot affects the responses:
```
npx promptfoo eval -p prompts.txt -r openai:gpt-4o-mini -t tests.csv
```
This command will evaluate the prompts in `prompts.txt`, substituting the variable values from `vars.csv`, and output results in your terminal.
You can also output a nice [spreadsheet](https://docs.google.com/spreadsheets/d/1nanoj3_TniWrDl1Sj-qYqIMD6jwm5FBy15xPFdUTsmI/edit?usp=sharing), [JSON](https://github.com/promptfoo/promptfoo/blob/main/examples/simple-cli/output.json), YAML, or an HTML file:

#### Model quality
In the [next example](https://github.com/promptfoo/promptfoo/tree/main/examples/gpt-4o-vs-4o-mini), we evaluate the difference between GPT 3 and GPT 4 outputs for a given prompt:
```
npx promptfoo eval -p prompts.txt -r openai:gpt-4o openai:gpt-4o-mini -o output.html
```
Produces this HTML table:

## Usage (node package)
You can also use `promptfoo` as a library in your project by importing the `evaluate` function. The function takes the following parameters:
- `testSuite`: the Javascript equivalent of the promptfooconfig.yaml
```typescript
interface EvaluateTestSuite {
providers: string[]; // Valid provider name (e.g. openai:gpt-4o-mini)
prompts: string[]; // List of prompts
tests: string | TestCase[]; // Path to a CSV file, or list of test cases
defaultTest?: Omit; // Optional: add default vars and assertions on test case
outputPath?: string | string[]; // Optional: write results to file
}
interface TestCase {
// Optional description of what you're testing
description?: string;
// Key-value pairs to substitute in the prompt
vars?: Record;
// Optional list of automatic checks to run on the LLM output
assert?: Assertion[];
// Additional configuration settings for the prompt
options?: PromptConfig & OutputConfig & GradingConfig;
// The required score for this test case. If not provided, the test case is graded pass/fail.
threshold?: number;
// Override the provider for this test
provider?: string | ProviderOptions | ApiProvider;
}
interface Assertion {
type: string;
value?: string;
threshold?: number; // Required score for pass
weight?: number; // The weight of this assertion compared to other assertions in the test case. Defaults to 1.
provider?: ApiProvider; // For assertions that require an LLM provider
}
```
- `options`: misc options related to how the tests are run
```typescript
interface EvaluateOptions {
maxConcurrency?: number;
showProgressBar?: boolean;
generateSuggestions?: boolean;
}
```
### Example
`promptfoo` exports an `evaluate` function that you can use to run prompt evaluations.
```js
import promptfoo from 'promptfoo';
const results = await promptfoo.evaluate({
prompts: ['Rephrase this in French: {{body}}', 'Rephrase this like a pirate: {{body}}'],
providers: ['openai:gpt-4o-mini'],
tests: [
{
vars: {
body: 'Hello world',
},
},
{
vars: {
body: "I'm hungry",
},
},
],
});
```
This code imports the `promptfoo` library, defines the evaluation options, and then calls the `evaluate` function with these options.
See the full example [here](https://github.com/promptfoo/promptfoo/tree/main/examples/simple-import), which includes an example results object.
## Configuration
- **[Main guide](https://www.promptfoo.dev/docs/configuration/guide)**: Learn about how to configure your YAML file, setup prompt files, etc.
- **[Configuring test cases](https://www.promptfoo.dev/docs/configuration/expected-outputs)**: Learn more about how to configure assertions and metrics.
## Installation
Requires Node.js 18 or newer.
You can install promptfoo using npm, npx, Homebrew, or by cloning the repository.
### npm (recommended)
Install `promptfoo` globally:
```sh
npm install -g promptfoo
```
Or install it locally in your project:
```sh
npm install promptfoo
```
### npx
Run promptfoo without installing it:
```sh
npx promptfoo@latest init
```
This will create a `promptfooconfig.yaml` placeholder in your current directory.
### Homebrew
If you prefer using Homebrew, you can install promptfoo with:
```sh
brew install promptfoo
```
### From source
For the latest development version:
```sh
git clone https://github.com/promptfoo/promptfoo.git
cd promptfoo
npm install
npm run build
npm link
```
### Verify installation
To verify that promptfoo is installed correctly, run:
```sh
promptfoo --version
```
This should display the version number of promptfoo.
For more detailed installation instructions, including system requirements and troubleshooting, please visit our [installation guide](https://www.promptfoo.dev/docs/installation).
## API Providers
We support OpenAI's API as well as a number of open-source models. It's also to set up your own custom API provider. **[See Provider documentation][providers-docs]** for more details.
## Development
Here's how to build and run locally:
```sh
git clone https://github.com/promptfoo/promptfoo.git
cd promptfoo
# Optionally use the Node.js version specified in the .nvmrc file - make sure you are on node >= 18
nvm use
npm i
cd path/to/experiment-with-promptfoo # contains your promptfooconfig.yaml
npx path/to/promptfoo-source eval
```
The web UI is located in `src/app`. To run it in dev mode, run `npm run local:app`. This will host the web UI at http://localhost:3000. The web UI expects `promptfoo view` to be running separately.
Then run:
```sh
npm run build
```
The build has some side effects such as e.g. copying HTML templates, migrations, etc.
Contributions are welcome! Please feel free to submit a pull request or open an issue.
`promptfoo` includes several npm scripts to make development easier and more efficient. To use these scripts, run `npm run ` in the project directory.
Here are some of the available scripts:
- `build`: Transpile TypeScript files to JavaScript
- `build:watch`: Continuously watch and transpile TypeScript files on changes
- `test`: Run test suite
- `test:watch`: Continuously run test suite on changes
- `db:generate`: Generate new db migrations (and create the db if it doesn't already exist). Note that after generating a new migration, you'll have to `npm i` to copy the migrations into `dist/`.
- `db:migrate`: Run existing db migrations (and create the db if it doesn't already exist)
To run the CLI during development you can run a command like: `npm run local -- eval --config $(readlink -f ./examples/cloudflare-ai/chat_config.yaml)`, where any parts of the command after `--` are passed through to our CLI entrypoint. Since the Next dev server isn't supported in this mode, see the instructions above for running the web server.
# [» View full documentation «](https://www.promptfoo.dev/docs/intro)
[providers-docs]: https://www.promptfoo.dev/docs/providers
### Adding a New Provider
1. Create an implementation in `src/providers/SOME_PROVIDER_FILE`
2. Update `loadApiProvider` in `src/providers.ts` to load your provider via string
3. Add test cases in `test/providers.test.ts`
1. Test the actual provider implementation
2. Test loading the provider via a `loadApiProvider` test