Evaluation how-to guides
Step-by-step guides that cover key tasks and operations for doing evaluating and testing your applications in LangSmith.
Evaluation SDK & API
Write evaluations to test and improve your application.
- Evaluate an LLM application in the SDK
- Define a custom evaluator
- Evaluate on intermediate steps
- Use LangChain off-the-shelf evaluators (Python only)
- Evaluate an existing experiment
- Run a pairwise evaluation
- Run evals using the API only
Unit testing
Run assertions and expectations designed to quickly identify obvious bugs and regressions in your AI system, natively in your favorite testing library.
Auto-evaluation
Set up auto-evaluators that LangSmith will automatically run on your experiments.
Online evaluation
Set up evaluations to run on incoming traces to understand your application's behavior in production.
Experiments
Use the experiments UI & API to understand your evaluations.
- Run an evaluation in the prompt playground
- Compare experiments with the comparison view
- Filter experiments
- View pairwise experiments
- Fetch experiment results in the SDK
- Upload experiments run outside of LangSmith with the REST API
Datasets
Manage datasets in LangSmith used by your offline evaluations (as well as other downstream applications).
- Manage datasets in the application
- Manage datasets programmatically
- Version datasets
- Share or unshare a dataset publicly
- Export filtered traces from an experiment to a dataset
Annotation Queues and Human Feedback
Collect feedback from subject matter experts and users to improve your LLM applications.