Evaluation how-to guides

Step-by-step guides that cover key tasks and operations for doing evaluating and testing your applications in LangSmith.

Evaluation SDK & API

Write evaluations to test and improve your application.

Unit testing

Run assertions and expectations designed to quickly identify obvious bugs and regressions in your AI system, natively in your favorite testing library.

Unit test LLM applications (Python only)

Auto-evaluation

Set up auto-evaluators that LangSmith will automatically run on your experiments.

Online evaluation

Set up evaluations to run on incoming traces to understand your application's behavior in production.

Experiments

Use the experiments UI & API to understand your evaluations.

Datasets

Manage datasets in LangSmith used by your offline evaluations (as well as other downstream applications).

Annotation Queues and Human Feedback

Collect feedback from subject matter experts and users to improve your LLM applications.

Evaluation how-to guides

Evaluation SDK & API

Unit testing

Auto-evaluation

Online evaluation

Experiments

Datasets

Annotation Queues and Human Feedback

Was this page helpful?

You can leave detailed feedback on GitHub.

Evaluation SDK & API​

Unit testing​

Auto-evaluation​

Online evaluation​

Experiments​

Datasets​

Annotation Queues and Human Feedback​

Was this page helpful?

You can leave detailed feedback on GitHub.

Evaluation SDK & API

Unit testing

Auto-evaluation

Online evaluation

Experiments

Datasets

Annotation Queues and Human Feedback