mirror of
https://github.com/kennethreitz/instructor.git
synced 2026-06-05 22:50:18 +00:00
177 lines
6.8 KiB
Markdown
177 lines
6.8 KiB
Markdown
# Integrated Validation and Reask with LLMs and Pydantic
|
|
|
|
Instead of framing "self-critique" or "self-reflection" in AI as new concepts, we can view them as validation errors with clear error messages that the systen can use to self heal.
|
|
|
|
## Applications and Scenarios
|
|
|
|
- **Content Moderation**: LLMs can be trained or guided to recognize and filter out objectionable or sensitive material, ensuring a safer user experience.
|
|
- **Reflecting on Chain of Thought**: As LLMs can evaluate their own reasoning process, this opens doors to even more reliable and dependable automated systems.
|
|
- **Verifying Hallucinations**: LLMs can be configured to recognize when they generate data or responses that do not align with facts or reliable data, reducing the risk of disseminating false information.
|
|
- **Data Integrity**: Enforces data quality standards.
|
|
|
|
## Pythonic Validation with Pydantic and Instructor
|
|
|
|
1. **Uniform Validation API**: Pydantic provides identical developer experience, whether using code-based or LLM-based validation.
|
|
2. **Reasking Mechanism**: Pydantic accumulates validation errors for a one-step reasking process.
|
|
3. **Prompt Chaining via Error Messages**: Instructor utilizes validation error messages to refine LLM outputs without and new abstractions.
|
|
|
|
## Uniform Validation: Code-Based vs. LLM
|
|
|
|
Validation is crucial when using Large Language Models (LLMs) for data extraction. It ensures data integrity, ensuring both quantitative and qualititave correctness with code and llm validations.
|
|
|
|
!!! note "Pydantic Validation Docs"
|
|
Pydantic supports validation individual fields or the whole model dict all at once.
|
|
|
|
- [Field-Level Validation](https://docs.pydantic.dev/latest/usage/validators/)
|
|
- [Model-Level Validation](https://docs.pydantic.dev/latest/usage/validators/#model-validators)
|
|
|
|
To see the most up to date examples check out our repo [jxnl/instructor/examples/validators](https://github.com/jxnl/instructor/tree/main/examples/validators)
|
|
|
|
|
|
### Code-Based Validation Example
|
|
|
|
!!! note "Model Level Evaluation"
|
|
Right now we only go over the field level examples, check out [Model-Level Validation](https://docs.pydantic.dev/latest/usage/validators/#model-validators) if you want to see how to do model level evaluation
|
|
|
|
Enforce a naming rule using Pydantic's built-in validation:
|
|
|
|
```python hl_lines="5-8 12"
|
|
from pydantic import BaseModel, ValidationError
|
|
from typing_extensions import Annotated
|
|
from pydantic import AfterValidator
|
|
|
|
def name_must_contain_space(v: str) -> str:
|
|
if " " not in v:
|
|
raise ValueError("Name must contain a space.")
|
|
return v.lower()
|
|
|
|
class UserDetail(BaseModel):
|
|
age: int
|
|
name: Annotated[str, AfterValidator(name_must_contain_space)]
|
|
|
|
try:
|
|
person = UserDetail(age=29, name="Jason")
|
|
except ValidationError as e:
|
|
print(e)
|
|
```
|
|
|
|
#### Output for Code-Based Validation
|
|
|
|
```plaintext
|
|
1 validation error for UserDetail
|
|
name
|
|
Value error, name must contain a space (type=value_error)
|
|
```
|
|
|
|
### LLM-Based Validation Example
|
|
|
|
LLM-based validation can also be plugged into the same Pydantic model. Here, if the answer attribute contains content that violates the rule "don't say objectionable things," Pydantic will raise a validation error.
|
|
|
|
```python hl_lines="9 15"
|
|
from pydantic import BaseModel, ValidationError, BeforeValidator
|
|
from typing_extensions import Annotated
|
|
from instruct import llm_validator
|
|
|
|
class QuestionAnswer(BaseModel):
|
|
question: str
|
|
answer: Annotated[
|
|
str,
|
|
BeforeValidator(llm_validator("don't say objectionable things"))
|
|
]
|
|
|
|
try:
|
|
qa = QuestionAnswer(
|
|
question="What is the meaning of life?",
|
|
answer="The meaning of life is to be evil and steal",
|
|
)
|
|
except ValidationError as e:
|
|
print(e)
|
|
```
|
|
|
|
#### Output for LLM-Based Validation
|
|
|
|
Its important to not here that the error message is generated by the LLM, not the code, so it'll be helpful for re asking the model.
|
|
|
|
```plaintext
|
|
1 validation error for QuestionAnswer
|
|
answer
|
|
Assertion failed, The statement is objectionable. (type=assertion_error)
|
|
```
|
|
|
|
## Using Reasking Logic to Correct Outputs
|
|
|
|
Validators are a great tool for ensuring some property of the outputs. When you use the `patch()` method with the `openai` client, you can use the `max_retries` parameter to set the number of times you can reask the model to correct the output.
|
|
|
|
Its a great layer of defense against bad outputs of two forms.
|
|
|
|
1. Pydantic Validation Errors (code or llm based)
|
|
2. JSON Decoding Errors (when the model returns a bad response)
|
|
|
|
|
|
### Step 1: Define the Response Model with Validators
|
|
|
|
Noticed the field validator wants the name in uppercase, but the user input is lowercase. The validator will raise a `ValueError` if the name is not in uppercase.
|
|
|
|
```python hl_lines="11-16"
|
|
import instructor
|
|
from pydantic import BaseModel, field_validator
|
|
|
|
# Apply the patch to the OpenAI client
|
|
instructor.patch()
|
|
|
|
class UserDetails(BaseModel):
|
|
name: str
|
|
age: int
|
|
|
|
@field_validator("name")
|
|
@classmethod
|
|
def validate_name(cls, v):
|
|
if v.upper() != v:
|
|
raise ValueError("Name must be in uppercase.")
|
|
return v
|
|
```
|
|
|
|
### Step 2. Using the Client with Retries
|
|
|
|
Here, the `UserDetails` model is passed as the `response_model`, and `max_retries` is set to 2.
|
|
|
|
```python hl_lines="4 10"
|
|
model = openai.ChatCompletion.create(
|
|
model="gpt-3.5-turbo",
|
|
response_model=UserDetails,
|
|
max_retries=2,
|
|
messages=[
|
|
{"role": "user", "content": "Extract jason is 25 years old"},
|
|
],
|
|
)
|
|
|
|
assert model.name == "JASON"
|
|
```
|
|
|
|
### What happens behind the scenes?
|
|
|
|
Behind the scenes, the `instructor.patch()` method adds a `max_retries` parameter to the `openai.ChatCompletion.create()` method. The `max_retries` parameter will trigger up to 2 reattempts if the `name` attribute fails the uppercase validation in `UserDetails`.
|
|
|
|
```python
|
|
try:
|
|
...
|
|
except (ValidationError, JSONDecodeError) as e:
|
|
kwargs["messages"].append(dict(**response.choices[0].message))
|
|
kwargs["messages"].append(
|
|
{
|
|
"role": "user",
|
|
"content": f"Please correct the function call; errors encountered:\n{e}",
|
|
}
|
|
)
|
|
```
|
|
|
|
## Advanced Validation Techniques
|
|
|
|
The docs are currently incomplete, but we have a few advanced validation techniques that we're working on documenting better, for a example of model level validation, and using a validation context check out our example on [verifying citations](examples/exact_citations.md) which covers
|
|
|
|
1. Validate the entire object with all attributes rather than one attribute at a time
|
|
2. Using some 'context' to validate the object, in this case we use the `context` to check if the citation existed in the original text.
|
|
|
|
## Takeaways
|
|
|
|
By integrating these advanced validation techniques, we not only improve the quality and reliability of LLM-generated content but also pave the way for more autonomous and effective systems. |