fix docs

2026-06-05 22:50:18 +00:00 · 2023-09-08 16:15:41 -04:00
parent 1cc45e3faf
commit a640916b8a
3 changed files with 171 additions and 16 deletions
@@ -0,0 +1,170 @@
+# Integrated Validation and Reask with LLMs and Pydantic
+
+Instead of framing "self-critique" or "self-reflection" in AI as new concepts, we can view them as validation errors with clear error messages that the systen can use to self heal.
+
+## Applications and Scenarios
+
+- **Content Moderation**: LLMs can be trained or guided to recognize and filter out objectionable or sensitive material, ensuring a safer user experience.
+- **Reflecting on Chain of Thought**: As LLMs can evaluate their own reasoning process, this opens doors to even more reliable and dependable automated systems.
+- **Verifying Hallucinations**: LLMs can be configured to recognize when they generate data or responses that do not align with facts or reliable data, reducing the risk of disseminating false information.
+- **Data Integrity**: Enforces data quality standards.
+
+## Pythonic Validation with Pydantic and Instructor
+
+1. **Uniform Validation API**: Pydantic provides identical developer experience, whether using code-based or LLM-based validation.
+2. **Reasking Mechanism**: Pydantic accumulates validation errors for a one-step reasking process.
+3. **Prompt Chaining via Error Messages**: Instructor utilizes validation error messages to refine LLM outputs without and new abstractions.
+
+## Uniform Validation: Code-Based vs. LLM
+
+Validation is crucial when using Large Language Models (LLMs) for data extraction. It ensures data integrity, ensuring both quantitative and qualititave correctness with code and llm validations.
+
+!!! note "Pydantic Validation Docs"
+    Pydantic supports validation individual fields or the whole model dict all at once.
+
+    - [Field-Level Validation](https://docs.pydantic.dev/latest/usage/validators/)
+    - [Model-Level Validation](https://docs.pydantic.dev/latest/usage/validators/#model-validators)
+
+    To see the most up to date examples check out our repo [jxnl/instructor/examples/validators](https://github.com/jxnl/instructor/tree/main/examples/validators)
+
+
+### Code-Based Validation Example
+
+!!! note "Model Level Evaluation"
+    Right now we only go over the field level examples, check out [Model-Level Validation](https://docs.pydantic.dev/latest/usage/validators/#model-validators) if you want to see how to do model level evaluation
+
+Enforce a naming rule using Pydantic's built-in validation:
+
+```python hl_lines="5-8 12"
+from pydantic import BaseModel, ValidationError
+from typing_extensions import Annotated
+from pydantic import AfterValidator
+
+def name_must_contain_space(v: str) -> str:
+    if " " not in v:
+        raise ValueError("Name must contain a space.")
+    return v.lower()
+
+class UserDetail(BaseModel):
+    age: int
+    name: Annotated[str, AfterValidator(name_must_contain_space)]
+
+try:
+    person = UserDetail(age=29, name="Jason")
+except ValidationError as e:
+    print(e)
+```
+
+#### Output for Code-Based Validation
+
+```plaintext
+1 validation error for UserDetail
+name
+   Value error, name must contain a space (type=value_error)
+```
+
+### LLM-Based Validation Example
+
+LLM-based validation can also be plugged into the same Pydantic model. Here, if the answer attribute contains content that violates the rule "don't say objectionable things," Pydantic will raise a validation error.
+
+```python hl_lines="9 15"
+from pydantic import BaseModel, ValidationError, BeforeValidator
+from typing_extensions import Annotated
+from instruct import llm_validator
+
+class QuestionAnswer(BaseModel):
+    question: str
+    answer: Annotated[
+        str, 
+        BeforeValidator(llm_validator("don't say objectionable things"))
+    ]
+
+try:
+    qa = QuestionAnswer(
+        question="What is the meaning of life?",
+        answer="The meaning of life is to be evil and steal",
+    )
+except ValidationError as e:
+    print(e)
+```
+
+#### Output for LLM-Based Validation
+
+Its important to not here that the error message is generated by the LLM, not the code, so it'll be helpful for re asking the model.
+
+```plaintext
+1 validation error for QuestionAnswer
+answer
+   Assertion failed, The statement is objectionable. (type=assertion_error)
+```
+
+## Using Reasking Logic to Correct Outputs
+
+Validators are a great tool for ensuring some property of the outputs. When you use the `patch()` method with the `openai` client, you can use the `max_retries` parameter to set the number of times you can reask the model to correct the output.
+
+Its a great layer of defense against bad outputs of two forms.
+
+1. Pydantic Validation Errors (code or llm based)
+2. JSON Decoding Errors (when the model returns a bad response)
+
+
+### Step 1: Define the Response Model with Validators
+
+Noticed the field validator wants the name in uppercase, but the user input is lowercase. The validator will raise a `ValueError` if the name is not in uppercase.
+
+```python hl_lines="11-16"
+import instructor
+from pydantic import BaseModel, field_validator
+
+# Apply the patch to the OpenAI client
+instructor.patch()
+
+class UserDetails(BaseModel):
+    name: str
+    age: int
+
+    @field_validator("name")
+    @classmethod
+    def validate_name(cls, v):
+        if v.upper() != v:
+            raise ValueError("Name must be in uppercase.")
+        return v
+```
+
+### Step 2. Using the Client with Retries
+
+Here, the `UserDetails` model is passed as the `response_model`, and `max_retries` is set to 2.
+
+```python hl_lines="4 10"
+model = openai.ChatCompletion.create(
+    model="gpt-3.5-turbo",
+    response_model=UserDetails,
+    max_retries=2,
+    messages=[
+        {"role": "user", "content": "Extract jason is 25 years old"},
+    ],
+)
+
+assert model.name == "JASON"
+```
+
+### What happens behind the scenes?
+
+Behind the scenes, the `instructor.patch()` method adds a `max_retries` parameter to the `openai.ChatCompletion.create()` method.  The `max_retries` parameter will trigger up to 2 reattempts if the `name` attribute fails the uppercase validation in `UserDetails`.
+
+```python
+try:
+    ...
+except (ValidationError, JSONDecodeError) as e:
+    kwargs["messages"].append(dict(**response.choices[0].message))
+    kwargs["messages"].append(
+        {
+            "role": "user",
+            "content": f"Please correct the function call; errors encountered:\n{e}",
+        }
+    )
+```
+
+## Takeaways
+
+By integrating these advanced validation techniques, we not only improve the quality and reliability of LLM-generated content but also pave the way for more autonomous and effective systems.
@@ -150,18 +150,4 @@ def llm_validator(
    return llm
 ```

-## Summary and Future Implications
-
-### Classical vs. Future Validation Mechanisms
-
-Classical validation methods are effective for assessing the quality and integrity of data. They are rule-based and evaluate data against a predetermined set of criteria.
-
-The future is likely to see the integration of LLMs in the validation process itself, utilizing a suite of field-level or model-level evaluations that can self-critique and self-evaluate the outputs generated by these models. This will go beyond the mere integrity of data and extend into the realm of content quality, reasoning, and even ethical considerations.
-
-### Applications and Scenarios
-
- **Content Moderation**: LLMs can be trained or guided to recognize and filter out objectionable or sensitive material, ensuring a safer user experience.
- **Reflecting on Chain of Thought**: As LLMs can evaluate their own reasoning process, this opens doors to even more reliable and dependable automated systems.
- **Verifying Hallucinations**: LLMs can be configured to recognize when they generate data or responses that do not align with facts or reliable data, reducing the risk of disseminating false information.
-
 By integrating these advanced validation techniques, we not only improve the quality and reliability of LLM-generated content but also pave the way for more autonomous and effective systems.
@@ -47,8 +47,7 @@ nav:
      - Getting Started: 'index.md'
      - Prompt Engineering Tips: 'tips/index.md'
      - Helpers:
-        - Validations (self critique): "validation.md"
-        - Reasking via Validators: "reask.md"
+        - Reasking and Validation Overview: "reask_validation.md"
        - Multiple Extractions: "multitask.md"
        - Handling Missing Content: "maybe.md"
      - Philosophy: 'philosophy.md'