diff --git a/docs/blog/posts/chain-of-density.md b/docs/blog/posts/chain-of-density.md index 8e3e5f7..ab87c15 100644 --- a/docs/blog/posts/chain-of-density.md +++ b/docs/blog/posts/chain-of-density.md @@ -212,7 +212,7 @@ class RewrittenSummary(BaseModel): For a more in-depth walkthrough on how to use `Pydantic` validators with the `Instructor` library, we recommend checking out our previous article on LLM - validation - [Good LLM Validation is just Good Validation](/instructor/blog/2023/10/23/good-llm-validation-is-just-good-validation/) + validation - [Good LLM Validation is just Good Validation](../posts/validation-part1.md) Ideally, we'd like for `Missing` to have a length between 1 and 3, `Absent` to be an empty list and for our rewritten summaries to keep a minimum entity density. With `Instructor`, we can implement this logic using native `Pydantic` validators that are simply declared as part of the class itself. @@ -470,7 +470,7 @@ instructor jobs create-from-file generated.jsonl ??? notes "Finetuning Reference" - Checking out our [Finetuning CLI](cli/finetune/) to learn about other hyperparameters that you can tune to improve your model's performance. + Checking out our [Finetuning CLI](../../cli/finetune.md) to learn about other hyperparameters that you can tune to improve your model's performance. Once the job is complete, all we need to do is to then change the annotation in the function call to `distil_summarization` in our original file above to start using our new model. diff --git a/docs/distillation.md b/docs/concepts/distillation.md similarity index 100% rename from docs/distillation.md rename to docs/concepts/distillation.md diff --git a/docs/philosophy.md b/docs/concepts/philosophy.md similarity index 100% rename from docs/philosophy.md rename to docs/concepts/philosophy.md diff --git a/docs/tips/index.md b/docs/concepts/prompting.md similarity index 98% rename from docs/tips/index.md rename to docs/concepts/prompting.md index 97cdbd6..1b61275 100644 --- a/docs/tips/index.md +++ b/docs/concepts/prompting.md @@ -1,6 +1,4 @@ -# Prompt Engineering for Function Calling - -The overarching theme of using instructor and pydantic for function calling is to make the models as self-descriptive, modular, and flexible as possible, while maintaining data integrity and ease of use. +The overarching theme of using Instructor and Pydantic for function calling is to make the models as self-descriptive, modular, and flexible as possible, while maintaining data integrity and ease of use. - **Modularity**: Design self-contained components for reuse. - **Self-Description**: Use Pydantic's `Field` for clear field descriptions. @@ -39,7 +37,6 @@ class UserDetail(BaseModel): age: int name: str role: Optional[str] = Field(default=None) - ``` ## Handling Errors Within Function Calls @@ -121,7 +118,6 @@ class UserDetail(BaseModel): age: int name: str role: Role - ``` ## Handle Arbitrary Properties @@ -139,7 +135,6 @@ class UserDetail(BaseModel): age: int name: str properties: List[Property] = Field(..., description="Extract any other properties that might be relevant.") - ``` ## Limiting the Length of Lists diff --git a/docs/reask_validation.md b/docs/concepts/reask_validation.md similarity index 75% rename from docs/reask_validation.md rename to docs/concepts/reask_validation.md index c2368b7..67f0018 100644 --- a/docs/reask_validation.md +++ b/docs/concepts/reask_validation.md @@ -1,30 +1,18 @@ -# Validation and Reask with LLMs and Pydantic +# Validation and Reasking -Instead of framing "self-critique" or "self-reflection" in AI as new concepts, we can view them as validation errors with clear error messages that the systen can use to self heal. +Instead of framing "self-critique" or "self-reflection" in AI as new concepts, we can view them as validation errors with clear error messages that the systen can use to self correct. -## Pythonic Validation with Pydantic and Instructor +## Pydantic -1. **Uniform Validation API**: Pydantic provides identical developer experience, whether using code-based or LLM-based validation. -2. **Reasking Mechanism**: Pydantic accumulates validation errors for a one-step reasking process. -3. **Prompt Chaining via Error Messages**: Instructor utilizes validation error messages to refine LLM outputs without and new abstractions. +Pydantic offers an customizable and expressive validation framework for Python. Instructor leverages Pydantic's validation framework to provide a uniform developer experience for both code-based and LLM-based validation, as well as a reasking mechanism for correcting LLM outputs based on validation errors. To learn more check out the [Pydantic docs](https://docs.pydantic.dev/latest/concepts/validators/) on validators. -## Uniform Validation: Code-Based vs. LLM +!!! note "Good llm validation is just good validation" -Validation is crucial when using Large Language Models (LLMs) for data extraction. It ensures data integrity, ensuring both quantitative and qualititave correctness with code and llm validations. - -!!! note "Pydantic Validation Docs" - - Pydantic supports validation individual fields or the whole model dict all at once. - - - [Field-Level Validation](https://docs.pydantic.dev/latest/usage/validators/) - - [Model-Level Validation](https://docs.pydantic.dev/latest/usage/validators/#model-validators) - - To see the most up to date examples check out our repo [jxnl/instructor/examples/validators](https://github.com/jxnl/instructor/tree/main/examples/validators) + If you want to see some more examples on validators checkout our blog post [Good llm validation is just good validation](https://jxnl.github.io/instructor/blog/2023/10/23/good-llm-validation-is-just-good-validation/) ### Code-Based Validation Example -!!! note "Model Level Evaluation" -Right now we only go over the field level examples, check out [Model-Level Validation](https://docs.pydantic.dev/latest/usage/validators/#model-validators) if you want to see how to do model level evaluation +First define a Pydantic model with a validator using the `Annotation` class from `typing_extensions`. Enforce a naming rule using Pydantic's built-in validation: @@ -56,6 +44,8 @@ name Value error, name must contain a space (type=value_error) ``` +As we can see, Pydantic raises a validation error when the name attribute does not contain a space. This is a simple example, but it demonstrates how Pydantic can be used to validate attributes of a model. + ### LLM-Based Validation Example LLM-based validation can also be plugged into the same Pydantic model. Here, if the answer attribute contains content that violates the rule "don't say objectionable things," Pydantic will raise a validation error. @@ -166,7 +156,7 @@ except (ValidationError, JSONDecodeError) as e: ## Advanced Validation Techniques -The docs are currently incomplete, but we have a few advanced validation techniques that we're working on documenting better, for a example of model level validation, and using a validation context check out our example on [verifying citations](examples/exact_citations.md) which covers +The docs are currently incomplete, but we have a few advanced validation techniques that we're working on documenting better, for a example of model level validation, and using a validation context check out our example on [verifying citations](../examples/exact_citations.md) which covers 1. Validate the entire object with all attributes rather than one attribute at a time 2. Using some 'context' to validate the object, in this case we use the `context` to check if the citation existed in the original text. diff --git a/docs/contributing.md b/docs/contributing.md new file mode 100644 index 0000000..66ca952 --- /dev/null +++ b/docs/contributing.md @@ -0,0 +1,34 @@ +We would love for you to contribute to `Instructor`. + +## Issues + +If you find a bug, please file an issue on [our issue tracker on GitHub](https://github.com/jxnl/instructor/issues). + +To help us reproduce the bug, please provide a minimal reproducible example, including a code snippet and the full error message. + +1. The `response_model` you are using. +2. The `messages` you are using. +3. The `model` you are using. + +## Pull Requests + +We welcome pull requests! There is plenty to do, and we are happy to discuss any contributions you would like to make. + +If it is not a small change, please start by [filing an issue](https://github.com/jxnl/instructor/issues) first. + +If you need ideas, you can check out the [help wanted](https://github.com/jxnl/instructor/labels/help%20wanted) or [good first issue](https://github.com/jxnl/instructor/labels/good%20first%20issue) labels. + +# Contributors + + + + + + + + + + + + + diff --git a/docs/help.md b/docs/help.md new file mode 100644 index 0000000..20bcd4d --- /dev/null +++ b/docs/help.md @@ -0,0 +1,27 @@ +# Getting help with Instructor + +If you need help getting started with Instructor or with advanced usage, the following sources may be useful. + +## :material-creation: Concepts + +The [concepts](concepts/index.md) section explains the core concepts of Instructor and how to prompt with models. + +## :material-chef-hat: Cookbooks + +The [cookbooks](examples/index.md) are a great place to start. They contain a variety of examples that demonstrate how to use Instructor in different scenarios. + +## :material-book: Blog + +The [blog](blog/index.md) contains articles that explain how to use Instructor in different scenarios. + +## :material-github: GitHub Discussions + +[GitHub discussions](https://github.com/jxnl/instructor/discussions) are useful for asking questions, your question and the answer will help everyone. + +## :material-github: GitHub Issues + +[GitHub issues](https://github.com/jxnl/instructor/issues) are useful for reporting bugs or requesting new features. + +## :material-twitter: Twitter + +You can also reach out to me on [Twitter](https://twitter.com/jxnlco) if you have any questions or ideas. diff --git a/docs/index.md b/docs/index.md index b846679..7b3e7b2 100644 --- a/docs/index.md +++ b/docs/index.md @@ -1,35 +1,16 @@ -# Welcome to Instructor - Your Gateway to Structured Outputs with OpenAI +# Instructor _Structured extraction in Python, powered by OpenAI's function calling api, designed for simplicity, transparency, and control._ --- +[![Pydantic v2](https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/pydantic/pydantic/main/docs/badge/v2.json)](https://pydantic.dev) +[![Twitter Follow](https://img.shields.io/twitter/follow/jxnlco?style=social)](https://twitter.com/jxnlco) [![Downloads](https://img.shields.io/pypi/dm/instructor.svg)](https://pypi.python.org/pypi/instructor) -![Star us on Github!](https://img.shields.io/github/stars/jxnl/instructor.svg?style=social) [![Documentation](https://img.shields.io/badge/docs-available-brightgreen)](https://jxnl.github.io/instructor) [![GitHub issues](https://img.shields.io/github/issues/jxnl/instructor.svg)](https://github.com/jxnl/instructor/issues) -[![Twitter Follow](https://img.shields.io/twitter/follow/jxnlco?style=social)](https://twitter.com/jxnlco) -Dive into the world of Python-based structured extraction, empowered by OpenAI's cutting-edge function calling API. Instructor stands out for its simplicity, transparency, and user-centric design. Whether you're a seasoned developer or just starting out, you'll find Instructor's approach intuitive and its results insightful. - -## Get Started in Moments - -Installing Instructor is a breeze. Just run `pip install instructor` in your terminal and you're on your way to a smoother data handling experience. - -## How Instructor Enhances Your Workflow - -Our `instructor.patch` for the `OpenAI` class introduces three key enhancements: - -- **Response Mode:** Specify a Pydantic model to streamline data extraction. -- **Max Retries:** Set your desired number of retry attempts for requests. -- **Validation Context:** Provide a context object for enhanced validator access. - A Glimpse into Instructor's Capabilities - -!!! note "Using Validators" - - Learn more about validators checkout our blog post [Good llm validation is just good validation](https://jxnl.github.io/instructor/blog/2023/10/23/good-llm-validation-is-just-good-validation/) - -With Instructor, your code becomes more efficient and readable. Here’s a quick peek: +Dive into the world of Python-based structured extraction, by OpenAI's function calling API and Pydantic, the most widely used data validation library for Python. Instructor stands out for its simplicity, transparency, and user-centric design. Whether you're a seasoned developer or just starting out, you'll find Instructor's approach intuitive and steerable. ## Usage @@ -59,24 +40,7 @@ assert user.name == "Jason" assert user.age == 25 ``` -**"Using `openai<1.0.0`"** - -If you're using `openai<1.0.0` then make sure you `pip install instructor<0.3.0` -where you can patch a global client like so: - -```python hl_lines="4 8" -import openai -import instructor - -instructor.patch() - -user = openai.ChatCompletion.create( - ..., - response_model=UserDetail, -) -``` - -**"Using async clients"** +**Using async clients** For async clients you must use apatch vs patch like so: @@ -101,115 +65,25 @@ model = await aclient.chat.completions.create( assert isinstance(model, UserExtract) ``` -### Step 1: Patch the client +## Why use Instructor? -First, import the required libraries and apply the patch function to the OpenAI module. This allows us to parse the raw JSON from our OpenAI completions into Pydantic output. +The question of using Instructor is fundamentally a question of why to use Pydantic. -```python -import instructor -from openai import OpenAI -from pydantic import BaseModel +1. **Powered by type hints** — Instructor is powered by Pydantic, which is powered by type hints. Schema validation, prompting is controleld by type annotations; less to learn, less code ot write,and integrates with your IDE. -# This enables response_model keyword -# from client.chat.completions.create -client = instructor.patch(OpenAI()) -``` +2. **Powered by OpenAI** — Instructor is powered by OpenAI's function calling API. This means you can use the same API for both prompting and extraction. -### Step 2: Define the Pydantic Model +3. **Customizable** — Pydantic is highly customizable. You can define your own validators, custom error messages, and more. -Create a Pydantic model to define the structure of the data extracted from the OpenAI response. This model will map directly to the information in the prompt. +4. **Ecosystem** Pydantic is the most widely used data validation library for Python. It's used by FastAPI, Typer, and many other popular libraries. -```python -class UserDetail(BaseModel): - name: str - age: int -``` +5. **Battle Tested** — Pydantic is downloaded over 100M times per month, and supported by a large community of contributors. -### Step 3: Extract +## More Examples -Use the `client.chat.completions.create` method to generate a completion and extract response data into the Pydantic object. The response_model parameter enables autocomplete and spell check in your IDE. +If you'd like to see more check out our [cookbook](examples/index.md). -```python -user: UserDetail = client.chat.completions.create( - model="gpt-3.5-turbo", - response_model=UserDetail, - messages=[ - {"role": "user", "content": "Extract Jason is 25 years old"}, - ] -) - -assert user.name == "Jason" -assert user.age == 25 -``` - -## Pydantic Validation - -Validation can also be plugged into the same Pydantic model. Here, if the answer attribute contains content that violates the rule "don't say objectionable things," Pydantic will raise a validation error. - -```python hl_lines="9 15" -from pydantic import BaseModel, ValidationError, BeforeValidator -from typing_extensions import Annotated -from instructor import llm_validator - -class QuestionAnswer(BaseModel): - question: str - answer: Annotated[ - str, - BeforeValidator(llm_validator("don't say objectionable things")) - ] - -try: - qa = QuestionAnswer( - question="What is the meaning of life?", - answer="The meaning of life is to be evil and steal", - ) -except ValidationError as e: - print(e) -``` - -Note, the error message is generated by the LLM, not the code, so it'll be helpful for re asking the model. - -```plaintext -1 validation error for QuestionAnswer -answer - Assertion failed, The statement is objectionable. (type=assertion_error) -``` - -## Reask on validation error - -Here, the `UserDetails` model is passed as the `response_model`, and `max_retries` is set to 2. - -```python -import instructor - -from openai import OpenAI -from pydantic import BaseModel, field_validator - -# Apply the patch to the OpenAI client -client = instructor.patch(OpenAI()) - -class UserDetails(BaseModel): - name: str - age: int - - @field_validator("name") - @classmethod - def validate_name(cls, v): - if v.upper() != v: - raise ValueError("Name must be in uppercase.") - return v - -model = client.chat.completions.create( - model="gpt-3.5-turbo", - response_model=UserDetails, - max_retries=2, - messages=[ - {"role": "user", "content": "Extract jason is 25 years old"}, - ], -) - -assert model.name == "JASON" -``` +[Installing Instructor](installation.md) is a breeze. Just run `pip install instructor`. ## Contributing @@ -218,18 +92,3 @@ If you want to help out checkout some of the issues marked as `good-first-issue` ## License This project is licensed under the terms of the MIT License. - -# Contributors - - - - - - - - - - - - - diff --git a/docs/installation.md b/docs/installation.md new file mode 100644 index 0000000..8046b82 --- /dev/null +++ b/docs/installation.md @@ -0,0 +1,14 @@ +Installation is as simple as: + +```bash +pip install instructor +``` + +Instructor has a few dependencies: + +- [`openai`](https://pypi.org/project/openai/): OpenAI's Python client. +- [`typer`](https://pypi.org/project/typer/): Build great CLIs. Easy to code. Based on Python type hints. +- [`docstring-parser`](https://pypi.org/project/docstring-parser/): A parser for Python docstrings, to improve the experience of working with docstrings in jsonschema. +- [`pydantic`](https://pypi.org/project/pydantic/): Data validation and settings management using python type annotations. + +If you've got Python 3.9+ and `pip` installed, you're good to go. diff --git a/docs/why.md b/docs/why.md new file mode 100644 index 0000000..5036c6f --- /dev/null +++ b/docs/why.md @@ -0,0 +1,151 @@ +# Why use Instructor? + +??? question "Why use Pydantic?" + + Its hard to answer the question of why use Instructor without first answering [why use Pydantic.](https://docs.pydantic.dev/latest/why/): + + + - **Powered by type hints** — with Pydantic, schema validation and serialization are controlled by type annotations; less to learn, less code to write, and integration with your IDE and static analysis tools. + + - **Speed** — Pydantic's core validation logic is written in Rust. As a result, Pydantic is among the fastest data validation libraries for Python. + + - **JSON Schema** — Pydantic models can emit JSON Schema, allowing for easy integration with other tools. [Learn more…] + + - **Customisation** — Pydantic allows custom validators and serializers to alter how data is processed in many powerful ways. + + - **Ecosystem** — around 8,000 packages on PyPI use Pydantic, including massively popular libraries like + _FastAPI_, _huggingface_, _Django Ninja_, _SQLModel_, & _LangChain_. + + - **Battle tested** — Pydantic is downloaded over 70M times/month and is used by all FAANG companies and 20 of the 25 largest companies on NASDAQ. If you're trying to do something with Pydantic, someone else has probably already done it. + +Our `instructor.patch` for the `OpenAI` class introduces three key enhancements: + +- **Response Mode:** Specify a Pydantic model to streamline data extraction. +- **Max Retries:** Set your desired number of retry attempts for requests. +- **Validation Context:** Provide a context object for enhanced validator access. + A Glimpse into Instructor's Capabilities + +!!! note "Using Validators" + + Learn more about validators checkout our blog post [Good llm validation is just good validation](https://jxnl.github.io/instructor/blog/2023/10/23/good-llm-validation-is-just-good-validation/) + +With Instructor, your code becomes more efficient and readable. Here’s a quick peek: + +## Understanding the `patch` + +Lets go over the `patch` function. And see how we can leverage it to make use of instructor + +### Step 1: Patch the client + +First, import the required libraries and apply the patch function to the OpenAI module. This exposes new functionality with the response_model parameter. + +```python +import instructor +from openai import OpenAI +from pydantic import BaseModel + +# This enables response_model keyword +# from client.chat.completions.create +client = instructor.patch(OpenAI()) +``` + +### Step 2: Define the Pydantic Model + +Create a Pydantic model to define the structure of the data you want to extract. This model will map directly to the information in the prompt. + +```python +from pydantic import BaseModel + +class UserDetail(BaseModel): + name: str + age: int +``` + +### Step 3: Extract + +Use the `client.chat.completions.create` method to send a prompt and extract the data into the Pydantic object. The response_model parameter specifies the Pydantic model to use for extraction. Its helpful to annotate the variable with the type of the response model. +which will help your IDE provide autocomplete and spell check. + +```python +user: UserDetail = client.chat.completions.create( + model="gpt-3.5-turbo", + response_model=UserDetail, + messages=[ + {"role": "user", "content": "Extract Jason is 25 years old"}, + ] +) + +assert user.name == "Jason" +assert user.age == 25 +``` + +## Understanding Validation + +Validation can also be plugged into the same Pydantic model. Here, if the answer attribute contains content that violates the rule "don't say objectionable things," Pydantic will raise a validation error. + +```python hl_lines="9 15" +from pydantic import BaseModel, ValidationError, BeforeValidator +from typing_extensions import Annotated +from instructor import llm_validator + +class QuestionAnswer(BaseModel): + question: str + answer: Annotated[ + str, + BeforeValidator(llm_validator("don't say objectionable things")) + ] + +try: + qa = QuestionAnswer( + question="What is the meaning of life?", + answer="The meaning of life is to be evil and steal", + ) +except ValidationError as e: + print(e) +``` + +Its important to not here that the error message is generated by the LLM, not the code, so it'll be helpful for re asking the model. + +```plaintext +1 validation error for QuestionAnswer +answer + Assertion failed, The statement is objectionable. (type=assertion_error) +``` + +## Self Correcting on Validation Error + +Here, the `UserDetails` model is passed as the `response_model`, and `max_retries` is set to 2. + +```python +import instructor + +from openai import OpenAI +from pydantic import BaseModel, field_validator + +# Apply the patch to the OpenAI client +client = instructor.patch(OpenAI()) + +class UserDetails(BaseModel): + name: str + age: int + + @field_validator("name") + @classmethod + def validate_name(cls, v): + if v.upper() != v: + raise ValueError("Name must be in uppercase.") + return v + +model = client.chat.completions.create( + model="gpt-3.5-turbo", + response_model=UserDetails, + max_retries=2, + messages=[ + {"role": "user", "content": "Extract jason is 25 years old"}, + ], +) + +assert model.name == "JASON" +``` + +As you can see, we've baked in a self correcting mechanism into the model. This is a powerful way to make your models more robust and less brittle without include a lot of extra code or prompt. diff --git a/mkdocs.yml b/mkdocs.yml index ee828f1..c835d45 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -121,11 +121,16 @@ markdown_extensions: - pymdownx.tilde nav: - Introduction: - - Quick Start: 'index.md' - - Validators: "reask_validation.md" - - Distillation: "distillation.md" - - Prompt Engineering Tips: 'tips/index.md' - - Philosophy: 'philosophy.md' + - Welcome To Instructor: 'index.md' + - Why use Instructor?: 'why.md' + - Help with Instructor: 'help.md' + - Installation: 'installation.md' + - Contributing: 'contributing.md' + - Concepts: + - Schema Engineering: 'concepts/prompting.md' + - Validators: "concepts/reask_validation.md" + - Distillation: "concepts/distillation.md" + - Philosophy: 'concepts/philosophy.md' - Cookbook: - Overview: 'examples/index.md' - Streaming Lists: "examples/multitask.md" diff --git a/requirements.txt b/requirements.txt index 9575d18..bf8687a 100644 --- a/requirements.txt +++ b/requirements.txt @@ -1,4 +1,3 @@ openai>=1.1.0 pydantic -pytest docstring-parser \ No newline at end of file