clean up documentation

This commit is contained in:
Jason Liu
2023-11-08 00:24:03 -05:00
parent c12c162919
commit 198cb7a69d
6 changed files with 138 additions and 114 deletions
+1 -1
View File
@@ -1,4 +1,4 @@
# Instructor (openai_function_call)
# Instructor
[![GitHub stars](https://img.shields.io/github/stars/jxnl/instructor.svg)](https://github.com/jxnl/instructor/stargazers)
[![GitHub forks](https://img.shields.io/github/forks/jxnl/instructor.svg)](https://github.com/jxnl/instructor/network)
+108 -58
View File
@@ -1,42 +1,46 @@
# Instructor (openai_function_call)
# Getting Started with Instructor
!!! note "Renaming from openai_function_call"
This library used to be called `openai_function_call` simply change the import and you should be good to go!
_Structured extraction in Python, powered by OpenAI's function calling api, designed for simplicity, transparency, and control._
```sh
find /path/to/dir -type f -exec sed -i 's/openai_function_call/instructor/g' {} \;
```
---
*Structured extraction in Python, powered by OpenAI's function calling api, designed for simplicity, transparency, and control.*
[Star us on Github!](https://jxnl.github.io/instructor).
-----
[![GitHub stars](https://img.shields.io/github/stars/jxnl/instructor.svg)](https://github.com/jxnl/instructor/stargazers)
[![GitHub issues](https://img.shields.io/github/issues/jxnl/instructor.svg)](https://github.com/jxnl/instructor/issues)
[![Github discussions](https://img.shields.io/github/discussions/jxnl/instructor)](https:github.com/jxnl/instructor/discussions)
[![Buy Me a Coffee](https://img.shields.io/badge/Buy%20Me%20a%20Coffee-Donate-yellow)](https://www.buymeacoffee.com/jxnlco)
[![Twitter Follow](https://img.shields.io/twitter/follow/jxnlco?style=social)](https://twitter.com/jxnlco)
This library is built to interact with openai's function call api from python code, with python structs / objects. It's designed to be intuitive, easy to use, but give great visibily in how we call openai.
Built to interact solely with openai's function calling api from python. It's designed to be intuitive, easy to use, and provide great visibility into your prompts.
The approach of combining a human prompt and a "response schema" is not necessarily unique; however, it shows great promise. As we have been concentrating on translating user intent into structured data, we have discovered that Python with Pydantic is exceptionally well-suited for this task.
```py hl_lines="5 13"
import openai
import instructor
**OpenAISchema** is based on Python type annotations, and powered by Pydantic.
# Enables `response_model`
instructor.patch()
The key features are:
class UserDetail(BaseModel):
name: str
age: int
* **Intuitive to write**: Great support for editors, completions. Spend less time debugging.
* **Writing prompts as code**: Collocate docstrings and descriptions as part of your prompting.
* **Extensible**: Bring your own kitchen sink without being weighted down by abstractions.
user = openai.ChatCompletion.create(
model="gpt-3.5-turbo",
response_model=UserDetail,
messages=[
{"role": "user", "content": "Extract Jason is 25 years old"},
]
)
## Structured Extraction with `openai`
Welcome to the Quick Start Guide for OpenAI Function Call. This guide will walk you through the installation process and provide examples demonstrating the usage of function calls and schemas with OpenAI and Pydantic.
### Requirements
This library depends on **Pydantic** and **OpenAI** that's all.
assert isinstance(user, UserDetail)
assert user.name == "Jason"
assert user.age == 25
```
### Installation
To get started with OpenAI Function Call, you need to install it using `pip`. Run the following command in your terminal:
!!! note Requirement
Ensure you have Python version 3.9 or above.
To get started you need to install it using `pip`. Run the following command in your terminal:
```sh
$ pip install instructor
@@ -44,7 +48,19 @@ $ pip install instructor
## Quick Start with Patching ChatCompletion
To simplify your work with OpenAI models and streamline the extraction of Pydantic objects from prompts, we offer a patching mechanism for the `ChatCompletion`` class. Here's a step-by-step guide:
To simplify your work with OpenAI we offer a patching mechanism for the `ChatCompletion` class.
Here's a step-by-step guide:
This patch introduces 3 features to the `ChatCompletion` class:
1. The `response_model` parameter, which allows you to specify a Pydantic model to extract data into.
2. The `max_retries` parameter, which allows you to specify the number of times to retry the request if it fails.
3. The `validation_context` parameter, which allows you to specify a context object that validators have access to.
!!! note "Using Validators"
Learn more about validators checkout our blog post [Good llm validation is just good validation](https://jxnl.github.io/instructor/blog/2023/10/23/good-llm-validation-is-just-good-validation/)
### Step 1: Import and Patch the Module
@@ -53,10 +69,7 @@ First, import the required libraries and apply the patch function to the OpenAI
```python
import openai
import instructor
from pydantic import BaseModel
# This enables response_model keyword
# from openai.ChatCompletion.create
instructor.patch()
```
@@ -65,6 +78,8 @@ instructor.patch()
Create a Pydantic model to define the structure of the data you want to extract. This model will map directly to the information in the prompt.
```python
from pydantic import BaseModel
class UserDetail(BaseModel):
name: str
age: int
@@ -82,6 +97,9 @@ user: UserDetail = openai.ChatCompletion.create(
{"role": "user", "content": "Extract Jason is 25 years old"},
]
)
assert user.name == "Jason"
assert user.age == 25
```
### Step 4: Validate the Extracted Data
@@ -93,41 +111,73 @@ assert user.name == "Jason"
assert user.age == 25
```
## IDE Support
### LLM-Based Validation
Everything is designed for you to get the best developer experience possible, with the best editor support.
LLM-based validation can also be plugged into the same Pydantic model. Here, if the answer attribute contains content that violates the rule "don't say objectionable things," Pydantic will raise a validation error.
Including **autocompletion**:
```python hl_lines="9 15"
from pydantic import BaseModel, ValidationError, BeforeValidator
from typing_extensions import Annotated
from instructor import llm_validator
![autocomplete](img/ide_support.png)
class QuestionAnswer(BaseModel):
question: str
answer: Annotated[
str,
BeforeValidator(llm_validator("don't say objectionable things"))
]
And even **inline errors**
![errors](img/error2.png)
## OpenAI Schema and Pydantic
This quick start guide provided you with a basic understanding of how to use OpenAI Function Call for schema extraction and function calls. You can now explore more advanced use cases and creative applications of this library.
Since `UserDetails` is a `OpenAISchems` and a `pydantic.BaseModel` you can use inheritance and nesting to create more complex emails while avoiding code duplication
```python
class UserDetails(OpenAISchema):
name: str = Field(..., description="User's full name")
age: int
class UserWithAddress(UserDetails):
address: str
class UserWithFriends(UserDetails):
best_friend: UserDetails
friends: List[UserDetails]
try:
qa = QuestionAnswer(
question="What is the meaning of life?",
answer="The meaning of life is to be evil and steal",
)
except ValidationError as e:
print(e)
```
If you have any questions, feel free to leave an issue or reach out to the library's author on [Twitter](https://twitter.com/jxnlco). For a more comprehensive solution with additional features, consider checking out [MarvinAI](https://www.askmarvin.ai/).
Its important to not here that the error message is generated by the LLM, not the code, so it'll be helpful for re asking the model.
To see more examples of how we can create interesting models check out some [examples.](examples/index.md)
```plaintext
1 validation error for QuestionAnswer
answer
Assertion failed, The statement is objectionable. (type=assertion_error)
```
## Using the Client with Retries
Here, the `UserDetails` model is passed as the `response_model`, and `max_retries` is set to 2.
```python
import instructor
from pydantic import BaseModel, field_validator
# Apply the patch to the OpenAI client
instructor.patch()
class UserDetails(BaseModel):
name: str
age: int
@field_validator("name")
@classmethod
def validate_name(cls, v):
if v.upper() != v:
raise ValueError("Name must be in uppercase.")
return v
model = openai.ChatCompletion.create(
model="gpt-3.5-turbo",
response_model=UserDetails,
max_retries=2,
messages=[
{"role": "user", "content": "Extract jason is 25 years old"},
],
)
assert model.name == "JASON"
```
## License
This project is licensed under ther terms of the MIT License.
This project is licensed under the terms of the MIT License.
+18 -35
View File
@@ -1,4 +1,4 @@
# Patterns for Extracting Multiple Items
# Streaming and MultiTask
A common use case of structured extraction is defining a single schema class and then making another schema to create a list to do multiple extraction
@@ -20,51 +20,45 @@ Defining a task and creating a list of classes is a common enough pattern that w
By using multitask you get a very convient class with prompts and names automatically defined. You get `from_response` just like any other `OpenAISchema` you're able to extract the list of objects data you want with `MultTask.tasks`.
```python hl_lines="13"
from instructor import OpenAISchema, MultiTask
```python
import instructor
# Enable `response_model`
instructor.patch()
class User(BaseModel):
name: str
age: int
MultiUser = MultiTask(User)
completion = openai.ChatCompletion.create(
model="gpt-4-0613",
temperature=0.1,
stream=False,
functions=[MultiUser.openai_schema],
function_call={"name": MultiUser.openai_schema["name"]},
results = openai.ChatCompletion.create(
model="gpt-3.5-turbo",
response_model=instructor.MultiTask(User)
messages=[
{
"role": "user",
"content": f"Consider the data below: Jason is 10 and John is 30",
},
],
max_tokens=1000,
)
MultiUser.from_response(completion)
```
```sh
{"tasks": [
{"name": "Jason", "age": 10},
{"name": "John", "age": 30}
]}
```json
{
"tasks": [
{ "name": "Jason", "age": 10 },
{ "name": "John", "age": 30 }
]
}
```
## Streaming Tasks
Since a `MultiTask(T)` is well contrained to `tasks: List[T]` we can make assuptions on how tokens are used and provide a helper method that allows you generate tasks as the the tokens are streamed in
!!! tips "Why would we want this?"
While `gpt-3.5-turbo` is quite fast `gpt-4` will take a while if there are many objects or if each object schema is complex. If 10 entities are created and takes 100ms to complete it would mean that it would take 1 second before we had access to our objects. With streaming you'd get the first object in 100ms a 10x percieved improvement in latency! While this may not make sense for more usecases if we were dynamitcally building UI based on entities, streaming entities 1 by 1 could improve the user experience dramatically.
Since a `MultiTask(T)` is well contrained to `tasks: List[T]` we can make assuptions on how tokens are used and provide a helper method that allows you generate tasks as the the tokens are streamed in. This currently isnt supported via the `response_model` parameter but can be used with the `functions` parameter.
Lets look at an example in action with the same class
```python hl_lines="6 26"
MultiUser = MultiTask(User)
MultiUser = instructor.MultiTask(User)
completion = openai.ChatCompletion.create(
model="gpt-4-0613",
@@ -96,14 +90,3 @@ for user in MultiUser.from_streaming_response(completion):
>>> name="Jason" "age"=10
>>> name="John" "age"=10
```
!!! usage "How??"
Consider this incomplete json string.
```json
{"tasks": [{"name": "Jason", "age": 10}
```
Notice how, while this isn't valid json, we know that one complete `User` object was generated so we `yield` that object to be used elsewhere as soon as possible.
This streaming is still a prototype, but should work quite well for simple schemas.
+6 -14
View File
@@ -1,14 +1,7 @@
# Integrated Validation and Reask with LLMs and Pydantic
# Validation and Reask with LLMs and Pydantic
Instead of framing "self-critique" or "self-reflection" in AI as new concepts, we can view them as validation errors with clear error messages that the systen can use to self heal.
## Applications and Scenarios
- **Content Moderation**: LLMs can be trained or guided to recognize and filter out objectionable or sensitive material, ensuring a safer user experience.
- **Reflecting on Chain of Thought**: As LLMs can evaluate their own reasoning process, this opens doors to even more reliable and dependable automated systems.
- **Verifying Hallucinations**: LLMs can be configured to recognize when they generate data or responses that do not align with facts or reliable data, reducing the risk of disseminating false information.
- **Data Integrity**: Enforces data quality standards.
## Pythonic Validation with Pydantic and Instructor
1. **Uniform Validation API**: Pydantic provides identical developer experience, whether using code-based or LLM-based validation.
@@ -20,6 +13,7 @@ Instead of framing "self-critique" or "self-reflection" in AI as new concepts, w
Validation is crucial when using Large Language Models (LLMs) for data extraction. It ensures data integrity, ensuring both quantitative and qualititave correctness with code and llm validations.
!!! note "Pydantic Validation Docs"
Pydantic supports validation individual fields or the whole model dict all at once.
- [Field-Level Validation](https://docs.pydantic.dev/latest/usage/validators/)
@@ -27,11 +21,10 @@ Validation is crucial when using Large Language Models (LLMs) for data extractio
To see the most up to date examples check out our repo [jxnl/instructor/examples/validators](https://github.com/jxnl/instructor/tree/main/examples/validators)
### Code-Based Validation Example
!!! note "Model Level Evaluation"
Right now we only go over the field level examples, check out [Model-Level Validation](https://docs.pydantic.dev/latest/usage/validators/#model-validators) if you want to see how to do model level evaluation
Right now we only go over the field level examples, check out [Model-Level Validation](https://docs.pydantic.dev/latest/usage/validators/#model-validators) if you want to see how to do model level evaluation
Enforce a naming rule using Pydantic's built-in validation:
@@ -75,7 +68,7 @@ from instruct import llm_validator
class QuestionAnswer(BaseModel):
question: str
answer: Annotated[
str,
str,
BeforeValidator(llm_validator("don't say objectionable things"))
]
@@ -107,7 +100,6 @@ Its a great layer of defense against bad outputs of two forms.
1. Pydantic Validation Errors (code or llm based)
2. JSON Decoding Errors (when the model returns a bad response)
### Step 1: Define the Response Model with Validators
Noticed the field validator wants the name in uppercase, but the user input is lowercase. The validator will raise a `ValueError` if the name is not in uppercase.
@@ -150,7 +142,7 @@ assert model.name == "JASON"
### What happens behind the scenes?
Behind the scenes, the `instructor.patch()` method adds a `max_retries` parameter to the `openai.ChatCompletion.create()` method. The `max_retries` parameter will trigger up to 2 reattempts if the `name` attribute fails the uppercase validation in `UserDetails`.
Behind the scenes, the `instructor.patch()` method adds a `max_retries` parameter to the `openai.ChatCompletion.create()` method. The `max_retries` parameter will trigger up to 2 reattempts if the `name` attribute fails the uppercase validation in `UserDetails`.
```python
try:
@@ -174,4 +166,4 @@ The docs are currently incomplete, but we have a few advanced validation techniq
## Takeaways
By integrating these advanced validation techniques, we not only improve the quality and reliability of LLM-generated content but also pave the way for more autonomous and effective systems.
By integrating these advanced validation techniques, we not only improve the quality and reliability of LLM-generated content but also pave the way for more autonomous and effective systems.
+3 -4
View File
@@ -56,11 +56,10 @@ nav:
- Introduction:
- Getting Started: 'index.md'
- Prompt Engineering Tips: 'tips/index.md'
- Validation: "reask_validation.md"
- Streaming Lists: "multitask.md"
- Handling Missing Content: "maybe.md"
- Distillation: 'distillation.md'
- Helpers:
- Reasking and Validation Overview: "reask_validation.md"
- Multiple Extractions: "multitask.md"
- Handling Missing Content: "maybe.md"
- Philosophy: 'philosophy.md'
- Cookbook:
- Overview: 'examples/index.md'
+2 -2
View File
@@ -1,4 +1,4 @@
openai
pydantic
openai<1.0.0
pydantic>=2.0.0
pytest
docstring-parser