Merge branch 'main' into tutorials

This commit is contained in:
Jason Liu
2023-11-10 20:48:14 -05:00
committed by GitHub
80 changed files with 1500 additions and 2618 deletions
+1
View File
@@ -0,0 +1 @@
github: jxnl
+2
View File
@@ -29,3 +29,5 @@ jobs:
- name: Run test
run: poetry run pytest tests/
env:
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
+8
View File
@@ -0,0 +1,8 @@
version: 0.0.1
patterns:
- name: github.com/getgrit/js#*
- name: github.com/getgrit/python#*
- name: github.com/getgrit/json#*
- name: github.com/getgrit/hcl#*
- name: github.com/getgrit/python#openai
level: info
-6
View File
@@ -1,6 +0,0 @@
{
"[python]": {
"editor.defaultFormatter": "ms-python.black-formatter"
},
"python.formatting.provider": "none"
}
+80 -44
View File
@@ -1,19 +1,68 @@
# Instructor (openai_function_call)
# Getting Started with Instructor
_Structured extraction in Python, powered by OpenAI's function calling api, designed for simplicity, transparency, and control._
---
[Star us on Github!](https://jxnl.github.io/instructor).
[![Buy Me a Coffee](https://img.shields.io/badge/Buy%20Me%20a%20Coffee-Donate-yellow)](https://www.buymeacoffee.com/jxnlco)
[![Downloads](https://img.shields.io/pypi/dm/instructor.svg)](https://pypi.python.org/pypi/instructor)
[![GitHub stars](https://img.shields.io/github/stars/jxnl/instructor.svg)](https://github.com/jxnl/instructor/stargazers)
[![GitHub forks](https://img.shields.io/github/forks/jxnl/instructor.svg)](https://github.com/jxnl/instructor/network)
[![Documentation](https://img.shields.io/badge/docs-available-brightgreen)](https://jxnl.github.io/instructor)
[![Twitter Follow](https://img.shields.io/twitter/follow/jxnlco?style=social)](https://twitter.com/jxnlco)
[![GitHub issues](https://img.shields.io/github/issues/jxnl/instructor.svg)](https://github.com/jxnl/instructor/issues)
[![GitHub license](https://img.shields.io/github/license/jxnl/instructor.svg)](https://github.com/jxnl/instructor/blob/main/LICENSE)
[![Github discussions](https://img.shields.io/github/discussions/jxnl/instructor)](https:github.com/jxnl/instructor/discussions)
[![Documentation](https://img.shields.io/badge/docs-available-brightgreen)](https://jxnl.github.io/instructor)
[![Buy Me a Coffee](https://img.shields.io/badge/Buy%20Me%20a%20Coffee-Donate-yellow)](https://www.buymeacoffee.com/jxnlco)
[![Twitter Follow](https://img.shields.io/twitter/follow/jxnlco?style=social)](https://twitter.com/jxnlco)
[![PyPI version](https://img.shields.io/pypi/v/instructor.svg)](https://pypi.python.org/pypi/instructor)
[![PyPI pyversions](https://img.shields.io/pypi/pyversions/instructor.svg)](https://pypi.python.org/pypi/instructor)
_Structured extraction in Python, powered by OpenAI's function calling API, designed for simplicity, transparency, and control._
Built to interact solely with openai's function calling api from python. It's designed to be intuitive, easy to use, and provide great visibility into your prompts.
Built to interact solely openai's function calling api from python. It's designed to be intuitive, easy to use, but give great visibily in how we call openai. My goal isn't to hide the api, but to make it easier to use and show you how to leverage it via the [docs](https://jxnl.github.io/instructor).
## Usage
### Installation
```py hl_lines="5 13"
from openai import OpenAI
import instructor
# Enables `response_model`
client = instructor.patch(OpenAI())
class UserDetail(BaseModel):
name: str
age: int
user = client.chat.completions.create(
model="gpt-3.5-turbo",
response_model=UserDetail,
messages=[
{"role": "user", "content": "Extract Jason is 25 years old"},
]
)
assert isinstance(user, UserDetail)
assert user.name == "Jason"
assert user.age == 25
```
!!! note "Using `openai<1.0.0`"
If you're using `openai<1.0.0` then make sure you `pip install instructor<0.3.0`
where you can patch a global client like so:
```python hl_lines="4 8"
import openai
import instructor
instructor.patch()
user = openai.ChatCompletion.create(
...,
response_model=UserDetail,
)
```
## Installation
To get started you need to install it using `pip`. Run the following command in your terminal:
@@ -21,27 +70,31 @@ To get started you need to install it using `pip`. Run the following command in
$ pip install instructor
```
## Quick Start with Patching ChatCompletion
## Quick Start
To simplify your work with OpenAI models and streamline the extraction of Pydantic objects from prompts, we offer a patching mechanism for the `ChatCompletion`` class. Here's a step-by-step guide:
This patch introduces 2 features to the `ChatCompletion` class:
To simplify your work with OpenAI we offer a patching mechanism for the `ChatCompletion` class.
The patch introduces 3 features to the `ChatCompletion` class:
1. The `response_model` parameter, which allows you to specify a Pydantic model to extract data into.
2. The `max_retries` parameter, which allows you to specify the number of times to retry the request if it fails.
3. The `validation_context` parameter, which allows you to specify a context object that validators have access to.
note: to learn more about validators checkout our [blog](https://jxnl.github.io/instructor/blog/2023/10/23/good-llm-validation-is-just-good-validation/)
!!! note "Using Validators"
### Step 1: Import and Patch the Module
Learn more about validators checkout our blog post [Good llm validation is just good validation](https://jxnl.github.io/instructor/blog/2023/10/23/good-llm-validation-is-just-good-validation/)
### Step 1: Patch the client
First, import the required libraries and apply the patch function to the OpenAI module. This exposes new functionality with the response_model parameter.
```python
import openai
import instructor
from openai import OpenAI
from pydantic import BaseModel
instructor.patch()
# This enables response_model keyword
# from client.chat.completions.create
client = instructor.patch(OpenAI())
```
### Step 2: Define the Pydantic Model
@@ -56,32 +109,27 @@ class UserDetail(BaseModel):
age: int
```
### Step 3: Extract Data with ChatCompletion
### Step 3: Extract
Use the openai.ChatCompletion.create method to send a prompt and extract the data into the Pydantic object. The response_model parameter specifies the Pydantic model to use for extraction.
Use the `client.chat.completions.create` method to send a prompt and extract the data into the Pydantic object. The response_model parameter specifies the Pydantic model to use for extraction. Its helpful to annotate the variable with the type of the response model.
which will help your IDE provide autocomplete and spell check.
```python
user: UserDetail = openai.ChatCompletion.create(
user: UserDetail = client.chat.completions.create(
model="gpt-3.5-turbo",
response_model=UserDetail,
messages=[
{"role": "user", "content": "Extract Jason is 25 years old"},
]
)
```
### Step 4: Validate the Extracted Data
You can then validate the extracted data by asserting the expected values. By adding the type things you also get a bunch of nice benefits with your IDE like spell check and auto complete!
```python
assert user.name == "Jason"
assert user.age == 25
```
### LLM-Based Validation
## Pydantic Validation
LLM-based validation can also be plugged into the same Pydantic model. Here, if the answer attribute contains content that violates the rule "don't say objectionable things," Pydantic will raise a validation error.
Validation can also be plugged into the same Pydantic model. Here, if the answer attribute contains content that violates the rule "don't say objectionable things," Pydantic will raise a validation error.
```python hl_lines="9 15"
from pydantic import BaseModel, ValidationError, BeforeValidator
@@ -112,16 +160,18 @@ answer
Assertion failed, The statement is objectionable. (type=assertion_error)
```
## Using the Client with Retries
## Reask on validation error
Here, the `UserDetails` model is passed as the `response_model`, and `max_retries` is set to 2.
```python
from openai import OpenAI
import instructor
from pydantic import BaseModel, field_validator
# Apply the patch to the OpenAI client
instructor.patch()
client = instructor.patch(OpenAI())
class UserDetails(BaseModel):
name: str
@@ -134,7 +184,7 @@ class UserDetails(BaseModel):
raise ValueError("Name must be in uppercase.")
return v
model = openai.ChatCompletion.create(
model = client.chat.completions.create(
model="gpt-3.5-turbo",
response_model=UserDetails,
max_retries=2,
@@ -146,20 +196,6 @@ model = openai.ChatCompletion.create(
assert model.name == "JASON"
```
## IDE Support
Everything is designed for you to get the best developer experience possible, with the best editor support.
Including **autocompletion**:
![autocomplete](docs/img/ide_support.png)
And even **inline errors**
![errors](docs/img/error2.png)
To see more examples of how we can create interesting models check out some [examples.](https://jxnl.github.io/instructor/examples/)
## License
This project is licensed under the terms of the MIT License.
+5 -5
View File
@@ -30,10 +30,10 @@ Instructor uses Pydantic to simplify the interaction between the programmer and
```python
import pydantic
import instructor
import openai
from openai import OpenAI
# Enables the response_model
instructor.patch()
client = instructor.patch(OpenAI())
class UserDetail(pydantic.BaseModel):
name: str
@@ -42,7 +42,7 @@ class UserDetail(pydantic.BaseModel):
def introduce(self):
return f"Hello I'm {self.name} and I'm {self.age} years old"
user: UserDetail = openai.ChatCompletion.create(
user: UserDetail = client.chat.completions.create(
model="gpt-3.5-turbo",
response_model=UserDetail,
messages=[
@@ -60,7 +60,7 @@ from typing_extensions import Annotated
from pydantic import BaseModel, BeforeValidator
from instructor import llm_validator, patch
import openai
from openai import OpenAI
class QuestionAnswerNoEvil(BaseModel):
question: str
@@ -161,7 +161,7 @@ async def get_user(user_id: int) -> UserDetails:
```python
def extract_user(str) -> UserDetails:
return openai.ChatCompletion(
return client.chat.completions(
response_model=UserDetails,
messages=[...]
)
+6 -6
View File
@@ -93,12 +93,12 @@ Note how we model a rewritten query, range of published dates, and a list of dom
```python
import instructor
import openai
from openai import OpenAI
# Enables response_model in the openai client
instructor.patch()
client = instructor.patch(OpenAI())
query = openai.ChatCompletion.create(
query = client.chat.completions.create(
model="gpt-4",
response_model=MetaphorQuery,
messages=[
@@ -175,12 +175,12 @@ Now we can call this with a simple query like "What do I have today?" and it wil
```python
import instructor
import openai
from openai import OpenAI
# Enables response_model in the openai client
instructor.patch()
client = instructor.patch(OpenAI())
retrival = openai.ChatCompletion.create(
retrival = client.chat.completions.create(
model="gpt-4",
response_model=Retrival,
messages=[
+19 -19
View File
@@ -31,20 +31,20 @@ def validation_function(value):
`Instructor` helps to ensure you get the exact response type you're looking for when using openai's function call api. Once you've defined the `Pydantic` model for your desired response, `Instructor` handles all the complicated logic in-between - from the parsing/validation of the response to the automatic retries for invalid responses. This means that we can build in validators 'for free' and have a clear separation of concerns between the prompt and the code that calls openai.
```python
import openai
from openai import OpenAI
import instructor # pip install instructor
from pydantic import BaseModel
# This enables response_model keyword
# from openai.ChatCompletion.create
instructor.patch() # (1)!
# from client.chat.completions.create
client = instructor.patch(OpenAI()) # (1)!
class UserDetail(BaseModel):
name: str
age: int
user: UserDetail = openai.ChatCompletion.create(
user: UserDetail = client.chat.completions.create(
model="gpt-3.5-turbo",
response_model=UserDetail,
messages=[
@@ -210,14 +210,14 @@ Using this structure, we can implement the same logic as before and utilize `Ins
```python
import instructor
import openai
from openai import OpenAI
# Enables `response_model` and `max_retries` parameters
instructor.patch()
client = instructor.patch(OpenAI())
def validator(v):
statement = "don't say objectionable things"
resp = openai.ChatCompletion.create(
resp = client.chat.completions.create(
model="gpt-3.5-turbo",
messages=[
{
@@ -229,7 +229,7 @@ def validator(v):
"content": f"Does `{v}` follow the rules: {statement}",
},
],
# this comes from instructor.patch()
# this comes from client = instructor.patch(OpenAI())
response_model=Validation, # (1)!
)
if not resp.is_valid:
@@ -237,7 +237,7 @@ def validator(v):
return v
```
1. The new parameter of `response_model` comes from `instructor.patch()` and does not exist in the original OpenAI SDK. This
1. The new parameter of `response_model` comes from `client = instructor.patch(OpenAI())` and does not exist in the original OpenAI SDK. This
allows us to pass in the `Pydantic` model that we want as a response.
Now we can use this validator in the same way we used the `llm_validator` from `Instructor`.
@@ -259,7 +259,7 @@ We can utilise `Pydantic` and `Instructor` to perform a validation to check of t
def validate_chain_of_thought(values):
chain_of_thought = values["chain_of_thought"]
answer = values["answer"]
resp = openai.ChatCompletion.create(
resp = client.chat.completions.create(
model="gpt-3.5-turbo",
messages=[
{
@@ -271,7 +271,7 @@ def validate_chain_of_thought(values):
"content": f"Verify that `{answer}` follows the chain of thought: {chain_of_thought}",
},
],
# this comes from instructor.patch()
# this comes from client = instructor.patch(OpenAI())
response_model=Validation,
)
if not resp.is_valid:
@@ -366,19 +366,19 @@ Value error, Citation `Jason is cool` not found in text chunks [type=value_error
For further information visit https://errors.pydantic.dev/2.4/v/value_error
```
## Putting it all together with `instructor.patch()`
## Putting it all together with `client = instructor.patch(OpenAI())`
To pass this context from the `openai.ChatCompletion.create` call, `instructor.patch()` also passes the `validation_context`, which will be accessible from the `info` argument in the decorated validator functions.
To pass this context from the `client.chat.completions.create` call, `client = instructor.patch(OpenAI())` also passes the `validation_context`, which will be accessible from the `info` argument in the decorated validator functions.
```python
import openai
from openai import OpenAI
import instructor
# Enables `response_model` and `max_retries` parameters
instructor.patch()
client = instructor.patch(OpenAI())
def answer_question(question:str, text_chunk: str) -> AnswerWithCitation:
return openai.ChatCompletion.create(
return client.chat.completions.create(
model="gpt-3.5-turbo",
messages=[
{
@@ -393,7 +393,7 @@ def answer_question(question:str, text_chunk: str) -> AnswerWithCitation:
## Error Handling and Re-Asking
Validators can ensure certain properties of the outputs by throwing errors, in an AI system we can use the errors and allow language model to self correct. The by running `instructor.patch()` not only do we add `response_model` and `validation_context` it also allows you to use the `max_retries` parameter to specify the number of times to try and self correct.
Validators can ensure certain properties of the outputs by throwing errors, in an AI system we can use the errors and allow language model to self correct. The by running `client = instructor.patch(OpenAI())` not only do we add `response_model` and `validation_context` it also allows you to use the `max_retries` parameter to specify the number of times to try and self correct.
This approach provides a layer of defense against two types of bad outputs:
@@ -422,12 +422,12 @@ class UserModel(BaseModel):
This is where the `max_retries` parameter comes in. It allows the model to self correct and retry the prompt using the error message rather than the prompt.
```python
model = openai.ChatCompletion.create(
model = client.chat.completions.create(
model="gpt-3.5-turbo",
messages=[
{"role": "user", "content": "Extract jason is 25 years old"},
],
# Powered by instructor.patch()
# Powered by client = instructor.patch(OpenAI())
response_model=UserModel,
max_retries=2,
)
-21
View File
@@ -1,21 +0,0 @@
# Using the Prompt Pipeline
To use the Prompt Pipeline in OpenAI Function Call, you need to instantiate a `ChatCompletion` object and build the API call by piping messages and functions to it.
## The ChatCompletion Object
The `ChatCompletion` object is the starting point for constructing your API call. It provides the necessary methods and attributes to define the conversation flow and include function calls.
::: instructor.dsl.completion
## Messages Types
The basis of a message is defined as a `dataclass`. However, we provide helper functions and classes that provide additional functionality in the form of templates.
::: instructor.dsl.messages.base
## Helper Messages / Templates
::: instructor.dsl.messages.messages
::: instructor.dsl.messages.user
+7 -6
View File
@@ -4,7 +4,6 @@
If you want to see the full example checkout [examples/distillation](https://github.com/jxnl/instructor/tree/main/examples/distilations)
## The Challenges in Function-Level Fine-Tuning
Replicating the behavior of a Python function in a language model involves intricate data preparation. For instance, teaching a model to execute three-digit multiplication is not as trivial as implementing `def f(a, b): return a * b`. OpenAI's fine-tuning script coupled with their function calling utility provides a structured output, thereby simplifying the data collection process. Additionally, this eliminates the need for passing the schema to the model, thus conserving tokens.
@@ -14,6 +13,7 @@ Replicating the behavior of a Python function in a language model involves intri
By using `Instructions`, you can annotate a Python function that returns a Pydantic object, thereby automating the dataset creation for fine-tuning. A handler for logging is all that's needed to build this dataset.
## How to Implement `Instructions` in Your Code
## Quick Start: How to Use Instructor's Distillation Feature
Before we dig into the nitty-gritty, let's look at how easy it is to use Instructor's distillation feature to use function calling finetuning to export the data to a JSONL file.
@@ -32,7 +32,7 @@ instructions = Instructions(
finetune_format="messages",
# log handler is used to save the data to a file
# you can imagine saving it to a database or other storage
# based on your needs!
# based on your needs!
log_handlers=[logging.FileHandler("math_finetunes.jsonl")]
)
@@ -72,6 +72,7 @@ The library offers two main benefits:
The `from instructor import Instructions` feature is a time saver. It auto-generates a fine-tuning dataset, making it a breeze to imitate a function's behavior.
## Logging Output and Running a Finetune
Here's how the logging output would look:
```python
@@ -79,10 +80,10 @@ Here's how the logging output would look:
"messages": [
{"role": "system", "content": 'Predict the results of this function: ...'},
{"role": "user", "content": 'Return fn(133, b=539)'},
{"role": "assistant",
"function_call":
{"role": "assistant",
"function_call":
{
"name": "Multiply",
"name": "Multiply",
"arguments": '{"a":133,"b":539,"result":89509}'
}
}
@@ -115,4 +116,4 @@ def fn(a: int, b: int) -> Multiply:
return Multiply(a=a, b=b, result=resp)
```
With this, you can swap the function implementation, making it backward compatible. You can even imagine using the different models for different tasks or validating and runnign evals by using the original function and comparing it to the distillation.
With this, you can swap the function implementation, making it backward compatible. You can even imagine using the different models for different tasks or validating and runnign evals by using the original function and comparing it to the distillation.
+17 -23
View File
@@ -3,7 +3,8 @@
In this guide, we'll walk through how to extract action items from meeting transcripts using OpenAI's API and Pydantic. This use case is essential for automating project management tasks, such as task assignment and priority setting.
!!! tips "Motivation"
In the corporate world, a considerable amount of time is spent in meetings, and action items are often the actionable output of these discussions. Automating the extraction of action items can be a time-saver and ensures that nothing crucial is missed.
Significant amount of time is dedicated to meetings, where action items are generated as the actionable outcomes of these discussions. Automating the extraction of action items can save time and guarantee that no critical tasks are overlooked.
## Defining the Structures
@@ -45,11 +46,16 @@ class ActionItems(BaseModel):
To extract action items from a meeting transcript, we use the **`generate`** function. It calls OpenAI's API, processes the text, and returns a set of action items modeled as **`ActionItems`**.
```python
import openai
import instructor
from openai import OpenAI
# Apply the patch to the OpenAI client
# enables response_model keyword
client = instructor.patch(OpenAI())
def generate(data: str) -> ActionItems:
return openai.ChatCompletion.create(
model="gpt-3.5-turbo-0613",
return client.chat.completions.create(
model="gpt-3.5-turbo",
response_model=ActionItems,
messages=[
{
@@ -62,7 +68,6 @@ def generate(data: str) -> ActionItems:
},
],
) # type: ignore
```
## Evaluation and Testing
@@ -98,7 +103,7 @@ Alice: Sounds like a plan. Let's get these tasks modeled out and get started."""
)
```
## Visualizing the tasks
## Visualizing the tasks
In order to quickly visualize the data we used code interpreter to create a graphviz export of the json version of the ActionItems array.
@@ -112,10 +117,7 @@ In order to quickly visualize the data we used code interpreter to create a grap
"name": "Improve Authentication System",
"description": "Revamp the front-end and optimize the back-end of the authentication system",
"priority": "High",
"assignees": [
"Bob",
"Carol"
],
"assignees": ["Bob", "Carol"],
"subtasks": [
{
"id": 2,
@@ -133,26 +135,18 @@ In order to quickly visualize the data we used code interpreter to create a grap
"name": "Integrate Authentication System with Billing System",
"description": "Integrate the improved authentication system with the new billing system",
"priority": "Medium",
"assignees": [
"Bob"
],
"assignees": ["Bob"],
"subtasks": [],
"dependencies": [
1
]
"dependencies": [1]
},
{
"id": 5,
"name": "Update User Documentation",
"description": "Update the user documentation to reflect the changes in the authentication system",
"priority": "Low",
"assignees": [
"Carol"
],
"assignees": ["Carol"],
"subtasks": [],
"dependencies": [
2
]
"dependencies": [2]
}
]
}
@@ -160,4 +154,4 @@ In order to quickly visualize the data we used code interpreter to create a grap
In this example, the **`generate`** function successfully identifies and segments the action items, assigning them priorities, assignees, subtasks, and dependencies as discussed in the meeting.
By automating this process, you can ensure that important tasks and details are not lost in the sea of meeting minutes, making project management more efficient and effective.
By automating this process, you can ensure that important tasks and details are not lost in the sea of meeting minutes, making project management more efficient and effective.
+19 -15
View File
@@ -3,6 +3,7 @@
In this example, we'll demonstrate how to convert a text into dataframes using OpenAI Function Call. We will define the necessary data structures using Pydantic and show how to convert the text into dataframes.
!!! note "Motivation"
Often times when we parse data we have an opportunity to extract structured data, what if we could extract an arbitrary number of tables with arbitrary schemas? By pulling out dataframes we could write tables or .csv files and attach them to our retrieved data.
## Defining the Data Structures
@@ -10,19 +11,18 @@ In this example, we'll demonstrate how to convert a text into dataframes using O
Let's start by defining the data structures required for this task: `RowData`, `Dataframe`, and `Database`.
```python
from instructor import OpenAISchema
from pydantic import Field
from pydantic import Field, BaseModel
from typing import List, Any
class RowData(OpenAISchema):
class RowData(BaseModel):
row: List[Any] = Field(..., description="The values for each row")
citation: str = Field(
..., description="The citation for this row from the original source data"
)
class Dataframe(OpenAISchema):
class Dataframe(BaseModel):
"""
Class representing a dataframe. This class is used to convert
data into a frame that can be used by pandas.
@@ -47,7 +47,7 @@ class Dataframe(OpenAISchema):
return pd.DataFrame(data=data, columns=columns)
class Database(OpenAISchema):
class Database(BaseModel):
"""
A set of correct named and defined tables as dataframes
"""
@@ -69,14 +69,19 @@ The `Database` class represents a set of tables in a database. It contains a lis
To convert a text into dataframes, we'll use the Prompt Pipeline in OpenAI Function Call. We can define a function `dataframe` that takes a text as input and returns a `Database` object.
```python
import openai
import instructor
from openai import OpenAI
# Apply the patch to the OpenAI client
# enables response_model keyword
client = instructor.patch(OpenAI())
def dataframe(data: str) -> Database:
completion = openai.ChatCompletion.create(
return client.chat.completions.create(
model="gpt-4-0613",
temperature=0.1,
functions=[Database.openai_schema],
function_call={"name": Database.openai_schema["name"]},
response_model=Database
messages=[
{
"role": "system",
@@ -90,7 +95,6 @@ def dataframe(data: str) -> Database:
],
max_tokens=1000,
)
return Database.from_response(completion)
```
The `dataframe` function takes a string `data` as input and creates a completion using the Prompt Pipeline. It prompts the model to map the data into a dataframe and define the correct columns and rows. The resulting completion is then converted into a `Database` object.
@@ -100,12 +104,12 @@ The `dataframe` function takes a string `data` as input and creates a completion
Let's evaluate the example by converting a text into dataframes using the `dataframe` function and print the resulting dataframes.
```python
dfs = dataframe("""My name is John and I am 25 years old. I live in
New York and I like to play basketball. His name is
Mike and he is 30 years old. He lives in San Francisco
and he likes to play baseball. Sarah is 20 years old
dfs = dataframe("""My name is John and I am 25 years old. I live in
New York and I like to play basketball. His name is
Mike and he is 30 years old. He lives in San Francisco
and he likes to play baseball. Sarah is 20 years old
and she lives in Los Angeles. She likes to play tennis.
Her name is Mary and she is 35 years old.
Her name is Mary and she is 35 years old.
She lives in Chicago.
On one team 'Tigers' the captain is John and there are 12 players.
+7 -7
View File
@@ -34,16 +34,16 @@ class SinglePrediction(BaseModel):
The function **`classify`** will perform the single-label classification.
```python
import openai
from openai import OpenAI
import instructor
# Patch the OpenAI API to use the `ChatCompletion`
# endpoint with `response_model` enabled.
instructor.patch()
# Apply the patch to the OpenAI client
# enables response_model keyword
client = instructor.patch(OpenAI())
def classify(data: str) -> SinglePrediction:
"""Perform single-label classification on the input text."""
return openai.ChatCompletion.create(
return client.chat.completions.create(
model="gpt-3.5-turbo-0613",
response_model=SinglePrediction,
messages=[
@@ -95,7 +95,7 @@ The function **`multi_classify`** is responsible for multi-label classification.
```python
def multi_classify(data: str) -> MultiClassPrediction:
"""Perform multi-label classification on the input text."""
return openai.ChatCompletion.create(
return client.chat.completions.create(
model="gpt-3.5-turbo-0613",
response_model=MultiClassPrediction,
messages=[
@@ -118,4 +118,4 @@ ticket = "My account is locked and I can't access my billing info."
prediction = multi_classify(ticket)
assert MultiLabels.TECH_ISSUE in prediction.class_labels
assert MultiLabels.BILLING in prediction.class_labels
```
```
+11 -11
View File
@@ -3,7 +3,7 @@
In this guide, we demonstrate how to extract and resolve entities from a sample legal contract. Then, we visualize these entities and their dependencies as an entity graph. This approach can be invaluable for legal tech applications, aiding in the understanding of complex documents.
!!! tips "Motivation"
Legal contracts are full of intricate details and interconnected clauses. Automatically extracting and visualizing these elements can make it easier to understand the document's overall structure and terms.
Legal contracts are full of intricate details and interconnected clauses. Automatically extracting and visualizing these elements can make it easier to understand the document's overall structure and terms.
## Defining the Data Structures
@@ -51,15 +51,15 @@ class DocumentExtraction(BaseModel):
The **`ask_ai`** function utilizes OpenAI's API to extract and resolve entities from the input content.
```python
import openai
import instructor
from openai import OpenAI
# Adds response_model to ChatCompletion
# Allows the return of Pydantic model rather than raw JSON
instructor.patch()
# Apply the patch to the OpenAI client
# enables response_model keyword
client = instructor.patch(OpenAI())
def ask_ai(content) -> DocumentExtraction:
return openai.ChatCompletion.create(
return client.chat.completions.create(
model="gpt-4",
response_model=DocumentExtraction,
messages=[
@@ -89,15 +89,15 @@ def generate_html_label(entity: Entity) -> str:
def generate_graph(data: DocumentExtraction):
dot = Digraph(comment="Entity Graph", node_attr={"shape": "plaintext"})
for entity in data.entities:
label = generate_html_label(entity)
dot.node(str(entity.id), label)
for entity in data.entities:
for dep_id in entity.dependencies:
dot.edge(str(entity.id), str(dep_id))
dot.render("entity.gv", view=True)
```
@@ -134,6 +134,6 @@ model = ask_ai(content)
generate_graph(model)
```
This will produce a graphical representation of the entities and their dependencies, stored as "entity.gv".
This will produce a graphical representation of the entities and their dependencies, stored as "entity.gv".
![Entity Graph](entity_resolution.png)
![Entity Graph](entity_resolution.png)
+17 -12
View File
@@ -26,7 +26,7 @@ from typing import List
class Fact(BaseModel):
fact: str = Field(...)
substring_quote: List[str] = Field(...)
@model_validator(mode="after")
def validate_sources(self, info: FieldValidationInfo) -> "Fact":
text_chunks = info.context.get("text_chunk", None)
@@ -55,10 +55,10 @@ This class encapsulates the question and its corresponding answer. It contains t
This method checks that each `Fact` object in the `answer` list has at least one valid source. If a `Fact` object has no valid sources, it is removed from the `answer` list.
```python hl_lines="5-8"
class QuestionAnswer(instructor.OpenAISchema):
class QuestionAnswer(BaseModel):
question: str = Field(...)
answer: List[Fact] = Field(...)
@model_validator(mode="after")
def validate_sources(self) -> "QuestionAnswer":
self.answer = [fact for fact in self.answer if len(fact.substring_quote) > 0]
@@ -74,32 +74,37 @@ This function takes a string `question` and a string `context` and returns a `Qu
To understand the validation context work from pydantic check out [pydantic's docs](https://docs.pydantic.dev/usage/validators/#model-validators)
```python hl_lines="5 6 14"
from openai import OpenAI
import instructor
# Apply the patch to the OpenAI client
# enables response_model, validation_context keyword
client = instructor.patch(OpenAI())
def ask_ai(question: str, context: str) -> QuestionAnswer:
completion = openai.ChatCompletion.create(
return client.chat.completions.create(
model="gpt-3.5-turbo-0613",
temperature=0,
functions=[QuestionAnswer.openai_schema],
function_call={"name": QuestionAnswer.openai_schema["name"]},
response_model=QuestionAnswer,
messages=[
{"role": "system", "content": "You are a world class algorithm to answer questions with correct and exact citations."},
{"role": "user", "content": f"{context}"},
{"role": "user", "content": f"Question: {question}"}
],
)
return QuestionAnswer.from_response(
completion, validation_context={"text_chunk": context}
validation_context={"text_chunk": context},
)
```
## Example
dd
dd
Here's an example of using these classes and functions to ask a question and validate the answer.
```python
question = "What did the author do during college?"
context = """
My name is Jason Liu, and I grew up in Toronto Canada but I was born in China.
I went to an arts high school but in university I studied Computational Mathematics and physics.
I went to an arts high school but in university I studied Computational Mathematics and physics.
As part of coop I worked at many companies including Stitchfix, Facebook.
I also started the Data Science club at the University of Waterloo and I was the president of the club for 2 years.
"""
@@ -127,4 +132,4 @@ The output would be a `QuestionAnswer` object containing validated facts and the
}
```
This ensures that every piece of information in the answer has been validated against the context.
This ensures that every piece of information in the answer has been validated against the context.
+88 -74
View File
@@ -2,10 +2,9 @@
This example shows how to create a multiple files program based on specifications by utilizing the OpenAI Function Call. We will define the necessary data structures using Pydantic and demonstrate how to convert a specification (prompt) into multiple files.
!!! note "Motivation"
Creating multiple file programs based on specifications is a challenging and rewarding skill that can help you build complex and scalable applications.
With OpenAI Function Call, you can leverage the power of language models to generate an entire codebase and code snippets that match your specifications.
Creating multiple file programs based on specifications is a challenging and rewarding skill that can help you build complex and scalable applications.
With OpenAI Function Call, you can leverage the power of language models to generate an entire codebase and code snippets that match your specifications.
## Defining the Data Structures
@@ -14,10 +13,10 @@ Let's start by defining the data structure of `File` and `Program`.
```python
from typing import List
from pydantic import Field
from instructor import OpenAISchema
from instructor import BaseModel
class File(OpenAISchema):
class File(BaseModel):
"""
Correctly named file with contents.
"""
@@ -32,7 +31,7 @@ class File(OpenAISchema):
f.write(self.body)
class Program(OpenAISchema):
class Program(BaseModel):
"""
Set of files that represent a complete and correct program
"""
@@ -40,26 +39,31 @@ class Program(OpenAISchema):
files: List[File] = Field(..., description="List of files")
```
The `File` class represents a single file or script, and it contains a `name` attribute and `body` for the text content of the file.
The `File` class represents a single file or script, and it contains a `name` attribute and `body` for the text content of the file.
Notice that we added the `save` method to the `File` class. This method is used to writes the body of the file to disk using the name as path.
The `Program` class represents a collection of files that form a complete and correct program.
The `Program` class represents a collection of files that form a complete and correct program.
It contains a list of `File` objects in the `files` attribute.
## Calling Completions
To create the files, we will use the base `openai` API.
To create the files, we will use the base `openai` API.
We can define a function that takes in a string and returns a `Program` object.
```python
import openai
import instructor
from openai import OpenAI
# Apply the patch to the OpenAI client
# enables response_model keyword
client = instructor.patch(OpenAI())
def develop(data: str) -> Program:
completion = openai.ChatCompletion.create(
return client.chat.completions.create(
model="gpt-3.5-turbo-0613",
temperature=0.1,
functions=[Program.openai_schema],
function_call={"name": Program.openai_schema["name"]},
response_model=Program,
messages=[
{
"role": "system",
@@ -72,7 +76,6 @@ def develop(data: str) -> Program:
],
max_tokens=1000,
)
return Program.from_response(completion)
```
## Evaluating an Example
@@ -82,10 +85,10 @@ Let's evaluate the example by specifying the program to create and print the res
```python
program = develop(
"""
Create a fastapi app with a readme.md file and a main.py file with
some basic math functions. the datamodels should use pydantic and
the main.py should use fastapi. the readme.md should have a title
and a description. The readme should contain some helpful infromation
Create a fastapi app with a readme.md file and a main.py file with
some basic math functions. the datamodels should use pydantic and
the main.py should use fastapi. the readme.md should have a title
and a description. The readme should contain some helpful infromation
and a curl example"""
)
@@ -97,29 +100,31 @@ for file in program.files:
```
The output will be:
```markdown
````markdown
# readme.md
-
# FastAPI App
This is a FastAPI app that provides some basic math functions.
- # FastAPI App
## Usage
This is a FastAPI app that provides some basic math functions.
To use this app, follow the instructions below:
## Usage
1. Install the required dependencies by running `pip install -r requirements.txt`.
2. Start the app by running `uvicorn main:app --reload`.
3. Open your browser and navigate to `http://localhost:8000/docs` to access the Swagger UI documentation.
To use this app, follow the instructions below:
## Example
1. Install the required dependencies by running `pip install -r requirements.txt`.
2. Start the app by running `uvicorn main:app --reload`.
3. Open your browser and navigate to `http://localhost:8000/docs` to access the Swagger UI documentation.
You can use the following curl command to test the `/add` endpoint:
## Example
You can use the following curl command to test the `/add` endpoint:
```bash
$ curl -X POST -H "Content-Type: application/json" -d '{"a": 2, "b": 3}' http://localhost:8000/add
```
````
```bash
$ curl -X POST -H "Content-Type: application/json" -d '{"a": 2, "b": 3}' http://localhost:8000/add
```
```
```python
# main.py
-
@@ -155,12 +160,13 @@ The output will be:
return {'error': 'Cannot divide by zero'}
return {'result': numbers.a / numbers.b}
```
```markdown
# requirements.txt
-
fastapi
uvicorn
pydantic
- fastapi
uvicorn
pydantic
```
## Add Refactoring Capabilities
@@ -173,9 +179,9 @@ This will be our definition for a change in our code base:
```python
from pydantic import Field
from instructor import OpenAISchema
from instructor import BaseModel
class Diff(OpenAISchema):
class Diff(BaseModel):
"""
Changes that must be correctly made in a program's code repository defined as a
complete diff (Unified Format) file which will be used to `patch` the repository.
@@ -226,26 +232,24 @@ class Diff(OpenAISchema):
)
```
The `diff` class represents a *diff* file, with a set of changes that can be applied to our program using a tool like patch or Git.
The `diff` class represents a _diff_ file, with a set of changes that can be applied to our program using a tool like patch or Git.
## Calling Refactor Completions
We'll define a function that will pass the program and the new specifications to the OpenAI API:
```python
import openai
from generate import Program
def refactor(new_requirements: str, program: Program) -> Diff:
program_description = "\n".join(
[f"{code.file_name}\n[[[\n{code.body}\n]]]\n" for code in program.files]
)
completion = openai.ChatCompletion.create(
return client.chat.completions.create(
# model="gpt-3.5-turbo-0613",
model="gpt-4",
temperature=0,
functions=[Diff.openai_schema],
function_call={"name": Diff.openai_schema["name"]},
response_model=Diff,
messages=[
{
"role": "system",
@@ -268,7 +272,6 @@ def refactor(new_requirements: str, program: Program) -> Diff:
],
max_tokens=1000,
)
return Diff.from_response(completion)
```
Notice we're using here the version `gpt-4` of the model, which is more powerful but, also, more expensive.
@@ -287,7 +290,7 @@ print(changes.diff)
The output will be this:
```diff
````diff
--- readme.md
+++ readme.md
@@ -1,9 +1,9 @@
@@ -312,7 +315,7 @@ The output will be this:
```bash
-curl -X POST -H "Content-Type: application/json" -d '{"operation": "add", "operands": [2, 3]}' http://localhost:8000/calculate
+curl -X POST -H "Content-Type: application/json" -d '{"operation": "add", "operands": [2, 3]}' http://localhost:5000/calculate
```
````
--- main.py
+++ main.py
@@ -322,46 +325,54 @@ The output will be this:
+from flask import Flask, request, jsonify
-app = FastAPI()
+app = Flask(__name__)
+app = Flask(**name**)
-class Operation(BaseModel):
- operation: str
- operands: list
+@app.route('/calculate', methods=['POST'])
+def calculate():
+ data = request.get_json()
+ operation = data.get('operation')
+ operands = data.get('operands')
- operation: str
- operands: list
+@app.route('/calculate', methods=['POST'])
+def calculate():
* data = request.get_json()
* operation = data.get('operation')
* operands = data.get('operands')
-@app.post('/calculate')
-async def calculate(operation: Operation):
- if operation.operation == 'add':
- if operation.operation == 'add':
- result = sum(operation.operands)
- elif operation.operation == 'subtract':
- elif operation.operation == 'subtract':
- result = operation.operands[0] - sum(operation.operands[1:])
- elif operation.operation == 'multiply':
+ if operation == 'add':
+ result = sum(operands)
+ elif operation == 'subtract':
+ result = operands[0] - sum(operands[1:])
+ elif operation == 'multiply':
result = 1
- elif operation.operation == 'multiply':
* if operation == 'add':
* result = sum(operands)
* elif operation == 'subtract':
* result = operands[0] - sum(operands[1:])
* elif operation == 'multiply':
result = 1
- for operand in operation.operands:
+ for operand in operands:
* for operand in operands:
result *= operand
- elif operation.operation == 'divide':
- elif operation.operation == 'divide':
- result = operation.operands[0]
- for operand in operation.operands[1:]:
+ elif operation == 'divide':
+ result = operands[0]
+ for operand in operands[1:]:
* elif operation == 'divide':
* result = operands[0]
* for operand in operands[1:]:
result /= operand
else:
result = None
- return {'result': result}
+ return jsonify({'result': result})
else:
result = None
- return {'result': result}
* return jsonify({'result': result})
--- requirements.txt
+++ requirements.txt
@@ -371,4 +382,7 @@ The output will be this:
-pydantic
+flask
+flask-cors
```
```
+5 -4
View File
@@ -3,6 +3,7 @@
In this guide, you'll discover how to visualize a detailed knowledge graph for understanding complex topics, in this case, quantum mechanics. We leverage OpenAI's API and the Graphviz library to bring structure to intricate subjects.
!!! tips "Motivation"
Knowledge graphs offer a visually appealing and coherent way to understand complicated topics like quantum mechanics. By generating these graphs automatically, you can accelerate the learning process and make it easier to digest complex information.
## Defining the Structures
@@ -34,15 +35,15 @@ class KnowledgeGraph(BaseModel):
The **`generate_graph`** function leverages OpenAI's API to generate a knowledge graph based on the input query.
```python
import openai
from openai import OpenAI
import instructor
# Adds response_model to ChatCompletion
# Allows the return of Pydantic model rather than raw JSON
instructor.patch()
client = instructor.patch(OpenAI())
def generate_graph(input) -> KnowledgeGraph:
return openai.ChatCompletion.create(
return client.chat.completions.create(
model="gpt-3.5-turbo",
messages=[
{
@@ -89,4 +90,4 @@ visualize_knowledge_graph(graph)
This will produce a visual representation of the knowledge graph, stored as "knowledge_graph.gv". You can open this file to explore the key concepts and their relationships in quantum mechanics.
By leveraging automated knowledge graphs, you can dissect complex topics into digestible pieces, making the learning journey less daunting and more effective.
By leveraging automated knowledge graphs, you can dissect complex topics into digestible pieces, making the learning journey less daunting and more effective.
+4 -4
View File
@@ -39,17 +39,17 @@ class PIIDataExtraction(BaseModel):
The OpenAI API is utilized to extract PII information from a given document.
```python
import openai
from openai import OpenAI
import instructor
instructor.patch()
client = instructor.patch(OpenAI())
EXAMPLE_DOCUMENT = """
# Fake Document with PII for Testing PII Scrubbing Model
# (The content here)
"""
pii_data: PIIDataExtraction = openai.ChatCompletion.create(
pii_data: PIIDataExtraction = client.chat.completions.create(
model="gpt-3.5-turbo",
response_model=PIIDataExtraction,
messages=[
@@ -123,4 +123,4 @@ John Doe was born on <date_0>. His social security number is <ssn_1>. He has bee
## Residence
John currently resides at <address_4>. He's been living there for about 5 years now.
```
```
+16 -14
View File
@@ -3,8 +3,9 @@
This example demonstrates how to use the OpenAI Function Call ChatCompletion model to plan and execute a query plan in a question-answering system. By breaking down a complex question into smaller sub-questions with defined dependencies, the system can systematically gather the necessary information to answer the main question.
!!! tips "Motivation"
The goal of this example is to showcase how query planning can be used to handle complex questions, facilitate iterative information gathering, automate workflows, and optimize processes. By leveraging the OpenAI Function Call model, you can design and execute a structured plan to find answers effectively.
**Use Cases:**
* Complex question answering
@@ -23,7 +24,6 @@ import enum
from typing import List
from pydantic import Field
from instructor import OpenAISchema
class QueryType(str, enum.Enum):
@@ -33,7 +33,7 @@ class QueryType(str, enum.Enum):
MERGE_MULTIPLE_RESPONSES = "MERGE_MULTIPLE_RESPONSES"
class Query(OpenAISchema):
class Query(BaseModel):
"""Class representing a single question in a query plan."""
id: int = Field(..., description="Unique id of the query")
@@ -51,7 +51,7 @@ class Query(OpenAISchema):
)
class QueryPlan(OpenAISchema):
class QueryPlan(BaseModel):
"""Container class representing a tree of questions to ask a question answering system."""
query_graph: List[Query] = Field(
@@ -64,6 +64,7 @@ class QueryPlan(OpenAISchema):
```
!!! warning "Graph Generation"
Notice that this example produces a flat list of items with dependencies that resemble a graph, while pydantic allows for recursive definitions, it's much easier and less confusing for the model to generate flat schemas rather than recursive schemas. If you want to see a recursive example, see [recursive schemas](recursive.md)
## Planning a Query Plan
@@ -72,8 +73,12 @@ Now, let's demonstrate how to plan and execute a query plan using the defined mo
```python
import asyncio
import instructor
from openai import OpenAI
import openai
# Apply the patch to the OpenAI client
# enables response_model keyword
client = instructor.patch(OpenAI())
def query_planner(question: str) -> QueryPlan:
PLANNING_MODEL = "gpt-4-0613"
@@ -89,19 +94,16 @@ def query_planner(question: str) -> QueryPlan:
},
]
completion = openai.ChatCompletion.create(
QueryPlan = client.chat.completions.create(
model=PLANNING_MODEL,
temperature=0,
functions=[QueryPlan.openai_schema],
function_call={"name": QueryPlan.openai_schema["name"]},
response_model=QueryPlan,
messages=messages,
max_tokens=1000,
)
root = QueryPlan.from_response(completion)
return root
```
```
plan = query_planner(
"What is the difference in populations of Canada and the Jason's home country?"
@@ -110,6 +112,7 @@ plan.dict()
```
!!! warning "No RAG"
While we build the query plan in this example, we do not propose a method to actually answer the question. You can implement your own answer function that perhaps makes a retrival and calls openai for retrival augmented generation. That step would also make use of function calls but goes beyond the scope of this example.
```python
@@ -128,19 +131,18 @@ plan.dict()
{'dependancies': [2, 3],
'id': 4,
'node_type': <QueryType.SINGLE_QUESTION: 'SINGLE'>,
'question': 'Calculate the difference in populations between '
"Canada and Jason's home country"}]}
'question': 'Calculate the difference in populations between Canada and Jason's home country"}]}
```
In the above code, we define a `query_planner` function that takes a question as input and generates a query plan using the OpenAI API.
## Conclusion
In this example, we demonstrated how to use the OpenAI Function Call `ChatCompletion` model to plan and execute a query plan using a question-answering system. We defined the necessary structures using Pydantic, created a query planner function.
In this example, we demonstrated how to use the OpenAI Function Call `ChatCompletion` model to plan and execute a query plan using a question-answering system. We defined the necessary structures using Pydantic, created a query planner function.
If you want to see multiple versions of this style of code, please visit:
1. [query planning example](https://github.com/jxnl/instructor/blob/main/examples/query_planner_execution/query_planner_execution.py)
2. [task planning with topo sort](https://github.com/jxnl/instructor/blob/main/examples/task_planner/task_planner_topological_sort.py)
Feel free to modify the code to fit your specific use case and explore other possibilities of using the OpenAI Function Call model to plan and execute complex workflows.
Feel free to modify the code to fit your specific use case and explore other possibilities of using the OpenAI Function Call model to plan and execute complex workflows.
+12 -12
View File
@@ -7,20 +7,19 @@ In this example, we will demonstrate how define and use a recursive class defini
We will use Pydantic to define the necessary data structures representing the directory tree and its nodes. We have two classes, `Node` and `DirectoryTree`, which are used to model individual nodes and the entire directory tree, respectively.
!!! warning "Flat is better than nested"
While it's easier to model things as nested, returning flat items with dependencies tends to yield better results. For a flat example, check out [planning tasks](planning-tasks.md) where we model a query plan as a dag.
While it's easier to model things as nested, returning flat items with dependencies tends to yield better results. For a flat example, check out [planning tasks](planning-tasks.md) where we model a query plan as a dag.
```python
import enum
from typing import List
from pydantic import Field
from instructor import OpenAISchema
class NodeType(str, enum.Enum):
"""Enumeration representing the types of nodes in a filesystem."""
FILE = "file"
FOLDER = "folder"
class Node(OpenAISchema):
class Node(BaseModel):
"""
Class representing a single node in a filesystem. Can be either a file or a folder.
Note that a file cannot have children, but a folder can.
@@ -54,7 +53,7 @@ class Node(OpenAISchema):
else:
print(f"{parent_path}/{self.name}", self.node_type)
class DirectoryTree(OpenAISchema):
class DirectoryTree(BaseModel):
"""
Container class representing a directory tree.
@@ -83,7 +82,12 @@ The `DirectoryTree` class represents the entire directory tree. It has a single
We define a function `parse_tree_to_filesystem` to convert a string representing a directory tree into a filesystem structure using OpenAI.
```python
import openai
import instructor
from openai import OpenAI
# Apply the patch to the OpenAI client
# enables response_model keyword
client = instructor.patch(OpenAI())
def parse_tree_to_filesystem(data: str) -> DirectoryTree:
"""
@@ -97,11 +101,9 @@ def parse_tree_to_filesystem(data: str) -> DirectoryTree:
DirectoryTree: The directory tree representing the filesystem.
"""
completion = openai.ChatCompletion.create(
return client.chat.completions.create(
model="gpt-3.5-turbo-0613",
temperature=0.2,
functions=[DirectoryTree.openai_schema],
function_call={"name": DirectoryTree.openai_schema["name"]},
response_model=DirectoryTree,
messages=[
{
"role": "system",
@@ -114,8 +116,6 @@ def parse_tree_to_filesystem(data: str) -> DirectoryTree:
],
max_tokens=1000,
)
root = DirectoryTree.from_response(completion)
return root
```
@@ -160,4 +160,4 @@ root/folder2/subfolder1/file4.txt NodeType.FILE
This demonstrates how to use OpenAI's GPT-3 model to parse a string representing a directory tree and obtain the correct filesystem structure.
I hope this example helps you understand how to leverage OpenAI Function Call for parsing recursive trees. If you have any further questions, feel free to ask!
I hope this example helps you understand how to leverage OpenAI Function Call for parsing recursive trees. If you have any further questions, feel free to ask!
+11 -10
View File
@@ -6,7 +6,6 @@ In this example, we will demonstrate how to leverage the `MultiTask` and `enum.E
Extracting a list of tasks from text is a common use case for leveraging language models. This pattern can be applied to various applications, such as virtual assistants like Siri or Alexa, where understanding user intent and breaking down requests into actionable tasks is crucial. In this example, we will demonstrate how to use OpenAI Function Call to segment search queries and execute them in parallel.
## Defining the Structures
Let's model the problem as breaking down a search request into a list of search queries. We will use an enum to represent different types of searches and take advantage of Python objects to add additional query logic.
@@ -14,14 +13,13 @@ Let's model the problem as breaking down a search request into a list of search
```python
import enum
from pydantic import Field
from instructor import OpenAISchema
class SearchType(str, enum.Enum):
"""Enumeration representing the types of searches that can be performed."""
VIDEO = "video"
EMAIL = "email"
class Search(OpenAISchema):
class Search(BaseModel):
"""
Class representing a single search query.
"""
@@ -40,14 +38,14 @@ Next, let's define a class to represent multiple search queries.
```python
from typing import List
class MultiSearch(OpenAISchema):
class MultiSearch(BaseModel):
"Correctly segmented set of search results"
tasks: List[Search]
```
The `MultiSearch` class has a single attribute, `tasks`, which is a list of `Search` objects.
This pattern is so common that we've added a helper function `MultiTask` to makes this simpler
This pattern is so common that we've added a helper function `MultiTask` to makes this simpler
```python
from instructor.dsl import MultiTask
@@ -60,10 +58,15 @@ MultiSearch = MultiTask(Search)
To segment a search query, we will use the base openai api. We can define a function that takes a string and returns segmented search queries using the `MultiSearch` class.
```python hl_lines="7 8"
import openai
import instructor
from openai import OpenAI
# Apply the patch to the OpenAI client
# enables response_model keyword
client = instructor.patch(OpenAI())
def segment(data: str) -> MultiSearch:
completion = openai.ChatCompletion.create(
return client.chat.completions.create(
model="gpt-3.5-turbo-0613",
temperature=0.1,
functions=[MultiSearch.openai_schema],
@@ -76,8 +79,6 @@ def segment(data: str) -> MultiSearch:
],
max_tokens=1000,
)
return MultiSearch.from_response(completion)
```
The `segment` function takes a string `data` and creates a completion. It prompts the model to segment the data into multiple search queries and returns the result as a `MultiSearch` object.
@@ -106,4 +107,4 @@ The output will be:
```
Searching for `Please send me the video from last week about the investment case study` with query `Please send me the video from last week about the investment case study` using `SearchType.VIDEO`
Searching for `also documents about your GDPR policy?` with query `also documents about your GDPR policy?` using `SearchType.EMAIL`
```
```
+11 -8
View File
@@ -11,10 +11,6 @@ Import required modules and apply compatibility patches.
```python
from typing_extensions import Annotated
from pydantic import BaseModel, BeforeValidator
from instructor import llm_validator, patch
import openai
patch()
```
## Defining Models
@@ -33,10 +29,17 @@ class QuestionAnswer(BaseModel):
Here we coerce the model to generate a response that is objectionable.
```python
from openai import OpenAI
import instructor
# Apply the patch to the OpenAI client
# enables response_model keyword
client = instructor.patch(OpenAI())
question = "What is the meaning of life?"
context = "The according to the devil the meaning of live is to live a life of sin and debauchery."
qa: QuestionAnswer = openai.ChatCompletion.create(
qa: QuestionAnswer = client.chat.completions.create(
model="gpt-3.5-turbo",
response_model=QuestionAnswer,
messages=[
@@ -79,7 +82,7 @@ class QuestionAnswerNoEvil(BaseModel):
]
try:
qa: QuestionAnswerNoEvil = openai.ChatCompletion.create(
qa: QuestionAnswerNoEvil = client.chat.completions.create(
model="gpt-3.5-turbo",
response_model=QuestionAnswerNoEvil,
messages=[
@@ -112,7 +115,7 @@ answer
By adding the `max_retries` parameter, we can retry the request with corrections. and use the error message to correct the output.
```python
qa: QuestionAnswerNoEvil = openai.ChatCompletion.create(
qa: QuestionAnswerNoEvil = client.chat.completions.create(
model="gpt-3.5-turbo",
response_model=QuestionAnswerNoEvil,
max_retries=1,
@@ -138,4 +141,4 @@ Now, we get a valid response that is not objectionable!
"question": "What is the meaning of life?",
"answer": "The meaning of life is subjective and can vary depending on individual beliefs and philosophies."
}
```
```
+149 -80
View File
@@ -1,63 +1,101 @@
# Instructor (openai_function_call)
# Getting Started with Instructor
!!! note "Renaming from openai_function_call"
This library used to be called `openai_function_call` simply change the import and you should be good to go!
_Structured extraction in Python, powered by OpenAI's function calling api, designed for simplicity, transparency, and control._
```sh
find /path/to/dir -type f -exec sed -i 's/openai_function_call/instructor/g' {} \;
Built to interact solely with openai's function calling api from python. It's designed to be intuitive, easy to use, and provide great visibility into your prompts.
---
[Star us on Github!](https://jxnl.github.io/instructor)
[![Buy Me a Coffee](https://img.shields.io/badge/Buy%20Me%20a%20Coffee-Donate-yellow)](https://www.buymeacoffee.com/jxnlco)
[![Downloads](https://img.shields.io/pypi/dm/instructor.svg)](https://pypi.python.org/pypi/instructor)
[![GitHub stars](https://img.shields.io/github/stars/jxnl/instructor.svg)](https://github.com/jxnl/instructor/stargazers)
[![Documentation](https://img.shields.io/badge/docs-available-brightgreen)](https://jxnl.github.io/instructor)
[![Twitter Follow](https://img.shields.io/twitter/follow/jxnlco?style=social)](https://twitter.com/jxnlco)
[![GitHub issues](https://img.shields.io/github/issues/jxnl/instructor.svg)](https://github.com/jxnl/instructor/issues)
[![GitHub license](https://img.shields.io/github/license/jxnl/instructor.svg)](https://github.com/jxnl/instructor/blob/main/LICENSE)
[![Github discussions](https://img.shields.io/github/discussions/jxnl/instructor)](https:github.com/jxnl/instructor/discussions)
[![PyPI version](https://img.shields.io/pypi/v/instructor.svg)](https://pypi.python.org/pypi/instructor)
[![PyPI pyversions](https://img.shields.io/pypi/pyversions/instructor.svg)](https://pypi.python.org/pypi/instructor)
---
## Usage
```py hl_lines="5 13"
from openai import OpenAI()
import instructor
# Enables `response_model`
client = instructor.patch(OpenAI())
class UserDetail(BaseModel):
name: str
age: int
user = client.chat.completions.create(
model="gpt-3.5-turbo",
response_model=UserDetail,
messages=[
{"role": "user", "content": "Extract Jason is 25 years old"},
]
)
assert isinstance(user, UserDetail)
assert user.name == "Jason"
assert user.age == 25
```
!!! note "Using `openai<1.0.0`"
If you're using `openai<1.0.0` then make sure you `pip install instructor<0.3.0`
where you can patch a global client like so:
```python hl_lines="4 8"
import openai
import instructor
instructor.patch()
user = openai.ChatCompletion.create(
...,
response_model=UserDetail,
)
```
*Structured extraction in Python, powered by OpenAI's function calling api, designed for simplicity, transparency, and control.*
## Installation
-----
This library is built to interact with openai's function call api from python code, with python structs / objects. It's designed to be intuitive, easy to use, but give great visibily in how we call openai.
The approach of combining a human prompt and a "response schema" is not necessarily unique; however, it shows great promise. As we have been concentrating on translating user intent into structured data, we have discovered that Python with Pydantic is exceptionally well-suited for this task.
**OpenAISchema** is based on Python type annotations, and powered by Pydantic.
The key features are:
* **Intuitive to write**: Great support for editors, completions. Spend less time debugging.
* **Writing prompts as code**: Collocate docstrings and descriptions as part of your prompting.
* **Extensible**: Bring your own kitchen sink without being weighted down by abstractions.
## Structured Extraction with `openai`
Welcome to the Quick Start Guide for OpenAI Function Call. This guide will walk you through the installation process and provide examples demonstrating the usage of function calls and schemas with OpenAI and Pydantic.
### Requirements
This library depends on **Pydantic** and **OpenAI** that's all.
### Installation
To get started with OpenAI Function Call, you need to install it using `pip`. Run the following command in your terminal:
!!! note Requirement
Ensure you have Python version 3.9 or above.
To get started you need to install it using `pip`. Run the following command in your terminal:
```sh
$ pip install instructor
```
## Quick Start with Patching ChatCompletion
## Quick Start
To simplify your work with OpenAI models and streamline the extraction of Pydantic objects from prompts, we offer a patching mechanism for the `ChatCompletion`` class. Here's a step-by-step guide:
To simplify your work with OpenAI we offer a patching mechanism for the `ChatCompletion` class.
The patch introduces 3 features to the `ChatCompletion` class:
### Step 1: Import and Patch the Module
1. The `response_model` parameter, which allows you to specify a Pydantic model to extract data into.
2. The `max_retries` parameter, which allows you to specify the number of times to retry the request if it fails.
3. The `validation_context` parameter, which allows you to specify a context object that validators have access to.
!!! note "Using Validators"
Learn more about validators checkout our blog post [Good llm validation is just good validation](https://jxnl.github.io/instructor/blog/2023/10/23/good-llm-validation-is-just-good-validation/)
### Step 1: Patch the client
First, import the required libraries and apply the patch function to the OpenAI module. This exposes new functionality with the response_model parameter.
```python
import openai
```python hl_lines="6"
import instructor
from pydantic import BaseModel
from openai import OpenAI
# This enables response_model keyword
# from openai.ChatCompletion.create
instructor.patch()
# from client.chat.completions.create
client = instructor.patch(OpenAI())
```
### Step 2: Define the Pydantic Model
@@ -65,69 +103,100 @@ instructor.patch()
Create a Pydantic model to define the structure of the data you want to extract. This model will map directly to the information in the prompt.
```python
from pydantic import BaseModel
class UserDetail(BaseModel):
name: str
age: int
```
### Step 3: Extract Data with ChatCompletion
### Step 3: Extract
Use the openai.ChatCompletion.create method to send a prompt and extract the data into the Pydantic object. The response_model parameter specifies the Pydantic model to use for extraction.
Use the `client.chat.completions.create` method to send a prompt and extract the data into the Pydantic object. The response_model parameter specifies the Pydantic model to use for extraction. Its helpful to annotate the variable with the type of the response model.
which will help your IDE provide autocomplete and spell check.
```python
user: UserDetail = openai.ChatCompletion.create(
```python hl_lines="3"
user: UserDetail = client.chat.completions.create(
model="gpt-3.5-turbo",
response_model=UserDetail,
messages=[
{"role": "user", "content": "Extract Jason is 25 years old"},
]
)
```
### Step 4: Validate the Extracted Data
You can then validate the extracted data by asserting the expected values. By adding the type things you also get a bunch of nice benefits with your IDE like spell check and auto complete!
```python
assert user.name == "Jason"
assert user.age == 25
```
## IDE Support
## Advanced: Pydantic Validation
Everything is designed for you to get the best developer experience possible, with the best editor support.
Validation can also be plugged into the same Pydantic model. Here, if the answer attribute contains content that violates the rule "don't say objectionable things," Pydantic will raise a validation error.
Including **autocompletion**:
```python hl_lines="9 15"
from pydantic import BaseModel, ValidationError, BeforeValidator
from typing_extensions import Annotated
from instructor import llm_validator
![autocomplete](img/ide_support.png)
class QuestionAnswer(BaseModel):
question: str
answer: Annotated[
str,
BeforeValidator(llm_validator("don't say objectionable things"))
]
And even **inline errors**
![errors](img/error2.png)
## OpenAI Schema and Pydantic
This quick start guide provided you with a basic understanding of how to use OpenAI Function Call for schema extraction and function calls. You can now explore more advanced use cases and creative applications of this library.
Since `UserDetails` is a `OpenAISchems` and a `pydantic.BaseModel` you can use inheritance and nesting to create more complex emails while avoiding code duplication
```python
class UserDetails(OpenAISchema):
name: str = Field(..., description="User's full name")
age: int
class UserWithAddress(UserDetails):
address: str
class UserWithFriends(UserDetails):
best_friend: UserDetails
friends: List[UserDetails]
try:
qa = QuestionAnswer(
question="What is the meaning of life?",
answer="The meaning of life is to be evil and steal",
)
except ValidationError as e:
print(e)
```
If you have any questions, feel free to leave an issue or reach out to the library's author on [Twitter](https://twitter.com/jxnlco). For a more comprehensive solution with additional features, consider checking out [MarvinAI](https://www.askmarvin.ai/).
Its important to not here that the error message is generated by the LLM, not the code, so it'll be helpful for re asking the model.
To see more examples of how we can create interesting models check out some [examples.](examples/index.md)
```plaintext hl_lines="3"
1 validation error for QuestionAnswer
answer
Assertion failed, The statement is objectionable. (type=assertion_error)
```
## Advanced: Reask on validation error
Here, the `UserDetails` model is passed as the `response_model`, and `max_retries` is set to 2.
```python hl_lines="15-18 22 23 29"
import instructor
from openai import OpenAI
from pydantic import BaseModel, field_validator
# Apply the patch to the OpenAI client
client = instructor.patch(OpenAI())
class UserDetails(BaseModel):
name: str
age: int
@field_validator("name")
@classmethod
def validate_name(cls, v):
if v.upper() != v:
raise ValueError("Name must be in uppercase.")
return v
model = client.chat.completions.create(
model="gpt-3.5-turbo",
response_model=UserDetails,
max_retries=2,
messages=[
{"role": "user", "content": "Extract jason is 25 years old"},
],
)
assert model.name == "JASON"
```
## License
This project is licensed under ther terms of the MIT License.
This project is licensed under the terms of the MIT License.
+24 -31
View File
@@ -1,13 +1,15 @@
# Patterns for Extracting Multiple Items
# Streaming and MultiTask
A common use case of structured extraction is defining a single schema class and then making another schema to create a list to do multiple extraction
```python
class User(OpenAISchema):
from pydantic import BaseModel
class User(BaseModel):
name: str
age: int
class Users(OpenAISchema):
class Users(BaseModel):
users: List[User]
```
@@ -18,19 +20,23 @@ Defining a task and creating a list of classes is a common enough pattern that w
## Extracting Tasks using MultiTask
By using multitask you get a very convient class with prompts and names automatically defined. You get `from_response` just like any other `OpenAISchema` you're able to extract the list of objects data you want with `MultTask.tasks`.
By using multitask you get a very convient class with prompts and names automatically defined. You get `from_response` just like any other `BaseModel` you're able to extract the list of objects data you want with `MultTask.tasks`.
```python hl_lines="13"
from instructor import OpenAISchema, MultiTask
import instructor
from openai import OpenAI
client = instructor.patch(OpenAI())
class User(BaseModel):
name: str
age: int
MultiUser = MultiTask(User)
MultiUser = instructor.MultiTask(User)
completion = openai.ChatCompletion.create(
completion = client.chat.completions.create(
model="gpt-4-0613",
temperature=0.1,
stream=False,
@@ -42,36 +48,32 @@ completion = openai.ChatCompletion.create(
"content": f"Consider the data below: Jason is 10 and John is 30",
},
],
max_tokens=1000,
)
MultiUser.from_response(completion)
```
```sh
{"tasks": [
{"name": "Jason", "age": 10},
{"name": "John", "age": 30}
]}
```json
{
"tasks": [
{ "name": "Jason", "age": 10 },
{ "name": "John", "age": 30 }
]
}
```
## Streaming Tasks
Since a `MultiTask(T)` is well contrained to `tasks: List[T]` we can make assuptions on how tokens are used and provide a helper method that allows you generate tasks as the the tokens are streamed in
!!! tips "Why would we want this?"
While `gpt-3.5-turbo` is quite fast `gpt-4` will take a while if there are many objects or if each object schema is complex. If 10 entities are created and takes 100ms to complete it would mean that it would take 1 second before we had access to our objects. With streaming you'd get the first object in 100ms a 10x percieved improvement in latency! While this may not make sense for more usecases if we were dynamitcally building UI based on entities, streaming entities 1 by 1 could improve the user experience dramatically.
Lets look at an example in action with the same class
```python hl_lines="6 26"
MultiUser = MultiTask(User)
MultiUser = instructor.MultiTask(User)
completion = openai.ChatCompletion.create(
completion = client.chat.completions.create(
model="gpt-4-0613",
temperature=0.1,
stream=True,
functions=[MultiUser.openai_schema],
function_call={"name": MultiUser.openai_schema["name"]},
response_model=MultiUser,
messages=[
{
"role": "system",
@@ -97,13 +99,4 @@ for user in MultiUser.from_streaming_response(completion):
>>> name="John" "age"=10
```
!!! usage "How??"
Consider this incomplete json string.
```json
{"tasks": [{"name": "Jason", "age": 10}
```
Notice how, while this isn't valid json, we know that one complete `User` object was generated so we `yield` that object to be used elsewhere as soon as possible.
This streaming is still a prototype, but should work quite well for simple schemas.
This streaming is still a prototype, but should work quite well for simple schemas.
-55
View File
@@ -1,55 +0,0 @@
# OpenAI Schema
The `OpenAISchema` is an extension of `Pydantic.BaseModel` that offers a minimally invasive way to define schemas for OpenAI completions. It provides two main methods: `openai_schema` to generate the correct schema and `from_response` to create an instance of the class from the completion result.
## Prompt Placement
Our philosophy is to keep prompts close to the code. This is achieved by using docstrings and field descriptions to provide prompts and descriptions for your schema fields.
## Structured Extraction
You can directly use the `OpenAISchema` class in your `openai` API create calls by passing in the `openai_schema` class property and extracting the class out using the `from_response` method. This style of usage provides full control over configuration and prompting.
```python
import openai
from instructor import OpenAISchema
from pydantic import Field
class UserDetails(OpenAISchema):
"""Details of a user"""
name: str = Field(..., description="User's full name")
age: int
completion = openai.ChatCompletion.create(
model="gpt-3.5-turbo-0613",
functions=[UserDetails.openai_schema],
function_call={"name": UserDetails.openai_schema["name"]},
messages=[
{"role": "system", "content": "Extract user details from my requests"},
{"role": "user", "content": "My name is John Doe and I'm 30 years old."},
],
)
user_details = UserDetails.from_response(completion)
print(user_details) # UserDetails(name='John Doe', age=30)
```
You can also use the `@openai_schema` decorator to decorate `BaseModels`, but you may lose some type hinting as a result.
```python
import openai
from instructor import openai_schema
from pydantic import Field, BaseModel
@openai_schema
class UserDetails(BaseModel):
"""Details of a user"""
name: str = Field(..., description="User's full name")
age: int
```
## Code Reference
For more information about the code, including the complete API reference, please refer to the `instructor` documentation.
::: instructor.function_calls
-82
View File
@@ -1,82 +0,0 @@
# Reasking When Validation Fails
Validators are a great tool for ensuring some property of the outputs. When you use the `patch()` method with the `openai` client, you can use the `max_retries` parameter to set the number of times you can reask. This allows the client to reattempt the API call a specified number of times if validation fails. Its another layer of defense against bad outputs of two forms.
1. Pydantic Validation Errors
2. JSON Decoding Errors
## Future Improvements
!!! notes "Contributions Welcome"
The current retry mechanism relies on a while loop. For a more robust solution, contributions to integrate the `tenacity` library are welcome.
## Example: Using Validators for Reasking
The example utilizes Pydantic's field validators in tandem with the `max_retries` parameter. In this example if the `name` field fails validation, the `openai` client will reattempt the API call. Here we use a plain validator, but we can also use [llms for validation](validation.md)
### Step 1: Define the Response Model with Validators
```python
import instructor
from pydantic import BaseModel, field_validator
# Apply the patch to the OpenAI client
instructor.patch()
class UserDetails(BaseModel):
name: str
age: int
@field_validator("name")
@classmethod
def validate_name(cls, v):
if v.upper() != v:
raise ValueError("Name must be in uppercase.")
return v
```
Here, the `UserDetails` class includes a validator for the `name` attribute. The validator checks that the name is in uppercase and raises a `ValueError` otherwise.
### Step 2: Exception Handling and Reasking
When validation fails, several steps are taken:
1. The existing messages are retained for the new API request.
2. The previous function call's response is added back.
3. A user prompt is included to reask the model, with details on the error.
```python
try:
...
except (ValidationError, JSONDecodeError) as e:
kwargs["messages"].append(dict(**response.choices[0].message))
kwargs["messages"].append(
{
"role": "user",
"content": f"Please correct the function call; errors encountered:\n{e}",
}
)
```
## Using the Client with Retries
Here, the `UserDetails` model is passed as the `response_model`, and `max_retries` is set to 2.
```python
model = openai.ChatCompletion.create(
model="gpt-3.5-turbo",
response_model=UserDetails,
max_retries=2,
messages=[
{"role": "user", "content": "Extract jason is 25 years old"},
],
)
assert model.name == "JASON"
```
The `max_retries` parameter will trigger up to 2 reattempts if the `name` attribute fails the uppercase validation in `UserDetails`.
## Takeaways
Instead of framing "self-critique" or "self-reflection" in AI as new concepts, we can view them as validation errors with clear error messages that the systen can use to heal. This approach leverages existing programming practices for error handling, avoiding the need for new methodologies. We simplify the issue into code we already know how to write and leverage pydantic's powerful validation system to do so.
+8 -16
View File
@@ -1,14 +1,7 @@
# Integrated Validation and Reask with LLMs and Pydantic
# Validation and Reask with LLMs and Pydantic
Instead of framing "self-critique" or "self-reflection" in AI as new concepts, we can view them as validation errors with clear error messages that the systen can use to self heal.
## Applications and Scenarios
- **Content Moderation**: LLMs can be trained or guided to recognize and filter out objectionable or sensitive material, ensuring a safer user experience.
- **Reflecting on Chain of Thought**: As LLMs can evaluate their own reasoning process, this opens doors to even more reliable and dependable automated systems.
- **Verifying Hallucinations**: LLMs can be configured to recognize when they generate data or responses that do not align with facts or reliable data, reducing the risk of disseminating false information.
- **Data Integrity**: Enforces data quality standards.
## Pythonic Validation with Pydantic and Instructor
1. **Uniform Validation API**: Pydantic provides identical developer experience, whether using code-based or LLM-based validation.
@@ -20,6 +13,7 @@ Instead of framing "self-critique" or "self-reflection" in AI as new concepts, w
Validation is crucial when using Large Language Models (LLMs) for data extraction. It ensures data integrity, ensuring both quantitative and qualititave correctness with code and llm validations.
!!! note "Pydantic Validation Docs"
Pydantic supports validation individual fields or the whole model dict all at once.
- [Field-Level Validation](https://docs.pydantic.dev/latest/usage/validators/)
@@ -27,11 +21,10 @@ Validation is crucial when using Large Language Models (LLMs) for data extractio
To see the most up to date examples check out our repo [jxnl/instructor/examples/validators](https://github.com/jxnl/instructor/tree/main/examples/validators)
### Code-Based Validation Example
!!! note "Model Level Evaluation"
Right now we only go over the field level examples, check out [Model-Level Validation](https://docs.pydantic.dev/latest/usage/validators/#model-validators) if you want to see how to do model level evaluation
Right now we only go over the field level examples, check out [Model-Level Validation](https://docs.pydantic.dev/latest/usage/validators/#model-validators) if you want to see how to do model level evaluation
Enforce a naming rule using Pydantic's built-in validation:
@@ -75,7 +68,7 @@ from instruct import llm_validator
class QuestionAnswer(BaseModel):
question: str
answer: Annotated[
str,
str,
BeforeValidator(llm_validator("don't say objectionable things"))
]
@@ -107,7 +100,6 @@ Its a great layer of defense against bad outputs of two forms.
1. Pydantic Validation Errors (code or llm based)
2. JSON Decoding Errors (when the model returns a bad response)
### Step 1: Define the Response Model with Validators
Noticed the field validator wants the name in uppercase, but the user input is lowercase. The validator will raise a `ValueError` if the name is not in uppercase.
@@ -117,7 +109,7 @@ import instructor
from pydantic import BaseModel, field_validator
# Apply the patch to the OpenAI client
instructor.patch()
client = instructor.patch(OpenAI())
class UserDetails(BaseModel):
name: str
@@ -136,7 +128,7 @@ class UserDetails(BaseModel):
Here, the `UserDetails` model is passed as the `response_model`, and `max_retries` is set to 2.
```python hl_lines="4 10"
model = openai.ChatCompletion.create(
model = client.chat.completions.create(
model="gpt-3.5-turbo",
response_model=UserDetails,
max_retries=2,
@@ -150,7 +142,7 @@ assert model.name == "JASON"
### What happens behind the scenes?
Behind the scenes, the `instructor.patch()` method adds a `max_retries` parameter to the `openai.ChatCompletion.create()` method. The `max_retries` parameter will trigger up to 2 reattempts if the `name` attribute fails the uppercase validation in `UserDetails`.
Behind the scenes, the `instructor.patch()` method adds a `max_retries` parameter to the `openai.ChatCompletion.create()` method. The `max_retries` parameter will trigger up to 2 reattempts if the `name` attribute fails the uppercase validation in `UserDetails`.
```python
try:
@@ -174,4 +166,4 @@ The docs are currently incomplete, but we have a few advanced validation techniq
## Takeaways
By integrating these advanced validation techniques, we not only improve the quality and reliability of LLM-generated content but also pave the way for more autonomous and effective systems.
By integrating these advanced validation techniques, we not only improve the quality and reliability of LLM-generated content but also pave the way for more autonomous and effective systems.
+13 -4
View File
@@ -18,7 +18,7 @@ This approach to "chain of thought" improves data quality but can have modular c
from pydantic import BaseModel, Field
class Role(BaseModel):
chain_of_thought: str = Field(...,
chain_of_thought: str = Field(...,
description="Think step by step to determine the correct title")
title: str
@@ -92,9 +92,19 @@ class UserDetail(BaseModel):
age: int
name: str
role: Role = Field(description="Correctly assign one of the predefined roles to the user.")
```
If you're having a hard time with `Enum` and alternative is to use `Literal`
```python hl_lines="4"
class UserDetail(BaseModel):
age: int
name: str
role: Literal["PRINCIPAL", "TEACHER", "STUDENT", "OTHER"]
```
If you'd like to improve performance more you can reiterate the requirements in the field descriptions or in the docstrings.
## Reiterate Long Instructions
For complex attributes, it helps to reiterate the instructions in the field's description.
@@ -166,7 +176,7 @@ For multiple users, aim to use consistent key names when extracting properties.
```python
class UserDetails(BaseModel):
"""
Extract information for multiple users.
Extract information for multiple users.
Use consistent key names for properties across users.
"""
users: List[UserDetail]
@@ -215,4 +225,3 @@ class TimeRange(BaseModel):
start_time: int = Field(..., description="The start time in hours.")
end_time: int = Field(..., description="The end time in hours.")
```
-153
View File
@@ -1,153 +0,0 @@
# Introduction to Validation in Pydantic and LLMs
Validation is crucial when using Large Language Models (LLMs) for data extraction. It ensures data integrity, enables [reasking for better results](reask.md), and allows for overwriting incorrect values. Pydantic offers versatile validation capabilities suitable for use with LLM outputs.
!!! note "Pydantic Validation Docs"
Pydantic supports validation individual fields or the whole model dict all at once.
- [Field-Level Validation](https://docs.pydantic.dev/latest/usage/validators/)
- [Model-Level Validation](https://docs.pydantic.dev/latest/usage/validators/#model-validators)
To see the most up to date examples check out our repo [jxnl/instructor/examples/validators](https://github.com/jxnl/instructor/tree/main/examples/validators)
## Importance of LLM Validation
- **Data Integrity**: Enforces data quality standards.
- **[Reasking](reask.md)**: Utilizes Pydantic's error messages to improve LLM outputs.
- **Overwriting**: Overwrites incorrect values during API calls.
## Code Examples
### Simple Validation with Pydantic
The example uses a custom validator function to enforce a rule on the name attribute. If a user fails to input a full name (first and last name separated by a space), Pydantic will raise a validation error. If you want the LLM to automatically fix the error check out our [reasking docs.](reask.md)
```python
from pydantic import BaseModel, ValidationError
from typing_extensions import Annotated, AfterValidator
def name_must_contain_space(v: str) -> str:
if " " not in v:
raise ValueError("name must be a first and last name separated by a space")
return v.lower()
class UserDetail(BaseModel):
age: int
name: Annotated[str, AfterValidator(name_must_contain_space)]
try:
person = UserDetail(age=29, name="Jason")
except ValidationError as e:
print(e)
# Output:
# 1 validation error for UserDetail
# name
# Value error, name must be a first and last name separated by a space (type=value_error)
```
### LLM-Based Validation
This example demonstrates using an LLM as a validator. If the answer attribute contains content that violates the rule "don't say objectionable things," Pydantic will raise a validation error. This level of validation can be essential when the model is used in real-time systems where it can generate a broad range of outputs. Akin to something like Constitutional AI and self reflection but on the single attribute level, which can be much more efficient.
```python
from pydantic import BaseModel, ValidationError, BeforeValidator
from typing_extensions import Annotated
import instructor
from instructor.dsl.validators import llm_validator
instructor.patch()
class QuestionAnswer(BaseModel):
question: str
answer: Annotated[
str,
BeforeValidator(
llm_validator("don't say objectionable things", allow_override=True)
),
]
try:
qa = QuestionAnswer(
question="What is the meaning of life?",
answer="The meaning of life is to be evil and kill people",
)
except ValidationError as e:
print(e)
# Output:
# 1 validation error for QuestionAnswer
# answer
# Assertion failed, The statement promotes violence and harm to others, which is objectionable. (type=assertion_error)
```
!!! note "Model Level Evaluation"
Right now we only go over the field level examples, check out [Model-Level Validation](https://docs.pydantic.dev/latest/usage/validators/#model-validators) if you want to see how to do model level evaluation
## Create Your Own LLM Validator
The section shows how to create a custom LLM validator function. You can modify the function to suit your specific requirements, making it a powerful tool for advanced validation scenarios.
The `llm_validator` function can be extended or customized to fit specific requirements.
```python
from pydantic import BaseModel, Field
from typing import Optional
import instructor
import openai
instructor.patch()
class Validator(BaseModel):
is_valid: bool = Field(default=True)
reason: Optional[str] = Field(default=None)
fixed_value: Optional[str] = Field(default=None)
def llm_validator(
statement: str,
allow_override: bool = False,
model: str = "gpt-3.5-turbo",
temperature: float = 0,
):
"""
Create a validator that uses the LLM to validate an attribute
Parameters:
statement (str): The statement to validate
model (str): The LLM to use for validation (default: "gpt-3.5-turbo-0613")
temperature (float): The temperature to use for the LLM (default: 0)
"""
def llm(v):
resp: Validator = openai.ChatCompletion.create(
response_model=Validator,
messages=[
{
"role": "system",
"content": "You are a world class validation model. Capable to determine if the following value is valid for the statement, if it is not, explain why and suggest a new value.",
},
{
"role": "user",
"content": f"Does `{v}` follow the rules: {statement}",
},
],
model=model,
temperature=temperature,
) # type: ignore
# If the response is not valid, return the reason, this could be used in
# the future to generate a better response, via reasking mechanism.
assert resp.is_valid, resp.reason
if allow_override and not resp.is_valid and resp.fixed_value is not None:
# If the value is not valid, but we allow override, return the fixed value
return resp.fixed_value
return v
return llm
```
By integrating these advanced validation techniques, we not only improve the quality and reliability of LLM-generated content but also pave the way for more autonomous and effective systems.
-178
View File
@@ -1,178 +0,0 @@
# Writing prompts with `ChatCompletion`
The ChatCompletion pipeline API provides a convenient way to build prompts with clear instructions and structure. It helps avoid the need to remember best practices for wording and prompt construction. This documentation will demonstrate an example pipeline and guide you through the process of using it.
Our goals are to:
1. Define some best practices with a light abstraction over a chat message
2. Allow the pipeline to be intuitive and readable.
3. Abstract the output shape and deserialization to better usability
## Example Pipeline
We will begin by defining a task to segment queries and add instructions using the prompt pipeline API.
1. We want to define a search object to extract
2. We want to extract multiple instances of such an object
3. We want to define the pipeline with a set of instructions
4. We want to easily call OpenAI and extract the data back out of the competion
!!! note "Applications"
Extracted a repeated task out of instructions is a fairly common task.
Prompting tips have been to define the task clearly, model the output object and provide tips to the llm for better performance. Something like this can be used to power agents like Siri or Alexa in performing multiple tasks in one request. [Read more](examples/search.md)
### Designing the Schema
First, let's design the schema for our task. In this example, we will have a `SearchQuery` schema with a single field called `query`. The `query` field will represent a detailed, comprehensive, and specific query to be used for semantic search.
```python
from instructor import OpenAISchema, dsl
from pydantic import Field
class SearchQuery(OpenAISchema):
query: str = Field(
...,
description="Detailed, comprehensive, and specific query to be used for semantic search",
)
SearchResponse = dsl.MultiTask(
subtask_class=SearchQuery,
)
```
!!! tip "MultiTask"
To learn more about the `MultiTask` functionality, you can refer to the [MultiTask](multitask.md) documentation.
### Building our Prompts
Next, let's write out prompt using the pipeline style. We will leverage the features provided by the `ChatCompletion` class and utilize the `|` operator to chain different components of our prompt together.
```python
task = (
dsl.ChatCompletion(#(1)!
name="Segmenting Search requests example",
model='gpt-3.5-turbo-0613,
max_token=1000)
| dsl.SystemTask(task="Segment search results") #(2)!
| dsl.TaggedMessage(#(3)!
content="can you send me the data about the video investment and the one about spot the dog?",
tag="query")
| dsl.TipsMessage(#(4)!
tips=[
"Expand query to contain multiple forms of the same word (SSO -> Single Sign On)",
"Use the title to explain what the query should return, but use the query to complete the search",
"The query should be detailed, specific, and cast a wide net when possible",
])
| SearchResponse #(5)!
)
```
1. Define the completion object (consider this both task and prompt)
2. SystemTask augments the `task` with "You are a *world class* ... *correctly* complete the task: {task}"
3. TaggedMessage wraps content with `<query></query>` to set clear boundaries for the data you wish to process
4. TipsMessages allows you to pass a list of strings as tips as a result we can potentially create this list dynamically
5. Last step defines the output model you want to use to parse the results if no output model is defined we revert to the usual openai completion.
The `ChatCompletion` class is responsible for model configuration, while the `|` operator allows us to construct the prompt in a readable manner. We can add `Messages` or `OpenAISchema` components to the prompt pipeline using `|`, and the `ChatCompletion` class will handle the prompt construction for us.
In the above example, we:
- Initialize a `ChatCompletion` object with the desired model and maximum token count.
- Add a `SystemTask` component to segment search results.
- Include a `TaggedMessage` component to provide a query with a specific tag.
- Use a `TipsMessage` component to include some helpful tips related to the task.
- Connect the `SearchResponse` schema to the pipeline.
Lastly, we create the `search_request` using `task.create()`. The `search_request` object will be of type `SearchResponse`, and we can print it as a JSON object.
!!! tip
If you want to see the exact input sent to OpenAI, scroll to the bottom of the page.
```python
search_request = task.create() # type: ignore
assert isinstance(search_request, SearchResponse)
print(search_request.json(indent=2))
```
The output will be a JSON object containing the segmented search queries.
```json
{
"tasks": [
{
"query": "data about video investment"
},
{
"query": "data about spot the dog"
}
]
}
```
## Inspecting the API Call
To make it easy for you to understand what this api is doing we default only construct the kwargs for the chat completion call.
```python
print(task.kwargs)
```
```json
{
"messages": [
{
"role": "system",
"content": "You are a world class state of the art algorithm capable of correctly completing the following task: `Segment search results`."
},
{
"role": "user",
"content": "Consider the following data:\n\n<query>can you send me the data about the video investment and the one about spot the dog?</query>"
},
{
"role": "user",
"content": "Here are some tips to help you complete the task:\n\n* Expand query to contain multiple forms of the same word (SSO -> Single Sign On)\n* Use the title to explain what the query should return, but use the query to complete the search\n* The query should be detailed, specific, and cast a wide net when possible"
}
],
"functions": [
{
"name": "MultiSearchQuery",
"description": "Correctly segmented set of search queries",
"parameters": {
"type": "object",
"properties": {
"tasks": {
"description": "Correctly segmented list of `SearchQuery` tasks",
"type": "array",
"items": {
"$ref": "#/definitions/SearchQuery"
}
}
},
"definitions": {
"SearchQuery": {
"type": "object",
"properties": {
"query": {
"description": "Detailed, comprehensive, and specific query to be used for semantic search",
"type": "string"
}
},
"required": [
"query"
]
}
},
"required": [
"tasks"
]
}
}
],
"function_call": {
"name": "MultiSearchQuery"
},
"max_tokens": 1000,
"temperature": 0.1,
"model": "gpt-3.5-turbo-0613"
}
```
+5 -4
View File
@@ -1,10 +1,11 @@
import openai
import instructor
from openai import OpenAI
from typing import List, Optional
from pydantic import BaseModel, Field
from instructor import patch
from enum import Enum
patch()
client = instructor.patch(OpenAI())
class PriorityEnum(str, Enum):
@@ -49,7 +50,7 @@ class ActionItems(BaseModel):
def generate(data: str) -> ActionItems:
return openai.ChatCompletion.create(
return client.chat.completions.create(
model="gpt-3.5-turbo-0613",
response_model=ActionItems,
messages=[
@@ -1,7 +1,9 @@
from instructor import OpenAISchema
from pydantic import Field
from typing import List, Any
import openai
from openai import OpenAI
client = OpenAI()
class RowData(OpenAISchema):
@@ -33,7 +35,7 @@ class Dataframe(OpenAISchema):
def dataframe(data: str) -> Dataframe:
completion = openai.ChatCompletion.create(
completion = client.chat.completions.create(
model="gpt-3.5-turbo-0613",
temperature=0.1,
functions=[Dataframe.openai_schema],
@@ -42,7 +44,7 @@ def dataframe(data: str) -> Dataframe:
{
"role": "system",
"content": """Map this data into a dataframe a
nd correctly define the correct columns and rows""",
nd correctly define the correct columns and rows""",
},
{
"role": "user",
@@ -1,7 +1,9 @@
from instructor import OpenAISchema
from pydantic import Field
from typing import List, Any
import openai
from openai import OpenAI
client = OpenAI()
class RowData(OpenAISchema):
@@ -42,7 +44,7 @@ class Database(OpenAISchema):
def dataframe(data: str) -> Database:
completion = openai.ChatCompletion.create(
completion = client.chat.completions.create(
model="gpt-4-0613",
temperature=0.0,
functions=[Database.openai_schema],
@@ -51,7 +53,7 @@ def dataframe(data: str) -> Database:
{
"role": "system",
"content": """Map this data into a dataframe a
nd correctly define the correct columns and rows""",
nd correctly define the correct columns and rows""",
},
{
"role": "user",
+5 -2
View File
@@ -1,10 +1,13 @@
import instructor
import openai
from openai import OpenAI
from pydantic import BaseModel, Field
from pprint import pprint
from typing import List
client = instructor.patch(OpenAI())
class Summary(BaseModel):
"""Represents a summary entry in the list.
@@ -58,7 +61,7 @@ ChainOfDenseSummaries = instructor.MultiTask(
def summarize_article(article: str, n_summaries: int = 5, stream: bool = True):
completion = openai.ChatCompletion.create(
completion = client.chat.completions.create(
model="gpt-3.5-turbo-16k",
stream=stream,
messages=[
@@ -1,12 +1,12 @@
from typing import List
from loguru import logger
import openai
import instructor
from typing import List
from loguru import logger
from openai import OpenAI
from pydantic import Field, BaseModel, FieldValidationInfo, model_validator
client = instructor.patch(OpenAI())
class Fact(BaseModel):
statement: str = Field(
@@ -82,7 +82,7 @@ class QuestionAnswer(instructor.OpenAISchema):
def ask_ai(question: str, context: str) -> QuestionAnswer:
completion = openai.ChatCompletion.create(
completion = client.chat.completions.create(
model="gpt-3.5-turbo-0613",
temperature=0,
functions=[QuestionAnswer.openai_schema],
+7 -3
View File
@@ -7,11 +7,13 @@ from pydantic import BaseModel, Field
from starlette.responses import StreamingResponse
import os
import openai
import instructor
import logging
from openai import OpenAI
from instructor.dsl.multitask import MultiTaskBase
client = instructor.patch(OpenAI())
logger = logging.getLogger(__name__)
# FastAPI app
@@ -79,7 +81,7 @@ class Question(BaseModel):
# Function to extract entities from input text using GPT-3.5
def stream_extract(question: Question) -> Iterable[Fact]:
completion = openai.ChatCompletion.create(
completion = client.chat.completions.create(
model="gpt-3.5-turbo-0613",
temperature=0,
stream=True,
@@ -124,7 +126,9 @@ def get_api_key(request: Request):
# Route to handle SSE events and return users
@app.post("/extract", response_class=StreamingResponse)
async def extract(question: Question, openai_key=Depends(get_api_key)):
openai.api_key = openai_key
raise Exception(
"The 'openai.api_key' option isn't read in the client API. You will need to pass it when you instantiate the client, e.g. 'OpenAI(api_key=openai_key)'"
)
facts = stream_extract(question)
async def generate():
@@ -1,6 +1,6 @@
fastapi
uvicorn
openai
openai>=1.0.0
pydantic
instructor
regex
+7 -6
View File
@@ -1,10 +1,11 @@
from typing import List
import enum
import openai
from pydantic import BaseModel
from instructor import patch
import instructor
patch()
from typing import List
from openai import OpenAI
from pydantic import BaseModel
client = instructor.patch(OpenAI())
# Define new Enum class for multiple labels
@@ -21,7 +22,7 @@ class MultiClassPrediction(BaseModel):
# Modify the classify function
def multi_classify(data: str) -> MultiClassPrediction:
return openai.ChatCompletion.create(
return client.chat.completions.create(
model="gpt-3.5-turbo-0613",
response_model=MultiClassPrediction,
messages=[
+6 -5
View File
@@ -1,9 +1,10 @@
import enum
import openai
from pydantic import BaseModel
from instructor import patch
import instructor
from openai import OpenAI
patch()
from pydantic import BaseModel
client = instructor.patch(OpenAI())
class Labels(str, enum.Enum):
@@ -20,7 +21,7 @@ class SinglePrediction(BaseModel):
def classify(data: str) -> SinglePrediction:
return openai.ChatCompletion.create(
return client.chat.completions.create(
model="gpt-3.5-turbo-0613",
response_model=SinglePrediction,
messages=[
+5 -5
View File
@@ -4,15 +4,15 @@
# api_path: /api/v1/extract_person
# json_schema_path: ./input.json
import instructor
from fastapi import FastAPI
from pydantic import BaseModel
from jinja2 import Template
from models import ExtractPerson
from openai import AsyncOpenAI
import openai
import instructor
instructor.patch()
aclient = instructor.apatch(AsyncOpenAI())
app = FastAPI()
@@ -35,7 +35,7 @@ PROMPT_TEMPLATE = Template(
@app.post("/api/v1/extract_person", response_model=ExtractPerson)
async def extract_person(input: RequestSchema) -> ExtractPerson:
rendered_prompt = PROMPT_TEMPLATE.render(**input.template_variables.model_dump())
return await openai.ChatCompletion.acreate(
return await aclient.chat.completions.create(
model=input.model,
temperature=input.temperature,
response_model=ExtractPerson,
+6 -7
View File
@@ -1,11 +1,10 @@
from typing import List
from enum import Enum
from pydantic import BaseModel, Field
import openai
import instructor
from openai import OpenAI
instructor.patch()
client = instructor.patch(OpenAI())
class CRMSource(Enum):
@@ -41,16 +40,16 @@ class CRMSearchQuery(BaseModel):
def query_crm(query: str) -> CRMSearchQuery:
queries = openai.ChatCompletion.create(
queries = client.chat.completions.create(
model="gpt-3.5-turbo",
response_model=CRMSearchQuery,
messages=[
{
"role": "system",
"content": """
You are a world class CRM search career generator.
You will take the user query and decompose it into a set of CRM queries queries.
""",
You are a world class CRM search career generator.
You will take the user query and decompose it into a set of CRM queries queries.
""",
},
{"role": "user", "content": query},
],
+4 -5
View File
@@ -1,11 +1,10 @@
import instructor
import openai
from typing import List
from pydantic import BaseModel
instructor.patch()
import instructor
from openai import OpenAI
# Define Schemas for PII data
client = instructor.patch(OpenAI())
class Data(BaseModel):
@@ -50,7 +49,7 @@ At the moment, John is employed at Company A. He started his role as a Software
"""
# Define the PII Scrubbing Model
pii_data: PIIDataExtraction = openai.ChatCompletion.create(
pii_data: PIIDataExtraction = client.chat.completions.create(
model="gpt-3.5-turbo",
response_model=PIIDataExtraction,
messages=[
+5 -2
View File
@@ -1,9 +1,12 @@
import openai
import instructor
from openai import OpenAI
from typing import List
from pydantic import Field
from instructor import OpenAISchema
client = instructor.patch(OpenAI())
class File(OpenAISchema):
"""
@@ -29,7 +32,7 @@ class Program(OpenAISchema):
def develop(data: str) -> Program:
completion = openai.ChatCompletion.create(
completion = client.chat.completions.create(
model="gpt-3.5-turbo-0613",
temperature=0.1,
functions=[Program.openai_schema],
+5 -4
View File
@@ -1,10 +1,12 @@
import openai
import instructor
from openai import OpenAI
from pydantic import Field, parse_file_as
from instructor import OpenAISchema
from generate import Program
client = instructor.patch(OpenAI())
class Diff(OpenAISchema):
"""
@@ -61,8 +63,7 @@ def refactor(new_requirements: str, program: Program) -> Diff:
program_description = "\n".join(
[f"{code.file_name}\n[[[\n{code.body}\n]]]\n" for code in program.files]
)
completion = openai.ChatCompletion.create(
# model="gpt-3.5-turbo-0613",
completion = client.chat.completions.create(
model="gpt-4",
temperature=0,
functions=[Diff.openai_schema],
+6 -4
View File
@@ -1,10 +1,12 @@
import instructor
from graphviz import Digraph
from pydantic import BaseModel, Field
from typing import List
import openai
import instructor
from openai import OpenAI
instructor.patch()
client = instructor.patch(OpenAI())
class Node(BaseModel):
@@ -26,7 +28,7 @@ class KnowledgeGraph(BaseModel):
def generate_graph(input) -> KnowledgeGraph:
return openai.ChatCompletion.create(
return client.chat.completions.create(
model="gpt-3.5-turbo-16k",
messages=[
{
+27 -30
View File
@@ -1,10 +1,11 @@
from typing import List, Optional
import instructor
from pydantic import BaseModel, Field
import openai
import enum
instructor.patch()
from typing import List, Optional
from pydantic import BaseModel, Field
from openai import OpenAI
client = instructor.patch(OpenAI())
class Action(enum.Enum):
@@ -85,35 +86,31 @@ initial_messages = [
},
]
response: Response = openai.ChatCompletion.create(
response: Response = client.chat.completions.create(
messages=initial_messages, response_model=Response, model="gpt-4"
) # type: ignore
print(response.model_dump_json(indent=2))
"""
{
"text": "Updating task to create 20 GIFs and creating a new task to create an additional 20 animated GIFs after the initial task is done.",
"task_action": [
{
"id": 23,
"method": "update_task",
"waiting_on": null,
"name": "Create 20 new GIFs",
"notes": "The user increased the number of GIFs from 10 to 20. They plan to create these as they work through their daily tasks, creating about one to two GIFs per day. If this plan doesn't work, they will reconsider their strategy.",
"bucket": "taskbot",
"project": "personal_site"
},
{
"id": 24,
"method": "create_task",
"waiting_on": [
23
],
"name": "Create 20 new animated GIFs",
"notes": "The task will be initiated once the task with id 23 is completed.",
"bucket": "taskbot",
"project": "personal_site"
}
]
"text": "Updating task to create 20 GIFs and creating a new task to create an additional 20 animated GIFs after the initial task is done.",
"task_action": [
{
"id": 23,
"method": "update_task",
"waiting_on": None,
"name": "Create 20 new GIFs",
"notes": "The user increased the number of GIFs from 10 to 20. They plan to create these as they work through their daily tasks, creating about one to two GIFs per day. If this plan doesn't work, they will reconsider their strategy.",
"bucket": "taskbot",
"project": "personal_site",
},
{
"id": 24,
"method": "create_task",
"waiting_on": [23],
"name": "Create 20 new animated GIFs",
"notes": "The task will be initiated once the task with id 23 is completed.",
"bucket": "taskbot",
"project": "personal_site",
},
],
}
"""
@@ -1,28 +1,11 @@
"""
This script is used to segment a request into multiple search queries and perform them asynchronously.
The `Search` class represents a single search query and has the `execute` method to perform the search.
The `MultiSearch` class represents multiple searches and has an `execute` method that runs all the
searches concurrently using asyncio.
The `segment` function uses OpenAI's GPT-3 model to convert a given string into multiple search queries,
which are then run by calling the `execute` method of the returned `MultiSearch` object.
Examples:
>>> queries = segment(
... "Please send me the video from last week about the investment case study and also documents about your GPDR policy?"
... )
>>> queries.execute()
# Expected output:
# >>> Searching for `Video` with query `investment case study` using `SearchType.VIDEO`
# >>> Searching for `Documents` with query `GPDR policy` using `SearchType.EMAIL`
"""
import enum
import instructor
from typing import List
from openai import OpenAI
from pydantic import Field, BaseModel
import openai
from pydantic import Field
from instructor import OpenAISchema
client = instructor.patch(OpenAI())
class SearchType(str, enum.Enum):
@@ -32,7 +15,7 @@ class SearchType(str, enum.Enum):
EMAIL = "email"
class Search(OpenAISchema):
class Search(BaseModel):
"""
Class representing a single search query which contains title, query and the search type
"""
@@ -50,7 +33,7 @@ class Search(OpenAISchema):
)
class MultiSearch(OpenAISchema):
class MultiSearch(BaseModel):
"""
Class representing multiple search queries.
Make sure they contain all the required attributes
@@ -81,11 +64,10 @@ def segment(data: str) -> MultiSearch:
MultiSearch: An object representing the multiple search queries.
"""
completion = openai.ChatCompletion.create(
completion = client.chat.completions.create(
model="gpt-4-0613",
temperature=0.1,
functions=[MultiSearch.openai_schema],
function_call={"name": MultiSearch.openai_schema["name"]},
response_model=MultiSearch,
messages=[
{
"role": "system",
+10 -4
View File
@@ -1,9 +1,11 @@
import openai
import instructor
from openai import OpenAI
from pydantic import BaseModel
from instructor import patch
# By default, the patch function will patch the ChatCompletion.create and ChatCompletion.acreate methods. to support response_model parameter
patch()
client = instructor.patch(OpenAI())
# Now, we can use the response_model parameter using only a base model
@@ -13,7 +15,7 @@ class UserExtract(BaseModel):
age: int
user: UserExtract = openai.ChatCompletion.create(
user: UserExtract = client.chat.completions.create(
model="gpt-3.5-turbo",
response_model=UserExtract,
messages=[
@@ -22,3 +24,7 @@ user: UserExtract = openai.ChatCompletion.create(
) # type: ignore
print(user)
{
"name": "Jason",
"age": 25,
}
@@ -1,11 +1,12 @@
import asyncio
import enum
import instructor
from typing import List
from openai import OpenAI
from pydantic import Field, BaseModel
import openai
from pydantic import Field
from instructor import OpenAISchema
client = instructor.patch(OpenAI())
class QueryType(str, enum.Enum):
@@ -18,7 +19,7 @@ class QueryType(str, enum.Enum):
MERGE_MULTIPLE_RESPONSES = "MERGE_MULTIPLE_RESPONSES"
class ComputeQuery(OpenAISchema):
class ComputeQuery(BaseModel):
"""
Models a computation of a query, assume this can be some RAG system like llamaindex
"""
@@ -27,7 +28,7 @@ class ComputeQuery(OpenAISchema):
response: str = "..."
class MergedResponses(OpenAISchema):
class MergedResponses(BaseModel):
"""
Models a merged response of multiple queries.
Currently we just concatinate them but we can do much more complex things.
@@ -36,7 +37,7 @@ class MergedResponses(OpenAISchema):
responses: List[ComputeQuery]
class Query(OpenAISchema):
class Query(BaseModel):
"""
Class representing a single question in a question answer subquery.
Can be either a single question or a multi question merge.
@@ -82,7 +83,7 @@ class Query(OpenAISchema):
return resp
class QueryPlan(OpenAISchema):
class QueryPlan(BaseModel):
"""
Container class representing a tree of questions to ask a question answer system.
and its dependencies. Make sure every question is in the tree, and every question is asked only once.
@@ -131,11 +132,8 @@ def query_planner(question: str, plan=False) -> QueryPlan:
"content": "Lets think step by step to find correct set of queries and its dependencies and not make any assuptions on what is known.",
},
)
completion = openai.ChatCompletion.create(
model=PLANNING_MODEL,
temperature=0,
messages=messages,
max_tokens=1000,
completion = client.chat.completions.create(
model=PLANNING_MODEL, temperature=0, messages=messages, max_tokens=1000
)
messages.append(completion["choices"][0]["message"])
@@ -147,7 +145,7 @@ def query_planner(question: str, plan=False) -> QueryPlan:
}
)
completion = openai.ChatCompletion.create(
completion = client.chat.completions.create(
model=ANSWERING_MODEL,
temperature=0,
functions=[QueryPlan.openai_schema],
@@ -1,11 +1,12 @@
import enum
import instructor
from typing import List
from openai import OpenAI
from pydantic import BaseModel, Field
import openai
from pydantic import Field
from tenacity import retry, stop_after_attempt
from instructor import OpenAISchema
client = instructor.patch(OpenAI())
class NodeType(str, enum.Enum):
@@ -15,7 +16,7 @@ class NodeType(str, enum.Enum):
FOLDER = "folder"
class Node(OpenAISchema):
class Node(BaseModel):
"""
Class representing a single node in a filesystem. Can be either a file or a folder.
Note that a file cannot have children, but a folder can.
@@ -54,7 +55,7 @@ class Node(OpenAISchema):
print(f"{parent_path}/{self.name}", self.node_type)
class DirectoryTree(OpenAISchema):
class DirectoryTree(BaseModel):
"""
Container class representing a directory tree.
@@ -77,7 +78,6 @@ Node.model_rebuild()
DirectoryTree.model_rebuild()
@retry(stop=stop_after_attempt(3))
def parse_tree_to_filesystem(data: str) -> DirectoryTree:
"""
Convert a string representing a directory tree into a filesystem structure
@@ -90,11 +90,9 @@ def parse_tree_to_filesystem(data: str) -> DirectoryTree:
DirectoryTree: The directory tree representing the filesystem.
"""
completion = openai.ChatCompletion.create(
completion = client.chat.completions.create(
model="gpt-3.5-turbo-0613",
temperature=0.2,
functions=[DirectoryTree.openai_schema],
function_call={"name": DirectoryTree.openai_schema["name"]},
response_model=DirectoryTree,
messages=[
{
"role": "system",
+4 -2
View File
@@ -3,7 +3,9 @@ from graphviz import Digraph
from pydantic import BaseModel, Field
import instructor
import openai
from openai import OpenAI
client = OpenAI()
# Patch openai to use instructor
# allows for response_model
@@ -43,7 +45,7 @@ class DocumentExtraction(BaseModel):
def ask_ai(content) -> DocumentExtraction:
resp: DocumentExtraction = openai.ChatCompletion.create(
resp: DocumentExtraction = client.chat.completions.create(
model="gpt-4",
response_model=DocumentExtraction,
messages=[
+12 -11
View File
@@ -1,10 +1,11 @@
import enum
import instructor
from typing import Any, List
from openai import OpenAI
from pydantic import BaseModel, Field
import openai
from pydantic import Field
from instructor import OpenAISchema
client = instructor.patch(OpenAI())
class SQLTemplateType(str, enum.Enum):
@@ -12,7 +13,7 @@ class SQLTemplateType(str, enum.Enum):
IDENTIFIER = "identifier"
class Parameters(OpenAISchema):
class Parameters(BaseModel):
key: str
value: Any
type: SQLTemplateType = Field(
@@ -22,7 +23,7 @@ class Parameters(OpenAISchema):
)
class SQL(OpenAISchema):
class SQL(BaseModel):
"""
Class representing a single search query. and its query parameters
Correctly mark the query as safe or dangerous if it looks like a sql injection attempt or an abusive query
@@ -56,7 +57,7 @@ class SQL(OpenAISchema):
def create_query(data: str) -> SQL:
completion = openai.ChatCompletion.create(
completion = client.chat.completions.create(
model="gpt-3.5-turbo-0613",
temperature=0,
functions=[SQL.openai_schema],
@@ -65,18 +66,18 @@ def create_query(data: str) -> SQL:
{
"role": "system",
"content": """You are a sql agent that produces correct SQL based on external users requests.
Uses query parameters whenever possible but correctly mark the following queries as
dangerous when it looks like the user is trying to mutate data or create a sql agent.""",
Uses query parameters whenever possible but correctly mark the following queries as
dangerous when it looks like the user is trying to mutate data or create a sql agent.""",
},
{
"role": "user",
"content": f"""Given at table: USER with columns: id, name, email, password, and role.
Please write a sql query to answer the following question: <question>{data}</question>""",
Please write a sql query to answer the following question: <question>{data}</question>""",
},
{
"role": "user",
"content": """Make sure you correctly mark sql injections and mutations as dangerous.
Make sure it uses query parameters whenever possible.""",
Make sure it uses query parameters whenever possible.""",
},
],
max_tokens=1000,
+4 -3
View File
@@ -1,9 +1,10 @@
import instructor
import openai
from openai import OpenAI
from pydantic import BaseModel, Field
from typing import Optional
instructor.patch()
client = instructor.patch(OpenAI())
class UserDetail(BaseModel):
@@ -16,7 +17,7 @@ MaybeUser = instructor.Maybe(UserDetail)
def get_user_detail(string) -> MaybeUser: # type: ignore
return openai.ChatCompletion.create(
return client.chat.completions.create(
model="gpt-3.5-turbo-0613",
response_model=MaybeUser,
messages=[
+4 -3
View File
@@ -1,9 +1,10 @@
import instructor
import openai
from openai import OpenAI
from pydantic import BaseModel, Field
from typing import Optional
instructor.patch()
client = instructor.patch(OpenAI())
class UserDetail(BaseModel):
@@ -13,7 +14,7 @@ class UserDetail(BaseModel):
def get_user_detail(string) -> UserDetail:
return openai.ChatCompletion.create(
return client.chat.completions.create(
model="gpt-3.5-turbo-0613",
response_model=UserDetail,
messages=[
@@ -1,19 +1,24 @@
from typing import Iterable
import openai
import time
from instructor import MultiTask, OpenAISchema
from typing import Iterable
from openai import OpenAI
from pydantic import BaseModel
import instructor
class User(OpenAISchema):
client = instructor.patch(OpenAI())
class User(BaseModel):
name: str
job: str
age: int
def stream_extract(input: str, cls) -> Iterable[User]:
MultiUser = MultiTask(cls)
completion = openai.ChatCompletion.create(
MultiUser = instructor.MultiTask(cls)
completion = client.chat.completions.create(
model="gpt-4-0613",
temperature=0.1,
stream=True,
@@ -13,10 +13,13 @@ Added by Jan Philipp Harries / @jpdus
import asyncio
from typing import List, Generator
import openai
from openai import OpenAI
from pydantic import Field, BaseModel
from instructor import OpenAISchema
import instructor
client = instructor.patch(OpenAI())
class TaskResult(BaseModel):
@@ -28,7 +31,7 @@ class TaskResults(BaseModel):
results: List[TaskResult]
class Task(OpenAISchema):
class Task(BaseModel):
"""
Class representing a single task in a task plan.
"""
@@ -57,7 +60,7 @@ class Task(OpenAISchema):
return TaskResult(task_id=self.id, result=f"`{self.task}`")
class TaskPlan(OpenAISchema):
class TaskPlan(BaseModel):
"""
Container class representing a tree of tasks and subtasks.
Make sure every task is in the tree, and every task is done only once.
@@ -137,8 +140,8 @@ class TaskPlan(OpenAISchema):
return task_results
Task.update_forward_refs()
TaskPlan.update_forward_refs()
Task.model_rebuild()
TaskPlan.model_rebuild()
def task_planner(question: str) -> TaskPlan:
@@ -153,11 +156,10 @@ def task_planner(question: str) -> TaskPlan:
},
]
completion = openai.ChatCompletion.create(
completion = client.chat.completions.create(
model="gpt-4-0613",
temperature=0,
functions=[TaskPlan.openai_schema],
function_call={"name": TaskPlan.openai_schema["name"]},
response_model=TaskPlan,
messages=messages,
max_tokens=1000,
)
@@ -167,52 +169,42 @@ def task_planner(question: str) -> TaskPlan:
if __name__ == "__main__":
from pprint import pprint
plan = task_planner(
"What is the difference in populations betweend the adjacent countries of Jan's home country and the adjacent countries of Jason's home country?"
)
pprint(plan.dict())
"""
{'task_graph': [{'id': 1,
'subtasks': [],
'task': "Identify Jan's home country"},
{'id': 2,
'subtasks': [1],
'task': "Identify the adjacent countries of Jan's home "
'country'},
{'id': 3,
'subtasks': [2],
'task': 'Calculate the total population of the adjacent '
"countries of Jan's home country"},
{'id': 4,
'subtasks': [],
'task': "Identify Jason's home country"},
{'id': 5,
'subtasks': [4],
'task': "Identify the adjacent countries of Jason's home "
'country'},
{'id': 6,
'subtasks': [5],
'task': 'Calculate the total population of the adjacent '
"countries of Jason's home country"},
{'id': 7,
'subtasks': [3, 6],
'task': 'Calculate the difference in populations between the '
"adjacent countries of Jan's home country and the "
"adjacent countries of Jason's home country"}]}
"""
# execute the plan
results = asyncio.run(plan.execute())
pprint(results, sort_dicts=False)
"""
{1: TaskResult(task_id=1, result="`Identify Jan's home country`"),
4: TaskResult(task_id=4, result="`Identify Jason's home country`"),
2: TaskResult(task_id=2, result="`Identify the adjacent countries of Jan's home country`"),
5: TaskResult(task_id=5, result="`Identify the adjacent countries of Jason's home country`"),
3: TaskResult(task_id=3, result="`Calculate the total population of the adjacent countries of Jan's home country`"),
6: TaskResult(task_id=6, result="`Calculate the total population of the adjacent countries of Jason's home country`"),
7: TaskResult(task_id=7, result="`Calculate the difference in populations between the adjacent countries of Jan's home country and the adjacent countries of Jason's home country`")}
"""
print(plan.model_dump_json(indent=2))
{
"task_graph": [
{"id": 1, "subtasks": [], "task": "Identify Jan's home country"},
{
"id": 2,
"subtasks": [1],
"task": "Identify the adjacent countries of Jan's home " "country",
},
{
"id": 3,
"subtasks": [2],
"task": "Calculate the total population of the adjacent "
"countries of Jan's home country",
},
{"id": 4, "subtasks": [], "task": "Identify Jason's home country"},
{
"id": 5,
"subtasks": [4],
"task": "Identify the adjacent countries of Jason's home " "country",
},
{
"id": 6,
"subtasks": [5],
"task": "Calculate the total population of the adjacent "
"countries of Jason's home country",
},
{
"id": 7,
"subtasks": [3, 6],
"task": "Calculate the difference in populations between the "
"adjacent countries of Jan's home country and the "
"adjacent countries of Jason's home country",
},
]
}
+4 -2
View File
@@ -2,7 +2,9 @@ import asyncio
from typing_extensions import Annotated
from pydantic import BaseModel, BeforeValidator
from instructor import llm_validator, patch
import openai
from openai import AsyncOpenAI
aclient = AsyncOpenAI()
patch()
@@ -22,7 +24,7 @@ async def main():
question = "What is the meaning of life?"
try:
qa: QuestionAnswerNoEvil = await openai.ChatCompletion.acreate(
qa: QuestionAnswerNoEvil = await aclient.chat.completions.create(
model="gpt-3.5-turbo",
response_model=QuestionAnswerNoEvil,
max_retries=2,
@@ -1,10 +1,11 @@
import instructor
import openai
from openai import OpenAI
from pydantic import BaseModel, Field, model_validator
from typing import Optional
# Enables `response_model` and `max_retries` parameters
instructor.patch()
client = instructor.patch(OpenAI())
class Validation(BaseModel):
@@ -20,7 +21,7 @@ class Validation(BaseModel):
def validator(values):
chain_of_thought = values["chain_of_thought"]
answer = values["answer"]
resp = openai.ChatCompletion.create(
resp = client.chat.completions.create(
model="gpt-3.5-turbo",
messages=[
{
+6 -4
View File
@@ -5,7 +5,9 @@ from pydantic import (
)
from instructor import llm_validator, patch
import openai
from openai import OpenAI
client = OpenAI()
patch()
@@ -18,7 +20,7 @@ class QuestionAnswer(BaseModel):
question = "What is the meaning of life?"
context = "The according to the devil is to live a life of sin and debauchery."
qa: QuestionAnswer = openai.ChatCompletion.create(
qa: QuestionAnswer = client.chat.completions.create(
model="gpt-3.5-turbo",
response_model=QuestionAnswer,
messages=[
@@ -55,7 +57,7 @@ class QuestionAnswerNoEvil(BaseModel):
try:
qa: QuestionAnswerNoEvil = openai.ChatCompletion.create(
qa: QuestionAnswerNoEvil = client.chat.completions.create(
model="gpt-3.5-turbo",
response_model=QuestionAnswerNoEvil,
messages=[
@@ -78,7 +80,7 @@ except Exception as e:
For further information visit https://errors.pydantic.dev/2.3/v/assertion_error
"""
qa: QuestionAnswerNoEvil = openai.ChatCompletion.create(
qa: QuestionAnswerNoEvil = client.chat.completions.create(
model="gpt-3.5-turbo",
response_model=QuestionAnswerNoEvil,
max_retries=1,
+1 -1
View File
@@ -1,7 +1,7 @@
from .function_calls import OpenAISchema, openai_function, openai_schema
from .distil import FinetuneFormat, Instructions
from .dsl import MultiTask, Maybe, llm_validator, CitationMixin
from .patch import patch, unpatch
from .patch import patch
__all__ = [
"OpenAISchema",
+13 -9
View File
@@ -3,16 +3,19 @@ from rich.table import Table
from rich.console import Console
from datetime import datetime
from openai import OpenAI
import openai
import typer
import time
client = OpenAI()
app = typer.Typer()
console = Console()
# Sample response data
def generate_file_table(files: List[openai.File]) -> Table:
def generate_file_table(files: List[openai.types.FileObject]) -> Table:
table = Table(
title="OpenAI Files",
)
@@ -34,15 +37,16 @@ def generate_file_table(files: List[openai.File]) -> Table:
return table
def get_files(limit: int = 5) -> List[openai.File]:
files = openai.File.list(limit=limit)["data"] # type: ignore
files = sorted(files, key=lambda x: x["created_at"], reverse=True)
def get_files(limit: int = 5) -> List[openai.types.FileObject]:
files = client.files.list(limit=limit)
files = files.data
files = sorted(files, key=lambda x: x.created_at, reverse=True)
return files[:limit]
def get_file_status(file_id: str) -> str:
response = openai.File.retrieve(file_id)
return response["status"]
response = client.files.retrieve(file_id)
return response.status
@app.command(
@@ -54,7 +58,7 @@ def upload(
poll: int = typer.Option(5, help="Polling interval in seconds"),
):
with open(filepath, "rb") as file:
response = openai.File.create(file=file, purpose=purpose)
response = client.files.create(file=file, purpose=purpose)
file_id = response["id"]
with console.status(f"Monitoring upload: {file_id}...") as status:
status.spinner_style = "dots"
@@ -74,7 +78,7 @@ def download(
output: str = typer.Argument(..., help="Output path for the downloaded file"),
):
with console.status(f"[bold green]Downloading file {file_id}...", spinner="dots"):
content = openai.File.download(file_id)
content = client.files.download(file_id)
with open(output, "wb") as file:
file.write(content)
console.log(f"[bold green]File {file_id} downloaded successfully!")
@@ -86,7 +90,7 @@ def download(
def delete(file_id: str = typer.Argument(..., help="ID of the file to delete")):
with console.status(f"[bold red]Deleting file {file_id}...", spinner="dots"):
try:
openai.File.delete(file_id)
client.files.delete(file_id)
console.log(f"[bold red]File {file_id} deleted successfully!")
except Exception as e:
console.log(f"[bold red]Error deleting file {file_id}: {e}")
+54 -38
View File
@@ -1,13 +1,14 @@
from typing import List
import openai
from openai import OpenAI
import typer
import time
import json
from rich.live import Live
from rich.table import Table
from rich.console import Console
from datetime import datetime
client = OpenAI()
app = typer.Typer()
console = Console()
@@ -64,12 +65,12 @@ def status_color(status: str) -> str:
)
def get_jobs(limit: int = 5) -> List[openai.FineTuningJob]:
return openai.FineTuningJob.list(limit=limit)["data"]
def get_jobs(limit: int = 5) -> List:
return client.fine_tuning.list(limit=limit)["data"]
def get_file_status(file_id: str) -> str:
response = openai.File.retrieve(file_id)
response = client.files.retrieve(file_id)
return response["status"]
@@ -99,28 +100,35 @@ def watch(
def create_from_id(
id: str = typer.Argument(..., help="ID of the existing fine-tuning job"),
model: str = typer.Option("gpt-3.5-turbo", help="Model to use for fine-tuning"),
n_epochs: int = typer.Option(None, help="Number of epochs for fine-tuning", show_default=False),
batch_size: str = typer.Option(None, help="Batch size for fine-tuning", show_default=False),
learning_rate_multiplier: str = typer.Option(None, help="Learning rate multiplier for fine-tuning", show_default=False),
validation_file_id: str = typer.Option(None, help="ID of the uploaded validation file"),
):
n_epochs: int = typer.Option(
None, help="Number of epochs for fine-tuning", show_default=False
),
batch_size: str = typer.Option(
None, help="Batch size for fine-tuning", show_default=False
),
learning_rate_multiplier: str = typer.Option(
None, help="Learning rate multiplier for fine-tuning", show_default=False
),
validation_file_id: str = typer.Option(
None, help="ID of the uploaded validation file"
),
):
hyperparameters_dict = {}
if n_epochs is not None:
hyperparameters_dict['n_epochs'] = n_epochs
hyperparameters_dict["n_epochs"] = n_epochs
if batch_size is not None:
hyperparameters_dict['batch_size'] = batch_size
hyperparameters_dict["batch_size"] = batch_size
if learning_rate_multiplier is not None:
hyperparameters_dict['learning_rate_multiplier'] = learning_rate_multiplier
hyperparameters_dict["learning_rate_multiplier"] = learning_rate_multiplier
with console.status(
f"[bold green]Creating fine-tuning job from ID {id}...", spinner="dots"
):
job = openai.FineTuningJob.create(
training_file=id,
model=model,
hyperparameters = hyperparameters_dict if hyperparameters_dict else None,
validation_file=validation_file_id if validation_file_id else None
job = client.fine_tuning.create(
training_file=id,
model=model,
hyperparameters=hyperparameters_dict if hyperparameters_dict else None,
validation_file=validation_file_id if validation_file_id else None,
)
console.log(f"[bold green]Fine-tuning job created with ID: {job.id}") # type: ignore
watch(limit=5, poll=2, screen=False)
@@ -133,30 +141,34 @@ def create_from_file(
file: str = typer.Argument(..., help="Path to the file for fine-tuning"),
model: str = typer.Option("gpt-3.5-turbo", help="Model to use for fine-tuning"),
poll: int = typer.Option(2, help="Polling interval in seconds"),
n_epochs: int = typer.Option(None, help="Number of epochs for fine-tuning", show_default=False),
batch_size: str = typer.Option(None, help="Batch size for fine-tuning", show_default=False),
learning_rate_multiplier: str = typer.Option(None, help="Learning rate multiplier for fine-tuning", show_default=False),
n_epochs: int = typer.Option(
None, help="Number of epochs for fine-tuning", show_default=False
),
batch_size: str = typer.Option(
None, help="Batch size for fine-tuning", show_default=False
),
learning_rate_multiplier: str = typer.Option(
None, help="Learning rate multiplier for fine-tuning", show_default=False
),
validation_file: str = typer.Option(None, help="Path to the validation file"),
):
):
hyperparameters_dict = {}
if n_epochs is not None:
hyperparameters_dict['n_epochs'] = n_epochs
hyperparameters_dict["n_epochs"] = n_epochs
if batch_size is not None:
hyperparameters_dict['batch_size'] = batch_size
hyperparameters_dict["batch_size"] = batch_size
if learning_rate_multiplier is not None:
hyperparameters_dict['learning_rate_multiplier'] = learning_rate_multiplier
hyperparameters_dict["learning_rate_multiplier"] = learning_rate_multiplier
with open(file, "rb") as file:
response = openai.File.create(file=file, purpose="fine-tune")
response = client.files.create(file=file, purpose="fine-tune")
file_id = response["id"]
validation_file_id = None
if validation_file:
with open(validation_file, "rb") as val_file:
val_response = openai.File.create(file=val_file, purpose="fine-tune")
val_response = client.files.create(file=val_file, purpose="fine-tune")
validation_file_id = val_response["id"]
with console.status(f"Monitoring upload: {file_id} before finetuning...") as status:
@@ -166,19 +178,23 @@ def create_from_file(
if validation_file_id:
validation_file_status = get_file_status(validation_file_id)
if file_status == "processed" and (not validation_file_id or validation_file_status == "processed"):
if file_status == "processed" and (
not validation_file_id or validation_file_status == "processed"
):
console.log(f"[bold green]File {file_id} uploaded successfully!")
if validation_file_id:
console.log(f"[bold green]Validation file {validation_file_id} uploaded successfully!")
console.log(
f"[bold green]Validation file {validation_file_id} uploaded successfully!"
)
break
time.sleep(poll)
job = openai.FineTuningJob.create(
training_file = file_id,
model = model,
hyperparameters = hyperparameters_dict if hyperparameters_dict else None,
validation_file = validation_file_id if validation_file else None
job = client.fine_tuning.create(
training_file=file_id,
model=model,
hyperparameters=hyperparameters_dict if hyperparameters_dict else None,
validation_file=validation_file_id if validation_file else None,
)
if validation_file_id:
console.log(
@@ -197,7 +213,7 @@ def create_from_file(
def cancel(id: str = typer.Argument(..., help="ID of the fine-tuning job to cancel")):
with console.status(f"[bold red]Cancelling job {id}...", spinner="dots"):
try:
openai.FineTuningJob.cancel(id)
client.fine_tuning.cancel(id)
console.log(f"[bold red]Job {id} cancelled successfully!")
except Exception as e:
console.log(f"[bold red]Error cancelling job {id}: {e}")
+7 -6
View File
@@ -1,15 +1,14 @@
import enum
import functools
import inspect
import json
import uuid
import logging
import inspect
import functools
from typing import Any, Callable, List, Optional
from pydantic import BaseModel, validate_call
import uuid
import openai
from openai import OpenAI
from instructor.function_calls import openai_schema
@@ -86,6 +85,7 @@ class Instructions:
finetune_format: FinetuneFormat = FinetuneFormat.MESSAGES,
indent: int = 2,
include_code_body: bool = False,
openai_client: OpenAI = None,
):
"""
Instructions for distillation and dispatch.
@@ -103,6 +103,7 @@ class Instructions:
self.finetune_format = finetune_format
self.indent = indent
self.include_code_body = include_code_body
self.client = openai_client or OpenAI()
self.logger = logging.getLogger(self.name)
for handler in log_handlers or []:
@@ -155,7 +156,7 @@ class Instructions:
kwargs=kwargs,
base_model=return_base_model,
)
return openai.ChatCompletion.create(
return self.client.chat.completions.create(
**openai_kwargs, model=model, response_model=return_base_model
)
-4
View File
@@ -1,15 +1,11 @@
from .completion import ChatCompletion
from .messages import * # noqa: F403
from .multitask import MultiTask
from .maybe import Maybe
from .validators import llm_validator
from .citation import CitationMixin
__all__ = [ # noqa: F405
"ChatCompletion",
"CitationMixin",
"MultiTask",
"messages",
"Maybe",
"llm_validator",
]
-166
View File
@@ -1,166 +0,0 @@
import openai
from typing import List, Union
from pydantic import BaseModel, Field
from instructor import OpenAISchema
from .messages import ChainOfThought, Message, MessageRole, SystemMessage
class ChatCompletion(BaseModel):
"""
A chat completion is a collection of messages and configration options that can be used to
generate a chat response from the OpenAI API.
Usage:
In order to generate a chat response from the OpenAI API, you need to create a chat completion and then pipe it to a message and a `OpenAISchema`. Then when `create` or `acreate` is called we'll return the response from the API as an instance of `OpenAISchema`.
Example:
```python
class Sum(OpenAISchema):
a: int
b: int
completion = (
ChatCompletion("example")
| TaggedMessage(content="What is 1 + 1?", tag="question")
| Schema
)
print(completion.create())
# Sum(a=1, b=1)
```
Tips:
* You can use the `|` operator to chain multiple messages and functions together
* There should be exactly one function call class (OpenAISchema) per chat completion
* System messages will be concatenated together
* Only one chain of thought message can be used per completion
Attributes:
name (str): The name of the chat completion
model (str): The model to use for the chat completion (default: "gpt-3.5-turbo-0613")
max_tokens (int): The maximum number of tokens to generate (default: 1000)
temperature (float): The temperature to use for the chat completion (default: 0.1)
stream (bool): Whether to stream the response from the API (default: False)
Warning:
Currently we do not support streaming the response from the API, so the stream parameter is not supported yet.
"""
name: str
model: str = Field(default="gpt-3.5-turbo-0613")
max_tokens: int = Field(default=1000)
temperature: float = Field(default=0.1)
stream: bool = Field(default=False)
messages: List[Message] = Field(default_factory=list, repr=False)
system_message: Message = Field(default=None, repr=False)
cot_message: ChainOfThought = Field(default=None, repr=False)
function: OpenAISchema = Field(default=None, repr=False)
def __post_init__(self):
assert not self.stream, "Stream is not supported yet"
def __or__(self, other: Union[Message, OpenAISchema]) -> "ChatCompletion":
"""
Add a message or function to the chat completion, this can be used to chain multiple messages and functions together. It should contain some set of user or system messages along with a function call class (OpenAISchema)
"""
if isinstance(other, Message):
if other.role == MessageRole.SYSTEM:
if not self.system_message:
self.system_message = other # type: ignore
else:
self.system_message.content += "\n\n" + other.content
else:
if isinstance(other, ChainOfThought):
if self.cot_message:
raise ValueError(
"Only one chain of thought message can be used per completion"
)
self.cot_message = other
self.messages.append(other)
else:
if self.function:
raise ValueError(
"Only one function can be used per completion, wrap your tools into a single toolkit schema"
)
self.function = other
assert self.model not in {
"gpt-3.5-turbo",
"gpt-4",
}, "Only *-0613 models can currently use functions"
return self
@property
def kwargs(self) -> dict:
"""
Construct the kwargs for the OpenAI API call
Example:
```python
result = openai.ChatCompletion.create(**self.kwargs)
```
"""
kwargs = {}
messages = []
if self.system_message:
messages.append(self.system_message.dict())
if self.messages:
special_types = {
SystemMessage,
ChainOfThought,
}
messages += [
message.dict()
for message in self.messages
if type(message) not in special_types
]
if self.cot_message:
messages.append(self.cot_message.dict())
kwargs["messages"] = messages
if self.function:
kwargs["functions"] = [self.function.openai_schema]
kwargs["function_call"] = {"name": self.function.openai_schema["name"]}
kwargs["max_tokens"] = self.max_tokens
kwargs["temperature"] = self.temperature
kwargs["model"] = self.model
return kwargs
def create(self):
"""
Create a chat response from the OpenAI API
Returns:
response (OpenAISchema): The response from the OpenAI API
"""
kwargs = self.kwargs
completion = openai.ChatCompletion.create(**kwargs)
if self.function:
return self.function.from_response(completion)
return completion
async def acreate(self):
"""
Create a chat response from the OpenAI API asynchronously
Returns:
response (OpenAISchema): The response from the OpenAI API
"""
kwargs = self.kwargs
completion = openai.ChatCompletion.acreate(**kwargs)
if self.function:
return self.function.from_response(await completion)
return await completion
-26
View File
@@ -1,26 +0,0 @@
from .base import Message, MessageRole
from .messages import (
SystemMessage,
SystemGuidelines,
SystemIdentity,
SystemStyle,
SystemTask,
SystemTips,
ChainOfThought,
)
from .user import TaggedMessage, TipsMessage, UserMessage
__all__ = [
"Message",
"MessageRole",
"ChainOfThought",
"UserMessage",
"TaggedMessage",
"TipsMessage",
"SystemMessage",
"SystemGuidelines",
"SystemIdentity",
"SystemStyle",
"SystemTask",
"SystemTips",
]
-58
View File
@@ -1,58 +0,0 @@
from enum import Enum, auto
from typing import Optional
from pydantic import Field
from pydantic.dataclasses import dataclass
class MessageRole(Enum):
"""
An enum that represents the role of a message.
Attributes:
USER: A message from the user.
SYSTEM: A message from the system.
ASSISTANT: A message from the assistant.
"""
USER = auto()
SYSTEM = auto()
ASSISTANT = auto()
@dataclass
class Message:
"""
A message class that helps build messages for the chat interface.
Attributes:
content (str): The content of the message.
role (MessageRole): The role of the message.
name (Optional[str]): The name of the user, only used if the role is USER.
Tips:
If you want to make custom messages simple make a function that returns the `Message` class and use that as part of your pipes. For example if you want to add additional context:
```python
def GetUserData(user_id) -> Message:
data = ...
return Message(
content="This is some more user data: {data} for {user_id}
role=MessageRole.USER
)
```
"""
content: str = Field(default=None, repr=True)
role: MessageRole = Field(default=MessageRole.USER, repr=False)
name: Optional[str] = Field(default=None)
def dict(self):
assert self.content is not None, "Content must be set!"
obj = {
"role": self.role.name.lower(),
"content": self.content,
}
if self.name and self.role == MessageRole.USER:
obj["name"] = self.name
return obj
-108
View File
@@ -1,108 +0,0 @@
from typing import List
from .base import Message, MessageRole
from pydantic.dataclasses import dataclass
def SystemIdentity(identity: str) -> Message:
"""
Create a system message that tells the user what their identity is.
Parameters:
identity (str): The identity of the user.
Returns:
message (Message): A system message that tells the user what their identity is.
"""
return Message(content=f"You are a {identity.lower()}.", role=MessageRole.SYSTEM)
def SystemTask(task: str) -> Message:
"""
Create a system message that tells the user what task they are doing, uses language to
push the system to behave as a world class algorithm.
Parameters:
task (str): The task the user is doing.
Returns:
message (Message): A system message that tells the user what task they are doing.
"""
return Message(
content=f"You are a world class state of the art algorithm capable of correctly completing the following task: `{task}`.",
role=MessageRole.SYSTEM,
)
def SystemStyle(style: str) -> Message:
"""
Create a system message that tells the user what style they are responding in.
Parameters:
style (str): The style the user is responding in.
Returns:
message (Message): A system message that tells the user what style they are responding in.
"""
return Message(
content=f"You must respond with in following style: {style.lower()}.",
role=MessageRole.SYSTEM,
)
def SystemMessage(content: str) -> Message:
"""
Create a system message.
Parameters:
content (str): The content of the message.
Returns:
message (Message): A system message."""
return Message(content=content, role=MessageRole.SYSTEM)
def SystemGuidelines(guidelines: List[str]) -> Message:
"""
Create a system message that tells the user what guidelines they must follow when responding.
Parameters:
guidelines (List[str]): The guidelines the user must follow when responding.
Returns:
message (Message): A system message that tells the user what guidelines they must follow when responding.
"""
guideline_str = "\n* ".join(guidelines)
return Message(
content=f"Here are the guidelines you must to follow when responding:\n\n* {guideline_str}",
role=MessageRole.SYSTEM,
)
def SystemTips(tips: List[str]) -> Message:
"""
Create a system message that gives the user some tips before responding.
Parameters:
tips (List[str]): The tips the user should follow when responding.
Returns:
message (Message): A system message that gives the user some tips before responding.
"""
tips_str = "\n* ".join(tips)
return Message(
content=f"Here are some tips before responding:\n\n* {tips_str}",
role=MessageRole.SYSTEM,
)
@dataclass
class ChainOfThought(Message):
"""
Special message type to correctly leverage chain of thought reasoning
for the task. This is automatically set as the last message.
"""
def __post_init__(self):
self.content = "Lets think step by step to get the correct answer:"
self.role = MessageRole.ASSISTANT
-54
View File
@@ -1,54 +0,0 @@
from typing import List
from .base import Message, MessageRole
def TipsMessage(
tips: List[str], header: str = "Here are some tips to help you complete the task"
) -> Message:
"""
Create a system message that gives the user tips to help them complete the task.
Parameters:
tips (List[str]): A list of tips to help the user complete the task.
header (str): The header of the message.
Returns:
message (Message): A user message that gives the user tips to help them complete the
"""
tips_str = "\n* ".join(tips)
return Message(
content=f"{header}:\n\n* {tips_str}",
role=MessageRole.USER,
)
def UserMessage(content: str) -> Message:
"""
Create a user message.
Parameters:
content (str): The content of the message.
Returns:
message (Message): A user message.
"""
return Message(content=content, role=MessageRole.USER)
def TaggedMessage(
content: str, tag: str = "data", header: str = "Consider the following data:"
) -> Message:
"""
Create a user message.
Parameters:
content (str): The content of the message.
tag (str): The tag to use, will show up as <tag>content</tag>.
header (str): The header to reference the data
Returns:
message (Message): A user message with the data tagged.
"""
content = f"{header}\n\n<{tag}>{content}</{tag}>"
return Message(content=content, role=MessageRole.USER)
+8 -3
View File
@@ -1,7 +1,8 @@
import openai
from pydantic import Field
from typing import Optional
from openai import OpenAI
import instructor
import openai
class Validator(instructor.OpenAISchema):
@@ -29,6 +30,7 @@ def llm_validator(
allow_override: bool = False,
model: str = "gpt-3.5-turbo",
temperature: float = 0,
openai_client: OpenAI = None,
):
"""
Create a validator that uses the LLM to validate an attribute
@@ -40,7 +42,7 @@ def llm_validator(
from pydantic import BaseModel, Field, field_validator
class User(BaseModel):
name: str = Annotated[str, llm_validator("The name must be a full name all lowercase")]
name: str = Annotated[str, llm_validator("The name must be a full name all lowercase")
age: int = Field(description="The age of the person")
try:
@@ -61,10 +63,13 @@ def llm_validator(
statement (str): The statement to validate
model (str): The LLM to use for validation (default: "gpt-3.5-turbo-0613")
temperature (float): The temperature to use for the LLM (default: 0)
openai_client (OpenAI): The OpenAI client to use (default: None)
"""
openai_client = openai_client or OpenAI()
def llm(v):
resp = openai.ChatCompletion.create(
resp = openai_client.chat.completions.create(
functions=[Validator.openai_schema],
function_call={"name": Validator.openai_schema["name"]},
messages=[
+2 -30
View File
@@ -1,25 +1,3 @@
# MIT License
#
# Copyright (c) 2023 Jason Liu
#
# Permission is hereby granted, free of charge, to any person obtaining a copy
# of this software and associated documentation files (the "Software"), to deal
# in the Software without restriction, including without limitation the rights
# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
# copies of the Software, and to permit persons to whom the Software is
# furnished to do so, subject to the following conditions:
#
# The above copyright notice and this permission notice shall be included in all
# copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
# SOFTWARE.
import json
from docstring_parser import parse
from functools import wraps
@@ -213,16 +191,10 @@ class OpenAISchema(BaseModel):
Returns:
cls (OpenAISchema): An instance of the class
"""
message = completion["choices"][0]["message"]
if throw_error:
assert "function_call" in message, "No function call detected"
assert (
message["function_call"]["name"] == cls.openai_schema["name"]
), "Function name does not match"
message = completion.choices[0].message
return cls.model_validate_json(
message["function_call"]["arguments"],
message.function_call.arguments,
context=validation_context,
strict=strict,
)
+35 -60
View File
@@ -1,11 +1,9 @@
import inspect
from functools import wraps
from json import JSONDecodeError
from pydantic import ValidationError
import openai
import inspect
from pydantic import ValidationError, BaseModel
from typing import Callable, Type, Optional
from pydantic import BaseModel
from .function_calls import OpenAISchema, openai_schema
OVERRIDE_DOCS = """
@@ -22,9 +20,9 @@ If `stream=True` is specified, the response will be parsed using the `from_strea
If need to obtain the raw response from OpenAI's API, you can access it using the `_raw_response` attribute of the response model.
Parameters:
response_model (Union[Type[BaseModel], Type[OpenAISchema]]): The response model to use for parsing the response from OpenAI's API, if available (default: None)
max_retries (int): The maximum number of retries to attempt if the response is not valid (default: 0)
validation_context (dict): The validation context to use for validating the response (default: None)
"""
@@ -124,7 +122,7 @@ def retry_sync(
None,
)
except (ValidationError, JSONDecodeError) as e:
kwargs["messages"].append(dict(**response.choices[0].message)) # type: ignore
kwargs["messages"].append(response.choices[0].message) # type: ignore
kwargs["messages"].append(
{
"role": "user",
@@ -141,7 +139,11 @@ def wrap_chatcompletion(func: Callable) -> Callable:
@wraps(func)
async def new_chatcompletion_async(
response_model=None, validation_context=None, *args, max_retries=1, **kwargs
response_model=None,
validation_context=None,
max_retries=1,
*args,
**kwargs,
):
response_model, new_kwargs = handle_response_model(response_model, kwargs) # type: ignore
response, error = await retry_async(
@@ -158,7 +160,11 @@ def wrap_chatcompletion(func: Callable) -> Callable:
@wraps(func)
def new_chatcompletion_sync(
response_model=None, validation_context=None, *args, max_retries=1, **kwargs
response_model=None,
validation_context=None,
max_retries=1,
*args,
**kwargs,
):
response_model, new_kwargs = handle_response_model(response_model, kwargs) # type: ignore
response, error = retry_sync(
@@ -178,13 +184,9 @@ def wrap_chatcompletion(func: Callable) -> Callable:
return wrapper_function
original_chatcompletion = openai.ChatCompletion.create
original_chatcompletion_async = openai.ChatCompletion.acreate
def patch():
def patch(client):
"""
Patch the `openai.ChatCompletion.create` and `openai.ChatCompletion.acreate` methods
Patch the `client.chat.completions.create` method
Enables the following features:
@@ -192,51 +194,24 @@ def patch():
- `max_retries` parameter to retry the function if the response is not valid
- `validation_context` parameter to validate the response using the pydantic model
- `strict` parameter to use strict json parsing
## Usage
```python
from pydantic import BaseModel, Field
import instructor
instructor.patch()
class User(BaseModel):
name: str = Field(description="The name of the person")
age: int = Field(description="The age of the person")
role: str = Field(description="The role of the person")
user = openai.ChatCompletion.create(
model="gpt-3.5-turbo",
messages=[
{
"role": "user",
"content": "Jason is 20 years old",
},
],
response_model=User,
validation_context={...},
strict=True,
)
print(user.model_dump())
```
## Result
```
{
"name": "Jason Liu",
"age": 20,
"role": "student",
}
```
"""
openai.ChatCompletion.create = wrap_chatcompletion(original_chatcompletion)
openai.ChatCompletion.acreate = wrap_chatcompletion(original_chatcompletion_async)
client.chat.completions.create = wrap_chatcompletion(client.chat.completions.create)
return client
def unpatch():
openai.ChatCompletion.create = original_chatcompletion
openai.ChatCompletion.acreate = original_chatcompletion_async
def apatch(client):
"""
Patch the `client.chat.completions.acreate` and `client.chat.completions.acreate` methods
Enables the following features:
- `response_model` parameter to parse the response from OpenAI's API
- `max_retries` parameter to retry the function if the response is not valid
- `validation_context` parameter to validate the response using the pydantic model
- `strict` parameter to use strict json parsing
"""
client.chat.completions.acreate = wrap_chatcompletion(
client.chat.completions.acreate
)
return client
+100 -50
View File
@@ -1,78 +1,128 @@
site_name: Instructor (openai_function_call)
site_author: Jason Liu
site_description: Enhancing OpenAI function calling with Pydantic
repo_name: instructor
repo_url: https://github.com/jxnl/instructor/
site_url: https://jxnl.github.io/instructor/
edit_uri: edit/main/docs/
copyright: Copyright &copy; 2023 Jason Liu
theme:
name: material
icon:
repo: fontawesome/brands/github
edit: material/pencil
view: material/eye
features:
- announce.dismiss
- content.action.edit
- content.action.view
- content.code.annotate
- content.code.copy
- content.code.select
- content.tabs.link
- content.tooltips
- header.autohide
- navigation.expand
- navigation.footer
- navigation.indexes
- navigation.instant
- navigation.instant.prefetch
- navigation.instant.progress
- navigation.prune
- navigation.sections
- navigation.tabs
- navigation.tabs.sticky
- navigation.expand
- navigation.path
- content.tooltips
- content.code.annotate
- content.code.select
- content.code.copy
- navigation.footer
- search.suggest
# - navigation.tabs.sticky
- navigation.top
- navigation.tracking
- search.highlight
- search.share
- search.suggest
- toc.follow
# - toc.integrate
palette:
# Palette toggle for light mode
- media: "(prefers-color-scheme: light)"
scheme: default
toggle:
icon: material/brightness-7
name: Switch to dark mode
# Palette toggle for dark mode
- media: "(prefers-color-scheme: dark)"
scheme: slate
toggle:
icon: material/brightness-4
name: Switch to light mode
- scheme: default
primary: indigo
accent: indigo
toggle:
icon: material/brightness-7
name: Switch to dark mode
- scheme: slate
primary: black
accent: indigo
toggle:
icon: material/brightness-4
name: Switch to light mode
font:
text: Roboto
code: Roboto Mono
# Extensions
markdown_extensions:
- pymdownx.critic
- abbr
- admonition
- attr_list
- def_list
- footnotes
- md_in_html
- toc:
permalink: true
- pymdownx.arithmatex:
generic: true
- pymdownx.betterem:
smart_enable: all
- pymdownx.caret
- pymdownx.keys
- pymdownx.mark
- pymdownx.tilde
- pymdownx.details
- pymdownx.emoji:
emoji_generator: !!python/name:material.extensions.emoji.to_svg
emoji_index: !!python/name:material.extensions.emoji.twemoji
- pymdownx.highlight:
anchor_linenums: true
line_spans: __span
pygments_lang_class: true
- pymdownx.inlinehilite
- pymdownx.snippets
- pymdownx.superfences
- attr_list
- md_in_html
- admonition
- pymdownx.keys
- pymdownx.magiclink:
normalize_issue_symbols: true
repo_url_shorthand: true
user: jxnl
repo: instructor
- pymdownx.mark
- pymdownx.smartsymbols
- pymdownx.snippets:
auto_append:
- includes/mkdocs.md
- pymdownx.superfences:
custom_fences:
- name: mermaid
class: mermaid
format: !!python/name:pymdownx.superfences.fence_code_format
- pymdownx.tabbed:
alternate_style: true
combine_header_slug: true
slugify: !!python/object/apply:pymdownx.slugs.slugify
kwds:
case: lower
- pymdownx.tasklist:
custom_checkbox: true
- pymdownx.tilde
nav:
- Introduction:
- Getting Started: 'index.md'
- Prompt Engineering Tips: 'tips/index.md'
- Distillation: 'distillation.md'
- Helpers:
- Reasking and Validation Overview: "reask_validation.md"
- Multiple Extractions: "multitask.md"
- Handling Missing Content: "maybe.md"
- Using Validations: "reask_validation.md"
- Streaming Lists: "multitask.md"
- Handling Missing Content: "maybe.md"
- Philosophy: 'philosophy.md'
- Cookbook:
- Overview: 'examples/index.md'
- Text Classification Techniques: 'examples/classification.md'
- AI Self-Assessment: 'examples/self_critique.md'
- Citation Retrieval via Regex: 'examples/exact_citations.md'
- Knowledge Graph Generation: 'examples/knowledge_graph.md'
- Text Classification: 'examples/classification.md'
- Self Critique: 'examples/self_critique.md'
- Citations: 'examples/exact_citations.md'
- Knowledge Graph: 'examples/knowledge_graph.md'
- Entity Resolution: 'examples/entity_resolution.md'
- Search Query Segmentation: 'examples/search.md'
- Query Decomposition in One Go: 'examples/planning-tasks.md'
- Working with Recursive Schemas: 'examples/recursive.md'
- Table Extraction from Text: 'examples/autodataframe.md'
- Search Queries: 'examples/search.md'
- Query Decomposition: 'examples/planning-tasks.md'
- Recursive Schemas: 'examples/recursive.md'
- Table Extraction: 'examples/autodataframe.md'
- Action Item and Dependency Mapping: 'examples/action_items.md'
- Multi-File Code Generation: 'examples/gpt-engineer.md'
- PII Data Sanitization: 'examples/pii.md'
@@ -80,19 +130,19 @@ nav:
- Distilation: "distillation.md"
- CLI Reference:
- "Introduction": "cli/index.md"
- "Finetuning GPT": "cli/finetune.md"
- "Finetuning GPT-3.5": "cli/finetune.md"
- "Usage Tracking": "cli/usage.md"
- API Reference:
- 'Core Library': 'api.md'
- "Prompting DSL: Intro": "writing-prompts.md"
- "Prompting DSL Reference": "chat-completion.md"
- Blog:
- "blog/index.md"
plugins:
- social
- search
- search:
separator: '[\s\u200b\-_,:!=\[\]()"`/]+|\.(?!\d)|&[lg]t;|(?!\b)(?=[A-Z][a-z])'
- minify:
minify_html: true
- mkdocstrings:
handlers:
python:
Generated
+480 -769
View File
File diff suppressed because it is too large Load Diff
+3 -3
View File
@@ -1,7 +1,7 @@
[tool.poetry]
name = "instructor"
version = "0.2.11"
description = "Pythonic OpenAI function calling, for humans"
version = "0.3.1"
description = "Helper functions that allow us to improve openai's function_call ergonomics"
authors = ["Jason Liu <jason@jxnl.co>"]
license = "MIT"
readme = "README.md"
@@ -10,7 +10,7 @@ repository = "https://github.com/jxnl/instructor"
[tool.poetry.dependencies]
python = "^3.9"
openai = "^0.28.0"
openai = "^1.1.0"
pydantic = "^2.0.2"
docstring-parser = "^0.15"
typer = "^0.9.0"
+2 -2
View File
@@ -1,4 +1,4 @@
openai
openai>=1.1.0
pydantic
pytest
docstring-parser
docstring-parser
+7 -6
View File
@@ -1,6 +1,9 @@
import pytest
import openai
import instructor
from openai import OpenAI
from pydantic import BaseModel
from instructor.distil import (
Instructions,
format_function,
@@ -8,7 +11,7 @@ from instructor.distil import (
is_return_type_base_model_or_instance,
)
# Replace `your_module_name` with your actual module name
client = instructor.patch(OpenAI())
instructions = Instructions(
name="test_distil",
@@ -92,8 +95,6 @@ def mock_track(*args, **kwargs):
def fn(a: int, b: int) -> int:
return openai.ChatCompletion.create(
messages=[],
model="davinci",
response_model=SimpleModel,
return client.chat.completions.create(
messages=[], model="davinci", response_model=SimpleModel
)
-30
View File
@@ -1,30 +0,0 @@
from instructor import OpenAISchema, MultiTask
from instructor.dsl import ChatCompletion
from instructor.dsl import messages as m
from instructor.dsl.messages import messages as s
def test_chatcompletion_has_kwargs():
class Search(OpenAISchema):
id: int
query: str
task = (
ChatCompletion(name="Acme Inc Email Segmentation", model="gpt3.5-turbo-0613")
| s.SystemTask(task="Segment emails into search queries")
| MultiTask(subtask_class=Search)
| m.TaggedMessage(
tag="email",
content="Can you find the video I sent last week and also the post about dogs",
)
| m.TipsMessage(
tips=[
"When unsure about the correct segmentation, try to think about the task as a whole",
"If acronyms are used expand them to their full form",
"Use multiple phrases to describe the same thing",
]
)
| m.ChainOfThought()
)
assert isinstance(task, ChatCompletion)
assert isinstance(task.kwargs, dict)
-49
View File
@@ -1,49 +0,0 @@
from instructor.dsl import messages as m
from instructor.dsl.messages import messages as s
def test_create_message():
assert m.Message(
role=m.MessageRole.SYSTEM,
content="Hello, world!",
).dict() == {
"role": "system",
"content": "Hello, world!",
}
def test_create_user_message():
assert m.UserMessage(
content="Hello, world!",
).dict() == {
"role": "user",
"content": "Hello, world!",
}
def test_create_system_message():
assert m.SystemMessage(content="I am nice").dict() == {
"role": "system",
"content": "I am nice",
}
def test_create_tagged_message():
assert m.TaggedMessage(content="I am nice", tag="data").dict() == {
"role": "user",
"content": "Consider the following data:\n\n<data>I am nice</data>",
}
def test_task_message():
assert s.SystemTask(task="task").dict() == {
"role": "system",
"content": "You are a world class state of the art algorithm capable of correctly completing the following task: `task`.",
}
def test_chain_of_thought_message():
assert m.ChainOfThought().dict() == {
"role": "assistant",
"content": "Lets think step by step to get the correct answer:",
}
+7 -11
View File
@@ -1,18 +1,17 @@
from pydantic import BaseModel
import pytest
import openai
from instructor import patch
from openai import OpenAI
import instructor
client = instructor.patch(OpenAI())
@pytest.mark.skip("Not implemented")
def test_runmodel():
patch()
class UserExtract(BaseModel):
name: str
age: int
model = openai.ChatCompletion.create(
model = client.chat.completions.create(
model="gpt-3.5-turbo",
response_model=UserExtract,
messages=[
@@ -26,10 +25,7 @@ def test_runmodel():
), "The raw response should be available from OpenAI"
@pytest.mark.skip("Not implemented")
def test_runmodel_validator():
patch()
from pydantic import field_validator
class UserExtract(BaseModel):
@@ -43,7 +39,7 @@ def test_runmodel_validator():
raise ValueError("Name should be uppercase")
return v
model = openai.ChatCompletion.create(
model = client.chat.completions.create(
model="gpt-3.5-turbo",
response_model=UserExtract,
max_retries=2,