mirror of
https://github.com/kennethreitz/instructor.git
synced 2026-06-05 22:50:18 +00:00
168 lines
3.9 KiB
Markdown
168 lines
3.9 KiB
Markdown
# Multi-task and Streaming
|
|
|
|
A common use case of structured extraction is defining a single schema class and then making another schema to create a list to do multiple extraction
|
|
|
|
```python
|
|
from typing import List
|
|
from pydantic import BaseModel
|
|
|
|
|
|
class User(BaseModel):
|
|
name: str
|
|
age: int
|
|
|
|
|
|
class Users(BaseModel):
|
|
users: List[User]
|
|
|
|
|
|
print(Users.model_json_schema())
|
|
"""
|
|
{
|
|
'$defs': {
|
|
'User': {
|
|
'properties': {
|
|
'name': {'title': 'Name', 'type': 'string'},
|
|
'age': {'title': 'Age', 'type': 'integer'},
|
|
},
|
|
'required': ['name', 'age'],
|
|
'title': 'User',
|
|
'type': 'object',
|
|
}
|
|
},
|
|
'properties': {
|
|
'users': {'items': {'$ref': '#/$defs/User'}, 'title': 'Users', 'type': 'array'}
|
|
},
|
|
'required': ['users'],
|
|
'title': 'Users',
|
|
'type': 'object',
|
|
}
|
|
"""
|
|
```
|
|
|
|
Defining a task and creating a list of classes is a common enough pattern that we make this convenient by making use of `Iterable[T]`. This lets us dynamically create a new class that:
|
|
|
|
1. Has dynamic docstrings and class name based on the task
|
|
2. Support streaming by collecting tokens until a task is received back out.
|
|
|
|
## Extracting Tasks using Iterable
|
|
|
|
By using `Iterable` you get a very convenient class with prompts and names automatically defined:
|
|
|
|
```python
|
|
import instructor
|
|
from openai import OpenAI
|
|
from typing import Iterable
|
|
from pydantic import BaseModel
|
|
|
|
client = instructor.patch(OpenAI(), mode=instructor.function_calls.Mode.JSON)
|
|
|
|
|
|
class User(BaseModel):
|
|
name: str
|
|
age: int
|
|
|
|
|
|
users = client.chat.completions.create(
|
|
model="gpt-3.5-turbo-1106",
|
|
temperature=0.1,
|
|
response_model=Iterable[User],
|
|
stream=False,
|
|
messages=[
|
|
{
|
|
"role": "user",
|
|
"content": "Consider this data: Jason is 10 and John is 30.\
|
|
Correctly segment it into entitites\
|
|
Make sure the JSON is correct",
|
|
},
|
|
],
|
|
)
|
|
for user in users:
|
|
print(user)
|
|
#> name='Jason' age=10
|
|
#> name='John' age=30
|
|
```
|
|
|
|
## Streaming Tasks
|
|
|
|
We can also generate tasks as the tokens are streamed in by defining an `Iterable[T]` type.
|
|
|
|
Lets look at an example in action with the same class
|
|
|
|
```python hl_lines="6 26"
|
|
import instructor
|
|
import openai
|
|
from typing import Iterable
|
|
from pydantic import BaseModel
|
|
|
|
client = instructor.patch(openai.OpenAI(), mode=instructor.Mode.TOOLS)
|
|
|
|
|
|
class User(BaseModel):
|
|
name: str
|
|
age: int
|
|
|
|
|
|
users = client.chat.completions.create(
|
|
model="gpt-4",
|
|
temperature=0.1,
|
|
stream=True,
|
|
response_model=Iterable[User],
|
|
messages=[
|
|
{
|
|
"role": "system",
|
|
"content": "You are a perfect entity extraction system",
|
|
},
|
|
{
|
|
"role": "user",
|
|
"content": (f"Extract `Jason is 10 and John is 10`"),
|
|
},
|
|
],
|
|
max_tokens=1000,
|
|
)
|
|
|
|
for user in users:
|
|
print(user)
|
|
#> name='Jason' age=10
|
|
#> name='John' age=10
|
|
```
|
|
|
|
## Asynchronous Streaming
|
|
|
|
I also just want to call out in this example that `instructor` also supports asynchronous streaming. This is useful when you want to stream a response model and process the results as they come in, but you'll need to use the `async for` syntax to iterate over the results.
|
|
|
|
```python
|
|
import instructor
|
|
import openai
|
|
from typing import Iterable
|
|
from pydantic import BaseModel
|
|
|
|
client = instructor.patch(openai.AsyncOpenAI(), mode=instructor.Mode.TOOLS)
|
|
|
|
|
|
class UserExtract(BaseModel):
|
|
name: str
|
|
age: int
|
|
|
|
|
|
async def print_iterable_results():
|
|
model = await client.chat.completions.create(
|
|
model="gpt-4",
|
|
response_model=Iterable[UserExtract],
|
|
max_retries=2,
|
|
stream=True,
|
|
messages=[
|
|
{"role": "user", "content": "Make two up people"},
|
|
],
|
|
)
|
|
async for m in model:
|
|
print(m)
|
|
#> name='John Smith' age=30
|
|
#> name='Mary Jane' age=28
|
|
|
|
|
|
import asyncio
|
|
|
|
asyncio.run(print_iterable_results())
|
|
```
|