Files
instructor/tutorials/1.introduction.ipynb
T
2023-11-19 15:23:40 -05:00

638 lines
35 KiB
Plaintext

{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Thinking with Types: Whats the problem?\n",
"\n",
"If you've seen my [talk](https://www.youtube.com/watch?v=yj-wSRJwrrc&t=1s) on this topic, you can skip this chapter.\n",
"\n",
"Many times, when we want to use language models, its not to make chatbots, but to communicate with other computer systems. This commonly means we want to use a model to output structured data like JSON. However, working with raw json or dictionaries can be a pain. \n",
"\n",
"This notebook highlights the core concepts of Pydantic and open ai function calling. With a foundational understanding of these two libraries we can lay the ground work for introducing my library, Instructor."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Problem 1: Working with JSON, Validation, and Pydantic\n",
"\n",
"Lets say we have a simple JSON object, and we want to work with it. We can use the `json` module to load it into a dictionary, and then work with it. However, this is a bit of a pain, because we have to manually check the types of the data, and we have to manually check if the data is valid. For example, lets say we have a JSON object that looks like this:"
]
},
{
"cell_type": "code",
"execution_count": 22,
"metadata": {},
"outputs": [],
"source": [
"data = [\n",
" {\"first_name\": \"Jason\", \"age\": 10}, \n",
" {\"firstName\": \"Jason\", \"age\": \"10\"}\n",
"]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We have a `name` field, which is a string, and an `age` field, which is an integer. However, if we were to load this into a dictionary, we would have no way of knowing if the data is valid. For example, we could have a string for the age, or we could have a float for the age. We could also have a string for the name, or we could have a list for the name."
]
},
{
"cell_type": "code",
"execution_count": 23,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Jason is 10\n",
"Next year he will be 11 years old\n",
"None is 10\n"
]
},
{
"ename": "TypeError",
"evalue": "can only concatenate str (not \"int\") to str",
"output_type": "error",
"traceback": [
"\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
"\u001b[0;31mTypeError\u001b[0m Traceback (most recent call last)",
"\u001b[1;32m/Users/jasonliu/dev/instructor/tutorials/1.introduction.ipynb Cell 5\u001b[0m line \u001b[0;36m5\n\u001b[1;32m <a href='vscode-notebook-cell:/Users/jasonliu/dev/instructor/tutorials/1.introduction.ipynb#W4sZmlsZQ%3D%3D?line=2'>3</a>\u001b[0m age \u001b[39m=\u001b[39m obj\u001b[39m.\u001b[39mget(\u001b[39m\"\u001b[39m\u001b[39mage\u001b[39m\u001b[39m\"\u001b[39m)\n\u001b[1;32m <a href='vscode-notebook-cell:/Users/jasonliu/dev/instructor/tutorials/1.introduction.ipynb#W4sZmlsZQ%3D%3D?line=3'>4</a>\u001b[0m \u001b[39mprint\u001b[39m(\u001b[39mf\u001b[39m\u001b[39m\"\u001b[39m\u001b[39m{\u001b[39;00mname\u001b[39m}\u001b[39;00m\u001b[39m is \u001b[39m\u001b[39m{\u001b[39;00mage\u001b[39m}\u001b[39;00m\u001b[39m\"\u001b[39m)\n\u001b[0;32m----> <a href='vscode-notebook-cell:/Users/jasonliu/dev/instructor/tutorials/1.introduction.ipynb#W4sZmlsZQ%3D%3D?line=4'>5</a>\u001b[0m \u001b[39mprint\u001b[39m(\u001b[39mf\u001b[39m\u001b[39m\"\u001b[39m\u001b[39mNext year he will be \u001b[39m\u001b[39m{\u001b[39;00mage\u001b[39m+\u001b[39;49m\u001b[39m1\u001b[39;49m\u001b[39m}\u001b[39;00m\u001b[39m years old\u001b[39m\u001b[39m\"\u001b[39m)\n",
"\u001b[0;31mTypeError\u001b[0m: can only concatenate str (not \"int\") to str"
]
}
],
"source": [
"for obj in data:\n",
" name = obj.get(\"first_name\")\n",
" age = obj.get(\"age\")\n",
" print(f\"{name} is {age}\")\n",
" print(f\"Next year he will be {age+1} years old\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"You see that while we were able to program with a dictionary, we had issues with the data being valid. We would have had to manually check the types of the data, and we had to manually check if the data was valid. This is a pain, and we can do better."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Pydantic to the rescue\n",
"\n",
"Pydantic is a library that allows us to define data structures, and then validate them. It also allows us to define data structures."
]
},
{
"cell_type": "code",
"execution_count": 24,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Person(name='Sam', age=30)"
]
},
"execution_count": 24,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"from pydantic import BaseModel, Field\n",
"\n",
"\n",
"class Person(BaseModel):\n",
" name: str\n",
" age: int\n",
"\n",
"person = Person(name=\"Sam\", age=30)\n",
"person"
]
},
{
"cell_type": "code",
"execution_count": 25,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Person(name='Sam', age=30)"
]
},
"execution_count": 25,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Data is correctly casted to the right type\n",
"person = Person.model_validate({\"name\": \"Sam\", \"age\": \"30\"})\n",
"person"
]
},
{
"cell_type": "code",
"execution_count": 26,
"metadata": {},
"outputs": [
{
"ename": "AssertionError",
"evalue": "",
"output_type": "error",
"traceback": [
"\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
"\u001b[0;31mAssertionError\u001b[0m Traceback (most recent call last)",
"\u001b[1;32m/Users/jasonliu/dev/instructor/tutorials/1.introduction.ipynb Cell 10\u001b[0m line \u001b[0;36m2\n\u001b[1;32m <a href='vscode-notebook-cell:/Users/jasonliu/dev/instructor/tutorials/1.introduction.ipynb#X12sZmlsZQ%3D%3D?line=0'>1</a>\u001b[0m \u001b[39massert\u001b[39;00m person\u001b[39m.\u001b[39mname \u001b[39m==\u001b[39m \u001b[39m\"\u001b[39m\u001b[39mSam\u001b[39m\u001b[39m\"\u001b[39m\n\u001b[0;32m----> <a href='vscode-notebook-cell:/Users/jasonliu/dev/instructor/tutorials/1.introduction.ipynb#X12sZmlsZQ%3D%3D?line=1'>2</a>\u001b[0m \u001b[39massert\u001b[39;00m person\u001b[39m.\u001b[39mage \u001b[39m==\u001b[39m \u001b[39m20\u001b[39m\n",
"\u001b[0;31mAssertionError\u001b[0m: "
]
}
],
"source": [
"assert person.name == \"Sam\"\n",
"assert person.age == 20"
]
},
{
"cell_type": "code",
"execution_count": 27,
"metadata": {},
"outputs": [
{
"ename": "ValidationError",
"evalue": "1 validation error for Person\nage\n Input should be a valid integer, unable to parse string as an integer [type=int_parsing, input_value='30.2', input_type=str]\n For further information visit https://errors.pydantic.dev/2.4/v/int_parsing",
"output_type": "error",
"traceback": [
"\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
"\u001b[0;31mValidationError\u001b[0m Traceback (most recent call last)",
"\u001b[1;32m/Users/jasonliu/dev/instructor/tutorials/1.introduction.ipynb Cell 11\u001b[0m line \u001b[0;36m2\n\u001b[1;32m <a href='vscode-notebook-cell:/Users/jasonliu/dev/instructor/tutorials/1.introduction.ipynb#X13sZmlsZQ%3D%3D?line=0'>1</a>\u001b[0m \u001b[39m# Data is validated to get better error messages\u001b[39;00m\n\u001b[0;32m----> <a href='vscode-notebook-cell:/Users/jasonliu/dev/instructor/tutorials/1.introduction.ipynb#X13sZmlsZQ%3D%3D?line=1'>2</a>\u001b[0m person \u001b[39m=\u001b[39m Person\u001b[39m.\u001b[39;49mmodel_validate({\u001b[39m\"\u001b[39;49m\u001b[39mname\u001b[39;49m\u001b[39m\"\u001b[39;49m: \u001b[39m\"\u001b[39;49m\u001b[39mSam\u001b[39;49m\u001b[39m\"\u001b[39;49m, \u001b[39m\"\u001b[39;49m\u001b[39mage\u001b[39;49m\u001b[39m\"\u001b[39;49m: \u001b[39m\"\u001b[39;49m\u001b[39m30.2\u001b[39;49m\u001b[39m\"\u001b[39;49m})\n\u001b[1;32m <a href='vscode-notebook-cell:/Users/jasonliu/dev/instructor/tutorials/1.introduction.ipynb#X13sZmlsZQ%3D%3D?line=2'>3</a>\u001b[0m person\n",
"File \u001b[0;32m~/dev/instructor/.venv/lib/python3.11/site-packages/pydantic/main.py:503\u001b[0m, in \u001b[0;36mBaseModel.model_validate\u001b[0;34m(cls, obj, strict, from_attributes, context)\u001b[0m\n\u001b[1;32m 501\u001b[0m \u001b[39m# `__tracebackhide__` tells pytest and some other tools to omit this function from tracebacks\u001b[39;00m\n\u001b[1;32m 502\u001b[0m __tracebackhide__ \u001b[39m=\u001b[39m \u001b[39mTrue\u001b[39;00m\n\u001b[0;32m--> 503\u001b[0m \u001b[39mreturn\u001b[39;00m \u001b[39mcls\u001b[39;49m\u001b[39m.\u001b[39;49m__pydantic_validator__\u001b[39m.\u001b[39;49mvalidate_python(\n\u001b[1;32m 504\u001b[0m obj, strict\u001b[39m=\u001b[39;49mstrict, from_attributes\u001b[39m=\u001b[39;49mfrom_attributes, context\u001b[39m=\u001b[39;49mcontext\n\u001b[1;32m 505\u001b[0m )\n",
"\u001b[0;31mValidationError\u001b[0m: 1 validation error for Person\nage\n Input should be a valid integer, unable to parse string as an integer [type=int_parsing, input_value='30.2', input_type=str]\n For further information visit https://errors.pydantic.dev/2.4/v/int_parsing"
]
}
],
"source": [
"# Data is validated to get better error messages\n",
"person = Person.model_validate({\"name\": \"Sam\", \"age\": \"30.2\"})\n",
"person"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"By introducing pydantic into any python codebase you can get a lot of benefits. You can get type checking, you can get validation, and you can get autocomplete. This is a huge win, because it means you can catch errors before they happen. This is even more useful when we rely on language models to generate data for us.\n",
"\n",
"You can also define validators that are run on the data. This is useful because it means you can catch errors before they happen. For example, you can define a validator that checks if the age is greater than 0. This is useful because it means you can catch errors before they happen."
]
},
{
"cell_type": "code",
"execution_count": 28,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Person(name='Sam', age=-10)"
]
},
"execution_count": 28,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"Person(name=\"Sam\", age=-10)"
]
},
{
"cell_type": "code",
"execution_count": 29,
"metadata": {},
"outputs": [
{
"ename": "ValidationError",
"evalue": "1 validation error for Person\nage\n Input should be greater than 0 [type=greater_than, input_value=-10, input_type=int]\n For further information visit https://errors.pydantic.dev/2.4/v/greater_than",
"output_type": "error",
"traceback": [
"\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
"\u001b[0;31mValidationError\u001b[0m Traceback (most recent call last)",
"\u001b[1;32m/Users/jasonliu/dev/instructor/tutorials/1.introduction.ipynb Cell 14\u001b[0m line \u001b[0;36m5\n\u001b[1;32m <a href='vscode-notebook-cell:/Users/jasonliu/dev/instructor/tutorials/1.introduction.ipynb#X36sZmlsZQ%3D%3D?line=1'>2</a>\u001b[0m name: \u001b[39mstr\u001b[39m\n\u001b[1;32m <a href='vscode-notebook-cell:/Users/jasonliu/dev/instructor/tutorials/1.introduction.ipynb#X36sZmlsZQ%3D%3D?line=2'>3</a>\u001b[0m age: \u001b[39mint\u001b[39m \u001b[39m=\u001b[39m Field(\u001b[39m.\u001b[39m\u001b[39m.\u001b[39m\u001b[39m.\u001b[39m, gt\u001b[39m=\u001b[39m\u001b[39m0\u001b[39m)\n\u001b[0;32m----> <a href='vscode-notebook-cell:/Users/jasonliu/dev/instructor/tutorials/1.introduction.ipynb#X36sZmlsZQ%3D%3D?line=4'>5</a>\u001b[0m Person(name\u001b[39m=\u001b[39;49m\u001b[39m\"\u001b[39;49m\u001b[39mSam\u001b[39;49m\u001b[39m\"\u001b[39;49m, age\u001b[39m=\u001b[39;49m\u001b[39m-\u001b[39;49m\u001b[39m10\u001b[39;49m)\n",
"File \u001b[0;32m~/dev/instructor/.venv/lib/python3.11/site-packages/pydantic/main.py:164\u001b[0m, in \u001b[0;36mBaseModel.__init__\u001b[0;34m(__pydantic_self__, **data)\u001b[0m\n\u001b[1;32m 162\u001b[0m \u001b[39m# `__tracebackhide__` tells pytest and some other tools to omit this function from tracebacks\u001b[39;00m\n\u001b[1;32m 163\u001b[0m __tracebackhide__ \u001b[39m=\u001b[39m \u001b[39mTrue\u001b[39;00m\n\u001b[0;32m--> 164\u001b[0m __pydantic_self__\u001b[39m.\u001b[39;49m__pydantic_validator__\u001b[39m.\u001b[39;49mvalidate_python(data, self_instance\u001b[39m=\u001b[39;49m__pydantic_self__)\n",
"\u001b[0;31mValidationError\u001b[0m: 1 validation error for Person\nage\n Input should be greater than 0 [type=greater_than, input_value=-10, input_type=int]\n For further information visit https://errors.pydantic.dev/2.4/v/greater_than"
]
}
],
"source": [
"class Person(BaseModel):\n",
" name: str\n",
" age: int = Field(..., gt=0)\n",
"\n",
"Person(name=\"Sam\", age=-10)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Lastly you can also define functions that run on the data. In this case we use the [@field_validator](https://docs.pydantic.dev/latest/concepts/validators/#field-validators) decorator."
]
},
{
"cell_type": "code",
"execution_count": 30,
"metadata": {},
"outputs": [
{
"ename": "ValidationError",
"evalue": "1 validation error for Person\nname\n Value error, must contain a space [type=value_error, input_value='Sam', input_type=str]\n For further information visit https://errors.pydantic.dev/2.4/v/value_error",
"output_type": "error",
"traceback": [
"\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
"\u001b[0;31mValidationError\u001b[0m Traceback (most recent call last)",
"\u001b[1;32m/Users/jasonliu/dev/instructor/tutorials/1.introduction.ipynb Cell 16\u001b[0m line \u001b[0;36m1\n\u001b[1;32m <a href='vscode-notebook-cell:/Users/jasonliu/dev/instructor/tutorials/1.introduction.ipynb#X43sZmlsZQ%3D%3D?line=10'>11</a>\u001b[0m \u001b[39mraise\u001b[39;00m \u001b[39mValueError\u001b[39;00m(\u001b[39m\"\u001b[39m\u001b[39mmust contain a space\u001b[39m\u001b[39m\"\u001b[39m)\n\u001b[1;32m <a href='vscode-notebook-cell:/Users/jasonliu/dev/instructor/tutorials/1.introduction.ipynb#X43sZmlsZQ%3D%3D?line=11'>12</a>\u001b[0m \u001b[39mreturn\u001b[39;00m v\n\u001b[0;32m---> <a href='vscode-notebook-cell:/Users/jasonliu/dev/instructor/tutorials/1.introduction.ipynb#X43sZmlsZQ%3D%3D?line=13'>14</a>\u001b[0m Person(name\u001b[39m=\u001b[39;49m\u001b[39m\"\u001b[39;49m\u001b[39mSam\u001b[39;49m\u001b[39m\"\u001b[39;49m, age\u001b[39m=\u001b[39;49m\u001b[39m10\u001b[39;49m)\n",
"File \u001b[0;32m~/dev/instructor/.venv/lib/python3.11/site-packages/pydantic/main.py:164\u001b[0m, in \u001b[0;36mBaseModel.__init__\u001b[0;34m(__pydantic_self__, **data)\u001b[0m\n\u001b[1;32m 162\u001b[0m \u001b[39m# `__tracebackhide__` tells pytest and some other tools to omit this function from tracebacks\u001b[39;00m\n\u001b[1;32m 163\u001b[0m __tracebackhide__ \u001b[39m=\u001b[39m \u001b[39mTrue\u001b[39;00m\n\u001b[0;32m--> 164\u001b[0m __pydantic_self__\u001b[39m.\u001b[39;49m__pydantic_validator__\u001b[39m.\u001b[39;49mvalidate_python(data, self_instance\u001b[39m=\u001b[39;49m__pydantic_self__)\n",
"\u001b[0;31mValidationError\u001b[0m: 1 validation error for Person\nname\n Value error, must contain a space [type=value_error, input_value='Sam', input_type=str]\n For further information visit https://errors.pydantic.dev/2.4/v/value_error"
]
}
],
"source": [
"from pydantic import field_validator\n",
"\n",
"\n",
"class Person(BaseModel):\n",
" name: str\n",
" age: int = Field(..., gt=0)\n",
"\n",
" @field_validator(\"name\")\n",
" def name_must_contain_space(cls, v):\n",
" if \" \" not in v:\n",
" raise ValueError(\"must contain a space\")\n",
" return v\n",
" \n",
"Person(name=\"Sam\", age=10)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"'...' is used as a placeholder for a required value in Pydantic's [Field](https://docs.pydantic.dev/latest/concepts/fields/) function.\n",
"\n",
"```age: int = Field(..., gt=0)```\n",
"defines a field age of type int in the Person model. The ... indicates that this field is required and must be provided when creating an instance of the Person model. The gt=0 is a validation that ensures the age must be greater than 0.\n",
"\n",
"If you try to create a Person without providing an age, Pydantic will raise a validation error because of the ... placeholder."
]
},
{
"cell_type": "code",
"execution_count": 31,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Person(name='Sam Liu', age=10)"
]
},
"execution_count": 31,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"Person(name=\"Sam Liu\", age=10)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Asking for JSON from OpenAI"
]
},
{
"cell_type": "code",
"execution_count": 32,
"metadata": {},
"outputs": [
{
"ename": "ValidationError",
"evalue": "1 validation error for Person\nname\n Value error, must contain a space [type=value_error, input_value='Jason', input_type=str]\n For further information visit https://errors.pydantic.dev/2.4/v/value_error",
"output_type": "error",
"traceback": [
"\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
"\u001b[0;31mValidationError\u001b[0m Traceback (most recent call last)",
"\u001b[1;32m/Users/jasonliu/dev/instructor/tutorials/1.introduction.ipynb Cell 19\u001b[0m line \u001b[0;36m1\n\u001b[1;32m <a href='vscode-notebook-cell:/Users/jasonliu/dev/instructor/tutorials/1.introduction.ipynb#X16sZmlsZQ%3D%3D?line=2'>3</a>\u001b[0m client \u001b[39m=\u001b[39m OpenAI()\n\u001b[1;32m <a href='vscode-notebook-cell:/Users/jasonliu/dev/instructor/tutorials/1.introduction.ipynb#X16sZmlsZQ%3D%3D?line=4'>5</a>\u001b[0m resp \u001b[39m=\u001b[39m client\u001b[39m.\u001b[39mchat\u001b[39m.\u001b[39mcompletions\u001b[39m.\u001b[39mcreate(\n\u001b[1;32m <a href='vscode-notebook-cell:/Users/jasonliu/dev/instructor/tutorials/1.introduction.ipynb#X16sZmlsZQ%3D%3D?line=5'>6</a>\u001b[0m model\u001b[39m=\u001b[39m\u001b[39m\"\u001b[39m\u001b[39mgpt-3.5-turbo\u001b[39m\u001b[39m\"\u001b[39m,\n\u001b[1;32m <a href='vscode-notebook-cell:/Users/jasonliu/dev/instructor/tutorials/1.introduction.ipynb#X16sZmlsZQ%3D%3D?line=6'>7</a>\u001b[0m messages\u001b[39m=\u001b[39m[\n\u001b[1;32m <a href='vscode-notebook-cell:/Users/jasonliu/dev/instructor/tutorials/1.introduction.ipynb#X16sZmlsZQ%3D%3D?line=7'>8</a>\u001b[0m {\u001b[39m\"\u001b[39m\u001b[39mrole\u001b[39m\u001b[39m\"\u001b[39m: \u001b[39m\"\u001b[39m\u001b[39muser\u001b[39m\u001b[39m\"\u001b[39m, \u001b[39m\"\u001b[39m\u001b[39mcontent\u001b[39m\u001b[39m\"\u001b[39m: \u001b[39m\"\u001b[39m\u001b[39mExtract `Jason is 25 years old` into json\u001b[39m\u001b[39m\"\u001b[39m},\n\u001b[1;32m <a href='vscode-notebook-cell:/Users/jasonliu/dev/instructor/tutorials/1.introduction.ipynb#X16sZmlsZQ%3D%3D?line=8'>9</a>\u001b[0m ]\n\u001b[1;32m <a href='vscode-notebook-cell:/Users/jasonliu/dev/instructor/tutorials/1.introduction.ipynb#X16sZmlsZQ%3D%3D?line=9'>10</a>\u001b[0m )\n\u001b[0;32m---> <a href='vscode-notebook-cell:/Users/jasonliu/dev/instructor/tutorials/1.introduction.ipynb#X16sZmlsZQ%3D%3D?line=11'>12</a>\u001b[0m Person\u001b[39m.\u001b[39;49mmodel_validate_json(resp\u001b[39m.\u001b[39;49mchoices[\u001b[39m0\u001b[39;49m]\u001b[39m.\u001b[39;49mmessage\u001b[39m.\u001b[39;49mcontent)\n",
"File \u001b[0;32m~/dev/instructor/.venv/lib/python3.11/site-packages/pydantic/main.py:530\u001b[0m, in \u001b[0;36mBaseModel.model_validate_json\u001b[0;34m(cls, json_data, strict, context)\u001b[0m\n\u001b[1;32m 528\u001b[0m \u001b[39m# `__tracebackhide__` tells pytest and some other tools to omit this function from tracebacks\u001b[39;00m\n\u001b[1;32m 529\u001b[0m __tracebackhide__ \u001b[39m=\u001b[39m \u001b[39mTrue\u001b[39;00m\n\u001b[0;32m--> 530\u001b[0m \u001b[39mreturn\u001b[39;00m \u001b[39mcls\u001b[39;49m\u001b[39m.\u001b[39;49m__pydantic_validator__\u001b[39m.\u001b[39;49mvalidate_json(json_data, strict\u001b[39m=\u001b[39;49mstrict, context\u001b[39m=\u001b[39;49mcontext)\n",
"\u001b[0;31mValidationError\u001b[0m: 1 validation error for Person\nname\n Value error, must contain a space [type=value_error, input_value='Jason', input_type=str]\n For further information visit https://errors.pydantic.dev/2.4/v/value_error"
]
}
],
"source": [
"from openai import OpenAI\n",
"\n",
"client = OpenAI()\n",
"\n",
"resp = client.chat.completions.create(\n",
" model=\"gpt-3.5-turbo\",\n",
" messages=[\n",
" {\"role\": \"user\", \"content\": \"Extract `Jason is 25 years old` into json\"},\n",
" ]\n",
")\n",
"\n",
"Person.model_validate_json(resp.choices[0].message.content)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"resp = client.chat.completions.create(\n",
" model=\"gpt-3.5-turbo\",\n",
" messages=[\n",
" {\"role\": \"user\", \"content\": \"Extract `Jason Liu is thirty years old` into json\"},\n",
" ]\n",
")\n",
"\n",
"Person.model_validate_json(resp.choices[0].message.content)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"But what happens if I want to describe specifically how the schema should look? What if I want full_name and age and birthday as a datetime?"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import datetime\n",
"\n",
"class PersonBirthday(Person):\n",
" birthday: datetime.date\n",
"\n",
"\n",
"resp = client.chat.completions.create(\n",
" model=\"gpt-3.5-turbo\",\n",
" messages=[\n",
" {\"role\": \"user\", \"content\": f\"Extract `Jason Liu is thirty years old his birthday is yesterday` into json. Today is {datetime.date.today()}\"},\n",
" ]\n",
")\n",
"\n",
"print(resp.choices[0].message.content)\n",
"print(Person.model_validate_json(resp.choices[0].message.content))\n",
"PersonBirthday.model_validate_json(resp.choices[0].message.content)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Introduction to Function Calling \n",
"\n",
"The json could be anything! We could add more and more into a prompt and hope it works, or we can use something called [function calling](https://platform.openai.com/docs/guides/function-calling) to directly specify the schema we want. \n",
"\n",
"\n",
"**Function Calling**\n",
"\n",
"In an API call, you can describe functions and have the model intelligently choose to output a JSON object containing arguments to call one or many functions. The Chat Completions API does not call the function; instead, the model generates JSON that you can use to call the function in your code."
]
},
{
"cell_type": "code",
"execution_count": 33,
"metadata": {},
"outputs": [
{
"ename": "NameError",
"evalue": "name 'datetime' is not defined",
"output_type": "error",
"traceback": [
"\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
"\u001b[0;31mNameError\u001b[0m Traceback (most recent call last)",
"\u001b[1;32m/Users/jasonliu/dev/instructor/tutorials/1.introduction.ipynb Cell 24\u001b[0m line \u001b[0;36m1\n\u001b[1;32m <a href='vscode-notebook-cell:/Users/jasonliu/dev/instructor/tutorials/1.introduction.ipynb#X24sZmlsZQ%3D%3D?line=0'>1</a>\u001b[0m schema \u001b[39m=\u001b[39m {\n\u001b[1;32m <a href='vscode-notebook-cell:/Users/jasonliu/dev/instructor/tutorials/1.introduction.ipynb#X24sZmlsZQ%3D%3D?line=1'>2</a>\u001b[0m \u001b[39m'\u001b[39m\u001b[39mproperties\u001b[39m\u001b[39m'\u001b[39m: \n\u001b[1;32m <a href='vscode-notebook-cell:/Users/jasonliu/dev/instructor/tutorials/1.introduction.ipynb#X24sZmlsZQ%3D%3D?line=2'>3</a>\u001b[0m {\n\u001b[0;32m (...)\u001b[0m\n\u001b[1;32m <a href='vscode-notebook-cell:/Users/jasonliu/dev/instructor/tutorials/1.introduction.ipynb#X24sZmlsZQ%3D%3D?line=8'>9</a>\u001b[0m \u001b[39m'\u001b[39m\u001b[39mtype\u001b[39m\u001b[39m'\u001b[39m: \u001b[39m'\u001b[39m\u001b[39mobject\u001b[39m\u001b[39m'\u001b[39m\n\u001b[1;32m <a href='vscode-notebook-cell:/Users/jasonliu/dev/instructor/tutorials/1.introduction.ipynb#X24sZmlsZQ%3D%3D?line=9'>10</a>\u001b[0m }\n\u001b[1;32m <a href='vscode-notebook-cell:/Users/jasonliu/dev/instructor/tutorials/1.introduction.ipynb#X24sZmlsZQ%3D%3D?line=11'>12</a>\u001b[0m resp \u001b[39m=\u001b[39m client\u001b[39m.\u001b[39mchat\u001b[39m.\u001b[39mcompletions\u001b[39m.\u001b[39mcreate(\n\u001b[1;32m <a href='vscode-notebook-cell:/Users/jasonliu/dev/instructor/tutorials/1.introduction.ipynb#X24sZmlsZQ%3D%3D?line=12'>13</a>\u001b[0m model\u001b[39m=\u001b[39m\u001b[39m\"\u001b[39m\u001b[39mgpt-3.5-turbo\u001b[39m\u001b[39m\"\u001b[39m,\n\u001b[1;32m <a href='vscode-notebook-cell:/Users/jasonliu/dev/instructor/tutorials/1.introduction.ipynb#X24sZmlsZQ%3D%3D?line=13'>14</a>\u001b[0m messages\u001b[39m=\u001b[39m[\n\u001b[0;32m---> <a href='vscode-notebook-cell:/Users/jasonliu/dev/instructor/tutorials/1.introduction.ipynb#X24sZmlsZQ%3D%3D?line=14'>15</a>\u001b[0m {\u001b[39m\"\u001b[39m\u001b[39mrole\u001b[39m\u001b[39m\"\u001b[39m: \u001b[39m\"\u001b[39m\u001b[39muser\u001b[39m\u001b[39m\"\u001b[39m, \u001b[39m\"\u001b[39m\u001b[39mcontent\u001b[39m\u001b[39m\"\u001b[39m: \u001b[39mf\u001b[39m\u001b[39m\"\u001b[39m\u001b[39mExtract `Jason Liu is thirty years old his birthday is yesturday` into json today is \u001b[39m\u001b[39m{\u001b[39;00mdatetime\u001b[39m.\u001b[39mdate\u001b[39m.\u001b[39mtoday()\u001b[39m}\u001b[39;00m\u001b[39m\"\u001b[39m},\n\u001b[1;32m <a href='vscode-notebook-cell:/Users/jasonliu/dev/instructor/tutorials/1.introduction.ipynb#X24sZmlsZQ%3D%3D?line=15'>16</a>\u001b[0m ],\n\u001b[1;32m <a href='vscode-notebook-cell:/Users/jasonliu/dev/instructor/tutorials/1.introduction.ipynb#X24sZmlsZQ%3D%3D?line=16'>17</a>\u001b[0m functions\u001b[39m=\u001b[39m[{\u001b[39m\"\u001b[39m\u001b[39mname\u001b[39m\u001b[39m\"\u001b[39m: \u001b[39m\"\u001b[39m\u001b[39mPerson\u001b[39m\u001b[39m\"\u001b[39m, \u001b[39m\"\u001b[39m\u001b[39mparameters\u001b[39m\u001b[39m\"\u001b[39m: schema}],\n\u001b[1;32m <a href='vscode-notebook-cell:/Users/jasonliu/dev/instructor/tutorials/1.introduction.ipynb#X24sZmlsZQ%3D%3D?line=17'>18</a>\u001b[0m function_call\u001b[39m=\u001b[39m\u001b[39m\"\u001b[39m\u001b[39mauto\u001b[39m\u001b[39m\"\u001b[39m\n\u001b[1;32m <a href='vscode-notebook-cell:/Users/jasonliu/dev/instructor/tutorials/1.introduction.ipynb#X24sZmlsZQ%3D%3D?line=18'>19</a>\u001b[0m )\n\u001b[1;32m <a href='vscode-notebook-cell:/Users/jasonliu/dev/instructor/tutorials/1.introduction.ipynb#X24sZmlsZQ%3D%3D?line=21'>22</a>\u001b[0m PersonBirthday\u001b[39m.\u001b[39mmodel_validate_json(resp\u001b[39m.\u001b[39mchoices[\u001b[39m0\u001b[39m]\u001b[39m.\u001b[39mmessage\u001b[39m.\u001b[39mfunction_call\u001b[39m.\u001b[39marguments)\n",
"\u001b[0;31mNameError\u001b[0m: name 'datetime' is not defined"
]
}
],
"source": [
"schema = {\n",
" 'properties': \n",
" {\n",
" 'name': {'type': 'string'},\n",
" 'age': {'type': 'integer'},\n",
" 'birthday': {'type': 'string', 'format': 'YYYY-MM-DD'},\n",
" },\n",
" 'required': ['name', 'age'],\n",
" 'type': 'object'\n",
"}\n",
"\n",
"resp = client.chat.completions.create(\n",
" model=\"gpt-3.5-turbo\",\n",
" messages=[\n",
" {\"role\": \"user\", \"content\": f\"Extract `Jason Liu is thirty years old his birthday is yesturday` into json today is {datetime.date.today()}\"},\n",
" ],\n",
" functions=[{\"name\": \"Person\", \"parameters\": schema}],\n",
" function_call=\"auto\"\n",
")\n",
"\n",
"\n",
"PersonBirthday.model_validate_json(resp.choices[0].message.function_call.arguments)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"But it turns out, pydantic actually not only does our serialization, we can define the schema as well as add additional documentation!"
]
},
{
"cell_type": "code",
"execution_count": 34,
"metadata": {},
"outputs": [
{
"ename": "NameError",
"evalue": "name 'PersonBirthday' is not defined",
"output_type": "error",
"traceback": [
"\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
"\u001b[0;31mNameError\u001b[0m Traceback (most recent call last)",
"\u001b[1;32m/Users/jasonliu/dev/instructor/tutorials/1.introduction.ipynb Cell 26\u001b[0m line \u001b[0;36m1\n\u001b[0;32m----> <a href='vscode-notebook-cell:/Users/jasonliu/dev/instructor/tutorials/1.introduction.ipynb#X26sZmlsZQ%3D%3D?line=0'>1</a>\u001b[0m PersonBirthday\u001b[39m.\u001b[39mmodel_json_schema()\n",
"\u001b[0;31mNameError\u001b[0m: name 'PersonBirthday' is not defined"
]
}
],
"source": [
"PersonBirthday.model_json_schema()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can even define nested complex schemas, and documentation with ease."
]
},
{
"cell_type": "code",
"execution_count": 35,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'$defs': {'Address': {'properties': {'address': {'description': 'Full street address',\n",
" 'title': 'Address',\n",
" 'type': 'string'},\n",
" 'city': {'title': 'City', 'type': 'string'},\n",
" 'state': {'title': 'State', 'type': 'string'}},\n",
" 'required': ['address', 'city', 'state'],\n",
" 'title': 'Address',\n",
" 'type': 'object'}},\n",
" 'description': 'A Person with an address',\n",
" 'properties': {'name': {'title': 'Name', 'type': 'string'},\n",
" 'age': {'exclusiveMinimum': 0, 'title': 'Age', 'type': 'integer'},\n",
" 'address': {'$ref': '#/$defs/Address'}},\n",
" 'required': ['name', 'age', 'address'],\n",
" 'title': 'PersonAddress',\n",
" 'type': 'object'}"
]
},
"execution_count": 35,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"class Address(BaseModel):\n",
" address: str = Field(description=\"Full street address\")\n",
" city: str\n",
" state: str\n",
"\n",
"\n",
"class PersonAddress(Person):\n",
" \"\"\"A Person with an address\"\"\"\n",
" address: Address\n",
"\n",
"\n",
"PersonAddress.model_json_schema()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"These simple concepts become what we built into `instructor` and most of the work has been around documenting how we can leverage schema engineering.\n",
"Except now we use `instructor.patch()` to add a bunch more capabilities to the OpenAI SDK."
]
},
{
"cell_type": "code",
"execution_count": 37,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"PersonAddress(name='Jason Liu', age=30, address=Address(address='123 Main St', city='San Francisco', state='CA'))"
]
},
"execution_count": 37,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"import instructor\n",
"import datetime\n",
"\n",
"client = instructor.patch(client)\n",
"\n",
"resp = client.chat.completions.create(\n",
" model=\"gpt-3.5-turbo\",\n",
" messages=[\n",
" {\n",
" \"role\": \"user\", \n",
" \"content\": f\"\"\"\n",
" Today is {datetime.date.today()} \n",
"\n",
" Extract `Jason Liu is thirty years old his birthday is yesturday` \n",
" he lives at 123 Main St, San Francisco, CA\"\"\"},\n",
" ],\n",
" response_model=PersonAddress\n",
")\n",
"resp"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now you can see that when we set `response_model` create call will now return a pydantic model, and we can use that to validate the data. and work with it as if it was a python object."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Is instructor the only way to do this?\n",
"\n",
"No. Libraries like Marvin, Langchain, and LLamaindex all now leverage the pydantic object in similar ways however they all have different approaches to how they do it. With instructor the goal is to be as light weight as possible, get you as close as possible to the openai api, and then get out of your way. \n",
"\n",
"More importantly, we've also added straight forward validation and reasking to the mix.\n",
"\n",
"The goal of instructor is to show you how to think about structured prompting and provide examples and documentation that you can take with you to any framework.\n",
"\n",
"\n",
"- [Marvin](https://www.askmarvin.ai/)\n",
"- [Langchain](https://python.langchain.com/docs/modules/model_io/output_parsers/pydantic)\n",
"- [LlamaIndex](https://gpt-index.readthedocs.io/en/latest/examples/output_parsing/openai_pydantic_program.html)\n",
"\n",
"The main difference between these libraries is that they all have different approaches to how they do it. With instructor the goal is to be as light weight as possible, get you as close as possible to the openai api, and then get out of your way."
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.6"
}
},
"nbformat": 4,
"nbformat_minor": 4
}