diff --git a/tutorials/1.introduction.ipynb b/tutorials/1.introduction.ipynb
index c8f4d9e..0d23fca 100644
--- a/tutorials/1.introduction.ipynb
+++ b/tutorials/1.introduction.ipynb
@@ -4,46 +4,43 @@
"cell_type": "markdown",
"metadata": {},
"source": [
- "# Thinking with Types: Whats the problem?\n",
+ "# Working with structured outputs\n",
"\n",
"If you've seen my [talk](https://www.youtube.com/watch?v=yj-wSRJwrrc&t=1s) on this topic, you can skip this chapter.\n",
"\n",
- "Many times, when we want to use language models, its not to make chatbots, but to communicate with other computer systems. This commonly means we want to use a model to output structured data like JSON. However, working with raw json or dictionaries can be a pain. \n",
+ "tl;dr\n",
"\n",
- "This notebook highlights the core concepts of Pydantic and open ai function calling. With a foundational understanding of these two libraries we can lay the ground work for introducing my library, Instructor."
+ "When we work with LLMs you find that many times we are not building chatbots, instead we're working with structured outputs in order to solve a problem by returning machine readable data. However the way we think about the problem is still very much influenced by the way we think about chatbots. This is a problem because it leads to a lot of confusion and frustration. In this chapter we'll try to understand why this happens and how we can fix it.\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
- "## Problem 1: Working with JSON, Validation, and Pydantic\n",
+ "## The fundamental problem with JSON and Dictionaries\n",
"\n",
- "Lets say we have a simple JSON object, and we want to work with it. We can use the `json` module to load it into a dictionary, and then work with it. However, this is a bit of a pain, because we have to manually check the types of the data, and we have to manually check if the data is valid. For example, lets say we have a JSON object that looks like this:"
+ "Lets say we have a simple JSON object, and we want to work with it. We can use the `json` module to load it into a dictionary, and then work with it. However, this is a bit of a pain, because we have to manually check the types of the data, and we have to manually check if the data is valid. For example, lets say we have a JSON object that looks like this:\n"
]
},
{
"cell_type": "code",
- "execution_count": 22,
+ "execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
- "data = [\n",
- " {\"first_name\": \"Jason\", \"age\": 10}, \n",
- " {\"firstName\": \"Jason\", \"age\": \"10\"}\n",
- "]"
+ "data = [{\"first_name\": \"Jason\", \"age\": 10}, {\"firstName\": \"Jason\", \"age\": \"10\"}]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
- "We have a `name` field, which is a string, and an `age` field, which is an integer. However, if we were to load this into a dictionary, we would have no way of knowing if the data is valid. For example, we could have a string for the age, or we could have a float for the age. We could also have a string for the name, or we could have a list for the name."
+ "We have a `name` field, which is a string, and an `age` field, which is an integer. However, if we were to load this into a dictionary, we would have no way of knowing if the data is valid. For example, we could have a string for the age, or we could have a float for the age. We could also have a string for the name, or we could have a list for the name.\n"
]
},
{
"cell_type": "code",
- "execution_count": 23,
+ "execution_count": 2,
"metadata": {},
"outputs": [
{
@@ -51,8 +48,8 @@
"output_type": "stream",
"text": [
"Jason is 10\n",
- "Next year he will be 11 years old\n",
- "None is 10\n"
+ "None is 10\n",
+ "Next year Jason will be 11 years old\n"
]
},
{
@@ -62,7 +59,7 @@
"traceback": [
"\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
"\u001b[0;31mTypeError\u001b[0m Traceback (most recent call last)",
- "\u001b[1;32m/Users/jasonliu/dev/instructor/tutorials/1.introduction.ipynb Cell 5\u001b[0m line \u001b[0;36m5\n\u001b[1;32m 3\u001b[0m age \u001b[39m=\u001b[39m obj\u001b[39m.\u001b[39mget(\u001b[39m\"\u001b[39m\u001b[39mage\u001b[39m\u001b[39m\"\u001b[39m)\n\u001b[1;32m 4\u001b[0m \u001b[39mprint\u001b[39m(\u001b[39mf\u001b[39m\u001b[39m\"\u001b[39m\u001b[39m{\u001b[39;00mname\u001b[39m}\u001b[39;00m\u001b[39m is \u001b[39m\u001b[39m{\u001b[39;00mage\u001b[39m}\u001b[39;00m\u001b[39m\"\u001b[39m)\n\u001b[0;32m----> 5\u001b[0m \u001b[39mprint\u001b[39m(\u001b[39mf\u001b[39m\u001b[39m\"\u001b[39m\u001b[39mNext year he will be \u001b[39m\u001b[39m{\u001b[39;00mage\u001b[39m+\u001b[39;49m\u001b[39m1\u001b[39;49m\u001b[39m}\u001b[39;00m\u001b[39m years old\u001b[39m\u001b[39m\"\u001b[39m)\n",
+ "\u001b[1;32m/Users/jasonliu/dev/instructor/tutorials/1.introduction.ipynb Cell 5\u001b[0m line \u001b[0;36m9\n\u001b[1;32m 7\u001b[0m name \u001b[39m=\u001b[39m obj\u001b[39m.\u001b[39mget(\u001b[39m\"\u001b[39m\u001b[39mfirst_name\u001b[39m\u001b[39m\"\u001b[39m)\n\u001b[1;32m 8\u001b[0m age \u001b[39m=\u001b[39m obj\u001b[39m.\u001b[39mget(\u001b[39m\"\u001b[39m\u001b[39mage\u001b[39m\u001b[39m\"\u001b[39m)\n\u001b[0;32m----> 9\u001b[0m \u001b[39mprint\u001b[39m(\u001b[39mf\u001b[39m\u001b[39m\"\u001b[39m\u001b[39mNext year \u001b[39m\u001b[39m{\u001b[39;00mname\u001b[39m}\u001b[39;00m\u001b[39m will be \u001b[39m\u001b[39m{\u001b[39;00mage\u001b[39m+\u001b[39;49m\u001b[39m1\u001b[39;49m\u001b[39m}\u001b[39;00m\u001b[39m years old\u001b[39m\u001b[39m\"\u001b[39m)\n",
"\u001b[0;31mTypeError\u001b[0m: can only concatenate str (not \"int\") to str"
]
}
@@ -72,14 +69,18 @@
" name = obj.get(\"first_name\")\n",
" age = obj.get(\"age\")\n",
" print(f\"{name} is {age}\")\n",
- " print(f\"Next year he will be {age+1} years old\")"
+ "\n",
+ "for obj in data:\n",
+ " name = obj.get(\"first_name\")\n",
+ " age = obj.get(\"age\")\n",
+ " print(f\"Next year {name} will be {age+1} years old\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
- "You see that while we were able to program with a dictionary, we had issues with the data being valid. We would have had to manually check the types of the data, and we had to manually check if the data was valid. This is a pain, and we can do better."
+ "You see that while we were able to program with a dictionary, we had issues with the data being valid. We would have had to manually check the types of the data, and we had to manually check if the data was valid. This is a pain, and we can do better.\n"
]
},
{
@@ -88,12 +89,12 @@
"source": [
"## Pydantic to the rescue\n",
"\n",
- "Pydantic is a library that allows us to define data structures, and then validate them. It also allows us to define data structures."
+ "Pydantic is a library that allows us to define data structures, and then validate them. It also allows us to define data structures.\n"
]
},
{
"cell_type": "code",
- "execution_count": 24,
+ "execution_count": 3,
"metadata": {},
"outputs": [
{
@@ -102,7 +103,7 @@
"Person(name='Sam', age=30)"
]
},
- "execution_count": 24,
+ "execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
@@ -115,13 +116,14 @@
" name: str\n",
" age: int\n",
"\n",
+ "\n",
"person = Person(name=\"Sam\", age=30)\n",
"person"
]
},
{
"cell_type": "code",
- "execution_count": 25,
+ "execution_count": 4,
"metadata": {},
"outputs": [
{
@@ -130,7 +132,7 @@
"Person(name='Sam', age=30)"
]
},
- "execution_count": 25,
+ "execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
@@ -143,7 +145,7 @@
},
{
"cell_type": "code",
- "execution_count": 26,
+ "execution_count": 5,
"metadata": {},
"outputs": [
{
@@ -165,19 +167,19 @@
},
{
"cell_type": "code",
- "execution_count": 27,
+ "execution_count": 6,
"metadata": {},
"outputs": [
{
"ename": "ValidationError",
- "evalue": "1 validation error for Person\nage\n Input should be a valid integer, unable to parse string as an integer [type=int_parsing, input_value='30.2', input_type=str]\n For further information visit https://errors.pydantic.dev/2.4/v/int_parsing",
+ "evalue": "1 validation error for Person\nage\n Input should be a valid integer, unable to parse string as an integer [type=int_parsing, input_value='30.2', input_type=str]\n For further information visit https://errors.pydantic.dev/2.5/v/int_parsing",
"output_type": "error",
"traceback": [
"\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
"\u001b[0;31mValidationError\u001b[0m Traceback (most recent call last)",
"\u001b[1;32m/Users/jasonliu/dev/instructor/tutorials/1.introduction.ipynb Cell 11\u001b[0m line \u001b[0;36m2\n\u001b[1;32m 1\u001b[0m \u001b[39m# Data is validated to get better error messages\u001b[39;00m\n\u001b[0;32m----> 2\u001b[0m person \u001b[39m=\u001b[39m Person\u001b[39m.\u001b[39;49mmodel_validate({\u001b[39m\"\u001b[39;49m\u001b[39mname\u001b[39;49m\u001b[39m\"\u001b[39;49m: \u001b[39m\"\u001b[39;49m\u001b[39mSam\u001b[39;49m\u001b[39m\"\u001b[39;49m, \u001b[39m\"\u001b[39;49m\u001b[39mage\u001b[39;49m\u001b[39m\"\u001b[39;49m: \u001b[39m\"\u001b[39;49m\u001b[39m30.2\u001b[39;49m\u001b[39m\"\u001b[39;49m})\n\u001b[1;32m 3\u001b[0m person\n",
"File \u001b[0;32m~/dev/instructor/.venv/lib/python3.11/site-packages/pydantic/main.py:503\u001b[0m, in \u001b[0;36mBaseModel.model_validate\u001b[0;34m(cls, obj, strict, from_attributes, context)\u001b[0m\n\u001b[1;32m 501\u001b[0m \u001b[39m# `__tracebackhide__` tells pytest and some other tools to omit this function from tracebacks\u001b[39;00m\n\u001b[1;32m 502\u001b[0m __tracebackhide__ \u001b[39m=\u001b[39m \u001b[39mTrue\u001b[39;00m\n\u001b[0;32m--> 503\u001b[0m \u001b[39mreturn\u001b[39;00m \u001b[39mcls\u001b[39;49m\u001b[39m.\u001b[39;49m__pydantic_validator__\u001b[39m.\u001b[39;49mvalidate_python(\n\u001b[1;32m 504\u001b[0m obj, strict\u001b[39m=\u001b[39;49mstrict, from_attributes\u001b[39m=\u001b[39;49mfrom_attributes, context\u001b[39m=\u001b[39;49mcontext\n\u001b[1;32m 505\u001b[0m )\n",
- "\u001b[0;31mValidationError\u001b[0m: 1 validation error for Person\nage\n Input should be a valid integer, unable to parse string as an integer [type=int_parsing, input_value='30.2', input_type=str]\n For further information visit https://errors.pydantic.dev/2.4/v/int_parsing"
+ "\u001b[0;31mValidationError\u001b[0m: 1 validation error for Person\nage\n Input should be a valid integer, unable to parse string as an integer [type=int_parsing, input_value='30.2', input_type=str]\n For further information visit https://errors.pydantic.dev/2.5/v/int_parsing"
]
}
],
@@ -193,151 +195,48 @@
"source": [
"By introducing pydantic into any python codebase you can get a lot of benefits. You can get type checking, you can get validation, and you can get autocomplete. This is a huge win, because it means you can catch errors before they happen. This is even more useful when we rely on language models to generate data for us.\n",
"\n",
- "You can also define validators that are run on the data. This is useful because it means you can catch errors before they happen. For example, you can define a validator that checks if the age is greater than 0. This is useful because it means you can catch errors before they happen."
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 28,
- "metadata": {},
- "outputs": [
- {
- "data": {
- "text/plain": [
- "Person(name='Sam', age=-10)"
- ]
- },
- "execution_count": 28,
- "metadata": {},
- "output_type": "execute_result"
- }
- ],
- "source": [
- "Person(name=\"Sam\", age=-10)"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 29,
- "metadata": {},
- "outputs": [
- {
- "ename": "ValidationError",
- "evalue": "1 validation error for Person\nage\n Input should be greater than 0 [type=greater_than, input_value=-10, input_type=int]\n For further information visit https://errors.pydantic.dev/2.4/v/greater_than",
- "output_type": "error",
- "traceback": [
- "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
- "\u001b[0;31mValidationError\u001b[0m Traceback (most recent call last)",
- "\u001b[1;32m/Users/jasonliu/dev/instructor/tutorials/1.introduction.ipynb Cell 14\u001b[0m line \u001b[0;36m5\n\u001b[1;32m 2\u001b[0m name: \u001b[39mstr\u001b[39m\n\u001b[1;32m 3\u001b[0m age: \u001b[39mint\u001b[39m \u001b[39m=\u001b[39m Field(\u001b[39m.\u001b[39m\u001b[39m.\u001b[39m\u001b[39m.\u001b[39m, gt\u001b[39m=\u001b[39m\u001b[39m0\u001b[39m)\n\u001b[0;32m----> 5\u001b[0m Person(name\u001b[39m=\u001b[39;49m\u001b[39m\"\u001b[39;49m\u001b[39mSam\u001b[39;49m\u001b[39m\"\u001b[39;49m, age\u001b[39m=\u001b[39;49m\u001b[39m-\u001b[39;49m\u001b[39m10\u001b[39;49m)\n",
- "File \u001b[0;32m~/dev/instructor/.venv/lib/python3.11/site-packages/pydantic/main.py:164\u001b[0m, in \u001b[0;36mBaseModel.__init__\u001b[0;34m(__pydantic_self__, **data)\u001b[0m\n\u001b[1;32m 162\u001b[0m \u001b[39m# `__tracebackhide__` tells pytest and some other tools to omit this function from tracebacks\u001b[39;00m\n\u001b[1;32m 163\u001b[0m __tracebackhide__ \u001b[39m=\u001b[39m \u001b[39mTrue\u001b[39;00m\n\u001b[0;32m--> 164\u001b[0m __pydantic_self__\u001b[39m.\u001b[39;49m__pydantic_validator__\u001b[39m.\u001b[39;49mvalidate_python(data, self_instance\u001b[39m=\u001b[39;49m__pydantic_self__)\n",
- "\u001b[0;31mValidationError\u001b[0m: 1 validation error for Person\nage\n Input should be greater than 0 [type=greater_than, input_value=-10, input_type=int]\n For further information visit https://errors.pydantic.dev/2.4/v/greater_than"
- ]
- }
- ],
- "source": [
- "class Person(BaseModel):\n",
- " name: str\n",
- " age: int = Field(..., gt=0)\n",
- "\n",
- "Person(name=\"Sam\", age=-10)"
+ "You can also define validators that are run on the data. This is useful because it means you can catch errors before they happen. For example, you can define a validator that checks if the age is greater than 0. This is useful because it means you can catch errors before they happen.\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
- "Lastly you can also define functions that run on the data. In this case we use the [@field_validator](https://docs.pydantic.dev/latest/concepts/validators/#field-validators) decorator."
+ "## Asking for JSON from OpenAI\n"
]
},
{
"cell_type": "code",
- "execution_count": 30,
+ "execution_count": 12,
"metadata": {},
"outputs": [
{
- "ename": "ValidationError",
- "evalue": "1 validation error for Person\nname\n Value error, must contain a space [type=value_error, input_value='Sam', input_type=str]\n For further information visit https://errors.pydantic.dev/2.4/v/value_error",
- "output_type": "error",
- "traceback": [
- "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
- "\u001b[0;31mValidationError\u001b[0m Traceback (most recent call last)",
- "\u001b[1;32m/Users/jasonliu/dev/instructor/tutorials/1.introduction.ipynb Cell 16\u001b[0m line \u001b[0;36m1\n\u001b[1;32m 11\u001b[0m \u001b[39mraise\u001b[39;00m \u001b[39mValueError\u001b[39;00m(\u001b[39m\"\u001b[39m\u001b[39mmust contain a space\u001b[39m\u001b[39m\"\u001b[39m)\n\u001b[1;32m 12\u001b[0m \u001b[39mreturn\u001b[39;00m v\n\u001b[0;32m---> 14\u001b[0m Person(name\u001b[39m=\u001b[39;49m\u001b[39m\"\u001b[39;49m\u001b[39mSam\u001b[39;49m\u001b[39m\"\u001b[39;49m, age\u001b[39m=\u001b[39;49m\u001b[39m10\u001b[39;49m)\n",
- "File \u001b[0;32m~/dev/instructor/.venv/lib/python3.11/site-packages/pydantic/main.py:164\u001b[0m, in \u001b[0;36mBaseModel.__init__\u001b[0;34m(__pydantic_self__, **data)\u001b[0m\n\u001b[1;32m 162\u001b[0m \u001b[39m# `__tracebackhide__` tells pytest and some other tools to omit this function from tracebacks\u001b[39;00m\n\u001b[1;32m 163\u001b[0m __tracebackhide__ \u001b[39m=\u001b[39m \u001b[39mTrue\u001b[39;00m\n\u001b[0;32m--> 164\u001b[0m __pydantic_self__\u001b[39m.\u001b[39;49m__pydantic_validator__\u001b[39m.\u001b[39;49mvalidate_python(data, self_instance\u001b[39m=\u001b[39;49m__pydantic_self__)\n",
- "\u001b[0;31mValidationError\u001b[0m: 1 validation error for Person\nname\n Value error, must contain a space [type=value_error, input_value='Sam', input_type=str]\n For further information visit https://errors.pydantic.dev/2.4/v/value_error"
- ]
- }
- ],
- "source": [
- "from pydantic import field_validator\n",
- "\n",
- "\n",
- "class Person(BaseModel):\n",
- " name: str\n",
- " age: int = Field(..., gt=0)\n",
- "\n",
- " @field_validator(\"name\")\n",
- " def name_must_contain_space(cls, v):\n",
- " if \" \" not in v:\n",
- " raise ValueError(\"must contain a space\")\n",
- " return v\n",
- " \n",
- "Person(name=\"Sam\", age=10)"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "'...' is used as a placeholder for a required value in Pydantic's [Field](https://docs.pydantic.dev/latest/concepts/fields/) function.\n",
- "\n",
- "```age: int = Field(..., gt=0)```\n",
- "defines a field age of type int in the Person model. The ... indicates that this field is required and must be provided when creating an instance of the Person model. The gt=0 is a validation that ensures the age must be greater than 0.\n",
- "\n",
- "If you try to create a Person without providing an age, Pydantic will raise a validation error because of the ... placeholder."
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 31,
- "metadata": {},
- "outputs": [
- {
- "data": {
- "text/plain": [
- "Person(name='Sam Liu', age=10)"
- ]
- },
- "execution_count": 31,
- "metadata": {},
- "output_type": "execute_result"
- }
- ],
- "source": [
- "Person(name=\"Sam Liu\", age=10)"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "## Asking for JSON from OpenAI"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 32,
- "metadata": {},
- "outputs": [
- {
- "ename": "ValidationError",
- "evalue": "1 validation error for Person\nname\n Value error, must contain a space [type=value_error, input_value='Jason', input_type=str]\n For further information visit https://errors.pydantic.dev/2.4/v/value_error",
- "output_type": "error",
- "traceback": [
- "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
- "\u001b[0;31mValidationError\u001b[0m Traceback (most recent call last)",
- "\u001b[1;32m/Users/jasonliu/dev/instructor/tutorials/1.introduction.ipynb Cell 19\u001b[0m line \u001b[0;36m1\n\u001b[1;32m 3\u001b[0m client \u001b[39m=\u001b[39m OpenAI()\n\u001b[1;32m 5\u001b[0m resp \u001b[39m=\u001b[39m client\u001b[39m.\u001b[39mchat\u001b[39m.\u001b[39mcompletions\u001b[39m.\u001b[39mcreate(\n\u001b[1;32m 6\u001b[0m model\u001b[39m=\u001b[39m\u001b[39m\"\u001b[39m\u001b[39mgpt-3.5-turbo\u001b[39m\u001b[39m\"\u001b[39m,\n\u001b[1;32m 7\u001b[0m messages\u001b[39m=\u001b[39m[\n\u001b[1;32m 8\u001b[0m {\u001b[39m\"\u001b[39m\u001b[39mrole\u001b[39m\u001b[39m\"\u001b[39m: \u001b[39m\"\u001b[39m\u001b[39muser\u001b[39m\u001b[39m\"\u001b[39m, \u001b[39m\"\u001b[39m\u001b[39mcontent\u001b[39m\u001b[39m\"\u001b[39m: \u001b[39m\"\u001b[39m\u001b[39mExtract `Jason is 25 years old` into json\u001b[39m\u001b[39m\"\u001b[39m},\n\u001b[1;32m 9\u001b[0m ]\n\u001b[1;32m 10\u001b[0m )\n\u001b[0;32m---> 12\u001b[0m Person\u001b[39m.\u001b[39;49mmodel_validate_json(resp\u001b[39m.\u001b[39;49mchoices[\u001b[39m0\u001b[39;49m]\u001b[39m.\u001b[39;49mmessage\u001b[39m.\u001b[39;49mcontent)\n",
- "File \u001b[0;32m~/dev/instructor/.venv/lib/python3.11/site-packages/pydantic/main.py:530\u001b[0m, in \u001b[0;36mBaseModel.model_validate_json\u001b[0;34m(cls, json_data, strict, context)\u001b[0m\n\u001b[1;32m 528\u001b[0m \u001b[39m# `__tracebackhide__` tells pytest and some other tools to omit this function from tracebacks\u001b[39;00m\n\u001b[1;32m 529\u001b[0m __tracebackhide__ \u001b[39m=\u001b[39m \u001b[39mTrue\u001b[39;00m\n\u001b[0;32m--> 530\u001b[0m \u001b[39mreturn\u001b[39;00m \u001b[39mcls\u001b[39;49m\u001b[39m.\u001b[39;49m__pydantic_validator__\u001b[39m.\u001b[39;49mvalidate_json(json_data, strict\u001b[39m=\u001b[39;49mstrict, context\u001b[39m=\u001b[39;49mcontext)\n",
- "\u001b[0;31mValidationError\u001b[0m: 1 validation error for Person\nname\n Value error, must contain a space [type=value_error, input_value='Jason', input_type=str]\n For further information visit https://errors.pydantic.dev/2.4/v/value_error"
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "{\n",
+ " \"Jason\": {\n",
+ " \"age\": 10\n",
+ " }\n",
+ "}\n",
+ "Here is the JSON representation of `jason is 10` as a JSON object:\n",
+ "\n",
+ "```\n",
+ "{\n",
+ " \"name\": \"Jason\",\n",
+ " \"age\": 10\n",
+ "}\n",
+ "```\n",
+ "Here is the JSON object representation of \"Jason is 10\":\n",
+ "\n",
+ "```json\n",
+ "{\n",
+ " \"name\": \"Jason\",\n",
+ " \"age\": 10\n",
+ "}\n",
+ "```\n",
+ "\n",
+ "In this JSON object, the key \"name\" corresponds to the value \"Jason\" and the key \"age\" corresponds to the value 10.\n"
]
}
],
@@ -349,113 +248,81 @@
"resp = client.chat.completions.create(\n",
" model=\"gpt-3.5-turbo\",\n",
" messages=[\n",
- " {\"role\": \"user\", \"content\": \"Extract `Jason is 25 years old` into json\"},\n",
- " ]\n",
+ " {\"role\": \"user\", \"content\": \"Please give me jason is 10 as a json object\"},\n",
+ " ],\n",
+ " n=20,\n",
+ " temperature=1,\n",
")\n",
"\n",
- "Person.model_validate_json(resp.choices[0].message.content)"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {},
- "outputs": [],
- "source": [
- "resp = client.chat.completions.create(\n",
- " model=\"gpt-3.5-turbo\",\n",
- " messages=[\n",
- " {\"role\": \"user\", \"content\": \"Extract `Jason Liu is thirty years old` into json\"},\n",
- " ]\n",
- ")\n",
- "\n",
- "Person.model_validate_json(resp.choices[0].message.content)"
+ "for choice in resp.choices:\n",
+ " json = choice.message.content\n",
+ " try:\n",
+ " Person.model_validate_json(json)\n",
+ " except Exception as e:\n",
+ " print(json)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
- "But what happens if I want to describe specifically how the schema should look? What if I want full_name and age and birthday as a datetime?"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {},
- "outputs": [],
- "source": [
- "import datetime\n",
- "\n",
- "class PersonBirthday(Person):\n",
- " birthday: datetime.date\n",
- "\n",
- "\n",
- "resp = client.chat.completions.create(\n",
- " model=\"gpt-3.5-turbo\",\n",
- " messages=[\n",
- " {\"role\": \"user\", \"content\": f\"Extract `Jason Liu is thirty years old his birthday is yesterday` into json. Today is {datetime.date.today()}\"},\n",
- " ]\n",
- ")\n",
- "\n",
- "print(resp.choices[0].message.content)\n",
- "print(Person.model_validate_json(resp.choices[0].message.content))\n",
- "PersonBirthday.model_validate_json(resp.choices[0].message.content)"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "## Introduction to Function Calling \n",
- "\n",
- "The json could be anything! We could add more and more into a prompt and hope it works, or we can use something called [function calling](https://platform.openai.com/docs/guides/function-calling) to directly specify the schema we want. \n",
+ "## Introduction to Function Calling\n",
"\n",
+ "The json could be anything! We could add more and more into a prompt and hope it works, or we can use something called [function calling](https://platform.openai.com/docs/guides/function-calling) to directly specify the schema we want.\n",
"\n",
"**Function Calling**\n",
"\n",
- "In an API call, you can describe functions and have the model intelligently choose to output a JSON object containing arguments to call one or many functions. The Chat Completions API does not call the function; instead, the model generates JSON that you can use to call the function in your code."
+ "In an API call, you can describe functions and have the model intelligently choose to output a JSON object containing arguments to call one or many functions. The Chat Completions API does not call the function; instead, the model generates JSON that you can use to call the function in your code.\n"
]
},
{
"cell_type": "code",
- "execution_count": 33,
+ "execution_count": 13,
"metadata": {},
"outputs": [
{
- "ename": "NameError",
- "evalue": "name 'datetime' is not defined",
- "output_type": "error",
- "traceback": [
- "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
- "\u001b[0;31mNameError\u001b[0m Traceback (most recent call last)",
- "\u001b[1;32m/Users/jasonliu/dev/instructor/tutorials/1.introduction.ipynb Cell 24\u001b[0m line \u001b[0;36m1\n\u001b[1;32m 1\u001b[0m schema \u001b[39m=\u001b[39m {\n\u001b[1;32m 2\u001b[0m \u001b[39m'\u001b[39m\u001b[39mproperties\u001b[39m\u001b[39m'\u001b[39m: \n\u001b[1;32m 3\u001b[0m {\n\u001b[0;32m (...)\u001b[0m\n\u001b[1;32m 9\u001b[0m \u001b[39m'\u001b[39m\u001b[39mtype\u001b[39m\u001b[39m'\u001b[39m: \u001b[39m'\u001b[39m\u001b[39mobject\u001b[39m\u001b[39m'\u001b[39m\n\u001b[1;32m 10\u001b[0m }\n\u001b[1;32m 12\u001b[0m resp \u001b[39m=\u001b[39m client\u001b[39m.\u001b[39mchat\u001b[39m.\u001b[39mcompletions\u001b[39m.\u001b[39mcreate(\n\u001b[1;32m 13\u001b[0m model\u001b[39m=\u001b[39m\u001b[39m\"\u001b[39m\u001b[39mgpt-3.5-turbo\u001b[39m\u001b[39m\"\u001b[39m,\n\u001b[1;32m 14\u001b[0m messages\u001b[39m=\u001b[39m[\n\u001b[0;32m---> 15\u001b[0m {\u001b[39m\"\u001b[39m\u001b[39mrole\u001b[39m\u001b[39m\"\u001b[39m: \u001b[39m\"\u001b[39m\u001b[39muser\u001b[39m\u001b[39m\"\u001b[39m, \u001b[39m\"\u001b[39m\u001b[39mcontent\u001b[39m\u001b[39m\"\u001b[39m: \u001b[39mf\u001b[39m\u001b[39m\"\u001b[39m\u001b[39mExtract `Jason Liu is thirty years old his birthday is yesturday` into json today is \u001b[39m\u001b[39m{\u001b[39;00mdatetime\u001b[39m.\u001b[39mdate\u001b[39m.\u001b[39mtoday()\u001b[39m}\u001b[39;00m\u001b[39m\"\u001b[39m},\n\u001b[1;32m 16\u001b[0m ],\n\u001b[1;32m 17\u001b[0m functions\u001b[39m=\u001b[39m[{\u001b[39m\"\u001b[39m\u001b[39mname\u001b[39m\u001b[39m\"\u001b[39m: \u001b[39m\"\u001b[39m\u001b[39mPerson\u001b[39m\u001b[39m\"\u001b[39m, \u001b[39m\"\u001b[39m\u001b[39mparameters\u001b[39m\u001b[39m\"\u001b[39m: schema}],\n\u001b[1;32m 18\u001b[0m function_call\u001b[39m=\u001b[39m\u001b[39m\"\u001b[39m\u001b[39mauto\u001b[39m\u001b[39m\"\u001b[39m\n\u001b[1;32m 19\u001b[0m )\n\u001b[1;32m 22\u001b[0m PersonBirthday\u001b[39m.\u001b[39mmodel_validate_json(resp\u001b[39m.\u001b[39mchoices[\u001b[39m0\u001b[39m]\u001b[39m.\u001b[39mmessage\u001b[39m.\u001b[39mfunction_call\u001b[39m.\u001b[39marguments)\n",
- "\u001b[0;31mNameError\u001b[0m: name 'datetime' is not defined"
- ]
+ "data": {
+ "text/plain": [
+ "PersonBirthday(name='Jason Liu', age=30, birthday=datetime.date(2023, 11, 30))"
+ ]
+ },
+ "execution_count": 13,
+ "metadata": {},
+ "output_type": "execute_result"
}
],
"source": [
+ "import datetime\n",
+ "\n",
+ "\n",
+ "class PersonBirthday(BaseModel):\n",
+ " name: str\n",
+ " age: int\n",
+ " birthday: datetime.date\n",
+ "\n",
+ "\n",
"schema = {\n",
- " 'properties': \n",
- " {\n",
- " 'name': {'type': 'string'},\n",
- " 'age': {'type': 'integer'},\n",
- " 'birthday': {'type': 'string', 'format': 'YYYY-MM-DD'},\n",
+ " \"properties\": {\n",
+ " \"name\": {\"type\": \"string\"},\n",
+ " \"age\": {\"type\": \"integer\"},\n",
+ " \"birthday\": {\"type\": \"string\", \"format\": \"YYYY-MM-DD\"},\n",
" },\n",
- " 'required': ['name', 'age'],\n",
- " 'type': 'object'\n",
+ " \"required\": [\"name\", \"age\"],\n",
+ " \"type\": \"object\",\n",
"}\n",
"\n",
"resp = client.chat.completions.create(\n",
" model=\"gpt-3.5-turbo\",\n",
" messages=[\n",
- " {\"role\": \"user\", \"content\": f\"Extract `Jason Liu is thirty years old his birthday is yesturday` into json today is {datetime.date.today()}\"},\n",
+ " {\n",
+ " \"role\": \"user\",\n",
+ " \"content\": f\"Extract `Jason Liu is thirty years old his birthday is yesturday` into json today is {datetime.date.today()}\",\n",
+ " },\n",
" ],\n",
" functions=[{\"name\": \"Person\", \"parameters\": schema}],\n",
- " function_call=\"auto\"\n",
+ " function_call=\"auto\",\n",
")\n",
"\n",
- "\n",
"PersonBirthday.model_validate_json(resp.choices[0].message.function_call.arguments)"
]
},
@@ -463,24 +330,28 @@
"cell_type": "markdown",
"metadata": {},
"source": [
- "But it turns out, pydantic actually not only does our serialization, we can define the schema as well as add additional documentation!"
+ "But it turns out, pydantic actually not only does our serialization, we can define the schema as well as add additional documentation!\n"
]
},
{
"cell_type": "code",
- "execution_count": 34,
+ "execution_count": 14,
"metadata": {},
"outputs": [
{
- "ename": "NameError",
- "evalue": "name 'PersonBirthday' is not defined",
- "output_type": "error",
- "traceback": [
- "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
- "\u001b[0;31mNameError\u001b[0m Traceback (most recent call last)",
- "\u001b[1;32m/Users/jasonliu/dev/instructor/tutorials/1.introduction.ipynb Cell 26\u001b[0m line \u001b[0;36m1\n\u001b[0;32m----> 1\u001b[0m PersonBirthday\u001b[39m.\u001b[39mmodel_json_schema()\n",
- "\u001b[0;31mNameError\u001b[0m: name 'PersonBirthday' is not defined"
- ]
+ "data": {
+ "text/plain": [
+ "{'properties': {'name': {'title': 'Name', 'type': 'string'},\n",
+ " 'age': {'title': 'Age', 'type': 'integer'},\n",
+ " 'birthday': {'format': 'date', 'title': 'Birthday', 'type': 'string'}},\n",
+ " 'required': ['name', 'age', 'birthday'],\n",
+ " 'title': 'PersonBirthday',\n",
+ " 'type': 'object'}"
+ ]
+ },
+ "execution_count": 14,
+ "metadata": {},
+ "output_type": "execute_result"
}
],
"source": [
@@ -491,12 +362,12 @@
"cell_type": "markdown",
"metadata": {},
"source": [
- "We can even define nested complex schemas, and documentation with ease."
+ "We can even define nested complex schemas, and documentation with ease.\n"
]
},
{
"cell_type": "code",
- "execution_count": 35,
+ "execution_count": 15,
"metadata": {},
"outputs": [
{
@@ -512,14 +383,14 @@
" 'type': 'object'}},\n",
" 'description': 'A Person with an address',\n",
" 'properties': {'name': {'title': 'Name', 'type': 'string'},\n",
- " 'age': {'exclusiveMinimum': 0, 'title': 'Age', 'type': 'integer'},\n",
+ " 'age': {'title': 'Age', 'type': 'integer'},\n",
" 'address': {'$ref': '#/$defs/Address'}},\n",
" 'required': ['name', 'age', 'address'],\n",
" 'title': 'PersonAddress',\n",
" 'type': 'object'}"
]
},
- "execution_count": 35,
+ "execution_count": 15,
"metadata": {},
"output_type": "execute_result"
}
@@ -533,6 +404,7 @@
"\n",
"class PersonAddress(Person):\n",
" \"\"\"A Person with an address\"\"\"\n",
+ "\n",
" address: Address\n",
"\n",
"\n",
@@ -544,12 +416,23 @@
"metadata": {},
"source": [
"These simple concepts become what we built into `instructor` and most of the work has been around documenting how we can leverage schema engineering.\n",
- "Except now we use `instructor.patch()` to add a bunch more capabilities to the OpenAI SDK."
+ "Except now we use `instructor.patch()` to add a bunch more capabilities to the OpenAI SDK.\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "# The core idea around Instructor\n",
+ "\n",
+ "1. Using function calling allows us to specify the schema we want\n",
+ "2. Pydantic can be used to define the schema and documentation AND validate the response at runtime\n",
+ "3. As a library with 100M downloads, we can leverage pydantic to do all the heavy lifting for us and fit nicely with the python ecosystem\n"
]
},
{
"cell_type": "code",
- "execution_count": 37,
+ "execution_count": 16,
"metadata": {},
"outputs": [
{
@@ -558,7 +441,7 @@
"PersonAddress(name='Jason Liu', age=30, address=Address(address='123 Main St', city='San Francisco', state='CA'))"
]
},
- "execution_count": 37,
+ "execution_count": 16,
"metadata": {},
"output_type": "execute_result"
}
@@ -567,20 +450,22 @@
"import instructor\n",
"import datetime\n",
"\n",
+ "# patch the client to add `response_model` to the `create` method\n",
"client = instructor.patch(client)\n",
"\n",
"resp = client.chat.completions.create(\n",
" model=\"gpt-3.5-turbo\",\n",
" messages=[\n",
" {\n",
- " \"role\": \"user\", \n",
+ " \"role\": \"user\",\n",
" \"content\": f\"\"\"\n",
" Today is {datetime.date.today()} \n",
"\n",
" Extract `Jason Liu is thirty years old his birthday is yesturday` \n",
- " he lives at 123 Main St, San Francisco, CA\"\"\"},\n",
+ " he lives at 123 Main St, San Francisco, CA\"\"\",\n",
+ " },\n",
" ],\n",
- " response_model=PersonAddress\n",
+ " response_model=PersonAddress,\n",
")\n",
"resp"
]
@@ -589,7 +474,8 @@
"cell_type": "markdown",
"metadata": {},
"source": [
- "Now you can see that when we set `response_model` create call will now return a pydantic model, and we can use that to validate the data. and work with it as if it was a python object."
+ "By defining `response_model` we can leverage pydantic to do all the heavy lifting. Later we'll introduce the other features that `instructor.patch()` adds to the OpenAI SDK.\n",
+ "but for now, this small change allows us to do a lot more with the API.\n"
]
},
{
@@ -598,7 +484,7 @@
"source": [
"## Is instructor the only way to do this?\n",
"\n",
- "No. Libraries like Marvin, Langchain, and Llamaindex all now leverage the pydantic object in similar ways however they all have different approaches to how they do it. With instructor the goal is to be as light weight as possible, get you as close as possible to the openai api, and then get out of your way. \n",
+ "No. Libraries like Marvin, Langchain, and Llamaindex all now leverage the pydantic object in similar ways however they all have different approaches to how they do it. With instructor the goal is to be as light weight as possible, get you as close as possible to the openai api, and then get out of your way.\n",
"\n",
"More importantly, we've also added straight forward validation and reasking to the mix.\n",
"\n",
@@ -606,10 +492,9 @@
"\n",
"For further exploration:\n",
"\n",
- "\n",
"- [Marvin](https://www.askmarvin.ai/)\n",
"- [Langchain](https://python.langchain.com/docs/modules/model_io/output_parsers/pydantic)\n",
- "- [LlamaIndex](https://gpt-index.readthedocs.io/en/latest/examples/output_parsing/openai_pydantic_program.html)"
+ "- [LlamaIndex](https://gpt-index.readthedocs.io/en/latest/examples/output_parsing/openai_pydantic_program.html)\n"
]
}
],
diff --git a/tutorials/2.tips.ipynb b/tutorials/2.tips.ipynb
index 40bec86..be5d94f 100644
--- a/tutorials/2.tips.ipynb
+++ b/tutorials/2.tips.ipynb
@@ -7,9 +7,10 @@
"source": [
"# General Tips on Prompting\n",
"\n",
- "Before we get into some big applications of schema engineering I want to equip you with the tools for success. \n",
+ "Before we get into some big applications of schema engineering I want to equip you with the tools for success.\n",
+ "This notebook is to share some general advice when using prompts to get the most of your models.\n",
"\n",
- "This notebook is to share some general advice when using prompts to get the most of your models. "
+ "Before you might think of prompt engineering as massaging this wall of text, almost like coding in a notepad. But with schema engineering you can get a lot more out of your prompts with a lot less work.\n"
]
},
{
@@ -24,15 +25,14 @@
"1. using Enums\n",
"2. using Literals\n",
"\n",
- "\n",
"Use an enum in Python when you need a set of named constants that are related and you want to ensure type safety, readability, and prevent invalid values. Enums are helpful for grouping and iterating over these constants.\n",
"\n",
- "Use literals when you have a small, unchanging set of values that you don't need to group or iterate over, and when type safety and preventing invalid values is less of a concern. Literals are simpler and more direct for basic, one-off values."
+ "Use literals when you have a small, unchanging set of values that you don't need to group or iterate over, and when type safety and preventing invalid values is less of a concern. Literals are simpler and more direct for basic, one-off values.\n"
]
},
{
"cell_type": "code",
- "execution_count": 5,
+ "execution_count": 10,
"id": "fdf5e1d9-31ad-4e8a-a55e-e2e70fff598d",
"metadata": {},
"outputs": [
@@ -42,49 +42,71 @@
"{'age': 17, 'name': 'Harry Potter', 'house': }"
]
},
- "execution_count": 5,
+ "execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
+ "import instructor\n",
+ "from openai import OpenAI\n",
+ "\n",
"from enum import Enum\n",
"from pydantic import BaseModel, Field\n",
"from typing_extensions import Literal\n",
"\n",
- "import instructor\n",
- "from openai import OpenAI\n",
"\n",
"client = instructor.patch(OpenAI())\n",
"\n",
- "# Tip: Do not use auto() as they cast to 1,2,3,4 \n",
+ "\n",
+ "# Tip: Do not use auto() as they cast to 1,2,3,4\n",
"class House(Enum):\n",
" Gryffindor = \"gryffindor\"\n",
" Hufflepuff = \"hufflepuff\"\n",
" Ravenclaw = \"ravenclaw\"\n",
" Slytherin = \"slytherin\"\n",
"\n",
+ "\n",
"class Character(BaseModel):\n",
" age: int\n",
" name: str\n",
" house: House\n",
- " \n",
+ "\n",
+ " def say_hello(self):\n",
+ " print(\n",
+ " f\"Hello, I'm {self.name}, I'm {self.age} years old and I'm from {self.house.value.title()}\"\n",
+ " )\n",
+ "\n",
+ "\n",
"resp = client.chat.completions.create(\n",
" model=\"gpt-4-1106-preview\",\n",
- " messages=[\n",
- " {\n",
- " \"role\": \"user\", \n",
- " \"content\": \"Harry Potter\"\n",
- " }\n",
- " ],\n",
- " response_model=Character\n",
+ " messages=[{\"role\": \"user\", \"content\": \"Harry Potter\"}],\n",
+ " response_model=Character,\n",
")\n",
"resp.model_dump()"
]
},
{
"cell_type": "code",
- "execution_count": 6,
+ "execution_count": 11,
+ "id": "c609eb44",
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "Hello, I'm Harry Potter, I'm 17 years old and I'm from Gryffindor\n"
+ ]
+ }
+ ],
+ "source": [
+ "resp.say_hello()"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 12,
"id": "03db160c-81e9-4373-bfec-7a107224b6dd",
"metadata": {},
"outputs": [
@@ -94,7 +116,7 @@
"{'age': 17, 'name': 'Harry Potter', 'house': 'Gryffindor'}"
]
},
- "execution_count": 6,
+ "execution_count": 12,
"metadata": {},
"output_type": "execute_result"
}
@@ -104,16 +126,12 @@
" age: int\n",
" name: str\n",
" house: Literal[\"Gryffindor\", \"Hufflepuff\", \"Ravenclaw\", \"Slytherin\"]\n",
- " \n",
+ "\n",
+ "\n",
"resp = client.chat.completions.create(\n",
" model=\"gpt-4-1106-preview\",\n",
- " messages=[\n",
- " {\n",
- " \"role\": \"user\", \n",
- " \"content\": \"Harry Potter\"\n",
- " }\n",
- " ],\n",
- " response_model=Character\n",
+ " messages=[{\"role\": \"user\", \"content\": \"Harry Potter\"}],\n",
+ " response_model=Character,\n",
")\n",
"resp.model_dump()"
]
@@ -123,56 +141,37 @@
"id": "803e0ce6-6e7e-4d86-a7a8-49ebaad0a40b",
"metadata": {},
"source": [
- "## Arbitrary long properties\n",
+ "## Arbitrary properties\n",
"\n",
- "Often times there are long properties that you might want to extract from data that we can not specify in advanced. We can get around this by defining an arbitrary key value store like so:"
+ "Often times there are long properties that you might want to extract from data that we can not specify in advanced. We can get around this by defining an arbitrary key value store like so:\n"
]
},
{
"cell_type": "code",
- "execution_count": 7,
+ "execution_count": 13,
"id": "0e7938b8-4666-4df4-bd80-f53e8baf7550",
"metadata": {},
- "outputs": [
- {
- "data": {
- "text/plain": [
- "{'age': 38,\n",
- " 'name': 'Severus Snape',\n",
- " 'house': 'Slytherin',\n",
- " 'properties': [{'key': 'position', 'value': 'Professor of Potions'},\n",
- " {'key': 'loyalty', 'value': 'Dumbledore, Hogwarts'},\n",
- " {'key': 'patronus', 'value': 'Doe'},\n",
- " {'key': 'skill', 'value': 'Occlumency'}]}"
- ]
- },
- "execution_count": 7,
- "metadata": {},
- "output_type": "execute_result"
- }
- ],
+ "outputs": [],
"source": [
"from typing import List\n",
"\n",
+ "\n",
"class Property(BaseModel):\n",
" key: str = Field(description=\"Must be snake case\")\n",
" value: str\n",
"\n",
+ "\n",
"class Character(BaseModel):\n",
" age: int\n",
" name: str\n",
" house: Literal[\"Gryffindor\", \"Hufflepuff\", \"Ravenclaw\", \"Slytherin\"]\n",
" properties: List[Property]\n",
"\n",
+ "\n",
"resp = client.chat.completions.create(\n",
" model=\"gpt-4-1106-preview\",\n",
- " messages=[\n",
- " {\n",
- " \"role\": \"user\", \n",
- " \"content\": \"Snape from Harry Potter\"\n",
- " }\n",
- " ],\n",
- " response_model=Character\n",
+ " messages=[{\"role\": \"user\", \"content\": \"Snape from Harry Potter\"}],\n",
+ " response_model=Character,\n",
")\n",
"resp.model_dump()"
]
@@ -182,16 +181,16 @@
"id": "b3e62f68-a79f-4f65-9c1f-726e4e2d340a",
"metadata": {},
"source": [
- "## Limiting the length of lists \n",
+ "## Limiting the length of lists\n",
"\n",
"In later chapters we'll talk about how to use validators to assert the length of lists but we can also use prompting tricks to enumerate values. Here we'll define a index to count the properties.\n",
"\n",
- "In this following example instead of extraction we're going to work on generation instead."
+ "In this following example instead of extraction we're going to work on generation instead.\n"
]
},
{
"cell_type": "code",
- "execution_count": 8,
+ "execution_count": null,
"id": "69a58d01-ab6f-41b6-bc0c-b0e55fdb6fe4",
"metadata": {},
"outputs": [
@@ -201,20 +200,22 @@
"{'age': 38,\n",
" 'name': 'Severus Snape',\n",
" 'house': 'Slytherin',\n",
- " 'properties': [{'index': '1',\n",
- " 'key': 'Occupation',\n",
- " 'value': 'Professor of Potions and later Defence Against the Dark Arts'},\n",
+ " 'properties': [{'index': '1', 'key': 'patronus', 'value': 'Doe'},\n",
" {'index': '2',\n",
- " 'key': 'Allegiance',\n",
- " 'value': 'Order of the Phoenix, Hogwarts'},\n",
- " {'index': '3', 'key': 'Patronus', 'value': 'Doe'},\n",
+ " 'key': 'position',\n",
+ " 'value': 'Potions Master, Defense Against the Dark Arts teacher, Headmaster'},\n",
+ " {'index': '3',\n",
+ " 'key': 'loyalty',\n",
+ " 'value': 'Hogwarts, Albus Dumbledore, Order of the Phoenix, Lily Evans'},\n",
" {'index': '4',\n",
- " 'key': 'Skills',\n",
- " 'value': 'Potions master, Occlumens, Legilimens'},\n",
- " {'index': '5', 'key': 'Portrayed by', 'value': 'Alan Rickman'}]}"
+ " 'key': 'skills',\n",
+ " 'value': 'Potions expertise, Occlumency, Legilimency'},\n",
+ " {'index': '5',\n",
+ " 'key': 'disguised_loyalty',\n",
+ " 'value': 'Death Eater (formerly, as a double agent)'}]}"
]
},
- "execution_count": 8,
+ "execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
@@ -222,24 +223,24 @@
"source": [
"class Property(BaseModel):\n",
" index: str = Field(..., description=\"Monotonically increasing ID\")\n",
- " key: str\n",
+ " key: str = Field(description=\"Must be snake case\")\n",
" value: str\n",
"\n",
+ "\n",
"class Character(BaseModel):\n",
" age: int\n",
" name: str\n",
" house: Literal[\"Gryffindor\", \"Hufflepuff\", \"Ravenclaw\", \"Slytherin\"]\n",
- " properties: List[Property] = Field(..., description=\"Numbered list of arbitrary extracted properties, should be exactly 5\")\n",
+ " properties: List[Property] = Field(\n",
+ " ...,\n",
+ " description=\"Numbered list of arbitrary extracted properties, should be exactly 5\",\n",
+ " )\n",
+ "\n",
"\n",
"resp = client.chat.completions.create(\n",
" model=\"gpt-4-1106-preview\",\n",
- " messages=[\n",
- " {\n",
- " \"role\": \"user\", \n",
- " \"content\": \"Snape from Harry Potter\"\n",
- " }\n",
- " ],\n",
- " response_model=Character\n",
+ " messages=[{\"role\": \"user\", \"content\": \"Snape from Harry Potter\"}],\n",
+ " response_model=Character,\n",
")\n",
"resp.model_dump()"
]
@@ -251,68 +252,78 @@
"source": [
"## Defining Multiple Entities\n",
"\n",
- "Now that we see a single entity with many properties we can continue to nest them into many users"
+ "Now that we see a single entity with many properties we can continue to nest them into many users\n"
]
},
{
"cell_type": "code",
- "execution_count": 9,
+ "execution_count": null,
"id": "1f2a2b14-a956-4f96-90c9-e11ca04ab7d1",
"metadata": {},
"outputs": [
{
- "data": {
- "text/plain": [
- "{'users': [{'age': 38,\n",
- " 'name': 'Severus Snape',\n",
- " 'house': 'Slytherin',\n",
- " 'properties': [{'index': '1',\n",
- " 'key': 'Role',\n",
- " 'value': 'Professor of Potions, later Defence Against the Dark Arts, and head of Slytherin House'},\n",
- " {'index': '2', 'key': 'Patronus', 'value': 'Doe'},\n",
- " {'index': '3', 'key': 'Loyalty', 'value': 'Dumbledore, Harry, Hogwarts'},\n",
- " {'index': '4',\n",
- " 'key': 'Special Skill',\n",
- " 'value': 'Occlumens, Potions Master'},\n",
- " {'index': '5', 'key': 'Played by', 'value': 'Alan Rickman'}]},\n",
- " {'age': 115,\n",
- " 'name': 'Albus Dumbledore',\n",
- " 'house': 'Gryffindor',\n",
- " 'properties': [{'index': '1',\n",
- " 'key': 'Role',\n",
- " 'value': 'Headmaster of Hogwarts'},\n",
- " {'index': '2', 'key': 'Patronus', 'value': 'Phoenix'},\n",
- " {'index': '3',\n",
- " 'key': 'Loyalty',\n",
- " 'value': 'Order of the Phoenix, Hogwarts'},\n",
- " {'index': '4',\n",
- " 'key': 'Special Skill',\n",
- " 'value': 'Considered to be the most powerful wizard of his time'},\n",
- " {'index': '5',\n",
- " 'key': 'Played by',\n",
- " 'value': 'Richard Harris (films 1-2), Michael Gambon (films 3-6)'}]}]}"
- ]
- },
- "execution_count": 9,
- "metadata": {},
- "output_type": "execute_result"
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "age=38 name='Severus Snape' house='Slytherin'\n",
+ "age=115 name='Albus Dumbledore' house='Gryffindor'\n"
+ ]
}
],
"source": [
- "class Characters(BaseModel):\n",
- " users: List[Character]\n",
+ "from typing import Iterable\n",
+ "\n",
+ "\n",
+ "class Character(BaseModel):\n",
+ " age: int\n",
+ " name: str\n",
+ " house: Literal[\"Gryffindor\", \"Hufflepuff\", \"Ravenclaw\", \"Slytherin\"]\n",
+ "\n",
"\n",
"resp = client.chat.completions.create(\n",
" model=\"gpt-4-1106-preview\",\n",
- " messages=[\n",
- " {\n",
- " \"role\": \"user\", \n",
- " \"content\": \"Snape and Dumbledore from Harry Potter\"\n",
- " }\n",
- " ],\n",
- " response_model=Characters\n",
+ " messages=[{\"role\": \"user\", \"content\": \"Snape and Dumbledore from Harry Potter\"}],\n",
+ " response_model=Iterable[Character],\n",
")\n",
- "resp.model_dump()"
+ "\n",
+ "for character in resp:\n",
+ " print(character)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "a3091aba",
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "age=38 name='Severus Snape' house='Slytherin'\n",
+ "age=115 name='Albus Dumbledore' house='Gryffindor'\n"
+ ]
+ }
+ ],
+ "source": [
+ "from typing import Iterable\n",
+ "\n",
+ "\n",
+ "class Character(BaseModel):\n",
+ " age: int\n",
+ " name: str\n",
+ " house: Literal[\"Gryffindor\", \"Hufflepuff\", \"Ravenclaw\", \"Slytherin\"]\n",
+ "\n",
+ "\n",
+ "resp = client.chat.completions.create(\n",
+ " model=\"gpt-4-1106-preview\",\n",
+ " messages=[{\"role\": \"user\", \"content\": \"Snape and Dumbledore from Harry Potter\"}],\n",
+ " stream=True,\n",
+ " response_model=Iterable[Character],\n",
+ ")\n",
+ "\n",
+ "for character in resp:\n",
+ " print(character)"
]
},
{
@@ -320,30 +331,27 @@
"id": "f6ed3144-bde1-4033-9c94-a6926fa079d2",
"metadata": {},
"source": [
- "## Defining Relationships \n",
+ "## Defining Relationships\n",
"\n",
- "Now only can we define lists of users, with list of properties one of the more interesting things I've learned about prompting is that we can also easily define lists of references."
+ "Now only can we define lists of users, with list of properties one of the more interesting things I've learned about prompting is that we can also easily define lists of references.\n"
]
},
{
"cell_type": "code",
- "execution_count": 12,
+ "execution_count": null,
"id": "6de8768e-b36a-4a51-9cf9-940d178552f6",
"metadata": {},
"outputs": [
{
- "data": {
- "text/plain": [
- "{'users': [{'id': 1, 'name': 'Harry Potter', 'friends': [2, 3, 4, 5]},\n",
- " {'id': 2, 'name': 'Hermione Granger', 'friends': [1, 3, 4, 5]},\n",
- " {'id': 3, 'name': 'Ron Weasley', 'friends': [1, 2, 4, 5]},\n",
- " {'id': 4, 'name': 'Ginny Weasley', 'friends': [1, 2, 3, 5]},\n",
- " {'id': 5, 'name': 'Neville Longbottom', 'friends': [1, 2, 3, 4]}]}"
- ]
- },
- "execution_count": 12,
- "metadata": {},
- "output_type": "execute_result"
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "id=1 name='Harry Potter' friends=[2, 3, 4, 5]\n",
+ "id=2 name='Hermione Granger' friends=[1, 3, 4, 5]\n",
+ "id=3 name='Ron Weasley' friends=[1, 2, 4, 5]\n",
+ "id=4 name='Draco Malfoy' friends=[5]\n",
+ "id=5 name='Neville Longbottom' friends=[1, 2, 3, 4]\n"
+ ]
}
],
"source": [
@@ -352,25 +360,21 @@
" name: str\n",
" friends: List[int]\n",
"\n",
- "class Characters(BaseModel):\n",
- " users: List[Character]\n",
"\n",
"resp = client.chat.completions.create(\n",
" model=\"gpt-4-1106-preview\",\n",
- " messages=[\n",
- " {\n",
- " \"role\": \"user\", \n",
- " \"content\": \"The 5 kids from Harry Potter\"\n",
- " }\n",
- " ],\n",
- " response_model=Characters\n",
+ " messages=[{\"role\": \"user\", \"content\": \"The 5 kids from Harry Potter\"}],\n",
+ " stream=True,\n",
+ " response_model=Iterable[Character],\n",
")\n",
- "resp.model_dump()"
+ "\n",
+ "for character in resp:\n",
+ " print(character)"
]
},
{
"cell_type": "code",
- "execution_count": 13,
+ "execution_count": null,
"id": "b31e10d7-ebd2-49b4-b2c4-20dd67ca135d",
"metadata": {},
"outputs": [
@@ -383,105 +387,93 @@
"\n",
"\n",
- "