Files
langchain/docs/extras/modules/chains/additional/extraction.ipynb
T
Harrison Chase 6a4a950a3c changes to llm chain (#6328)
- return raw and full output (but keep run shortcut method functional)
- change output parser to take in generations (good for working with
messages)
- add output parser to base class, always run (default to same as
current)

---------

Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
2023-06-18 22:49:47 -07:00

272 lines
7.1 KiB
Plaintext

{
"cells": [
{
"cell_type": "markdown",
"id": "6605e7f7",
"metadata": {},
"source": [
"# Extraction\n",
"\n",
"The extraction chain uses the OpenAI `functions` parameter to specify a schema to extract entities from a document. This helps us make sure that the model outputs exactly the schema of entities and properties that we want, with their appropriate types.\n",
"\n",
"The extraction chain is to be used when we want to extract several entities with their properties from the same passage (i.e. what people were mentioned in this passage?)"
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "34f04daf",
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"/Users/harrisonchase/.pyenv/versions/3.9.1/envs/langchain/lib/python3.9/site-packages/deeplake/util/check_latest_version.py:32: UserWarning: A newer version of deeplake (3.6.4) is available. It's recommended that you update to the latest version using `pip install -U deeplake`.\n",
" warnings.warn(\n"
]
}
],
"source": [
"from langchain.chat_models import ChatOpenAI\n",
"from langchain.chains import create_extraction_chain, create_extraction_chain_pydantic\n",
"from langchain.prompts import ChatPromptTemplate"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "a2648974",
"metadata": {},
"outputs": [],
"source": [
"llm = ChatOpenAI(temperature=0, model=\"gpt-3.5-turbo-0613\")"
]
},
{
"cell_type": "markdown",
"id": "5ef034ce",
"metadata": {},
"source": [
"## Extracting entities"
]
},
{
"cell_type": "markdown",
"id": "78ff9df9",
"metadata": {},
"source": [
"To extract entities, we need to create a schema like the following, were we specify all the properties we want to find and the type we expect them to have. We can also specify which of these properties are required and which are optional."
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "4ac43eba",
"metadata": {},
"outputs": [],
"source": [
"schema = {\n",
" \"properties\": {\n",
" \"person_name\": {\"type\": \"string\"},\n",
" \"person_height\": {\"type\": \"integer\"},\n",
" \"person_hair_color\": {\"type\": \"string\"},\n",
" \"dog_name\": {\"type\": \"string\"},\n",
" \"dog_breed\": {\"type\": \"string\"},\n",
" },\n",
" \"required\": [\"person_name\", \"person_height\"],\n",
"}"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "640bd005",
"metadata": {},
"outputs": [],
"source": [
"inp = \"\"\"\n",
"Alex is 5 feet tall. Claudia is 1 feet taller Alex and jumps higher than him. Claudia is a brunette and Alex is blonde.\n",
"Alex's dog Frosty is a labrador and likes to play hide and seek.\n",
" \"\"\""
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "64313214",
"metadata": {},
"outputs": [],
"source": [
"chain = create_extraction_chain(schema, llm)"
]
},
{
"cell_type": "markdown",
"id": "17c48adb",
"metadata": {},
"source": [
"As we can see, we extracted the required entities and their properties in the required format:"
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "cc5436ed",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[{'person_name': 'Alex',\n",
" 'person_height': 5,\n",
" 'person_hair_color': 'blonde',\n",
" 'dog_name': 'Frosty',\n",
" 'dog_breed': 'labrador'},\n",
" {'person_name': 'Claudia',\n",
" 'person_height': 6,\n",
" 'person_hair_color': 'brunette'}]"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"chain.run(inp)"
]
},
{
"cell_type": "markdown",
"id": "698b4c4d",
"metadata": {},
"source": [
"## Pydantic example"
]
},
{
"cell_type": "markdown",
"id": "6504a6d9",
"metadata": {},
"source": [
"We can also use a Pydantic schema to choose the required properties and types and we will set as 'Optional' those that are not strictly required.\n",
"\n",
"By using the `create_extraction_chain_pydantic` function, we can send a Pydantic schema as input and the output will be an instantiated object that respects our desired schema. \n",
"\n",
"In this way, we can specify our schema in the same manner that we would a new class or function in Python - with purely Pythonic types."
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "6792866b",
"metadata": {},
"outputs": [],
"source": [
"from typing import Optional, List\n",
"from pydantic import BaseModel, Field"
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "36a63761",
"metadata": {},
"outputs": [],
"source": [
"class Properties(BaseModel):\n",
" person_name: str\n",
" person_height: int\n",
" person_hair_color: str\n",
" dog_breed: Optional[str]\n",
" dog_name: Optional[str]"
]
},
{
"cell_type": "code",
"execution_count": 9,
"id": "8ffd1e57",
"metadata": {},
"outputs": [],
"source": [
"chain = create_extraction_chain_pydantic(pydantic_schema=Properties, llm=llm)"
]
},
{
"cell_type": "code",
"execution_count": 10,
"id": "24baa954",
"metadata": {
"scrolled": false
},
"outputs": [],
"source": [
"inp = \"\"\"\n",
"Alex is 5 feet tall. Claudia is 1 feet taller Alex and jumps higher than him. Claudia is a brunette and Alex is blonde.\n",
"Alex's dog Frosty is a labrador and likes to play hide and seek.\n",
" \"\"\""
]
},
{
"cell_type": "markdown",
"id": "84e0a241",
"metadata": {},
"source": [
"As we can see, we extracted the required entities and their properties in the required format:"
]
},
{
"cell_type": "code",
"execution_count": 13,
"id": "f771df58",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[Properties(person_name='Alex', person_height=5, person_hair_color='blonde', dog_breed='labrador', dog_name='Frosty'),\n",
" Properties(person_name='Claudia', person_height=6, person_hair_color='brunette', dog_breed=None, dog_name=None)]"
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"chain.run(inp)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "0df61283",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.1"
}
},
"nbformat": 4,
"nbformat_minor": 5
}