doc: lead extraction hub example (#459)

Co-authored-by: Jason Liu <jxnl@users.noreply.github.com>
This commit is contained in:
Stephen
2024-02-22 20:34:08 -05:00
committed by GitHub
parent 681ca6cdc3
commit 35db326d52
2 changed files with 98 additions and 0 deletions
+97
View File
@@ -0,0 +1,97 @@
# Customer Information Extraction
In this guide, we'll walk through how to extract customer lead information using OpenAI's API and Pydantic. This use case is essential for seamlessly automating the process of extracting specific information from a context.
If you want to try this out via `instructor hub`, you can pull it by running:
```bash
instructor hub pull --slug extract_contact_info --py > extract_contact_info.py
```
## Motivation
You could potentially integrate this into a chatbot to extract relevant user information from user messages. With the use of machine learning driven validation it would reduce the need for a human to verify the information.
## Defining the Structure
We'll model a customer lead as a Lead object, including attributes for the name and phone number. We'll use a Pydantic PhoneNumber type to validate the phone numbers entered and provide a Field to give the model more information on correctly populating the object.
## Extracting Lead Information
To extract lead information, we create the `parse_lead_from_message` function which integrates Instructor. It calls OpenAI's API, processes the text, and returns the extracted lead information as a Lead object.
## Evaluating Lead Extraction
To showcase the `parse_lead_from_message` function we can provide sample user messages that may be obtained from a dialogue with a chatbot assistant. Also take note of the response model being set as `Iterable[Lead]` this allows for multiple leads being extracted from the same message.
```python
import instructor
from openai import OpenAI
from pydantic import BaseModel, Field
from pydantic_extra_types.phone_numbers import PhoneNumber
from typing import Iterable
class Lead(BaseModel):
name: str
phone_number: PhoneNumber = Field(
description="Needs to be a phone number with a country code. If none, assume +1"
)
# Can define some function here to send Lead information to a database using an API
client = instructor.patch(OpenAI())
def parse_lead_from_message(user_message: str):
return client.chat.completions.create(
model="gpt-4-turbo-preview",
response_model=Iterable[Lead],
messages=[
{
"role": "system",
"content": "You are a data extraction system that extracts a user's name and phone number from a message.",
},
{
"role": "user",
"content": f"Extract the user's lead information from this user's message: {user_message}",
},
],
)
if __name__ == "__main__":
lead = parse_lead_from_message(
"Yes, that would be great if someone can reach out my name is Patrick King 9175554587"
)
assert all(isinstance(item, Lead) for item in lead)
for item in lead:
print(item.model_dump_json(indent=2))
"""
{
"name": "Patrick King",
"phone_number": "tel:+1-917-555-4587"
}
"""
# Invalid phone number example:
try:
lead2 = parse_lead_from_message(
"Yes, that would be great if someone can reach out my name is Patrick King 9172234"
)
assert all(isinstance(item, Lead) for item in lead2)
for item in lead2:
print(item.model_dump_json(indent=2))
except Exception as e:
print("ERROR:", e)
"""
ERROR:
1 validation error for IterableLead
tasks.0.phone_number
value is not a valid phone number [type=value_error, input_value='+19172234', input_type=str]
"""
```
In this example, the `parse_lead_from_message` function successfully extracts lead information from a user message, demonstrating how automation can enhance the efficiency of collecting accurate customer details. It also shows how the function successfully catches that the phone number is invalid so functionality can be implemented for the user to get prompted again to give a correct phone number.
+1
View File
@@ -176,6 +176,7 @@ nav:
- Batch Async Classification w/ Langsmith: 'hub/batch_classification_langsmith.md'
- Action Items: 'hub/action_items.md'
- Partial Streaming: 'hub/partial_streaming.md'
- Extract Contact Info: 'hub/extract_contact_info.md'
- Tutorials:
- Tutorials (Notebooks): 'tutorials/1-introduction.ipynb'