mirror of
https://github.com/kennethreitz/instructor.git
synced 2026-06-05 22:50:18 +00:00
Update classification
This commit is contained in:
@@ -0,0 +1,117 @@
|
||||
# Example: Text Classification using OpenAI and Pydantic
|
||||
|
||||
This tutorial showcases how to implement text classification tasks—specifically, single-label and multi-label classifications—using the OpenAI API, Python's **`enum`** module, and Pydantic models.
|
||||
|
||||
!!! tips "Motivation"
|
||||
|
||||
Text classification is a common problem in many NLP applications, such as spam detection or support ticket categorization. The goal is to provide a systematic way to handle these cases using OpenAI's GPT models in combination with Python data structures.
|
||||
|
||||
## Single-Label Classification
|
||||
|
||||
### Defining the Structures
|
||||
|
||||
For single-label classification, we first define an **`enum`** for possible labels and a Pydantic model for the output.
|
||||
|
||||
```python
|
||||
import enum
|
||||
from pydantic import BaseModel
|
||||
|
||||
class Labels(str, enum.Enum):
|
||||
"""Enumeration for single-label text classification."""
|
||||
SPAM = "spam"
|
||||
NOT_SPAM = "not_spam"
|
||||
|
||||
class SinglePrediction(BaseModel):
|
||||
"""
|
||||
Class for a single class label prediction.
|
||||
"""
|
||||
class_label: Labels
|
||||
|
||||
```
|
||||
|
||||
### Classifying Text
|
||||
|
||||
The function **`classify`** will perform the single-label classification.
|
||||
|
||||
```python
|
||||
import openai
|
||||
|
||||
def classify(data: str) -> SinglePrediction:
|
||||
"""Perform single-label classification on the input text."""
|
||||
return openai.ChatCompletion.create(
|
||||
model="gpt-3.5-turbo-0613",
|
||||
response_model=SinglePrediction,
|
||||
messages=[
|
||||
{
|
||||
"role": "user",
|
||||
"content": f"Classify the following text: {data}",
|
||||
},
|
||||
],
|
||||
) # type: ignore
|
||||
|
||||
```
|
||||
|
||||
### Testing and Evaluation
|
||||
|
||||
Let's run an example to see if it correctly identifies a spam message.
|
||||
|
||||
```python
|
||||
# Test single-label classification
|
||||
prediction = classify("Hello there I'm a Nigerian prince and I want to give you money")
|
||||
assert prediction.class_label == Labels.SPAM
|
||||
|
||||
```
|
||||
|
||||
## Multi-Label Classification
|
||||
|
||||
### Defining the Structures
|
||||
|
||||
For multi-label classification, we introduce a new enum class and a different Pydantic model to handle multiple labels.
|
||||
|
||||
```python
|
||||
# Define Enum class for multiple labels
|
||||
class MultiLabels(str, enum.Enum):
|
||||
TECH_ISSUE = "tech_issue"
|
||||
BILLING = "billing"
|
||||
GENERAL_QUERY = "general_query"
|
||||
|
||||
# Define the multi-class prediction model
|
||||
class MultiClassPrediction(BaseModel):
|
||||
"""
|
||||
Class for a multi-class label prediction.
|
||||
"""
|
||||
class_labels: List[MultiLabels]
|
||||
|
||||
```
|
||||
|
||||
### Classifying Text
|
||||
|
||||
The function **`multi_classify`** is responsible for multi-label classification.
|
||||
|
||||
```python
|
||||
def multi_classify(data: str) -> MultiClassPrediction:
|
||||
"""Perform multi-label classification on the input text."""
|
||||
return openai.ChatCompletion.create(
|
||||
model="gpt-3.5-turbo-0613",
|
||||
response_model=MultiClassPrediction,
|
||||
messages=[
|
||||
{
|
||||
"role": "user",
|
||||
"content": f"Classify the following support ticket: {data}",
|
||||
},
|
||||
],
|
||||
) # type: ignore
|
||||
|
||||
```
|
||||
|
||||
### Testing and Evaluation
|
||||
|
||||
Finally, we test the multi-label classification function using a sample support ticket.
|
||||
|
||||
```python
|
||||
# Test multi-label classification
|
||||
ticket = "My account is locked and I can't access my billing info."
|
||||
prediction = multi_classify(ticket)
|
||||
assert MultiLabels.TECH_ISSUE in prediction.class_labels
|
||||
assert MultiLabels.BILLING in prediction.class_labels
|
||||
```
|
||||
@@ -4,6 +4,7 @@ Welcome to the examples page. Here you will find emails that highlight a range o
|
||||
|
||||
## Quick Links
|
||||
|
||||
- [Classifying Text](classification.md)
|
||||
- [Segmenting search requests into multiple search queries](search.md)
|
||||
- [One shot query planning](planning-tasks.md)
|
||||
- [Using recursive schema](recursive.md)
|
||||
@@ -15,7 +16,9 @@ Welcome to the examples page. Here you will find emails that highlight a range o
|
||||
|
||||
In this section, you will find examples demonstrating different aspects of our project's functionality.
|
||||
|
||||
- [Segmented Search](search.md): Learn how to perform segmented search using a multi task definition using function calling
|
||||
- [Classfying Text](classification.md): Doing single and multi class prediction using enums.
|
||||
|
||||
- [Segmented Search](search.md): Learn how to perform segmented search using a multi task definition using function calling
|
||||
|
||||
- [One shot Query Planning](planning-tasks.md): Explore how to plan and decompose a complex query into multiple subqueries in a single request.
|
||||
|
||||
|
||||
@@ -0,0 +1,44 @@
|
||||
from typing import List
|
||||
import enum
|
||||
import openai
|
||||
from pydantic import BaseModel
|
||||
from instructor import patch
|
||||
|
||||
patch()
|
||||
|
||||
|
||||
# Define new Enum class for multiple labels
|
||||
class MultiLabels(str, enum.Enum):
|
||||
TECH_ISSUE = "tech_issue"
|
||||
BILLING = "billing"
|
||||
GENERAL_QUERY = "general_query"
|
||||
|
||||
|
||||
# Adjust the prediction model to accommodate a list of labels
|
||||
class MultiClassPrediction(BaseModel):
|
||||
"""
|
||||
List of correct class labels for the given text (Multi Class)
|
||||
"""
|
||||
|
||||
class_labels: List[MultiLabels]
|
||||
|
||||
|
||||
# Modify the classify function
|
||||
def multi_classify(data: str) -> MultiClassPrediction:
|
||||
return openai.ChatCompletion.create(
|
||||
model="gpt-3.5-turbo-0613",
|
||||
response_model=MultiClassPrediction,
|
||||
messages=[
|
||||
{
|
||||
"role": "user",
|
||||
"content": f"Classify the following support ticket: {data}",
|
||||
},
|
||||
],
|
||||
) # type: ignore
|
||||
|
||||
|
||||
# Example using a support ticket
|
||||
ticket = "My account is locked and I can't access my billing info."
|
||||
prediction = multi_classify(ticket)
|
||||
assert MultiLabels.TECH_ISSUE in prediction.class_labels
|
||||
assert MultiLabels.BILLING in prediction.class_labels
|
||||
@@ -0,0 +1,36 @@
|
||||
import enum
|
||||
import openai
|
||||
from pydantic import BaseModel
|
||||
from instructor import patch
|
||||
|
||||
patch()
|
||||
|
||||
|
||||
class Labels(str, enum.Enum):
|
||||
SPAM = "spam"
|
||||
NOT_SPAM = "not_spam"
|
||||
|
||||
|
||||
class SinglePrediction(BaseModel):
|
||||
"""
|
||||
Correct class label for the given text
|
||||
"""
|
||||
|
||||
class_label: Labels
|
||||
|
||||
|
||||
def classify(data: str) -> SinglePrediction:
|
||||
return openai.ChatCompletion.create(
|
||||
model="gpt-3.5-turbo-0613",
|
||||
response_model=SinglePrediction,
|
||||
messages=[
|
||||
{
|
||||
"role": "user",
|
||||
"content": f"Classify the following text: {data}",
|
||||
},
|
||||
],
|
||||
) # type: ignore
|
||||
|
||||
|
||||
prediction = classify("Hello there I'm a nigerian prince and I want to give you money")
|
||||
assert prediction.class_label == Labels.SPAM
|
||||
@@ -0,0 +1,20 @@
|
||||
# Legal Document Entity Resolution
|
||||
|
||||
This example demonstrates how to use an entity resolution system to extract and resolve entities from a legal document. The system leverages OpenAI's GPT-4 language model to achieve this task. The primary purpose of this example is to showcase the capabilities of the entity resolution system in a simple and illustrative manner.
|
||||
|
||||
## Overview
|
||||
The entity resolution system processes a given legal document and identifies key entities such as parties, dates, terms, and clauses. It then resolves relevant information to provide a structured output. This example uses a Python script to interact with the system and demonstrates the process with a sample legal contract.
|
||||
|
||||
## How to Use
|
||||
|
||||
* **Input Document:** Provide the legal document you want to analyze. The document should include relevant legal terms, dates, parties' names, and other pertinent information.
|
||||
|
||||
* **Entity Extraction:** The system employs the GPT-4 model to extract entities from the input document.
|
||||
|
||||
* **Entity Resolution:** Extracted entities are resolved to their absolute values when applicable. For instance, relative date phrases are converted to specific dates.
|
||||
|
||||
* **Dependency Handling:** The system identifies dependencies between entities. If one entity's resolution depends on another's, it ensures proper order of resolution.
|
||||
|
||||
## Limitations
|
||||
|
||||
The context window is the biggest limitation of the size of document, but I imagine a system where you stream chunks of the document into a model, that acculimates the entities in some state and formats a simple version back into the prompt (id, name, absolute_resolved_value) and the output emits only 'new' entities, thinking of it as a acculilating the object.
|
||||
@@ -49,6 +49,7 @@ nav:
|
||||
- Philosophy: 'philosophy.md'
|
||||
- Use Cases:
|
||||
- 'Overview': 'examples/index.md'
|
||||
- 'Classification': 'examples/classification.md'
|
||||
- 'Segmented Search': 'examples/search.md'
|
||||
- 'One shot Query Planning': 'examples/planning-tasks.md'
|
||||
- 'Recursive Schemas': 'examples/recursive.md'
|
||||
|
||||
Reference in New Issue
Block a user