diff --git a/docs/examples/classification.md b/docs/examples/classification.md new file mode 100644 index 0000000..db4b74c --- /dev/null +++ b/docs/examples/classification.md @@ -0,0 +1,117 @@ +# Example: Text Classification using OpenAI and Pydantic + +This tutorial showcases how to implement text classification tasks—specifically, single-label and multi-label classifications—using the OpenAI API, Python's **`enum`** module, and Pydantic models. + +!!! tips "Motivation" + + Text classification is a common problem in many NLP applications, such as spam detection or support ticket categorization. The goal is to provide a systematic way to handle these cases using OpenAI's GPT models in combination with Python data structures. + +## Single-Label Classification + +### Defining the Structures + +For single-label classification, we first define an **`enum`** for possible labels and a Pydantic model for the output. + +```python +import enum +from pydantic import BaseModel + +class Labels(str, enum.Enum): + """Enumeration for single-label text classification.""" + SPAM = "spam" + NOT_SPAM = "not_spam" + +class SinglePrediction(BaseModel): + """ + Class for a single class label prediction. + """ + class_label: Labels + +``` + +### Classifying Text + +The function **`classify`** will perform the single-label classification. + +```python +import openai + +def classify(data: str) -> SinglePrediction: + """Perform single-label classification on the input text.""" + return openai.ChatCompletion.create( + model="gpt-3.5-turbo-0613", + response_model=SinglePrediction, + messages=[ + { + "role": "user", + "content": f"Classify the following text: {data}", + }, + ], + ) # type: ignore + +``` + +### Testing and Evaluation + +Let's run an example to see if it correctly identifies a spam message. + +```python +# Test single-label classification +prediction = classify("Hello there I'm a Nigerian prince and I want to give you money") +assert prediction.class_label == Labels.SPAM + +``` + +## Multi-Label Classification + +### Defining the Structures + +For multi-label classification, we introduce a new enum class and a different Pydantic model to handle multiple labels. + +```python +# Define Enum class for multiple labels +class MultiLabels(str, enum.Enum): + TECH_ISSUE = "tech_issue" + BILLING = "billing" + GENERAL_QUERY = "general_query" + +# Define the multi-class prediction model +class MultiClassPrediction(BaseModel): + """ + Class for a multi-class label prediction. + """ + class_labels: List[MultiLabels] + +``` + +### Classifying Text + +The function **`multi_classify`** is responsible for multi-label classification. + +```python +def multi_classify(data: str) -> MultiClassPrediction: + """Perform multi-label classification on the input text.""" + return openai.ChatCompletion.create( + model="gpt-3.5-turbo-0613", + response_model=MultiClassPrediction, + messages=[ + { + "role": "user", + "content": f"Classify the following support ticket: {data}", + }, + ], + ) # type: ignore + +``` + +### Testing and Evaluation + +Finally, we test the multi-label classification function using a sample support ticket. + +```python +# Test multi-label classification +ticket = "My account is locked and I can't access my billing info." +prediction = multi_classify(ticket) +assert MultiLabels.TECH_ISSUE in prediction.class_labels +assert MultiLabels.BILLING in prediction.class_labels +``` \ No newline at end of file diff --git a/docs/examples/index.md b/docs/examples/index.md index 107ebe3..6c23462 100644 --- a/docs/examples/index.md +++ b/docs/examples/index.md @@ -4,6 +4,7 @@ Welcome to the examples page. Here you will find emails that highlight a range o ## Quick Links +- [Classifying Text](classification.md) - [Segmenting search requests into multiple search queries](search.md) - [One shot query planning](planning-tasks.md) - [Using recursive schema](recursive.md) @@ -15,7 +16,9 @@ Welcome to the examples page. Here you will find emails that highlight a range o In this section, you will find examples demonstrating different aspects of our project's functionality. -- [Segmented Search](search.md): Learn how to perform segmented search using a multi task definition using function calling +- [Classfying Text](classification.md): Doing single and multi class prediction using enums. + +- [Segmented Search](search.md): Learn how to perform segmented search using a multi task definition using function calling - [One shot Query Planning](planning-tasks.md): Explore how to plan and decompose a complex query into multiple subqueries in a single request. diff --git a/examples/classification/multi_prediction.py b/examples/classification/multi_prediction.py new file mode 100644 index 0000000..1f4f261 --- /dev/null +++ b/examples/classification/multi_prediction.py @@ -0,0 +1,44 @@ +from typing import List +import enum +import openai +from pydantic import BaseModel +from instructor import patch + +patch() + + +# Define new Enum class for multiple labels +class MultiLabels(str, enum.Enum): + TECH_ISSUE = "tech_issue" + BILLING = "billing" + GENERAL_QUERY = "general_query" + + +# Adjust the prediction model to accommodate a list of labels +class MultiClassPrediction(BaseModel): + """ + List of correct class labels for the given text (Multi Class) + """ + + class_labels: List[MultiLabels] + + +# Modify the classify function +def multi_classify(data: str) -> MultiClassPrediction: + return openai.ChatCompletion.create( + model="gpt-3.5-turbo-0613", + response_model=MultiClassPrediction, + messages=[ + { + "role": "user", + "content": f"Classify the following support ticket: {data}", + }, + ], + ) # type: ignore + + +# Example using a support ticket +ticket = "My account is locked and I can't access my billing info." +prediction = multi_classify(ticket) +assert MultiLabels.TECH_ISSUE in prediction.class_labels +assert MultiLabels.BILLING in prediction.class_labels diff --git a/examples/classification/simple_prediction.py b/examples/classification/simple_prediction.py new file mode 100644 index 0000000..c31a6c7 --- /dev/null +++ b/examples/classification/simple_prediction.py @@ -0,0 +1,36 @@ +import enum +import openai +from pydantic import BaseModel +from instructor import patch + +patch() + + +class Labels(str, enum.Enum): + SPAM = "spam" + NOT_SPAM = "not_spam" + + +class SinglePrediction(BaseModel): + """ + Correct class label for the given text + """ + + class_label: Labels + + +def classify(data: str) -> SinglePrediction: + return openai.ChatCompletion.create( + model="gpt-3.5-turbo-0613", + response_model=SinglePrediction, + messages=[ + { + "role": "user", + "content": f"Classify the following text: {data}", + }, + ], + ) # type: ignore + + +prediction = classify("Hello there I'm a nigerian prince and I want to give you money") +assert prediction.class_label == Labels.SPAM diff --git a/examples/reference-citation/readme.md b/examples/reference-citation/readme.md new file mode 100644 index 0000000..af0134f --- /dev/null +++ b/examples/reference-citation/readme.md @@ -0,0 +1,20 @@ +# Legal Document Entity Resolution + +This example demonstrates how to use an entity resolution system to extract and resolve entities from a legal document. The system leverages OpenAI's GPT-4 language model to achieve this task. The primary purpose of this example is to showcase the capabilities of the entity resolution system in a simple and illustrative manner. + +## Overview +The entity resolution system processes a given legal document and identifies key entities such as parties, dates, terms, and clauses. It then resolves relevant information to provide a structured output. This example uses a Python script to interact with the system and demonstrates the process with a sample legal contract. + +## How to Use + +* **Input Document:** Provide the legal document you want to analyze. The document should include relevant legal terms, dates, parties' names, and other pertinent information. + +* **Entity Extraction:** The system employs the GPT-4 model to extract entities from the input document. + +* **Entity Resolution:** Extracted entities are resolved to their absolute values when applicable. For instance, relative date phrases are converted to specific dates. + +* **Dependency Handling:** The system identifies dependencies between entities. If one entity's resolution depends on another's, it ensures proper order of resolution. + +## Limitations + +The context window is the biggest limitation of the size of document, but I imagine a system where you stream chunks of the document into a model, that acculimates the entities in some state and formats a simple version back into the prompt (id, name, absolute_resolved_value) and the output emits only 'new' entities, thinking of it as a acculilating the object. \ No newline at end of file diff --git a/mkdocs.yml b/mkdocs.yml index b94b72a..55fbf5e 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -49,6 +49,7 @@ nav: - Philosophy: 'philosophy.md' - Use Cases: - 'Overview': 'examples/index.md' + - 'Classification': 'examples/classification.md' - 'Segmented Search': 'examples/search.md' - 'One shot Query Planning': 'examples/planning-tasks.md' - 'Recursive Schemas': 'examples/recursive.md'