more docs!

This commit is contained in:
Jason
2023-07-09 14:55:03 +08:00
parent 34c97e63d1
commit fad1940796
6 changed files with 650 additions and 1 deletions
+131
View File
@@ -0,0 +1,131 @@
# Example: Converting Text into Dataframes
In this example, we'll demonstrate how to convert a text into dataframes using OpenAI Function Call. We will define the necessary data structures using Pydantic and show how to convert the text into dataframes.
## Defining the Data Structures
Let's start by defining the data structures required for this task: `RowData`, `Dataframe`, and `Database`.
```python
from openai_function_call import OpenAISchema
from pydantic import Field
from typing import List, Any
class RowData(OpenAISchema):
row: List[Any] = Field(..., description="The values for each row")
citation: str = Field(
..., description="The citation for this row from the original source data"
)
class Dataframe(OpenAISchema):
"""
Class representing a dataframe. This class is used to convert
data into a frame that can be used by pandas.
"""
name: str = Field(..., description="The name of the dataframe")
data: List[RowData] = Field(
...,
description="Correct rows of data aligned to column names, Nones are allowed",
)
columns: List[str] = Field(
...,
description="Column names relevant from source data, should be in snake_case",
)
def to_pandas(self):
import pandas as pd
columns = self.columns + ["citation"]
data = [row.row + [row.citation] for row in self.data]
return pd.DataFrame(data=data, columns=columns)
class Database(OpenAISchema):
"""
A set of correct named and defined tables as dataframes
"""
tables: List[Dataframe] = Field(
...,
description="List of tables in the database",
)
```
The `RowData` class represents a single row of data in the dataframe. It contains a `row` attribute for the values in each row and a `citation` attribute for the citation from the original source data.
The `Dataframe` class represents a dataframe and consists of a `name` attribute, a list of `RowData` objects in the `data` attribute, and a list of column names in the `columns` attribute. It also provides a `to_pandas` method to convert the dataframe into a Pandas DataFrame.
The `Database` class represents a set of tables in a database. It contains a list of `Dataframe` objects in the `tables` attribute.
## Using the Prompt Pipeline
To convert a text into dataframes, we'll use the Prompt Pipeline in OpenAI Function Call. We can define a function `dataframe` that takes a text as input and returns a `Database` object.
```python
import openai
def dataframe(data: str) -> Database:
completion = openai.ChatCompletion.create(
model="gpt-4-0613",
temperature=0.1,
functions=[Database.openai_schema],
function_call={"name": Database.openai_schema["name"]},
messages=[
{
"role": "system",
"content": """Map this data into a dataframe a
nd correctly define the correct columns and rows""",
},
{
"role": "user",
"content": f"{data}",
},
],
max_tokens=1000,
)
return Database.from_response(completion)
```
The `dataframe` function takes a string `data` as input and creates a completion using the Prompt Pipeline. It prompts the model to map the data into a dataframe and define the correct columns and rows. The resulting completion is then converted into a `Database` object.
## Evaluating an Example
Let's evaluate the example by converting a text into dataframes using the `dataframe` function and print the resulting dataframes.
```python
dfs = dataframe("""My name is John and I am 25 years old. I live in
New York and I like to play basketball. His name is
Mike and he is 30 years old. He lives in San Francisco
and he likes to play baseball. Sarah is 20 years old
and she lives in Los Angeles. She likes to play tennis.
Her name is Mary and she is 35 years old.
She lives in Chicago.
On one team 'Tigers' the captain is John and there are 12 players.
On the other team 'Lions' the captain is Mike and there are 10 players.
""")
for df in dfs.tables:
print(df.name)
print(df.to_pandas())
```
The output will be:
```sh
People
Name Age City Favorite Sport
0 John 25 New York Basketball
1 Mike 30 San Francisco Baseball
2 Sarah 20 Los Angeles Tennis
3 Mary 35 Chicago None
Teams
Team Name Captain Number of Players
0 Tigers John 12
1 Lions Mike 10
```
+188
View File
@@ -0,0 +1,188 @@
# Example: Answering Questions with Citations
In this example, we'll demonstrate how to use OpenAI Function Call to ask an AI a question and get back an answer with correct citations. We'll define the necessary data structures using Pydantic and show how to retrieve the citations for each answer.
## Defining the Data Structures
Let's start by defining the data structures required for this task: `Fact` and `QuestionAnswer`.
!!! tip "Prompting as documentation"
Make sure to include detailed and useful docstrings and fields for your class definitions. Naming becomes very important since they are semantically meaninful in the prompt
* `substring_quote` performs better than `quote` since it suggests it should be a substring of the original content.
* Notice that there are instructions on splitting facts in the docstring which will be used by OpenAI
!!! tip "Embedding computation"
While its not thet best idea to get too crazy with adding 100 methods to your class
colocating some computation is oftentimes useful, here we implement the substring search directly with the `Fact` class.
```python
import openai
from pydantic import Field, BaseModel
from typing import List
from openai_function_call import OpenAISchema
class Fact(BaseModel):
"""
Each fact has a body and a list of sources.
If there are multiple facts, make sure to break them apart such that each one only uses a set of sources that are relevant to it.
"""
fact: str = Field(..., description="Body of the sentence as part of a response")
substring_quote: List[str] = Field(
...,
description="Each source should be a direct quote from the context, as a substring of the original content",
)
def _get_span(self, quote, context, errs=100):
import regex
minor = quote
major = context
errs_ = 0
s = regex.search(f"({minor}){{e<={errs_}}}", major)
while s is None and errs_ <= errs:
errs_ += 1
s = regex.search(f"({minor}){{e<={errs_}}}", major)
if s is not None:
yield from s.spans()
def get_spans(self, context):
for quote in self.substring_quote:
yield from self._get_span(quote, context)
class QuestionAnswer(OpenAISchema):
"""
Class representing a question and its answer as a list of facts, where each fact should have a source.
Each sentence contains a body and a list of sources.
"""
question: str = Field(..., description="Question that was asked")
answer: List[Fact] = Field(
...,
description="Body of the answer, each fact should be its separate object with a body and a list of sources",
)
```
The `Fact` class represents a single statement in the answer. It contains a `fact` attribute for the body of the sentence and a `substring_quote` attribute for the sources, which are direct quotes from the context.
The `QuestionAnswer` class represents a question and its answer. It consists of a `question` attribute for the question asked and a list of `Fact` objects in the `answer` attribute.
## Asking AI a Question
To ask the AI a question and get back an answer with citations, we can define a function `ask_ai` that takes a question and context as input and returns a `QuestionAnswer` object.
!!! tips "Prompting Tip: Expert system"
Expert prompting is a great trick to get results, it can be easily done by saying things like:
* you are an world class expert that can correctly ...
* you are jeff dean give me a code review ...
```python
def ask_ai(question: str, context: str) -> QuestionAnswer:
"""
Function to ask AI a question and get back an Answer object.
but should be updated to use the actual method for making a request to the AI.
Args:
question (str): The question to ask the AI.
context (str): The context for the question.
Returns:
Answer: The Answer object.
"""
# Making a request to the hypothetical 'openai' module
completion = openai.ChatCompletion.create(
model="gpt-3.5-turbo-0613",
temperature=0.2,
max_tokens=1000,
functions=[QuestionAnswer.openai_schema],
function_call={"name": QuestionAnswer.openai_schema["name"]},
messages=[
{
"role": "system",
"content": f"You are a world class algorithm to answer questions with correct and exact citations. ",
},
{"role": "user", "content": f"Answer question using the following context"},
{"role": "user", "content": f"{context}"},
{"role": "user", "content": f"Question: {question}"},
{
"role": "user",
"content": f"Tips: Make sure to cite your sources, and use the exact words from the context.",
},
],
)
# Creating an Answer object from the completion response
return QuestionAnswer.from_response(completion)
```
The `ask_ai` function takes a string `question` and a string `context` as input. It makes a completion request to the AI model, providing the question and context as part of the prompt. The resulting completion is then converted into a `QuestionAnswer` object.
## Evaluating an Example
Let's evaluate the example by asking the AI a question and getting back an answer with citations. We'll ask the question "What did the author do during college?" with the given context.
!!! usage "Highlight"
This just adds some color and captures the citation in `<>`
```python
def highlight(text, span):
return (
"..."
+ text[span[0] - 50 : span[0]].replace("\n", "")
+ "\033[91m"
+ "<"
+ text[span[0] : span[1]].replace("\n", "")
+ "> "
+ "\033[0m"
+ text[span[1] : span[1] + 20].replace("\n", "")
+ "..."
)
```
```python
question = "What did the author do during college?"
context = """
My name is Jason Liu, and I grew up in Toronto Canada but I was born in China.
I went to an arts high school but in university I studied Computational Mathematics and physics.
As part of coop I worked at many companies including Stitchfix, Facebook.
I also started the Data Science club at the University of Waterloo and I was the president of the club for 2 years.
"""
answer = ask_ai(question, context)
print("Question:", question)
print()
for fact in answer.answer:
print("Statement:", fact.fact)
for span in fact.get_spans(context):
print("Citation:", highlight(context, span))
print()
```
In this code snippet, we print the question and iterate over each fact in the answer. For each fact, we print the statement and highlight the corresponding citation in the context using the `highlight` function.
Here is the expected output for the example:
```
Question: What did the author do during college?
Statement: The author studied Computational Mathematics and physics in university.
Citation: ...s born in China.I went to an arts high school but <in university I studied Computational Mathematics and physics> . As part of coop I...
Statement: The author started the Data Science club at the University of Waterloo and was the president of the club for 2 years.
Citation: ...y companies including Stitchfix, Facebook.I also <started the Data Science club at the University of Waterloo> and I was the presi...
Citation: ... club at the University of Waterloo and I was the <president of the club for 2 years> ...
```
The output includes the question, followed by each statement in the answer with its corresponding citation highlighted in the context.
Feel free to try this code with different questions and contexts to see how the AI responds with accurate citations.
+31
View File
@@ -0,0 +1,31 @@
# List of Examples
Welcome to the examples page. Here you will find detailed information on how to use our code and examples demonstrating various features and functionalities.
## Library
- [Segmented Search](examples/search.md)
- [One shot Query Planning](examples/planning-tasks.md)
- [Recursive Schemas](examples/recursive.md)
- [Exact Citations](examples/exact_citations.md)
- [Automated Dataframe Extraction](examples/autodataframe.md)
## Details
In this section, you will find examples demonstrating different aspects of our project's functionality.
- [Segmented Search](examples/search.md): Learn how to perform segmented search using a multi task definition using function calling
- [One shot Query Planning](examples/planning-tasks.md): Explore how to plan and decompose a complex query into multiple subqueries in a single request.
- [Recursive Schemas](examples/recursive.md): Understand how to work with recursive schemas, and why flat is better than nested.
- [Exact Citations](examples/exact_citations.md): Find out how to generate exact citations by using smart prompting and regular expressions
- [Automated Dataframe Extraction](examples/autodataframe.md): Discover how to automate dataframe extraction to not only return a table, but possibilty multiple tables.
Feel free to explore these examples to gain a better understanding of various patterns on how creative prompting, description, and structuring of `OpenAISchema` and unlock new capabilities.
If you have any questions or need further assistance, please refer to the specific example documentation or reach out to our support team.
Happy exploring!
+131
View File
@@ -0,0 +1,131 @@
# Example: Planning and Executing a Query Plan
In this example, we will demonstrate how to use the OpenAI Function Call `ChatCompletion` model to plan and execute a query plan using a question-answering system. We will define the necessary structures using Pydantic and show how to execute the query plan step-by-step.
!!! note "Graph Generation"
Notice that this example produces a flat list of items with dependencies that resemble a graph, while pydantic allows for recursive definitions, its much easier and less confusing for the model to generate flat schemas rather than recursive schemas. If y ou want to see a recursive example see [recursive schemas](recursive.md)
## Defining the Structures
Let's define the necessary Pydantic models to represent the query plan and the queries.
```python
import enum
from typing import List
from pydantic import Field
from openai_function_call import OpenAISchema
class QueryType(str, enum.Enum):
"""Enumeration representing the types of queries that can be asked to a question answer system."""
SINGLE_QUESTION = "SINGLE"
MERGE_MULTIPLE_RESPONSES = "MERGE_MULTIPLE_RESPONSES"
class Query(OpenAISchema):
"""Class representing a single question in a query plan."""
id: int = Field(..., description="Unique id of the query")
question: str = Field(
...,
description="Question asked using a question answering system",
)
dependancies: List[int] = Field(
default_factory=list,
description="List of sub questions that need to be answered before asking this question",
)
node_type: QueryType = Field(
default=QueryType.SINGLE_QUESTION,
description="Type of question, either a single question or a multi-question merge",
)
class QueryPlan(OpenAISchema):
"""Container class representing a tree of questions to ask a question answering system."""
query_graph: List[Query] = Field(
..., description="The query graph representing the plan"
)
def _dependencies(self, ids: List[int]) -> List[Query]:
"""Returns the dependencies of a query given their ids."""
return [q for q in self.query_graph if q.id in ids]
```
## Planning a Query Plan
Now, let's demonstrate how to plan and execute a query plan using the defined models and the OpenAI API.
```python
import asyncio
import openai
def query_planner(question: str) -> QueryPlan:
PLANNING_MODEL = "gpt-4-0613"
messages = [
{
"role": "system",
"content": "You are a world class query planning algorithm capable ofbreaking apart questions into its dependency queries such that the answers can be used to inform the parent question. Do not answer the questions, simply provide a correct compute graph with good specific questions to ask and relevant dependencies. Before you call the function, think step-by-step to get a better understanding of the problem.",
},
{
"role": "user",
"content": f"Consider: {question}\nGenerate the correct query plan.",
},
]
completion = openai.ChatCompletion.create(
model=PLANNING_MODEL,
temperature=0,
functions=[QueryPlan.openai_schema],
function_call={"name": QueryPlan.openai_schema["name"]},
messages=messages,
max_tokens=1000,
)
root = QueryPlan.from_response(completion)
return root
```
```
plan = query_planner(
"What is the difference in populations of Canada and the Jason's home country?"
)
plan.dict()
```
```python
{'query_graph': [{'dependancies': [],
'id': 1,
'node_type': <QueryType.SINGLE_QUESTION: 'SINGLE'>,
'question': "Identify Jason's home country"},
{'dependancies': [],
'id': 2,
'node_type': <QueryType.SINGLE_QUESTION: 'SINGLE'>,
'question': 'Find the population of Canada'},
{'dependancies': [1],
'id': 3,
'node_type': <QueryType.SINGLE_QUESTION: 'SINGLE'>,
'question': "Find the population of Jason's home country"},
{'dependancies': [2, 3],
'id': 4,
'node_type': <QueryType.SINGLE_QUESTION: 'SINGLE'>,
'question': 'Calculate the difference in populations between '
"Canada and Jason's home country"}]}
```
In the above code, we define a `query_planner` function that takes a question as input and generates a query plan using the OpenAI API.
## Conclusion
In this example, we demonstrated how to use the OpenAI Function Call `ChatCompletion` model to plan and execute a query plan using a question-answering system. We defined the necessary structures using Pydantic, created a query planner function.
If you want to see multiple version of this style of code please visit
1. [query planning example](https://github.com/jxnl/openai_function_call/blob/main/examples/query_planner_execution/query_planner_execution.py)
2. [task planning with topo sort](https://github.com/jxnl/openai_function_call/blob/main/examples/task_planner/task_planner_topological_sort.py)
Feel free to modify the code to fit your specific use case and explore other possibilities of using the OpenAI Function Call model to plan and execute complex workflows.
+163
View File
@@ -0,0 +1,163 @@
# Example: Parsing a Directory Tree
In this example, we will demonstrate how to convert a string representing a directory tree into a filesystem structure using OpenAI's GPT-3 model. We will define the necessary structures using Pydantic, create a function to parse the tree, and provide an example of how to use it.
## Defining the Structures
We will use Pydantic to define the necessary data structures representing the directory tree and its nodes. We have two classes, `Node` and `DirectoryTree`, which are used to model individual nodes and the entire directory tree, respectively.
!!! warning "Flat is better than nested"
While its easier to model thing as nested returning flat items with depedencies tends to yield better results. for a flat example check out [planning tasks](planning-tasks.md) where we model a query plan as a dag.
```python
import enum
from typing import List
from pydantic import Field
from openai_function_call import OpenAISchema
class NodeType(str, enum.Enum):
"""Enumeration representing the types of nodes in a filesystem."""
FILE = "file"
FOLDER = "folder"
class Node(OpenAISchema):
"""
Class representing a single node in a filesystem. Can be either a file or a folder.
Note that a file cannot have children, but a folder can.
Args:
name (str): The name of the node.
children (List[Node]): The list of child nodes (if any).
node_type (NodeType): The type of the node, either a file or a folder.
Methods:
print_paths: Prints the path of the node and its children.
"""
name: str = Field(..., description="Name of the folder")
children: List["Node"] = Field(
default_factory=list,
description="List of children nodes, only applicable for folders, files cannot have children",
)
node_type: NodeType = Field(
default=NodeType.FILE,
description="Either a file or folder, use the name to determine which it could be",
)
def print_paths(self, parent_path=""):
"""Prints the path of the node and its children."""
if self.node_type == NodeType.FOLDER:
path = f"{parent_path}/{self.name}" if parent_path != "" else self.name
print(path, self.node_type)
if self.children is not None:
for child in self.children:
child.print_paths(path)
else:
print(f"{parent_path}/{self.name}", self.node_type)
class DirectoryTree(OpenAISchema):
"""
Container class representing a directory tree.
Args:
root (Node): The root node of the tree.
Methods:
print_paths: Prints the paths of the root node and its children.
"""
root: Node = Field(..., description="Root folder of the directory tree")
def print_paths(self):
"""Prints the paths of the root node and its children."""
self.root.print_paths()
Node.update_forward_refs()
DirectoryTree.update_forward_refs()
```
The `Node` class represents a single node in the directory tree. It has a name, a list of children nodes (applicable only to folders), and a node type (either a file or a folder). The `print_paths` method can be used to print the path of the node and its children.
The `DirectoryTree` class represents the entire directory tree. It has a single attribute, `root`, which is the root node of the tree. The `print_paths` method can be used to print the paths of the root node and its children.
## Parsing the Tree
We define a function `parse_tree_to_filesystem` to convert a string representing a directory tree into a filesystem structure using OpenAI's GPT-3 model.
```python
import openai
def parse_tree_to_filesystem(data: str) -> DirectoryTree:
"""
Convert a string representing a directory tree into a filesystem structure
using OpenAI's GPT-3 model.
Args:
data (str): The string to convert into a filesystem.
Returns:
DirectoryTree: The directory tree representing the filesystem.
"""
completion = openai.ChatCompletion.create(
model="gpt-3.5-turbo-0613",
temperature=0.2,
functions=[DirectoryTree.openai_schema],
function_call={"name": DirectoryTree.openai_schema["name"]},
messages=[
{
"role": "system",
"content": "You are a perfect file system parsing algorithm. You are given a string representing a directory tree. You must return the correct filesystem structure.",
},
{
"role": "user",
"content": f"Consider the data below:\n{data} and return the correctly labeled filesystem",
},
],
max_tokens=1000,
)
root = DirectoryTree.from_response(completion)
return root
```
The `parse_tree_to_filesystem` function takes a string `data` representing the directory tree and returns a `DirectoryTree` object representing the filesystem structure. It uses the OpenAI Chat API to complete the prompt and extract the directory tree.
## Example Usage
Let's demonstrate how to use the `parse_tree_to_filesystem`
function with an example:
```python
root = parse_tree_to_filesystem(
"""
root
├── folder1
│ ├── file1.txt
│ └── file2.txt
└── folder2
├── file3.txt
└── subfolder1
└── file4.txt
"""
)
root.print_paths()
```
In this example, we call `parse_tree_to_filesystem` with a string representing a directory tree. The directory tree has a root node named 'root' with two subfolders (folder1 and folder2). The 'folder1' subfolder contains two files (file1.txt and file2.txt), while the 'folder2' subfolder contains a file (file3.txt) and a subfolder (subfolder1) that, in turn, contains a file (file4.txt).
After parsing the string into a `DirectoryTree` object, we call `root.print_paths()` to print the paths of the root node and its children. The output of this example will be:
```
root NodeType.FOLDER
root/folder1 NodeType.FOLDER
root/folder1/file1.txt NodeType.FILE
root/folder1/file2.txt NodeType.FILE
root/folder2 NodeType.FOLDER
root/folder2/file3.txt NodeType.FILE
root/folder2/subfolder1 NodeType.FOLDER
root/folder2/subfolder1/file4.txt NodeType.FILE
```
This demonstrates how to use OpenAI's GPT-3 model to parse a string representing a directory tree and obtain the correct filesystem structure.
I hope this example helps you understand how to leverage OpenAI Function Call for parsing recursive trees. If you have any further questions, feel free to ask!
+6 -1
View File
@@ -40,4 +40,9 @@ nav:
- "Introduction: Pipeline API": "pipeline-example.md"
- "Message Templates": "chat-completion.md"
- Examples:
- 'Segmented Search': 'examples/search.md'
- 'Table of contents': 'examples/index.md'
- 'Segmented Search': 'examples/search.md'
- 'One shot Query Planning': 'examples/planning-tasks.md'
- 'Recursive Schemas': 'examples/recursive.md'
- 'Exact Citations': 'examples/exact_citations.md'
- 'Automated Dataframe Extraction': "examples/autodataframe.md"