mirror of
https://github.com/kennethreitz/instructor.git
synced 2026-06-05 22:50:18 +00:00
164 lines
6.2 KiB
Markdown
164 lines
6.2 KiB
Markdown
# Example: Parsing a Directory Tree
|
|
|
|
In this example, we will demonstrate how define and use a recursive class definition to convert a string representing a directory tree into a filesystem structure using OpenAI's function call api. We will define the necessary structures using Pydantic, create a function to parse the tree, and provide an example of how to use it.
|
|
|
|
## Defining the Structures
|
|
|
|
We will use Pydantic to define the necessary data structures representing the directory tree and its nodes. We have two classes, `Node` and `DirectoryTree`, which are used to model individual nodes and the entire directory tree, respectively.
|
|
|
|
!!! warning "Flat is better than nested"
|
|
While it's easier to model things as nested, returning flat items with dependencies tends to yield better results. For a flat example, check out [planning tasks](planning-tasks.md) where we model a query plan as a dag.
|
|
|
|
```python
|
|
import enum
|
|
from typing import List
|
|
from pydantic import Field
|
|
|
|
class NodeType(str, enum.Enum):
|
|
"""Enumeration representing the types of nodes in a filesystem."""
|
|
FILE = "file"
|
|
FOLDER = "folder"
|
|
|
|
class Node(BaseModel):
|
|
"""
|
|
Class representing a single node in a filesystem. Can be either a file or a folder.
|
|
Note that a file cannot have children, but a folder can.
|
|
|
|
Args:
|
|
name (str): The name of the node.
|
|
children (List[Node]): The list of child nodes (if any).
|
|
node_type (NodeType): The type of the node, either a file or a folder.
|
|
|
|
Methods:
|
|
print_paths: Prints the path of the node and its children.
|
|
"""
|
|
name: str = Field(..., description="Name of the folder")
|
|
children: List["Node"] = Field(
|
|
default_factory=list,
|
|
description="List of children nodes, only applicable for folders, files cannot have children",
|
|
)
|
|
node_type: NodeType = Field(
|
|
default=NodeType.FILE,
|
|
description="Either a file or folder, use the name to determine which it could be",
|
|
)
|
|
|
|
def print_paths(self, parent_path=""):
|
|
"""Prints the path of the node and its children."""
|
|
if self.node_type == NodeType.FOLDER:
|
|
path = f"{parent_path}/{self.name}" if parent_path != "" else self.name
|
|
print(path, self.node_type)
|
|
if self.children is not None:
|
|
for child in self.children:
|
|
child.print_paths(path)
|
|
else:
|
|
print(f"{parent_path}/{self.name}", self.node_type)
|
|
|
|
class DirectoryTree(BaseModel):
|
|
"""
|
|
Container class representing a directory tree.
|
|
|
|
Args:
|
|
root (Node): The root node of the tree.
|
|
|
|
Methods:
|
|
print_paths: Prints the paths of the root node and its children.
|
|
"""
|
|
root: Node = Field(..., description="Root folder of the directory tree")
|
|
|
|
def print_paths(self):
|
|
"""Prints the paths of the root node and its children."""
|
|
self.root.print_paths()
|
|
|
|
Node.update_forward_refs()
|
|
DirectoryTree.update_forward_refs()
|
|
```
|
|
|
|
The `Node` class represents a single node in the directory tree. It has a name, a list of children nodes (applicable only to folders), and a node type (either a file or a folder). The `print_paths` method can be used to print the path of the node and its children.
|
|
|
|
The `DirectoryTree` class represents the entire directory tree. It has a single attribute, `root`, which is the root node of the tree. The `print_paths` method can be used to print the paths of the root node and its children.
|
|
|
|
## Parsing the Tree
|
|
|
|
We define a function `parse_tree_to_filesystem` to convert a string representing a directory tree into a filesystem structure using OpenAI.
|
|
|
|
```python
|
|
import instructor
|
|
from openai import OpenAI
|
|
|
|
# Apply the patch to the OpenAI client
|
|
# enables response_model keyword
|
|
client = instructor.patch(OpenAI())
|
|
|
|
def parse_tree_to_filesystem(data: str) -> DirectoryTree:
|
|
"""
|
|
Convert a string representing a directory tree into a filesystem structure
|
|
using OpenAI's GPT-3 model.
|
|
|
|
Args:
|
|
data (str): The string to convert into a filesystem.
|
|
|
|
Returns:
|
|
DirectoryTree: The directory tree representing the filesystem.
|
|
"""
|
|
|
|
return client.chat.completions.create(
|
|
model="gpt-3.5-turbo-0613",
|
|
response_model=DirectoryTree,
|
|
messages=[
|
|
{
|
|
"role": "system",
|
|
"content": "You are a perfect file system parsing algorithm. You are given a string representing a directory tree. You must return the correct filesystem structure.",
|
|
},
|
|
{
|
|
"role": "user",
|
|
"content": f"Consider the data below:\n{data} and return the correctly labeled filesystem",
|
|
},
|
|
],
|
|
max_tokens=1000,
|
|
)
|
|
|
|
```
|
|
|
|
The `parse_tree_to_filesystem` function takes a string `data` representing the directory tree and returns a `DirectoryTree` object representing the filesystem structure. It uses the OpenAI Chat API to complete the prompt and extract the directory tree.
|
|
|
|
## Example Usage
|
|
|
|
Let's demonstrate how to use the `parse_tree_to_filesystem`
|
|
|
|
function with an example:
|
|
|
|
```python
|
|
root = parse_tree_to_filesystem(
|
|
"""
|
|
root
|
|
├── folder1
|
|
│ ├── file1.txt
|
|
│ └── file2.txt
|
|
└── folder2
|
|
├── file3.txt
|
|
└── subfolder1
|
|
└── file4.txt
|
|
"""
|
|
)
|
|
root.print_paths()
|
|
```
|
|
|
|
In this example, we call `parse_tree_to_filesystem` with a string representing a directory tree.
|
|
|
|
After parsing the string into a `DirectoryTree` object, we call `root.print_paths()` to print the paths of the root node and its children. The output of this example will be:
|
|
|
|
```python
|
|
root NodeType.FOLDER
|
|
root/folder1 NodeType.FOLDER
|
|
root/folder1/file1.txt NodeType.FILE
|
|
root/folder1/file2.txt NodeType.FILE
|
|
root/folder2 NodeType.FOLDER
|
|
root/folder2/file3.txt NodeType.FILE
|
|
root/folder2/subfolder1 NodeType.FOLDER
|
|
root/folder2/subfolder1/file4.txt NodeType.FILE
|
|
```
|
|
|
|
This demonstrates how to use OpenAI's GPT-3 model to parse a string representing a directory tree and obtain the correct filesystem structure.
|
|
|
|
I hope this example helps you understand how to leverage OpenAI Function Call for parsing recursive trees. If you have any further questions, feel free to ask!
|