feat(Instructor): introduce Instructor Hub with tutorials, examples, and new CLI (#439)

This commit is contained in:
Jason Liu
2024-02-17 21:04:35 -05:00
committed by GitHub
parent 70334887d6
commit 4eedd5e968
23 changed files with 2250 additions and 109 deletions
+92
View File
@@ -0,0 +1,92 @@
# Instructor Hub
Welcome to instructor hub, the goal of this project is to provide a set of tutorials and examples to help you get started, and allow you to pull in the code you need to get started with `instructor`
Make sure you're using the latest version of `instructor` by running:
```bash
pip install -U instructor
```
## Contributing
We welcome contributions to the instructor hub, if you have a tutorial or example you'd like to add, please open a pull request in `docs/hub` and we'll review it.
1. The code must be in a single file
2. Make sure that its referenced in the `mkdocs.yml`
3. Make sure that the code is unit tested.
### Using pytest_examples
By running the following command you can run the tests and update the examples. This ensures that the examples are always up to date.
Linted correctly and that the examples are working, make sure to include a `if __name__ == "__main__":` block in your code and add some asserts to ensure that the code is working.
```bash
poetry run pytest tests/openai/docs/test_hub.py --update-examples
```
## CLI Usage
Instructor hub comes with a command line interface (CLI) that allows you to view and interact with the tutorials and examples and allows you to pull in the code you need to get started with the API.
### List Cookbooks
By running `instructor hub list` you can see all the available tutorials and examples. By clickony (doc) you can see the full tutorial back on this website.
```bash
$ instructor hub list --sort
```
| hub_id | slug | title | n_downloads |
| ------ | ----------------------------- | ----------------------------- | ----------- |
| 2 | multiple_classification (doc) | Multiple Classification Model | 24 |
| 1 | single_classification (doc) | Single Classification Model | 2 |
### Searching for Cookbooks
You can search for a tutorial by running `instructor hub list -q <QUERY>`. This will return a list of tutorials that match the query.
```bash
$ instructor hub list -q multi
```
| hub_id | slug | title | n_downloads |
| ------ | ----------------------------- | ----------------------------- | ----------- |
| 2 | multiple_classification (doc) | Multiple Classification Model | 24 |
### Reading a Cookbook
To read a tutorial, you can run `instructor hub pull --id <hub_id> --page` to see the full tutorial in the terminal. You can use `j,k` to scroll up and down, and `q` to quit. You can also run it without `--page` to print the tutorial to the terminal.
```bash
$ instructor hub pull --id 2 --page
```
### Pulling in Code
You can pull in the code with `--py --output=<filename>` to save the code to a file, or you cal also run it without `--output` to print the code to the terminal.
```bash
$ instructor hub pull --id 2 --py --output=run.py
$ instructor hub pull --id 2 --py > run.py
```
You can run the code instantly if you `|` it to `python`:
```bash
$ instructor hub pull --id 2 --py | python
```
## Call for Contributions
We're looking for a bunch more hub examples, if you have a tutorial or example you'd like to add, please open a pull request in `docs/hub` and we'll review it.
- [ ] Converting the cookbooks to the new format
- [ ] Validator examples
- [ ] Data extraction examples
- [ ] Streaming examples (Iterable and Partial)
- [ ] Batch Parsing examples
- [ ] Open Examples, together, anyscale, ollama, llama-cpp, etc
- [ ] Query Expansion examples
- [ ] Batch Data Processing examples
- [ ] Batch Data Processing examples with Cache
+51
View File
@@ -0,0 +1,51 @@
For multi-label classification, we introduce a new enum class and a different Pydantic model to handle multiple labels.
```python
import openai
import instructor
from typing import List, Literal
from pydantic import BaseModel, Field
# Apply the patch to the OpenAI client
# enables response_model keyword
client = instructor.patch(openai.OpenAI())
LABELS = Literal["ACCOUNT", "BILLING", "GENERAL_QUERY"]
class MultiClassPrediction(BaseModel):
labels: List[LABELS] = Field(
...,
description="Only select the labels that apply to the support ticket.",
)
def multi_classify(data: str) -> MultiClassPrediction:
return client.chat.completions.create(
model="gpt-4-turbo-preview", # gpt-3.5-turbo fails
response_model=MultiClassPrediction,
messages=[
{
"role": "system",
"content": f"You are a support agent at a tech company. Only select the labels that apply to the support ticket.",
},
{
"role": "user",
"content": f"Classify the following support ticket: {data}",
},
],
) # type: ignore
if __name__ == "__main__":
ticket = "My account is locked and I can't access my billing info."
prediction = multi_classify(ticket)
assert {"ACCOUNT", "BILLING"} == {label for label in prediction.labels}
print("input:", ticket)
#> input: My account is locked and I can't access my billing info.
print("labels:", LABELS)
#> labels: typing.Literal['ACCOUNT', 'BILLING', 'GENERAL_QUERY']
print("prediction:", prediction)
#> prediction: labels=['ACCOUNT', 'BILLING']
```
+47
View File
@@ -0,0 +1,47 @@
# Single-Label Classification
This example demonstrates how to perform single-label classification using the OpenAI API. The example uses the `gpt-3.5-turbo` model to classify text as either `SPAM` or `NOT_SPAM`.
```python
from pydantic import BaseModel, Field
from typing import Literal
from openai import OpenAI
import instructor
# Apply the patch to the OpenAI client
# enables response_model keyword
client = instructor.patch(OpenAI())
class ClassificationResponse(BaseModel):
label: Literal["SPAM", "NOT_SPAM"] = Field(
...,
description="The predicted class label.",
)
def classify(data: str) -> ClassificationResponse:
"""Perform single-label classification on the input text."""
return client.chat.completions.create(
model="gpt-3.5-turbo",
response_model=ClassificationResponse,
messages=[
{
"role": "user",
"content": f"Classify the following text: {data}",
},
],
)
if __name__ == "__main__":
for text, label in [
("Hey Jason! You're awesome", "NOT_SPAM"),
("I am a nigerian prince and I need your help.", "SPAM"),
]:
prediction = classify(text)
assert prediction.label == label
print(f"Text: {text}, Predicted Label: {prediction.label}")
#> Text: Hey Jason! You're awesome, Predicted Label: NOT_SPAM
#> Text: I am a nigerian prince and I need your help., Predicted Label: SPAM
```
+13
View File
@@ -0,0 +1,13 @@
# http://editorconfig.org
root = true
[*]
indent_style = tab
tab_width = 2
end_of_line = lf
charset = utf-8
trim_trailing_whitespace = true
insert_final_newline = true
[*.yml]
indent_style = space
+172
View File
@@ -0,0 +1,172 @@
# Logs
logs
_.log
npm-debug.log_
yarn-debug.log*
yarn-error.log*
lerna-debug.log*
.pnpm-debug.log*
# Diagnostic reports (https://nodejs.org/api/report.html)
report.[0-9]_.[0-9]_.[0-9]_.[0-9]_.json
# Runtime data
pids
_.pid
_.seed
\*.pid.lock
# Directory for instrumented libs generated by jscoverage/JSCover
lib-cov
# Coverage directory used by tools like istanbul
coverage
\*.lcov
# nyc test coverage
.nyc_output
# Grunt intermediate storage (https://gruntjs.com/creating-plugins#storing-task-files)
.grunt
# Bower dependency directory (https://bower.io/)
bower_components
# node-waf configuration
.lock-wscript
# Compiled binary addons (https://nodejs.org/api/addons.html)
build/Release
# Dependency directories
node_modules/
jspm_packages/
# Snowpack dependency directory (https://snowpack.dev/)
web_modules/
# TypeScript cache
\*.tsbuildinfo
# Optional npm cache directory
.npm
# Optional eslint cache
.eslintcache
# Optional stylelint cache
.stylelintcache
# Microbundle cache
.rpt2_cache/
.rts2_cache_cjs/
.rts2_cache_es/
.rts2_cache_umd/
# Optional REPL history
.node_repl_history
# Output of 'npm pack'
\*.tgz
# Yarn Integrity file
.yarn-integrity
# dotenv environment variable files
.env
.env.development.local
.env.test.local
.env.production.local
.env.local
# parcel-bundler cache (https://parceljs.org/)
.cache
.parcel-cache
# Next.js build output
.next
out
# Nuxt.js build / generate output
.nuxt
dist
# Gatsby files
.cache/
# Comment in the public line in if your project uses Gatsby and not Next.js
# https://nextjs.org/blog/next-9-1#public-directory-support
# public
# vuepress build output
.vuepress/dist
# vuepress v2.x temp and cache directory
.temp
.cache
# Docusaurus cache and generated files
.docusaurus
# Serverless directories
.serverless/
# FuseBox cache
.fusebox/
# DynamoDB Local files
.dynamodb/
# TernJS port file
.tern-port
# Stores VSCode versions used for testing VSCode extensions
.vscode-test
# yarn v2
.yarn/cache
.yarn/unplugged
.yarn/build-state.yml
.yarn/install-state.gz
.pnp.\*
# wrangler project
.dev.vars
.wrangler/
+6
View File
@@ -0,0 +1,6 @@
{
"printWidth": 140,
"singleQuote": true,
"semi": true,
"useTabs": true
}
+10
View File
@@ -0,0 +1,10 @@
CREATE TABLE hub_analytics (
id SERIAL PRIMARY KEY,
event_type VARCHAR(255) NOT NULL,
user_agent VARCHAR(255) NOT NULL,
request_ip VARCHAR(100) NOT NULL,
request_time TIMESTAMP WITH TIME ZONE NOT NULL,
branch VARCHAR(255) NOT NULL,
slug VARCHAR(255) NOT NULL
);
File diff suppressed because it is too large Load Diff
+20
View File
@@ -0,0 +1,20 @@
{
"name": "instructor-hub-proxy",
"version": "0.0.0",
"private": true,
"scripts": {
"deploy": "wrangler deploy",
"dev": "wrangler dev",
"start": "wrangler dev"
},
"devDependencies": {
"@cloudflare/workers-types": "^4.20240208.0",
"itty-router": "^3.0.12",
"typescript": "^5.0.4",
"wrangler": "^3.0.0"
},
"dependencies": {
"fuse.js": "^7.0.0",
"yaml": "^2.3.4"
}
}
+27
View File
@@ -0,0 +1,27 @@
/**
* Welcome to Cloudflare Workers! This is your first worker.
*
* - Run `npm run dev` in your terminal to start a development server
* - Open a browser tab at http://localhost:8787/ to see your worker in action
* - Run `npm run deploy` to publish your worker
*
* Learn more at https://developers.cloudflare.com/workers/
*/
import apiRouter from './router';
export interface Env {
// If you set another name in wrangler.toml as the value for 'binding',
// replace "DB" with the variable name you defined.
DB: D1Database;
}
// Export a default object containing event handlers
export default {
// The fetch handler is invoked when this worker receives a HTTP(S) request
// and should return a Response (optionally wrapped in a Promise)
async fetch(request: Request, env: Env, ctx: ExecutionContext): Promise<Response> {
// @ts-ignore
request.env = env;
return apiRouter.handle(request);
},
};
+128
View File
@@ -0,0 +1,128 @@
import { Router } from 'itty-router';
import YAML from 'yaml';
import Fuse from 'fuse.js';
// now let's create a router (note the lack of "new")
const router = Router();
// Function to track analytics
async function trackAnalytics(request: any, env: Env, event_type: string, slug: string, branch: string) {
const user_agent = request.headers.get('User-Agent') || 'unknown';
const request_ip = request.headers.get('CF-Connecting-IP') || 'unknown'; // Cloudflare passes the client IP
const request_time = new Date().toISOString();
// Prepare and execute the insert statement for analytics tracking
// @ts-ignore
await env.DB.prepare(
'INSERT INTO hub_analytics (event_type, user_agent, request_ip, request_time, slug, branch) VALUES (?, ?, ?, ?, ?, ?)'
)
.bind(event_type, user_agent, request_ip, request_time, slug, branch)
.run();
}
// GET collection index
router.get('/api/:branch/items', async (request) => {
const { query, params, env } = request;
await trackAnalytics(request, env, 'COLLECTION_INDEX', 'index', params.branch);
/**
* {
* success: true,
* meta: {...},
* results: [ { slug: 'single_classification', 'n': 2 } ]
* }
*/
const counts = await env.DB.prepare(
`SELECT slug, count(1) as n
FROM hub_analytics
WHERE branch = ? AND event_type != 'COLLECTION_INDEX'
GROUP BY slug`
)
.bind(params.branch)
.all();
const url = `https://raw.githubusercontent.com/jxnl/instructor/${params.branch}/mkdocs.yml?raw=true`;
const mkdoc_yml = await fetch(url).then((res) => res.text());
const mkdocs = YAML.parse(mkdoc_yml);
var cookbooks = mkdocs.nav
?.filter((obj: Map<string, string>) => 'Hub' in obj)[0]
.Hub.map((obj: any, index: number) => {
const [name, path] = Object.entries(obj)[0];
// Extract slug by getting the substring after the last '/'
// @ts-ignore
const slug = path.substring(path.lastIndexOf('/') + 1, path.lastIndexOf('.'));
const count = counts.results.find((obj: any) => obj.slug === slug)?.n || 0;
return { id: index, name, path, slug, count };
})
.filter(({ slug }: any) => slug !== 'index');
// Search for cookbooks
const queryStr = query.q;
if (queryStr !== undefined && queryStr !== '') {
const fuse = new Fuse(cookbooks, {
keys: ['name', 'slug'],
threshold: 0.3,
});
cookbooks = fuse.search(queryStr as string).map((obj: any) => obj.item);
}
return new Response(JSON.stringify(cookbooks), {
headers: {
'content-type': 'application/json',
},
});
});
// GET content
router.get('/api/:branch/items/:slug/md', async (request) => {
const { params, env } = request;
await trackAnalytics(request, env, 'CONTENT_MARKDOWN', params.slug, params.branch);
const raw_content = `https://raw.githubusercontent.com/jxnl/instructor/${params.branch}/docs/hub/${params.slug}.md?raw=true`;
const content = await fetch(raw_content).then((res) => res.text());
return new Response(content, {
headers: {
'content-type': 'text/plain',
},
});
});
// GET content python
router.get('/api/:branch/items/:slug/py', async (request) => {
const { params, env } = request;
await trackAnalytics(request, env, 'CONTENT_PYTHON', params.slug, params.branch);
const raw_content = `https://raw.githubusercontent.com/jxnl/instructor/${params.branch}/docs/hub/${params.slug}.md?raw=true`;
const content = await fetch(raw_content).then((res) => res.text());
// Extract all Python code blocks from within ```py or ```python blocks in the markdown
const python_codes = content.match(/(?<=```(?:py|python)\n)[\s\S]+?(?=\n```)/g);
if (python_codes === null) {
return new Response('No Python code found in this document.', {
headers: {
'content-type': 'text/plain',
},
});
}
if (python_codes.length === 0) {
return new Response('No Python code found in this document.', {
headers: {
'content-type': 'text/plain',
},
});
}
const python_code = python_codes.join('\n\n');
return new Response(python_code, {
headers: {
'content-type': 'text/plain',
},
});
});
// 404 for everything else
router.all('*', () => new Response('Not Found.', { status: 404 }));
export default router;
+103
View File
@@ -0,0 +1,103 @@
{
"compilerOptions": {
/* Visit https://aka.ms/tsconfig.json to read more about this file */
/* Projects */
// "incremental": true, /* Enable incremental compilation */
// "composite": true, /* Enable constraints that allow a TypeScript project to be used with project references. */
// "tsBuildInfoFile": "./", /* Specify the folder for .tsbuildinfo incremental compilation files. */
// "disableSourceOfProjectReferenceRedirect": true, /* Disable preferring source files instead of declaration files when referencing composite projects */
// "disableSolutionSearching": true, /* Opt a project out of multi-project reference checking when editing. */
// "disableReferencedProjectLoad": true, /* Reduce the number of projects loaded automatically by TypeScript. */
/* Language and Environment */
"target": "es2021" /* Set the JavaScript language version for emitted JavaScript and include compatible library declarations. */,
"lib": ["es2021"] /* Specify a set of bundled library declaration files that describe the target runtime environment. */,
"jsx": "react" /* Specify what JSX code is generated. */,
// "experimentalDecorators": true, /* Enable experimental support for TC39 stage 2 draft decorators. */
// "emitDecoratorMetadata": true, /* Emit design-type metadata for decorated declarations in source files. */
// "jsxFactory": "", /* Specify the JSX factory function used when targeting React JSX emit, e.g. 'React.createElement' or 'h' */
// "jsxFragmentFactory": "", /* Specify the JSX Fragment reference used for fragments when targeting React JSX emit e.g. 'React.Fragment' or 'Fragment'. */
// "jsxImportSource": "", /* Specify module specifier used to import the JSX factory functions when using `jsx: react-jsx*`.` */
// "reactNamespace": "", /* Specify the object invoked for `createElement`. This only applies when targeting `react` JSX emit. */
// "noLib": true, /* Disable including any library files, including the default lib.d.ts. */
// "useDefineForClassFields": true, /* Emit ECMAScript-standard-compliant class fields. */
/* Modules */
"module": "es2022" /* Specify what module code is generated. */,
// "rootDir": "./", /* Specify the root folder within your source files. */
"moduleResolution": "node" /* Specify how TypeScript looks up a file from a given module specifier. */,
// "baseUrl": "./", /* Specify the base directory to resolve non-relative module names. */
// "paths": {}, /* Specify a set of entries that re-map imports to additional lookup locations. */
// "rootDirs": [], /* Allow multiple folders to be treated as one when resolving modules. */
// "typeRoots": [], /* Specify multiple folders that act like `./node_modules/@types`. */
"types": [
"@cloudflare/workers-types/2023-07-01"
] /* Specify type package names to be included without being referenced in a source file. */,
// "allowUmdGlobalAccess": true, /* Allow accessing UMD globals from modules. */
"resolveJsonModule": true /* Enable importing .json files */,
// "noResolve": true, /* Disallow `import`s, `require`s or `<reference>`s from expanding the number of files TypeScript should add to a project. */
/* JavaScript Support */
"allowJs": true /* Allow JavaScript files to be a part of your program. Use the `checkJS` option to get errors from these files. */,
"checkJs": false /* Enable error reporting in type-checked JavaScript files. */,
// "maxNodeModuleJsDepth": 1, /* Specify the maximum folder depth used for checking JavaScript files from `node_modules`. Only applicable with `allowJs`. */
/* Emit */
// "declaration": true, /* Generate .d.ts files from TypeScript and JavaScript files in your project. */
// "declarationMap": true, /* Create sourcemaps for d.ts files. */
// "emitDeclarationOnly": true, /* Only output d.ts files and not JavaScript files. */
// "sourceMap": true, /* Create source map files for emitted JavaScript files. */
// "outFile": "./", /* Specify a file that bundles all outputs into one JavaScript file. If `declaration` is true, also designates a file that bundles all .d.ts output. */
// "outDir": "./", /* Specify an output folder for all emitted files. */
// "removeComments": true, /* Disable emitting comments. */
"noEmit": true /* Disable emitting files from a compilation. */,
// "importHelpers": true, /* Allow importing helper functions from tslib once per project, instead of including them per-file. */
// "importsNotUsedAsValues": "remove", /* Specify emit/checking behavior for imports that are only used for types */
// "downlevelIteration": true, /* Emit more compliant, but verbose and less performant JavaScript for iteration. */
// "sourceRoot": "", /* Specify the root path for debuggers to find the reference source code. */
// "mapRoot": "", /* Specify the location where debugger should locate map files instead of generated locations. */
// "inlineSourceMap": true, /* Include sourcemap files inside the emitted JavaScript. */
// "inlineSources": true, /* Include source code in the sourcemaps inside the emitted JavaScript. */
// "emitBOM": true, /* Emit a UTF-8 Byte Order Mark (BOM) in the beginning of output files. */
// "newLine": "crlf", /* Set the newline character for emitting files. */
// "stripInternal": true, /* Disable emitting declarations that have `@internal` in their JSDoc comments. */
// "noEmitHelpers": true, /* Disable generating custom helper functions like `__extends` in compiled output. */
// "noEmitOnError": true, /* Disable emitting files if any type checking errors are reported. */
// "preserveConstEnums": true, /* Disable erasing `const enum` declarations in generated code. */
// "declarationDir": "./", /* Specify the output directory for generated declaration files. */
// "preserveValueImports": true, /* Preserve unused imported values in the JavaScript output that would otherwise be removed. */
/* Interop Constraints */
"isolatedModules": true /* Ensure that each file can be safely transpiled without relying on other imports. */,
"allowSyntheticDefaultImports": true /* Allow 'import x from y' when a module doesn't have a default export. */,
// "esModuleInterop": true /* Emit additional JavaScript to ease support for importing CommonJS modules. This enables `allowSyntheticDefaultImports` for type compatibility. */,
// "preserveSymlinks": true, /* Disable resolving symlinks to their realpath. This correlates to the same flag in node. */
"forceConsistentCasingInFileNames": true /* Ensure that casing is correct in imports. */,
/* Type Checking */
"strict": true /* Enable all strict type-checking options. */,
// "noImplicitAny": true, /* Enable error reporting for expressions and declarations with an implied `any` type.. */
// "strictNullChecks": true, /* When type checking, take into account `null` and `undefined`. */
// "strictFunctionTypes": true, /* When assigning functions, check to ensure parameters and the return values are subtype-compatible. */
// "strictBindCallApply": true, /* Check that the arguments for `bind`, `call`, and `apply` methods match the original function. */
// "strictPropertyInitialization": true, /* Check for class properties that are declared but not set in the constructor. */
// "noImplicitThis": true, /* Enable error reporting when `this` is given the type `any`. */
// "useUnknownInCatchVariables": true, /* Type catch clause variables as 'unknown' instead of 'any'. */
// "alwaysStrict": true, /* Ensure 'use strict' is always emitted. */
// "noUnusedLocals": true, /* Enable error reporting when a local variables aren't read. */
// "noUnusedParameters": true, /* Raise an error when a function parameter isn't read */
// "exactOptionalPropertyTypes": true, /* Interpret optional property types as written, rather than adding 'undefined'. */
// "noImplicitReturns": true, /* Enable error reporting for codepaths that do not explicitly return in a function. */
// "noFallthroughCasesInSwitch": true, /* Enable error reporting for fallthrough cases in switch statements. */
// "noUncheckedIndexedAccess": true, /* Include 'undefined' in index signature results */
// "noImplicitOverride": true, /* Ensure overriding members in derived classes are marked with an override modifier. */
// "noPropertyAccessFromIndexSignature": true, /* Enforces using indexed accessors for keys declared using an indexed type */
// "allowUnusedLabels": true, /* Disable error reporting for unused labels. */
// "allowUnreachableCode": true, /* Disable error reporting for unreachable code. */
/* Completeness */
// "skipDefaultLibCheck": true, /* Skip type checking .d.ts files that are included with TypeScript. */
"skipLibCheck": true /* Skip type checking all .d.ts files. */
}
}
+16
View File
@@ -0,0 +1,16 @@
interface Env {
// Example binding to KV. Learn more at https://developers.cloudflare.com/workers/runtime-apis/kv/
// MY_KV_NAMESPACE: KVNamespace;
//
// Example binding to Durable Object. Learn more at https://developers.cloudflare.com/workers/runtime-apis/durable-objects/
// MY_DURABLE_OBJECT: DurableObjectNamespace;
//
// Example binding to R2. Learn more at https://developers.cloudflare.com/workers/runtime-apis/r2/
// MY_BUCKET: R2Bucket;
//
// Example binding to a Service. Learn more at https://developers.cloudflare.com/workers/runtime-apis/service-bindings/
// MY_SERVICE: Fetcher;
//
// Example binding to a Queue. Learn more at https://developers.cloudflare.com/queues/javascript-apis/
// MY_QUEUE: Queue;
}
+56
View File
@@ -0,0 +1,56 @@
name = "instructor-hub-proxy"
main = "src/index.ts"
compatibility_date = "2024-02-08"
# Variable bindings. These are arbitrary, plaintext strings (similar to environment variables)
# Note: Use secrets to store sensitive data.
# Docs: https://developers.cloudflare.com/workers/platform/environment-variables
# [vars]
# MY_VARIABLE = "production_value"
# Bind a KV Namespace. Use KV as persistent storage for small key-value pairs.
# Docs: https://developers.cloudflare.com/workers/runtime-apis/kv
# [[kv_namespaces]]
# binding = "MY_KV_NAMESPACE"
# id = "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
# Bind an R2 Bucket. Use R2 to store arbitrarily large blobs of data, such as files.
# Docs: https://developers.cloudflare.com/r2/api/workers/workers-api-usage/
# [[r2_buckets]]
# binding = "MY_BUCKET"
# bucket_name = "my-bucket"
# Bind a Queue producer. Use this binding to schedule an arbitrary task that may be processed later by a Queue consumer.
# Docs: https://developers.cloudflare.com/queues/get-started
# [[queues.producers]]
# binding = "MY_QUEUE"
# queue = "my-queue"
# Bind a Queue consumer. Queue Consumers can retrieve tasks scheduled by Producers to act on them.
# Docs: https://developers.cloudflare.com/queues/get-started
# [[queues.consumers]]
# queue = "my-queue"
# Bind another Worker service. Use this binding to call another Worker without network overhead.
# Docs: https://developers.cloudflare.com/workers/platform/services
# [[services]]
# binding = "MY_SERVICE"
# service = "my-service"
# Bind a Durable Object. Durable objects are a scale-to-zero compute primitive based on the actor model.
# Durable Objects can live for as long as needed. Use these when you need a long-running "server", such as in realtime apps.
# Docs: https://developers.cloudflare.com/workers/runtime-apis/durable-objects
# [[durable_objects.bindings]]
# name = "MY_DURABLE_OBJECT"
# class_name = "MyDurableObject"
# Durable Object migrations.
# Docs: https://developers.cloudflare.com/workers/learning/using-durable-objects#configure-durable-object-classes-with-migrations
# [[migrations]]
# tag = "v1"
# new_classes = ["MyDurableObject"]
[[d1_databases]]
binding = "DB" # i.e. available in your Worker on env.DB
database_name = "cli_analytics"
database_id = "607034d8-267d-42d7-8c0b-462aec83d955"
+2
View File
@@ -2,6 +2,7 @@ import typer
import instructor.cli.jobs as jobs
import instructor.cli.files as files
import instructor.cli.usage as usage
import instructor.cli.hub as hub
app = typer.Typer(
name="instructor-ft",
@@ -11,3 +12,4 @@ app = typer.Typer(
app.add_typer(jobs.app, name="jobs", help="Monitor and create fine tuning jobs")
app.add_typer(files.app, name="files", help="Manage files on OpenAI's servers")
app.add_typer(usage.app, name="usage", help="Check OpenAI API usage data")
app.add_typer(hub.app, name="hub", help="Interact with the instructor hub")
+167
View File
@@ -0,0 +1,167 @@
from typing import Optional
import typer
import httpx
from pydantic import BaseModel
from rich.console import Console
from rich.table import Table
from rich.markdown import Markdown
app = typer.Typer(
name="hub",
help="Interact with the instructor hub, a collection of examples and cookbooks for the instructor library.",
short_help="Interact with the instructor hub",
)
console = Console()
class HubPage(BaseModel):
id: int
name: str
slug: str
branch: str = "main"
count: int = 0
def get_doc_url(self) -> str:
return f"https://jxnl.github.io/instructor/hub/{self.slug}/"
def get_md_url(self) -> str:
return f"https://raw.githubusercontent.com/jxnl/instructor/{self.branch}/docs/hub/{self.slug}.md?raw=true"
def render_doc_link(self) -> str:
return f"[link={self.get_doc_url()}](doc)[/link]"
def render_slug(self) -> str:
return f"{self.slug} {self.render_doc_link()}"
class HubClient:
def __init__(
self, base_url: str = "https://instructor-hub-proxy.jason-a3f.workers.dev"
):
self.base_url = base_url
def get_cookbooks(self, branch: str, q: Optional[str] = None, sort: bool = False):
"""Get collection index of cookbooks."""
url = f"{self.base_url}/api/{branch}/items/"
if q:
url += f"?q={q}"
response = httpx.get(url)
if response.status_code == 200:
pages = [HubPage(**page) for page in response.json()]
if sort:
return sorted(pages, key=lambda x: x.count, reverse=True)
return pages
else:
raise Exception(f"Failed to fetch cookbooks: {response.status_code}")
def get_content_markdown(self, branch, slug):
"""Get markdown content."""
url = f"{self.base_url}/api/{branch}/items/{slug}/md/"
response = httpx.get(url)
if response.status_code == 200:
return response.text
else:
raise Exception(f"Failed to fetch markdown content: {response.status_code}")
def get_content_python(self, branch, slug):
"""Get Python code blocks from content."""
url = f"{self.base_url}/api/{branch}/items/{slug}/py/"
response = httpx.get(url)
if response.status_code == 200:
return response.text
else:
raise Exception(f"Failed to fetch Python content: {response.status_code}")
def get_cookbook_id(self, id: int, branch: str = "main") -> HubPage:
for cookbook in self.get_cookbooks(branch):
if cookbook.id == id:
return cookbook
def get_cookbook_slug(self, slug: str, branch: str = "main") -> HubPage:
for cookbook in self.get_cookbooks(branch):
if cookbook.slug == slug:
return cookbook
@app.command(
"list",
help="List all available cookbooks",
short_help="List all available cookbooks",
)
def list_cookbooks(
q: Optional[str] = typer.Option(None, "-q", help="Search for cookbooks by name"),
sort: bool = typer.Option(False, "--sort", help="Sort the cookbooks by popularity"),
branch: str = typer.Option(
"main",
"--branch",
"-b",
help="Specific branch to fetch the cookbooks from. Defaults to 'main'.",
),
):
table = Table(title="Available Cookbooks")
table.add_column("hub_id", justify="right", style="cyan", no_wrap=True)
table.add_column("slug", style="green")
table.add_column("title", style="white")
table.add_column("n_downloads", justify="right")
client = HubClient()
for cookbook in client.get_cookbooks(branch, q=q, sort=sort):
ii = cookbook.id
slug = cookbook.render_slug()
title = cookbook.name
table.add_row(str(ii), slug, title, str(cookbook.count))
console.print(table)
@app.command(
"pull",
help="Pull the latest cookbooks from the instructor hub, optionally outputting to a file",
short_help="Pull the latest cookbooks",
)
def pull(
id: Optional[int] = typer.Option(None, "--id", "-i", help="The cookbook id"),
slug: Optional[str] = typer.Option(None, "--slug", "-s", help="The cookbook slug"),
py: bool = typer.Option(False, "--py", help="Output to a Python file"),
file: Optional[str] = typer.Option(None, "--output", help="Output to a file"),
branch: str = typer.Option(
"main", help="Specific branch to fetch the cookbooks from."
),
page: bool = typer.Option(
False, "--page", "-p", help="Paginate the output with a less-like pager"
),
):
client = HubClient()
cookbook = (
client.get_cookbook_id(id, branch=branch)
if id
else client.get_cookbook_slug(slug, branch=branch)
if slug
else None
)
if not cookbook:
typer.echo("Please provide a valid cookbook id or slug.")
raise typer.Exit(code=1)
output = (
client.get_content_python(branch, cookbook.slug)
if py
else Markdown(client.get_content_markdown(branch, cookbook.slug))
)
if file:
with open(file, "w") as f:
f.write(output)
return
if page:
with console.pager(styles=True):
console.print(output)
elif py:
print(output)
else:
console.print(output)
+5 -3
View File
@@ -114,7 +114,6 @@ markdown_extensions:
- pymdownx.tabbed:
alternate_style: true
combine_header_slug: true
slugify: !!python/object/apply:pymdownx.slugs.slugify
- pymdownx.tasklist:
custom_checkbox: true
nav:
@@ -164,6 +163,10 @@ nav:
- Image to Ad Copy: 'examples/image_to_ad_copy.md'
- Ollama: 'examples/ollama.md'
- SQLModel Integration: 'examples/sqlmodel.md'
- Hub:
- Introduction: 'hub/index.md'
- Single Classification Model: 'hub/single_classification.md'
- Multiple Classification Model: 'hub/multiple_classification.md'
- Tutorials:
- Introduction: 'tutorials/1-introduction.ipynb'
- Tips and Tricks: 'tutorials/2-tips.ipynb'
@@ -234,5 +237,4 @@ extra:
- icon: fontawesome/brands/twitter
link: https://twitter.com/jxnlco
- icon: fontawesome/brands/github
link: https://github.com/jxnl
copyright: Copyright &copy; 2023 Jason Liu
link: https://github.com/jxnl
+1 -1
View File
@@ -1,6 +1,6 @@
[tool.poetry]
name = "instructor"
version = "0.5.2"
version = "0.6.0"
description = "structured outputs for llm"
authors = ["Jason Liu <jason@jxnl.co>"]
license = "MIT"
+12
View File
@@ -0,0 +1,12 @@
import pytest
from pytest_examples import find_examples, CodeExample, EvalExample
@pytest.mark.parametrize("example", find_examples("docs/hub"), ids=str)
def test_format_blog(example: CodeExample, eval_example: EvalExample):
if eval_example.update_examples:
eval_example.format(example)
eval_example.run_print_update(example)
else:
eval_example.lint(example)
eval_example.run(example)
-64
View File
@@ -1,64 +0,0 @@
# Introduction
This section includes a list of notebooks that walk you through some simple concepts in Instructor. We start small and then work our way up to more complex and tricky implementations using the library.
## Overview
Currently we have the following notebooks avaliable
1. `Introduction` - This is a quick walkthrough some of the benefits of Pydantic and how the Instructor Library integrates nicely with Pydantic with `instructor.patch()`
2. `Tips` - Quick demonstration of how to use enums, `Pydantic` models and structured prompting to get specific output formats
3. `Applications Rag`: Learn how to generate nested models with `Pydantic` by rewriting user queries
4. `Knowledge Graphs`: Dive deep into the use of LLMs to break down complex topics into simple knowledge graphs
5. `Validation` : Learn how to use Pydantic's inbuilt validators to perform more complex validation and checks on the outputs of your functions
6. `Chain Of Density` : Learn how to produce high quality summaries that consistently beat out human-generated ones using `Chain of Density` summarization.
## Installation
We utilise the Graphviz package in this tutorial series. If you don't have it on hand, you should download it. Mac users can do so by running `brew install graphviz` while Linux users can try `sudo apt install graphviz` ( modify to your system specific package manager). Here is a link to their official [documentation](https://graphviz.org/download/)
If you're encountering an error like the following when trying to run graphviz after installing it, just restart the notebook and verify you've got graphviz installed by running `dot -v` in your shell.
```
Command '[PosixPath('dot'), '-Kdot', '-Tsvg']' died with <Signals.SIGKILL: 9>.
```
Here are the steps to start running the notebooks
1. Create a virtual environment
```
python3 -m venv .venv
source .venv .venv/bin/activate
```
2. Install the dependencies
```
pip3 install -r requirements.txt
```
3. Add the virtual environment to Jupyter notebook
```
python -m ipykernel install --user --name=instructor-env
```
4. Add OpenAI API Key into your shell by running the following command. This will be set for as long as the shell is open.
```
export OPENAI_API_KEY=<api key goes here>
```
5. Start Jupyter Notebook
```
jupyter notebook
```
File diff suppressed because one or more lines are too long
-32
View File
@@ -1,32 +0,0 @@
import pandas as pd
def flatten_dict(d, parent_key="", sep="_"):
"""
Flatten a nested dictionary.
:param d: The nested dictionary to flatten.
:param parent_key: The base key to use for the flattened keys.
:param sep: Separator to use between keys.
:return: A flattened dictionary.
"""
items = []
for k, v in d.items():
new_key = f"{parent_key}{sep}{k}" if parent_key else k
if isinstance(v, dict):
items.extend(flatten_dict(v, new_key, sep=sep).items())
else:
items.append((new_key, v))
return dict(items)
def dicts_to_df(list_of_dicts):
"""
Convert a list of dictionaries to a pandas DataFrame.
:param list_of_dicts: List of dictionaries, potentially nested.
:return: A pandas DataFrame representing the flattened data.
"""
# Flatten each dictionary and create a DataFrame
flattened_data = [flatten_dict(d) for d in list_of_dicts]
return pd.DataFrame(flattened_data)
-8
View File
@@ -1,8 +0,0 @@
ipykernel
jupyter
instructor
openai>=1.1.0
pydantic
graphviz
spacy
nltk