--- draft: False date: 2024-03-07 slug: open-source-local-structured-output-pydantic-json-openai tags: - llms - opensource - together - llama-cpp-python - anyscale - groq - mistral - ollama authors: - jxnl --- # Structured Output for Open Source and Local LLMS Originally, Instructor facilitated API interactions solely via the OpenAI SDK, with an emphasis on function call by incorporating [Pydantic](https://pydantic-docs.helpmanual.io/) for structured data validation and serialization. As the year progressed, we expanded our toolkit by integrating [JSON mode](../../concepts/patching.md#json-mode), thus enhancing our adaptability to vision models and open source models. This advancement now enables us to support an extensive range of models, from [GPT](https://openai.com/api/) and [Mistral](https://mistral.ai) to virtually any model accessible through [Ollama](https://ollama.ai) and [Hugging Face](https://huggingface.co/models), facilitated by [llama-cpp-python](../../hub/llama-cpp-python.md). For more insights into leveraging JSON mode with various models, refer back to our detailed guide on [Patching](../../concepts/patching.md). If you want to check out a course on how to use Instructor with Pydantic, check out our course on [Steering language models towards structured outputs.](https://www.wandb.courses/courses/steering-language-models). ## Exploring Different OpenAI Clients with Instructor The landscape of OpenAI clients is diverse, each offering unique functionalities tailored to different needs. Below, we explore some of the notable clients integrated with Instructor, providing structured outputs and enhanced capabilities, complete with examples of how to initialize and patch each client. ## Local Models ### Ollama: A New Frontier for Local Models Ollama's introduction significantly impacts the open-source community, offering a way to merge structured outputs with local models via JSON schema, as detailed in our [Ollama documentation](../../hub/ollama.md). For an in-depth exploration of Ollama, including setup and advanced features, refer to the documentation. The [Ollama official website](https://ollama.ai/download) also provides essential resources, model downloads, and community support for newcomers. ``` ollama run llama2 ``` ```python from openai import OpenAI from pydantic import BaseModel import instructor class UserDetail(BaseModel): name: str age: int # enables `response_model` in create call client = instructor.patch( OpenAI( base_url="http://localhost:11434/v1", api_key="ollama", # required, but unused ), mode=instructor.Mode.JSON, ) user = client.chat.completions.create( model="llama2", messages=[ { "role": "user", "content": "Jason is 30 years old", } ], response_model=UserDetail, ) print(user) #> name='Jason' age=30 ``` ### llama-cpp-python Open-source LLMS are gaining popularity, and llama-cpp-python has made the `llama-cpp` model available to obtain structured outputs using JSON schema via a mixture of [constrained sampling](https://llama-cpp-python.readthedocs.io/en/latest/#json-schema-mode) and [speculative decoding](https://llama-cpp-python.readthedocs.io/en/latest/#speculative-decoding). They also support a [OpenAI compatible client](https://llama-cpp-python.readthedocs.io/en/latest/#openai-compatible-web-server), which can be used to obtain structured output as an in-process mechanism to avoid any network dependency. For those interested in leveraging the power of llama-cpp-python for structured outputs, here's a quick example: ```python import llama_cpp import instructor from llama_cpp.llama_speculative import LlamaPromptLookupDecoding from pydantic import BaseModel llama = llama_cpp.Llama( model_path="../../models/OpenHermes-2.5-Mistral-7B-GGUF/openhermes-2.5-mistral-7b.Q4_K_M.gguf", n_gpu_layers=-1, chat_format="chatml", n_ctx=2048, draft_model=LlamaPromptLookupDecoding(num_pred_tokens=2), logits_all=True, verbose=False, ) create = instructor.patch( create=llama.create_chat_completion_openai_v1, mode=instructor.Mode.JSON_SCHEMA, ) class UserDetail(BaseModel): name: str age: int user = create( messages=[ { "role": "user", "content": "Extract `Jason is 30 years old`", } ], response_model=UserDetail, ) print(user) #> name='Jason' age=30 ``` ## Alternative Providers ### Anyscale Anyscale's Mistral model, as detailed in our [Anyscale documentation](../../hub/anyscale.md) and on [Anyscale's official documentation](https://docs.anyscale.com/), introduces the ability to obtain structured outputs using JSON schema. ```bash export ANYSCALE_API_KEY="your-api-key" ``` ```python import os from openai import OpenAI from pydantic import BaseModel import instructor class UserDetails(BaseModel): name: str age: int # enables `response_model` in create call client = instructor.patch( OpenAI( base_url="https://api.endpoints.anyscale.com/v1", api_key=os.environ["ANYSCALE_API_KEY"], ), # This uses Anyscale's json schema output mode mode=instructor.Mode.JSON_SCHEMA, ) resp = client.chat.completions.create( model="mistralai/Mixtral-8x7B-Instruct-v0.1", messages=[ {"role": "system", "content": "You are a world class extractor"}, {"role": "user", "content": 'Extract the following entities: "Jason is 20"'}, ], response_model=UserDetails, ) print(resp) #> name='Jason' age=20 ``` ### Groq Groq's platform, detailed further in our [Groq documentation](../../hub/groq.md) and on [Groq's official documentation](https://groq.com/), offers a unique approach to processing with its tensor architecture. This innovation significantly enhances the performance of structured output processing. ```bash export GROQ_API_KEY="your-api-key" ``` ```python import os import instructor import groq from pydantic import BaseModel client = qrog.Groq( api_key=os.environ.get("GROQ_API_KEY"), ) # By default, the patch function will patch the ChatCompletion.create and ChatCompletion.create methods to support the response_model parameter client = instructor.patch(client, mode=instructor.Mode.MD_JSON) # Now, we can use the response_model parameter using only a base model # rather than having to use the OpenAISchema class class UserExtract(BaseModel): name: str age: int user: UserExtract = client.chat.completions.create( model="mixtral-8x7b-32768", response_model=UserExtract, messages=[ {"role": "user", "content": "Extract jason is 25 years old"}, ], ) assert isinstance(user, UserExtract), "Should be instance of UserExtract" print(user) #> name='jason' age=25 """ ``` ### Together AI Together AI, when combined with Instructor, offers a seamless experience for developers looking to leverage structured outputs in their applications. For more details, refer to our [Together AI documentation](../hub/together.md) and explore the [patching guide](../concepts/patching.md) to enhance your applications. ```bash export TOGETHER_API_KEY="your-api-key" ``` ```python import os import openai from pydantic import BaseModel import instructor client = openai.OpenAI( base_url="https://api.together.xyz/v1", api_key=os.environ["TOGETHER_API_KEY"], ) client = instructor.patch(client, mode=instructor.Mode.TOOLS) class UserExtract(BaseModel): name: str age: int user: UserExtract = client.chat.completions.create( model="mistralai/Mixtral-8x7B-Instruct-v0.1", response_model=UserExtract, messages=[ {"role": "user", "content": "Extract jason is 25 years old"}, ], ) assert isinstance(user, UserExtract), "Should be instance of UserExtract" print(user) #> name='jason' age=25 ``` ### Mistral For those interested in exploring the capabilities of Mistral Large with Instructor, we highly recommend checking out our comprehensive guide on [Mistral Large](../../hub/mistral.md). ```python import instructor from pydantic import BaseModel from mistralai.client import MistralClient client = MistralClient() patched_chat = instructor.patch(create=client.chat, mode=instructor.Mode.MISTRAL_TOOLS) class UserDetails(BaseModel): name: str age: int resp = patched_chat( model="mistral-large-latest", response_model=UserDetails, messages=[ { "role": "user", "content": f'Extract the following entities: "Jason is 20"', }, ], ) print(resp) #> name='Jason' age=20 ```