diff --git a/docs/blog/posts/caching.md b/docs/blog/posts/caching.md index b011fba..24be7ad 100644 --- a/docs/blog/posts/caching.md +++ b/docs/blog/posts/caching.md @@ -20,7 +20,7 @@ Today, we're diving into optimizing instructor code while maintaining the excell -Lets first consider our canonical example, using the `OpenAI` Python client to extract user details. +Let's first consider our canonical example, using the `OpenAI` Python client to extract user details. ```python import instructor @@ -320,7 +320,7 @@ def extract(data) -> UserDetail: !!! note "Looking carefully" - If you look carefully at the code above you'll notice that we're using the same `instructor_cache` decorator as before. The implementatino is the same, but we're using a different caching backend! + If you look carefully at the code above you'll notice that we're using the same `instructor_cache` decorator as before. The implementation is the same, but we're using a different caching backend! ## Conclusion diff --git a/docs/blog/posts/learn-async.md b/docs/blog/posts/learn-async.md index 741b924..c4c0620 100644 --- a/docs/blog/posts/learn-async.md +++ b/docs/blog/posts/learn-async.md @@ -70,7 +70,7 @@ async def extract_person(text: str) -> Person: 1. We use `instructor.apatch` to patch the `create` method of `AsyncOpenAI` to accept a `response_model` argument. This is because the `create` method of `AsyncOpenAI` does not accept a `response_model` argument without this patch. 2. We use `await` here to wait for the response from the server before we return the result. This is because `create` returns a coroutine object, not the result of the coroutine. -Notice that now there are `async` and `await` keywords in the function definition. This is because we're using the `asyncio` library to run the function concurrently. Now lets define a batch of texts to process. +Notice that now there are `async` and `await` keywords in the function definition. This is because we're using the `asyncio` library to run the function concurrently. Now let's define a batch of texts to process. ```python dataset = [ @@ -125,7 +125,7 @@ However, these methods aim to complete as many tasks as possible as quickly as p !!! note "Ordering of results" - Its important to note that the order of the results will not be the same as the order of the dataset. This is because the tasks are completed in the order they finish, not the order they were started. If you need to preserve the order of the results, you can use `asyncio.gather` instead. + It is important to note that the order of the results will not be the same as the order of the dataset. This is because the tasks are completed in the order they finish, not the order they were started. If you need to preserve the order of the results, you can use `asyncio.gather` instead. ### **Rate-Limited Gather**: Using semaphores to limit concurrency. @@ -167,7 +167,7 @@ Now that we have seen the code, let's examine the results of processing 7 texts. !!! note "Other Options" - Its important to also note that here we are using a `semaphore` to limit the number of concurrent requests. However, there are other ways to limit concurrency esp since we have rate limit information from the `openai` request. You can imagine using a library like `ratelimit` to limit the number of requests per second. OR catching rate limit exceptions and using `tenacity` to retry the request after a certain amount of time. + It is important to also note that here we are using a `semaphore` to limit the number of concurrent requests. However, there are other ways to limit concurrency especially since we have rate limit information from the `openai` request. You can imagine using a library like `ratelimit` to limit the number of requests per second. OR catching rate limit exceptions and using `tenacity` to retry the request after a certain amount of time. - [tenacity](https://pypi.org/project/tenacity/) - [aiolimiter](https://pypi.org/project/aiolimiter/) diff --git a/docs/blog/posts/rag-and-beyond.md b/docs/blog/posts/rag-and-beyond.md index a4904ee..f3c6027 100644 --- a/docs/blog/posts/rag-and-beyond.md +++ b/docs/blog/posts/rag-and-beyond.md @@ -13,11 +13,11 @@ authors: # RAG is more than just embedding search -With the advent of large language models (LLM), retrival augmented generation (RAG) has become a hot topic. However throught the past year of [helping startups](https://jxnl.co) integrate LLMs into their stack I've noticed that the pattern of taking user queries, embedding them, and directly searching a vector store is effectively demoware. +With the advent of large language models (LLM), retrieval augmented generation (RAG) has become a hot topic. However throughout the past year of [helping startups](https://jxnl.co) integrate LLMs into their stack I've noticed that the pattern of taking user queries, embedding them, and directly searching a vector store is effectively demoware. !!! note "What is RAG?" - Retrival augmented generation (RAG) is a technique that uses an LLM to generate responses, but uses a search backend to augment the generation. In the past year using text embeddings with a vector databases has been the most popular approach I've seen being socialized. + Retrieval augmented generation (RAG) is a technique that uses an LLM to generate responses, but uses a search backend to augment the generation. In the past year using text embeddings with a vector databases has been the most popular approach I've seen being socialized.
![RAG](img/dumb_rag.png) @@ -30,7 +30,7 @@ So let's kick things off by examining what I like to call the 'Dumb' RAG Model ## The 'Dumb' RAG Model -When you ask a question like, "what is the capital of France?" The RAG 'dumb' model embeds the query and searches in some unopinonated search endpoint. Limited to a single method API like `search(query: str) -> List[str]`. This is fine for simple queries, since you'd expect words like 'paris is the capital of france' to be in the top results of say, your wikipedia embeddings. +When you ask a question like, "what is the capital of France?" The RAG 'dumb' model embeds the query and searches in some unopinionated search endpoint. Limited to a single method API like `search(query: str) -> List[str]`. This is fine for simple queries, since you'd expect words like 'paris is the capital of france' to be in the top results of say, your wikipedia embeddings. ### Why is this a problem? @@ -39,7 +39,7 @@ When you ask a question like, "what is the capital of France?" The RAG 'dumb' mo - **Monolithic Search Backend**: Assumes a single search backend, which is not always the case. You may have multiple search backends, each with their own API, and you want to route the query to vector stores, search clients, sql databases, and more. - **Limitation of text search**: Restricts complex queries to a single string (`{query: str}`), sacrificing expressiveness, in using keywords, filters, and other advanced features. For example, asking `what problems did we fix last week` cannot be answered by a simple text search since documents that contain `problem, last week` are going to be present at every week. -- **Limited ability to plan**: Assumes that the query is the only input to the search backend, but you may want to use other information to improve the search, like the user's location, or the time of day using the context to rewrite the query. For example, if you present the language model of more context its able to plan a suite of queries to execute to return the best results. +- **Limited ability to plan**: Assumes that the query is the only input to the search backend, but you may want to use other information to improve the search, like the user's location, or the time of day using the context to rewrite the query. For example, if you present the language model of more context it is able to plan a suite of queries to execute to return the best results. Now let's dive into how we can make it smarter with query understanding. This is where things get interesting. @@ -165,7 +165,7 @@ class SearchClient(BaseModel): elif self.source == ClientSource.CALENDAR: ... -class Retrival(BaseModel): +class Retrieval(BaseModel): queries: List[SearchClient] async def execute(self) -> str: @@ -181,9 +181,9 @@ from openai import OpenAI # Enables response_model in the openai client client = instructor.patch(OpenAI()) -retrival = client.chat.completions.create( +retrieval = client.chat.completions.create( model="gpt-4", - response_model=Retrival, + response_model=Retrieval, messages=[ {"role": "system", "content": "You are Jason's personal assistant."}, {"role": "user", "content": "What do I have today?"} @@ -220,13 +220,13 @@ retrival = client.chat.completions.create( Notice that we have a list of queries that route to different search backends (email and calendar). We can even dispatch them async to be as performance as possible. Not only do we dispatch to different backends (that we have no control over), but you are likely going to render them to the user differently as well. Perhaps you want to summarize the emails in text, but you want to render the calendar events as a list that they can scroll across on a mobile app. !!! Note "Can I used framework X?" -I get this question a lot, but it's just code. Within these dispatchs you can do whatever you want. You can use `input()` to ask the user for more information, make a post request, call a Langchain agent or LLamaindex query engine to get more information. The sky is the limit. +I get this question a lot, but it's just code. Within these dispatches you can do whatever you want. You can use `input()` to ask the user for more information, make a post request, call a Langchain agent or LLamaindex query engine to get more information. The sky is the limit. -Both of these examples showcase how both search providors and consumers can use `instructor` to model their systems. This is a powerful pattern that allows you to build a system that can be used by anyone, and can be used to build an LLM layer, from scratch, in front of any arbitrary backend. +Both of these examples showcase how both search providers and consumers can use `instructor` to model their systems. This is a powerful pattern that allows you to build a system that can be used by anyone, and can be used to build an LLM layer, from scratch, in front of any arbitrary backend. ## Conclusion -This isnt about fancy embedding tricks, it's just plain old information retrival and query understanding. The beauty of instructor is that it simplifies modeling the complex and lets you define the output of the language model, the prompts, and the payload we send to the backend in a single place. +This is not about fancy embedding tricks, it's just plain old information retrieval and query understanding. The beauty of instructor is that it simplifies modeling the complex and lets you define the output of the language model, the prompts, and the payload we send to the backend in a single place. ## What's Next? diff --git a/docs/blog/posts/validation-part1.md b/docs/blog/posts/validation-part1.md index bf9b850..45cb64f 100644 --- a/docs/blog/posts/validation-part1.md +++ b/docs/blog/posts/validation-part1.md @@ -404,7 +404,7 @@ This approach provides a layer of defense against two types of bad outputs: ### Define the Response Model with Validators -To keep things simple lets assume we have a model that returns a `UserModel` object. We can define the response model using Pydantic and add a field validator to ensure that the name is in uppercase. +To keep things simple let's assume we have a model that returns a `UserModel` object. We can define the response model using Pydantic and add a field validator to ensure that the name is in uppercase. ```python from pydantic import BaseModel, field_validator diff --git a/docs/examples/planning-tasks.md b/docs/examples/planning-tasks.md index 2de0915..468a863 100644 --- a/docs/examples/planning-tasks.md +++ b/docs/examples/planning-tasks.md @@ -113,7 +113,7 @@ plan.model_dump() !!! warning "No RAG" - While we build the query plan in this example, we do not propose a method to actually answer the question. You can implement your own answer function that perhaps makes a retrival and calls openai for retrival augmented generation. That step would also make use of function calls but goes beyond the scope of this example. + While we build the query plan in this example, we do not propose a method to actually answer the question. You can implement your own answer function that perhaps makes a retrieval and calls openai for retrieval augmented generation. That step would also make use of function calls but goes beyond the scope of this example. ```python { diff --git a/tutorials/3.0.applications-rag.ipynb b/tutorials/3.0.applications-rag.ipynb index e62129a..4aed6b2 100644 --- a/tutorials/3.0.applications-rag.ipynb +++ b/tutorials/3.0.applications-rag.ipynb @@ -687,7 +687,7 @@ " date_range: DateRange\n", "\n", "\n", - "class Retrival(BaseModel):\n", + "class Retrieval(BaseModel):\n", " queries: List[SearchClient]" ] }, @@ -741,9 +741,9 @@ } ], "source": [ - "retrival = client.chat.completions.create(\n", + "retrieval = client.chat.completions.create(\n", " model=\"gpt-3.5-turbo\",\n", - " response_model=Retrival,\n", + " response_model=Retrieval,\n", " messages=[\n", " {\n", " \"role\": \"system\",\n", @@ -754,7 +754,7 @@ " {\"role\": \"user\", \"content\": \"What do I have today for work? any new emails?\"},\n", " ],\n", ")\n", - "print(retrival.model_dump_json(indent=4))" + "print(retrieval.model_dump_json(indent=4))" ] }, { @@ -810,9 +810,9 @@ } ], "source": [ - "retrival = client.chat.completions.create(\n", + "retrieval = client.chat.completions.create(\n", " model=\"gpt-4-1106-preview\",\n", - " response_model=Retrival,\n", + " response_model=Retrieval,\n", " messages=[\n", " {\n", " \"role\": \"system\",\n", @@ -826,7 +826,7 @@ " },\n", " ],\n", ")\n", - "print(retrival.model_dump_json(indent=4))" + "print(retrieval.model_dump_json(indent=4))" ] }, { @@ -908,7 +908,7 @@ " )\n", "\n", "\n", - "retrival = client.chat.completions.create(\n", + "retrieval = client.chat.completions.create(\n", " model=\"gpt-4-1106-preview\",\n", " response_model=QueryPlan,\n", " messages=[\n", @@ -923,7 +923,7 @@ " ],\n", ")\n", "\n", - "print(retrival.model_dump_json(indent=4))" + "print(retrieval.model_dump_json(indent=4))" ] }, {