Blog: Mastering Caching (#219)

2026-06-05 22:50:18 +00:00 · 2023-11-25 19:40:35 -05:00
parent 359c5f9295
commit d65150a0f9
10 changed files with 824 additions and 28 deletions
@@ -160,3 +160,4 @@ cython_debug/
 #  option (not recommended) you can uncomment the following to ignore the entire idea folder.
 #.idea/
 examples/citation_with_extraction/fly.toml
+my_cache_directory/
@@ -12,6 +12,7 @@ The goal of the blog is to capture some content that does not neatly fit within

 ## Learning Python

+- [How can I effectively cache my functions in Python?](posts/caching.md)
 - [What are the fundamentals of batch processing with async in Python?](posts/learn-async.md)

 ## Talks
@@ -0,0 +1,329 @@
+---
+draft: False
+date: 2023-11-26
+slug: python-caching
+tags:
+  - caching
+  - functools
+  - redis
+  - diskcache
+  - python
+authors:
+  - jxnl
+---
+
+# Introduction to Caching in Python
+
+> Instructor make working with language models easy, but they are still computationally expensive.
+
+Today, we're diving into optimizing instructor code while maintaining the excellent DX offered by [Pydantic](https://docs.pydantic.dev/latest/) models. We'll tackle the challenges of caching Pydantic models, typically incompatible with `pickle`, and explore solutions that use `decorators` like `functools.cache`. Then, we'll craft custom decorators with `diskcache` and `redis` to support persistent caching and distributed systems.
+
+Lets first consider our canonical example, using the `OpenAI` Python client to extract user details.
+
+```python
+import instructor
+from openai import OpenAI
+from pydantic import BaseModel
+
+# Enables `response_model`
+client = instructor.patch(OpenAI())
+
+class UserDetail(BaseModel):
+    name: str
+    age: int
+
+def extract(data) -> UserDetail:
+    return client.chat.completions.create(
+    model="gpt-3.5-turbo",
+    response_model=UserDetail,
+    messages=[
+        {"role": "user", "content": data},
+    ]
+)
+```
+
+Now imagine batch processing data, running tests or experiments, or simply calling `extract` multiple times over a workflow. We'll quickly run into performance issues, as the function may be called repeatedly, and the same data will be processed over and over again, costing us time and money.
+
+## 1. `functools.cache` for Simple In-Memory Caching
+
+**When to Use**: Ideal for functions with immutable arguments, called repeatedly with the same parameters in small to medium-sized applications. This makes sense when we might be reusing the same data within a single session. or in an application where we don't need to persist the cache between sessions.
+
+```python
+import functools
+
+@functools.cache
+def extract(data):
+    return client.chat.completions.create(
+        model="gpt-3.5-turbo",
+        response_model=UserDetail,
+        messages=[
+            {"role": "user", "content": data},
+        ]
+    )
+```
+
+!!! warning "Changing the Model does not Invalidate the Cache"
+
+    Note that changing the model does not invalidate the cache. This is because the cache key is based on the function's name and arguments, not the model. This means that if we change the model, the cache will still return the old result.
+
+Now we can call `extract` multiple times with the same argument, and the result will be cached in memory for faster access.
+
+```python hl_lines="4 8 12"
+import time
+
+start = time.perf_counter() # (1)
+model = extract("Extract jason is 25 years old")
+print(f"Time taken: {time.perf_counter() - start}")
+
+start = time.perf_counter()
+model = extract("Extract jason is 25 years old") # (2)
+print(f"Time taken: {time.perf_counter() - start}")
+
+>>> Time taken: 0.9267581660533324
+>>> Time taken: 1.2080417945981026e-06 # (3)
+```
+
+1. Using `time.perf_counter()` to measure the time taken to run the function is better than using `time.time()` because it's more accurate and less susceptible to system clock changes.
+2. The second time we call `extract`, the result is returned from the cache, and the function is not called.
+3. The second call to `extract` is much faster because the result is returned from the cache!
+
+**Benefits**: Easy to implement, provides fast access due to in-memory storage, and requires no additional libraries.
+
+??? question "What is a decorator?"
+
+    A decorator is a function that takes another function and extends the behavior of the latter function without explicitly modifying it. In Python, decorators are functions that take a function as an argument and return a closure.
+
+    ```python hl_lines="3-5 9"
+    def decorator(func):
+        def wrapper(*args, **kwargs):
+            print("Do something before") # (1)
+            result = func(*args, **kwargs)
+            print("Do something after") # (2)
+            return result
+        return wrapper
+
+    @decorator
+    def say_hello():
+        print("Hello!")
+
+    say_hello()
+    >>> "Do something before"
+    >>> "Hello!"
+    >>> "Do something after"
+    ```
+
+    1. The code is executed before the function is called
+    2. The code is executed after the function is called
+
+## 2. `diskcache` for Persistent, Large Data Caching
+
+??? note "Copy Caching Code"
+
+    We'll be using the same `instructor_cache` decorator for both `diskcache` and `redis` caching. You can copy the code below and use it for both examples.
+
+    ```python
+    import functools
+    import inspect
+    import diskcache
+
+    cache = diskcache.Cache('./my_cache_directory') # (1)
+
+    def instructor_cache(func):
+        """Cache a function that returns a Pydantic model"""
+        return_type = inspect.signature(func).return_annotation
+        if not issubclass(return_type, BaseModel): # (2)
+            raise ValueError("The return type must be a Pydantic model")
+
+        @functools.wraps(func)
+        def wrapper(*args, **kwargs):
+            key = f"{func.__name__}-{functools._make_key(args, kwargs, typed=False)}"
+            # Check if the result is already cached
+            if (cached := cache.get(key)) is not None:
+                # Deserialize from JSON based on the return type
+                return return_type.model_validate_json(cached)
+
+            # Call the function and cache its result
+            result = func(*args, **kwargs)
+            serialized_result = result.model_dump_json()
+            cache.set(key, serialized_result)
+
+            return result
+
+        return wrapper
+    ```
+
+    1. We create a new `diskcache.Cache` instance to store the cached data. This will create a new directory called `my_cache_directory` in the current working directory.
+    2. We only want to cache functions that return a Pydantic model to simplify serialization and deserialization logic in this example code
+
+    Remember that you can change this code to support non-Pydantic models, or to use a different caching backend. More over, don't forget that this cache does not invalidate when the model changes, so you might want to encode the `Model.model_json_schema()` as part of the key.
+
+**When to Use**: Suitable for applications needing cache persistence between sessions or dealing with large datasets. This is useful when we want to reuse the same data across multiple sessions, or when we need to store large amounts of data!
+
+```python hl_lines="10"
+import functools
+import inspect
+import instructor
+import diskcache
+
+from openai import OpenAI
+from pydantic import BaseModel
+
+client = instructor.patch(OpenAI())
+cache = diskcache.Cache('./my_cache_directory')
+
+
+def instructor_cache(func):
+    """Cache a function that returns a Pydantic model"""
+    return_type = inspect.signature(func).return_annotation # (4)
+    if not issubclass(return_type, BaseModel): # (1)
+        raise ValueError("The return type must be a Pydantic model")
+
+    @functools.wraps(func)
+    def wrapper(*args, **kwargs):
+        key = f"{func.__name__}-{functools._make_key(args, kwargs, typed=False)}" #  (2)
+        # Check if the result is already cached
+        if (cached := cache.get(key)) is not None:
+            # Deserialize from JSON based on the return type (3)
+            return return_type.model_validate_json(cached)
+
+        # Call the function and cache its result
+        result = func(*args, **kwargs)
+        serialized_result = result.model_dump_json()
+        cache.set(key, serialized_result)
+
+        return result
+
+    return wrapper
+
+class UserDetail(BaseModel):
+    name: str
+    age: int
+
+@instructor_cache
+def extract(data) -> UserDetail:
+    return client.chat.completions.create(
+        model="gpt-3.5-turbo",
+        response_model=UserDetail,
+        messages=[
+            {"role": "user", "content": data},
+        ]
+    )
+```
+
+1. We only want to cache functions that return a Pydantic model to simplify serialization and deserialization logic
+2. We use functool's `_make_key` to generate a unique key based on the function's name and arguments. This is important because we want to cache the result of each function call separately.
+3. We use Pydantic's `model_validate_json` to deserialize the cached result into a Pydantic model.
+4. We use `inspect.signature` to get the function's return type annotation, which we use to validate the cached result.
+
+**Benefits**: Reduces computation time for heavy data processing, provides disk-based caching for persistence.
+
+## 2. Redis Caching Decorator for Distributed Systems
+
+??? note "Copy Caching Code"
+
+    We'll be using the same `instructor_cache` decorator for both `diskcache` and `redis` caching. You can copy the code below and use it for both examples.
+
+    ```python
+    import functools
+    import inspect
+    import redis
+
+    cache = redis.Redis("localhost")
+
+    def instructor_cache(func):
+        """Cache a function that returns a Pydantic model"""
+        return_type = inspect.signature(func).return_annotation
+        if not issubclass(return_type, BaseModel):
+            raise ValueError("The return type must be a Pydantic model")
+
+        @functools.wraps(func)
+        def wrapper(*args, **kwargs):
+            key = f"{func.__name__}-{functools._make_key(args, kwargs, typed=False)}"
+            # Check if the result is already cached
+            if (cached := cache.get(key)) is not None:
+                # Deserialize from JSON based on the return type
+                return return_type.model_validate_json(cached)
+
+            # Call the function and cache its result
+            result = func(*args, **kwargs)
+            serialized_result = result.model_dump_json()
+            cache.set(key, serialized_result)
+
+            return result
+
+        return wrapper
+    ```
+
+    Remember that you can change this code to support non-Pydantic models, or to use a different caching backend. More over, don't forget that this cache does not invalidate when the model changes, so you might want to encode the `Model.model_json_schema()` as part of the key.
+
+**When to Use**: Recommended for distributed systems where multiple processes need to access the cached data, or for applications requiring fast read/write access and handling complex data structures.
+
+```python
+import redis
+import functools
+import inspect
+import json
+import instructor
+
+from pydantic import BaseModel
+from openai import OpenAI
+
+client = instructor.patch(OpenAI())
+cache = redis.Redis("localhost")
+
+def instructor_cache(func):
+    """Cache a function that returns a Pydantic model"""
+    return_type = inspect.signature(func).return_annotation
+    if not issubclass(return_type, BaseModel): # (1)
+        raise ValueError("The return type must be a Pydantic model")
+
+    @functools.wraps(func)
+    def wrapper(*args, **kwargs):
+        key = f"{func.__name__}-{functools._make_key(args, kwargs, typed=False)}" # (2)
+        # Check if the result is already cached
+        if (cached := cache.get(key)) is not None:
+            # Deserialize from JSON based on the return type
+            return return_type.model_validate_json(cached)
+
+        # Call the function and cache its result
+        result = func(*args, **kwargs)
+        serialized_result = result.model_dump_json()
+        cache.set(key, serialized_result)
+
+        return result
+
+    return wrapper
+
+
+class UserDetail(BaseModel):
+    name: str
+    age: int
+
+@instructor_cache
+def extract(data) -> UserDetail:
+    # Assuming client.chat.completions.create returns a UserDetail instance
+    return client.chat.completions.create(
+        model="gpt-3.5-turbo",
+        response_model=UserDetail,
+        messages=[
+            {"role": "user", "content": data},
+        ]
+    )
+```
+
+1. We only want to cache functions that return a Pydantic model to simplify serialization and deserialization logic
+2. We use functool's `_make_key` to generate a unique key based on the function's name and arguments. This is important because we want to cache the result of each function call separately.
+
+**Benefits**: Scalable for large-scale systems, supports fast in-memory data storage and retrieval, and is versatile for various data types.
+
+!!! note "Looking carefully"
+
+    If you look carefully at the code above you'll notice that we're using the same `instructor_cache` decorator as before. The implementatino is the same, but we're using a different caching backend!
+
+## Conclusion
+
+Choosing the right caching strategy depends on your application's specific needs, such as the size and type of data, the need for persistence, and the system's architecture. Whether it's optimizing a function's performance in a small application or managing large datasets in a distributed environment, Python offers robust solutions to improve efficiency and reduce computational overhead.
+
+If you'd like to use this code, try to send it over to ChatGPT to understand it more, and to add additional features that might matter for you, for example, the cache isn't invalidated when your BaseModel changes, so you might want to encode the `Model.model_json_schema()` as part of the key.
+
+If you like the content check out our [GitHub](https://github.com/jxnl/instructor) as give us a star and checkout the library.
@@ -265,4 +265,4 @@ except ValidationError as e:

 These examples demonstrate the potential of using Pydantic and OpenAI to enhance data accuracy through citation verification. While the LLM-based approach may not be efficient for runtime operations, it has exciting implications for generating a dataset of accurate responses. By leveraging this method during data generation, we can fine-tune a model that excels in citation accuracy. Similar to our last post on [finetuning a better summarizer](https://jxnl.github.io/instructor/blog/2023/11/05/chain-of-density/).

-If you like the content check out our [GitHub](https://github.com/jxnl/instructor) as give us a start and checkout the library.
+If you like the content check out our [GitHub](https://github.com/jxnl/instructor) as give us a star and checkout the library.
@@ -1,5 +1,5 @@
 ---
-draft: False 
+draft: False
 date: 2023-09-17
 tags:
  - RAG
@@ -16,7 +16,7 @@ authors:
 With the advent of large language models (LLM), retrival augmented generation (RAG) has become a hot topic. However throught the past year of [helping startups](https://jxnl.notion.site/Working-with-me-ec2bb36a5ac048c2a8f6bd888faea6c2?pvs=4) integrate LLMs into their stack I've noticed that the pattern of taking user queries, embedding them, and directly searching a vector store is effectively demoware.

 !!! note "What is RAG?"
-    Retrival augmented generation (RAG) is a technique that uses an LLM to generate responses, but uses a search backend to augment the generation. In the past year using text embeddings with a vector databases has been the most popular approach I've seen being socialized.
+Retrival augmented generation (RAG) is a technique that uses an LLM to generate responses, but uses a search backend to augment the generation. In the past year using text embeddings with a vector databases has been the most popular approach I've seen being socialized.

 <figure markdown>
  ![RAG](img/dumb_rag.png)
@@ -34,7 +34,6 @@ When you ask a question like, "what is the capital of France?" The RAG 'dumb' mo
 - **Query-Document Mismatch**: This model assumes that query embedding and the content embedding are similar in the embedding space, which is not always true based on the text you're trying to search over. Only using queries that are semantically similar to the content is a huge limitation!

 - **Monolithic Search Backend**: Assumes a single search backend, which is not always the case. You may have multiple search backends, each with their own API, and you want to route the query to vector stores, search clients, sql databases, and more.
-  
 - **Limitation of text search**: Restricts complex queries to a single string (`{query: str}`), sacrificing expressiveness, in using keywords, filters, and other advanced features. For example, asking `what problems did we fix last week` cannot be answered by a simple text search since documents that contain `problem, last week` are going to be present at every week.

 - **Limited ability to plan**: Assumes that the query is the only input to the search backend, but you may want to use other information to improve the search, like the user's location, or the time of day using the context to rewrite the query. For example, if you present the language model of more context its able to plan a suite of queries to execute to return the best results.
@@ -44,10 +43,9 @@ Now let's dive into how we can make it smarter with query understanding. This is
 ## Improving the RAG Model with Query Understanding

 !!! note "Shoutouts"
-    Much of this work has been inspired by / done in collab with a few of my clients at [new.computer](https://new.computer), [Metaphor Systems](https://metaphor.systems),  and [Naro](https://narohq.com), go check them out!
+Much of this work has been inspired by / done in collab with a few of my clients at [new.computer](https://new.computer), [Metaphor Systems](https://metaphor.systems), and [Naro](https://narohq.com), go check them out!

-
-Ultimately what you want to deploy is a [system that understands](https://en.wikipedia.org/wiki/Query_understanding) how to take the query and rewrite it to improve precision and recall. 
+Ultimately what you want to deploy is a [system that understands](https://en.wikipedia.org/wiki/Query_understanding) how to take the query and rewrite it to improve precision and recall.

 <figure markdown>
  ![RAG](img/query_understanding.png)
@@ -89,10 +87,10 @@ class MetaphorQuery(BaseModel):
        return await metaphor.search(...)
 ```

-Note how we model a rewritten query, range of published dates, and a list of domains to search in. This powerful pattern allows the user query to be restructured for better performance without the user having to know the details of how the search backend works. 
+Note how we model a rewritten query, range of published dates, and a list of domains to search in. This powerful pattern allows the user query to be restructured for better performance without the user having to know the details of how the search backend works.

 ```python
-import instructor 
+import instructor
 from openai import OpenAI

 # Enables response_model in the openai client
@@ -103,11 +101,11 @@ query = client.chat.completions.create(
    response_model=MetaphorQuery,
    messages=[
        {
-            "role": "system", 
+            "role": "system",
            "content": "You're a query understanding system for the Metafor Systems search engine. Here are some tips: ..."
        },
        {
-            "role": "user", 
+            "role": "user",
            "content": "What are some recent developments in AI?"
        }
    ],
@@ -118,12 +116,12 @@ query = client.chat.completions.create(

 ```json
 {
-    "rewritten_query": "novel developments advancements ai artificial intelligence machine learning",
-    "published_daterange": {
-        "start": "2023-09-17",
-        "end": "2021-06-17"
-    },
-    "domains_allow_list": ["arxiv.org"]
+  "rewritten_query": "novel developments advancements ai artificial intelligence machine learning",
+  "published_daterange": {
+    "start": "2023-09-17",
+    "end": "2021-06-17"
+  },
+  "domains_allow_list": ["arxiv.org"]
 }
 ```

@@ -174,7 +172,7 @@ class Retrival(BaseModel):
 Now we can call this with a simple query like "What do I have today?" and it will try to async dispatch to the correct backend. It's still important to prompt the language model well, but we'll leave that for another day.

 ```python
-import instructor 
+import instructor
 from openai import OpenAI

 # Enables response_model in the openai client
@@ -219,7 +217,7 @@ retrival = client.chat.completions.create(
 Notice that we have a list of queries that route to different search backends (email and calendar). We can even dispatch them async to be as performance as possible. Not only do we dispatch to different backends (that we have no control over), but you are likely going to render them to the user differently as well. Perhaps you want to summarize the emails in text, but you want to render the calendar events as a list that they can scroll across on a mobile app.

 !!! Note "Can I used framework X?"
-    I get this question a lot, but it's just code. Within these dispatchs you can do whatever you want. You can use `input()` to ask the user for more information, make a post request, call a Langchain agent or LLamaindex query engine to get more information. The sky is the limit.
+I get this question a lot, but it's just code. Within these dispatchs you can do whatever you want. You can use `input()` to ask the user for more information, make a post request, call a Langchain agent or LLamaindex query engine to get more information. The sky is the limit.

 Both of these examples showcase how both search providors and consumers can use `instructor` to model their systems. This is a powerful pattern that allows you to build a system that can be used by anyone, and can be used to build an LLM layer, from scratch, in front of any arbitrary backend.

@@ -231,7 +229,4 @@ This isnt about fancy embedding tricks, it's just plain old information retrival

 Here I want to show that `instructor`` isn’t just about data extraction. It’s a powerful framework for building a data model and integrating it with your LLM. Structured output is just the beginning — the untapped goldmine is skilled use of tools and APIs.

-I believe collaboration between domain experts and AI engineers is the key to enable advanced tool use. I’ve been building a new tool on top of instructor that enables seamless collaboration and experimentation on LLMs with structured outputs. If you’re interested, visit [useinstructor.com](https://useinstructor.com) and take our survey to join the waitlist.
-
-
-If you enjoy the content or want to try out `instructor` please check out the [github](https://github.com/jxnl/instructor) and give us a star!
+If you enjoy the content or want to try out `instructor` please check out the [github](https://github.com/jxnl/instructor) and give us a star!
@@ -0,0 +1,286 @@
+If you want to learn more about concepts in caching and how to use them in your own projects, check out our [blog](../blog/posts/caching.md) on the topic.
+
+## 1. `functools.cache` for Simple In-Memory Caching
+
+**When to Use**: Ideal for functions with immutable arguments, called repeatedly with the same parameters in small to medium-sized applications. This makes sense when we might be reusing the same data within a single session. or in an application where we don't need to persist the cache between sessions.
+
+```python
+import functools
+import instructor
+
+from openai import OpenAI
+
+client = instructor.patch(OpenAI())
+
+class UserDetail(BaseModel):
+    name: str
+    age: int
+
+@functools.cache
+def extract(data) -> UserDetail:
+    return client.chat.completions.create(
+        model="gpt-3.5-turbo",
+        response_model=UserDetail,
+        messages=[
+            {"role": "user", "content": data},
+        ]
+    )
+```
+
+!!! warning "Changing the Model does not Invalidate the Cache"
+
+    Note that changing the model does not invalidate the cache. This is because the cache key is based on the function's name and arguments, not the model. This means that if we change the model, the cache will still return the old result.
+
+Now we can call `extract` multiple times with the same argument, and the result will be cached in memory for faster access.
+
+```python hl_lines="4 8 12"
+import time
+
+start = time.perf_counter() # (1)
+model = extract("Extract jason is 25 years old")
+print(f"Time taken: {time.perf_counter() - start}")
+
+start = time.perf_counter()
+model = extract("Extract jason is 25 years old") # (2)
+print(f"Time taken: {time.perf_counter() - start}")
+
+>>> Time taken: 0.9267581660533324
+>>> Time taken: 1.2080417945981026e-06 # (3)
+```
+
+1. Using `time.perf_counter()` to measure the time taken to run the function is better than using `time.time()` because it's more accurate and less susceptible to system clock changes.
+2. The second time we call `extract`, the result is returned from the cache, and the function is not called.
+3. The second call to `extract` is much faster because the result is returned from the cache!
+
+**Benefits**: Easy to implement, provides fast access due to in-memory storage, and requires no additional libraries.
+
+??? question "What is a decorator?"
+
+    A decorator is a function that takes another function and extends the behavior of the latter function without explicitly modifying it. In Python, decorators are functions that take a function as an argument and return a closure.
+
+    ```python hl_lines="3-5 9"
+    def decorator(func):
+        def wrapper(*args, **kwargs):
+            print("Do something before") # (1)
+            result = func(*args, **kwargs)
+            print("Do something after") # (2)
+            return result
+        return wrapper
+
+    @decorator
+    def say_hello():
+        print("Hello!")
+
+    say_hello()
+    >>> "Do something before"
+    >>> "Hello!"
+    >>> "Do something after"
+    ```
+
+    1. The code is executed before the function is called
+    2. The code is executed after the function is called
+
+## 2. `diskcache` for Persistent, Large Data Caching
+
+??? note "Copy Caching Code"
+
+    We'll be using the same `instructor_cache` decorator for both `diskcache` and `redis` caching. You can copy the code below and use it for both examples.
+
+    ```python
+    import functools
+    import inspect
+    import diskcache
+
+    cache = diskcache.Cache('./my_cache_directory') # (1)
+
+    def instructor_cache(func):
+        """Cache a function that returns a Pydantic model"""
+        return_type = inspect.signature(func).return_annotation
+        if not issubclass(return_type, BaseModel): # (2)
+            raise ValueError("The return type must be a Pydantic model")
+
+        @functools.wraps(func)
+        def wrapper(*args, **kwargs):
+            key = f"{func.__name__}-{functools._make_key(args, kwargs, typed=False)}"
+            # Check if the result is already cached
+            if (cached := cache.get(key)) is not None:
+                # Deserialize from JSON based on the return type
+                return return_type.model_validate_json(cached)
+
+            # Call the function and cache its result
+            result = func(*args, **kwargs)
+            serialized_result = result.model_dump_json()
+            cache.set(key, serialized_result)
+
+            return result
+
+        return wrapper
+    ```
+
+    1. We create a new `diskcache.Cache` instance to store the cached data. This will create a new directory called `my_cache_directory` in the current working directory.
+    2. We only want to cache functions that return a Pydantic model to simplify serialization and deserialization logic in this example code
+
+    Remember that you can change this code to support non-Pydantic models, or to use a different caching backend. More over, don't forget that this cache does not invalidate when the model changes, so you might want to encode the `Model.model_json_schema()` as part of the key.
+
+**When to Use**: Suitable for applications needing cache persistence between sessions or dealing with large datasets. This is useful when we want to reuse the same data across multiple sessions, or when we need to store large amounts of data!
+
+```python hl_lines="10"
+import functools
+import inspect
+import instructor
+import diskcache
+
+from openai import OpenAI
+from pydantic import BaseModel
+
+client = instructor.patch(OpenAI())
+cache = diskcache.Cache('./my_cache_directory')
+
+
+def instructor_cache(func):
+    """Cache a function that returns a Pydantic model"""
+    return_type = inspect.signature(func).return_annotation # (4)
+    if not issubclass(return_type, BaseModel): # (1)
+        raise ValueError("The return type must be a Pydantic model")
+
+    @functools.wraps(func)
+    def wrapper(*args, **kwargs):
+        key = f"{func.__name__}-{functools._make_key(args, kwargs, typed=False)}" #  (2)
+        # Check if the result is already cached
+        if (cached := cache.get(key)) is not None:
+            # Deserialize from JSON based on the return type (3)
+            return return_type.model_validate_json(cached)
+
+        # Call the function and cache its result
+        result = func(*args, **kwargs)
+        serialized_result = result.model_dump_json()
+        cache.set(key, serialized_result)
+
+        return result
+
+    return wrapper
+
+class UserDetail(BaseModel):
+    name: str
+    age: int
+
+@instructor_cache
+def extract(data) -> UserDetail:
+    return client.chat.completions.create(
+        model="gpt-3.5-turbo",
+        response_model=UserDetail,
+        messages=[
+            {"role": "user", "content": data},
+        ]
+    )
+```
+
+1. We only want to cache functions that return a Pydantic model to simplify serialization and deserialization logic
+2. We use functool's `_make_key` to generate a unique key based on the function's name and arguments. This is important because we want to cache the result of each function call separately.
+3. We use Pydantic's `model_validate_json` to deserialize the cached result into a Pydantic model.
+4. We use `inspect.signature` to get the function's return type annotation, which we use to validate the cached result.
+
+**Benefits**: Reduces computation time for heavy data processing, provides disk-based caching for persistence.
+
+## 2. Redis Caching Decorator for Distributed Systems
+
+??? note "Copy Caching Code"
+
+    We'll be using the same `instructor_cache` decorator for both `diskcache` and `redis` caching. You can copy the code below and use it for both examples.
+
+    ```python
+    import functools
+    import inspect
+    import redis
+
+    cache = redis.Redis("localhost")
+
+    def instructor_cache(func):
+        """Cache a function that returns a Pydantic model"""
+        return_type = inspect.signature(func).return_annotation
+        if not issubclass(return_type, BaseModel):
+            raise ValueError("The return type must be a Pydantic model")
+
+        @functools.wraps(func)
+        def wrapper(*args, **kwargs):
+            key = f"{func.__name__}-{functools._make_key(args, kwargs, typed=False)}"
+            # Check if the result is already cached
+            if (cached := cache.get(key)) is not None:
+                # Deserialize from JSON based on the return type
+                return return_type.model_validate_json(cached)
+
+            # Call the function and cache its result
+            result = func(*args, **kwargs)
+            serialized_result = result.model_dump_json()
+            cache.set(key, serialized_result)
+
+            return result
+
+        return wrapper
+    ```
+
+    Remember that you can change this code to support non-Pydantic models, or to use a different caching backend. More over, don't forget that this cache does not invalidate when the model changes, so you might want to encode the `Model.model_json_schema()` as part of the key.
+
+**When to Use**: Recommended for distributed systems where multiple processes need to access the cached data, or for applications requiring fast read/write access and handling complex data structures.
+
+```python
+import redis
+import functools
+import inspect
+import json
+import instructor
+
+from pydantic import BaseModel
+from openai import OpenAI
+
+client = instructor.patch(OpenAI())
+cache = redis.Redis("localhost")
+
+def instructor_cache(func):
+    """Cache a function that returns a Pydantic model"""
+    return_type = inspect.signature(func).return_annotation
+    if not issubclass(return_type, BaseModel): # (1)
+        raise ValueError("The return type must be a Pydantic model")
+
+    @functools.wraps(func)
+    def wrapper(*args, **kwargs):
+        key = f"{func.__name__}-{functools._make_key(args, kwargs, typed=False)}" # (2)
+        # Check if the result is already cached
+        if (cached := cache.get(key)) is not None:
+            # Deserialize from JSON based on the return type
+            return return_type.model_validate_json(cached)
+
+        # Call the function and cache its result
+        result = func(*args, **kwargs)
+        serialized_result = result.model_dump_json()
+        cache.set(key, serialized_result)
+
+        return result
+
+    return wrapper
+
+
+class UserDetail(BaseModel):
+    name: str
+    age: int
+
+@instructor_cache
+def extract(data) -> UserDetail:
+    # Assuming client.chat.completions.create returns a UserDetail instance
+    return client.chat.completions.create(
+        model="gpt-3.5-turbo",
+        response_model=UserDetail,
+        messages=[
+            {"role": "user", "content": data},
+        ]
+    )
+```
+
+1. We only want to cache functions that return a Pydantic model to simplify serialization and deserialization logic
+2. We use functool's `_make_key` to generate a unique key based on the function's name and arguments. This is important because we want to cache the result of each function call separately.
+
+**Benefits**: Scalable for large-scale systems, supports fast in-memory data storage and retrieval, and is versatile for various data types.
+
+!!! note "Looking carefully"
+
+    If you look carefully at the code above you'll notice that we're using the same `instructor_cache` decorator as before. The implementation is the same, but we're using a different caching backend!
@@ -0,0 +1,71 @@
+import functools
+import inspect
+import instructor
+import diskcache
+
+from openai import OpenAI
+from pydantic import BaseModel
+
+client = instructor.patch(OpenAI())
+
+class UserDetail(BaseModel):
+    name: str
+    age: int
+
+cache = diskcache.Cache('./my_cache_directory')
+
+def instructor_cache(func):
+    """Cache a function that returns a Pydantic model"""
+    return_type = inspect.signature(func).return_annotation
+    if not issubclass(return_type, BaseModel):
+        raise ValueError("The return type must be a Pydantic model")
+
+    @functools.wraps(func)
+    def wrapper(*args, **kwargs):
+        key = f"{func.__name__}-{functools._make_key(args, kwargs, typed=False)}"
+        # Check if the result is already cached
+        if (cached := cache.get(key)) is not None:
+            # Deserialize from JSON based on the return type
+            if issubclass(return_type, BaseModel):
+                return return_type.model_validate_json(cached)
+
+        # Call the function and cache its result
+        result = func(*args, **kwargs)
+        serialized_result = result.model_dump_json()
+        cache.set(key, serialized_result)
+
+        return result
+
+    return wrapper
+
+@instructor_cache
+def extract(data) -> UserDetail:
+    return client.chat.completions.create(
+        model="gpt-3.5-turbo",
+        response_model=UserDetail,
+        messages=[
+            {"role": "user", "content": data},
+        ]
+    )
+
+
+def test_extract():
+    import time 
+
+    start = time.perf_counter()
+    model = extract("Extract jason is 25 years old")
+    assert model.name.lower() == "jason"
+    assert model.age == 25
+    print(f"Time taken: {time.perf_counter() - start}")
+
+    start = time.perf_counter()
+    model = extract("Extract jason is 25 years old")
+    assert model.name.lower() == "jason"
+    assert model.age == 25
+    print(f"Time taken: {time.perf_counter() - start}")
+
+
+if __name__ == "__main__":
+    test_extract()
+    # Time taken: 0.7285366660216823
+    # Time taken: 9.841693099588156e-05
@@ -0,0 +1,71 @@
+import redis
+import functools
+import inspect
+import instructor
+
+from pydantic import BaseModel
+from openai import OpenAI
+
+client = instructor.patch(OpenAI())
+cache = redis.Redis("localhost")
+
+def instructor_cache(func):
+    """Cache a function that returns a Pydantic model"""
+    return_type = inspect.signature(func).return_annotation
+    if not issubclass(return_type, BaseModel):
+        raise ValueError("The return type must be a Pydantic model")
+
+    @functools.wraps(func)
+    def wrapper(*args, **kwargs):
+        key = f"{func.__name__}-{functools._make_key(args, kwargs, typed=False)}"
+        # Check if the result is already cached
+        if (cached := cache.get(key)) is not None:
+            # Deserialize from JSON based on the return type
+            if issubclass(return_type, BaseModel):
+                return return_type.model_validate_json(cached)
+
+        # Call the function and cache its result
+        result = func(*args, **kwargs)
+        serialized_result = result.model_dump_json()
+        cache.set(key, serialized_result)
+
+        return result
+
+    return wrapper
+
+
+class UserDetail(BaseModel):
+    name: str
+    age: int
+
+@instructor_cache
+def extract(data) -> UserDetail:
+    # Assuming client.chat.completions.create returns a UserDetail instance
+    return client.chat.completions.create(
+        model="gpt-3.5-turbo",
+        response_model=UserDetail,
+        messages=[
+            {"role": "user", "content": data},
+        ]
+    )
+
+def test_extract():
+    import time 
+
+    start = time.perf_counter()
+    model = extract("Extract jason is 25 years old")
+    assert model.name.lower() == "jason"
+    assert model.age == 25
+    print(f"Time taken: {time.perf_counter() - start}")
+
+    start = time.perf_counter()
+    model = extract("Extract jason is 25 years old")
+    assert model.name.lower() == "jason"
+    assert model.age == 25
+    print(f"Time taken: {time.perf_counter() - start}")
+
+
+if __name__ == "__main__":
+    test_extract()
+    # Time taken: 0.798335583996959
+    # Time taken: 0.00017016706988215446
@@ -0,0 +1,41 @@
+import instructor
+from openai import OpenAI
+from pydantic import BaseModel
+import functools
+
+client = instructor.patch(OpenAI())
+
+class UserDetail(BaseModel):
+    name: str
+    age: int
+
+@functools.lru_cache
+def extract(data):
+    return client.chat.completions.create(
+        model="gpt-3.5-turbo",
+        response_model=UserDetail,
+        messages=[
+            {"role": "user", "content": data},
+        ]
+    )
+
+
+def test_extract():
+    import time 
+
+    start = time.perf_counter()
+    model = extract("Extract jason is 25 years old")
+    assert model.name.lower() == "jason"
+    assert model.age == 25
+    print(f"Time taken: {time.perf_counter() - start}")
+
+    start = time.perf_counter()
+    model = extract("Extract jason is 25 years old")
+    assert model.name.lower() == "jason"
+    assert model.age == 25
+    print(f"Time taken: {time.perf_counter() - start}")
+
+if __name__ == "__main__":
+    test_extract()
+    # Time taken: 0.9267581660533324
+    # Time taken: 1.2080417945981026e-06
@@ -128,18 +128,19 @@ nav:
    - Contributing: 'contributing.md'
  - Tips: 'concepts/prompting.md'
  - Concepts:
+    - Philosophy: 'concepts/philosophy.md'
    - Models: 'concepts/models.md'
    - Fields: 'concepts/fields.md'
-    - Types: 'concepts/types.md'
+    - Missing: "concepts/maybe.md"
    - Patching: 'concepts/patching.md'
    - Streaming: "concepts/lists.md"
+    - Caching: 'concepts/caching.md'
+    - Validators: "concepts/reask_validation.md"
+    - Distillation: "concepts/distillation.md"
+    - Types: 'concepts/types.md'
    - Union: 'concepts/union.md'
    - Alias: 'concepts/alias.md'
    - Type Adapter: 'concepts/typeadapter.md'
-    - Validators: "concepts/reask_validation.md"
-    - Missing: "concepts/maybe.md"
-    - Distillation: "concepts/distillation.md"
-    - Philosophy: 'concepts/philosophy.md'
  - Cookbook:
    - Overview: 'examples/index.md'
    - Text Classification: 'examples/classification.md'