Files
instructor/docs/examples/exact_citations.md
T
2023-08-16 21:11:06 -07:00

7.9 KiB

Example: Answering Questions with Citations

In this example, we'll demonstrate how to use OpenAI Function Call to ask an AI a question and get back an answer with correct citations. We'll define the necessary data structures using Pydantic and show how to retrieve the citations for each answer.

!!! tips "Motivation" When using AI models to answer questions, it's important to provide accurate and reliable information with appropriate citations. By including citations for each statement, we can ensure the information is backed by reliable sources and help readers verify the information themselves.

Defining the Data Structures

Let's start by defining the data structures required for this task: Fact and QuestionAnswer.

!!! tip "Prompting as documentation" Make sure to include detailed and useful docstrings and fields for your class definitions. Naming becomes very important since they are semantically meaningful in the prompt.

* `substring_quote` performs better than `quote` since it suggests it should be a substring of the original content.
* Notice that there are instructions on splitting facts in the docstring which will be used by OpenAI
import openai
from pydantic import Field, BaseModel
from typing import List
from instructor import OpenAISchema


class Fact(BaseModel):
    """
    Each fact has a body and a list of sources.
    If there are multiple facts, make sure to break them apart such that each one only uses a set of sources that are relevant to it.
    """

    fact: str = Field(..., description="Body of the sentence as part of a response")
    substring_quote: List[str] = Field(
        ...,
        description="Each source should be a direct quote from the context, as a substring of the original content",
    )

    def _get_span(self, quote, context, errs=100):
        import regex

        minor = quote
        major = context

        errs_ = 0
        s = regex.search(f"({minor}){{e<={errs_}}}", major)
        while s is None and errs_ <= errs:
            errs_ += 1
            s = regex.search(f"({minor}){{e<={errs_}}}", major)

        if s is not None:
            yield from s.spans()

    def get_spans(self, context):
        for quote in self.substring_quote:
            yield from self._get_span(quote, context)


class QuestionAnswer(OpenAISchema):
    """
    Class representing a question and its answer as a list of facts, where each fact should have a source.
    Each sentence contains a body and a list of sources.
    """

    question: str = Field(..., description="Question that was asked")
    answer: List[Fact] = Field(
        ...,
        description="Body of the answer, each fact should be its separate object with a body and a list of sources",
    )

The Fact class represents a single statement in the answer. It contains a fact attribute for the body of the sentence and a substring_quote attribute for the sources, which are direct quotes from the context.

The QuestionAnswer class represents a question and its answer. It consists of a question attribute for the question asked and a list of Fact objects in the answer attribute.

!!! tip "Embedding computation" While it's not the best idea to get too crazy with adding 100 methods to your class collocating some computation is oftentimes useful, here we implement the substring search directly with the Fact class.

Asking AI a Question

To ask the AI a question and get back an answer with citations, we can define a function ask_ai that takes a question and context as input and returns a QuestionAnswer object.

!!! tips "Prompting Tip: Expert system" Expert prompting is a great trick to get results, it can be easily done by saying things like:

*  you are an world class expert that can correctly ...
*  you are jeff dean give me a code review ...
def ask_ai(question: str, context: str) -> QuestionAnswer:
    """
    Function to ask AI a question and get back an Answer object.
    but should be updated to use the actual method for making a request to the AI.

    Args:
        question (str): The question to ask the AI.
        context (str): The context for the question.

    Returns:
        Answer: The Answer object.
    """

    # Making a request to the hypothetical 'openai' module
    completion = openai.ChatCompletion.create(
        model="gpt-3.5-turbo-0613",
        temperature=0.2,
        max_tokens=1000,
        functions=[QuestionAnswer.openai_schema],
        function_call={"name": QuestionAnswer.openai_schema["name"]},
        messages=[
            {
                "role": "system",
                "content": f"You are a world class algorithm to answer questions with correct and exact citations. ",
            },
            {"role": "user", "content": f"Answer question using the following context"},
            {"role": "user", "content": f"{context}"},
            {"role": "user", "content": f"Question: {question}"},
            {
                "role": "user",
                "content": f"Tips: Make sure to cite your sources, and use the exact words from the context.",
            },
        ],
    )

    # Creating an Answer object from the completion response
    return QuestionAnswer.from_response(completion)

The ask_ai function takes a string question and a string context as input. It makes a completion request to the AI model, providing the question and context as part of the prompt. The resulting completion is then converted into a QuestionAnswer object.

Evaluating an Example

Let's evaluate the example by asking the AI a question and getting back an answer with citations. We'll ask the question "What did the author do during college?" with the given context.

!!! usage "Highlight" This just adds some color and captures the citation in <>

```python
def highlight(text, span):
    return (
        "..."
        + text[span[0] - 50 : span[0]].replace("\n", "")
        + "\033[91m"
        + "<"
        + text[span[0] : span[1]].replace("\n", "")
        + "> "
        + "\033[0m"
        + text[span[1] : span[1] + 20].replace("\n", "")
        + "..."
    )
```
question = "What did the author do during college?"
context = """
My name is Jason Liu, and I grew up in Toronto Canada but I was born in China.
I went to an arts high school but in university I studied Computational Mathematics and physics. 
As part of coop I worked at many companies including Stitchfix, Facebook.
I also started the Data Science club at the University of Waterloo and I was the president of the club for 2 years.
"""

answer = ask_ai(question, context)

print("Question:", question)
print()
for fact in answer.answer:
    print("Statement:", fact.fact)
    for span in fact.get_spans(context):
        print("Citation:", highlight(context, span))
    print()

In this code snippet, we print the question and iterate over each fact in the answer. For each fact, we print the statement and highlight the corresponding citation in the context using the highlight function.

Here is the expected output for the example:

Question: What did the author do during college?

Statement: The author studied Computational Mathematics and physics in university.
Citation: ...s born in China.I went to an arts high school but <in university I studied Computational Mathematics and physics> . As part of coop I...

Statement: The author started the Data Science club at the University of Waterloo and was the president of the club for 2 years.
Citation: ...y companies including Stitchfix, Facebook.I also <started the Data Science club at the University of Waterloo>  and I was the presi...
Citation: ... club at the University of Waterloo and I was the <president of the club for 2 years> ...

The output includes the question, followed by each statement in the answer with its corresponding citation highlighted in the context.

Feel free to try this code with different questions and contexts to see how the AI responds with accurate citations.