Files
instructor/docs/examples/exact_citations.md
T
2023-09-09 16:10:42 -04:00

6.5 KiB

Example: Answering Questions with Validated Citations

In this example, we'll demonstrate how to use instructor with validators to add citations to answers generated by AI models.

!!! tips "Preventing Hallucinations" While the 'citations' that a llm are not correct, we can use a blend of regex, and validation to ensure that the citations are at least correct.

1. Make sure each quote exists in the context
2. Make sure every statement has at least one quote

Defining the Data Structures

Let's start by defining the data structures required for this task: Fact and QuestionAnswer.

from pydantic import Field, BaseModel, model_validator, FieldValidationInfo
from typing import List

import openai
import instructor


class Fact(BaseModel):
    """
    Each fact has a body and a list of sources.
    If there are multiple facts, make sure to break them apart such that each one only uses a set of sources that are relevant to it.
    """

    fact: str = Field(..., description="Body of the sentence as part of a response")
    substring_quote: List[str] = Field(
        ...,
        description="Each source should be a direct quote from the context, as a substring of the original content",
    )

    @model_validator(mode="after")
    def validate_sources(self, info: FieldValidationInfo) -> "Fact":
        """
        For each substring_phrase, find the span of the substring_phrase in the context.
        If the span is not found, remove the substring_phrase from the list.
        """
        if info.context is None:
            logger.info("No context found, skipping validation")
            return self

        # Get the context from the info
        text_chunks = info.context.get("text_chunk", None)

        # Get the spans of the substring_phrase in the context
        spans = list(self.get_spans(text_chunks))
        logger.info(
            f"Found {len(spans)} span(s) for from {len(self.substring_phrase)} citation(s)."
        )
        # Replace the substring_phrase with the actual substring
        self.substring_phrase = [text_chunks[span[0] : span[1]] for span in spans]
        return self

    def _get_span(self, quote, context, errs=100):
        import regex

        minor = quote
        major = context

        errs_ = 0
        s = regex.search(f"({minor}){{e<={errs_}}}", major)
        while s is None and errs_ <= errs:
            errs_ += 1
            s = regex.search(f"({minor}){{e<={errs_}}}", major)

        if s is not None:
            yield from s.spans()

    def get_spans(self, context):
        for quote in self.substring_quote:
            yield from self._get_span(quote, context)


class QuestionAnswer(instructor.OpenAISchema):
    """
    Class representing a question and its answer as a list of facts, where each fact should have a source.
    Each sentence contains a body and a list of sources.
    """

    question: str = Field(..., description="Question that was asked")
    answer: List[Fact] = Field(
        ...,
        description="Body of the answer, each fact should be its separate object with a body and a list of sources",
    )

    @model_validator(mode="after")
    def validate_sources(self) -> "QuestionAnswer":
        """
        Checks that each fact has at least one source, and removes those that do not.
        """
        self.answer = [fact for fact in self.answer if len(fact.substring_quote) > 0]
        return self

The Fact class represents a single statement in the answer. It contains a fact attribute for the body of the sentence and a substring_quote attribute for the sources, which are direct quotes from the context.

The QuestionAnswer class represents a question and its answer. It consists of a question attribute for the question asked and a list of Fact objects in the answer attribute.

The QuestionAnswer class also includes a validate_sources method that checks that each fact has at least one source. This method is used to remove facts that do not have any sources!

Asking AI a Question

To ask the AI a question and get back an answer with citations, we can define a function ask_ai that takes a question and context as input and returns a QuestionAnswer object.

def ask_ai(question: str, context: str) -> QuestionAnswer:
    completion = openai.ChatCompletion.create(
        model="gpt-3.5-turbo-0613",
        temperature=0,
        functions=[QuestionAnswer.openai_schema],
        function_call={"name": QuestionAnswer.openai_schema["name"]},
        messages=[
            {
                "role": "system",
                "content": f"You are a world class algorithm to answer questions with correct and exact citations. ",
            },
            {"role": "user", "content": f"Answer question using the following context"},
            {"role": "user", "content": f"{context}"},
            {"role": "user", "content": f"Question: {question}"},
            {
                "role": "user",
                "content": f"Tips: Make sure to cite your sources, and use the exact words from the context.",
            },
        ],
    )

    # Creating an Answer object from the completion response
    return QuestionAnswer.from_response(
        completion, validation_context={"text_chunk": context}
    )

The ask_ai function takes a string question and a string context as input. It makes a completion request to the AI model, providing the question and context as part of the prompt. The resulting completion is then converted into a QuestionAnswer object.

Evaluating an Example

Let's evaluate the example by asking the AI a question and getting back an answer with citations. We'll ask the question "What did the author do during college?" with the given context.

question = "What did the author do during college?"
context = """
My name is Jason Liu, and I grew up in Toronto Canada but I was born in China.
I went to an arts high school but in university I studied Computational Mathematics and physics. 
As part of coop I worked at many companies including Stitchfix, Facebook.
I also started the Data Science club at the University of Waterloo and I was the president of the club for 2 years.
"""
{
  "question": "where did he go to school?",
  "answer": [
    {
      "statement": "Jason Liu went to an arts highschool.",
      "substring_phrase": [
        "arts highschool"
      ]
    },
    {
      "statement": "Jason Liu studied Computational Mathematics and physics in university.",
      "substring_phrase": [
        "university"
      ]
    }
  ]
}