Examples of using LLMs for citation verification (#192)

2026-06-05 22:50:18 +00:00 · 2023-11-18 22:33:10 -05:00
parent 8ab9232e11
commit 1144f608aa
3 changed files with 497 additions and 3 deletions
@@ -4,9 +4,10 @@ The goal of the blog is to capture some content that does not neatly fit within

 ## Advanced Topics

- [Query Understanding and Expansion for RAG](posts/rag-and-beyond.md)
- [GPT-4 Level summarization with GPT3.5 Finetuning](posts/chain-of-density.md)
- [Deepdive on LLM Guardrails / Validation](posts/validation-part1.md)
+- [Query Understanding for RAG: Beyond Embeddings](posts/rag-and-beyond.md)
+- [Finetuning: GPT-4 level summaries with GPT-3.5-turbo](posts/chain-of-density.md)
+- [Introduction to Guardrails and Validation](posts/validation-part1.md)
+- [Validating Citations](posts/citations.md)
 - [A Guide to Fine-Tuning and Distillation](posts/distilation-part1.md)

 ## Learning Python
@@ -0,0 +1,268 @@
+---
+draft: False
+date: 2023-11-18
+slug: validate-citations
+tags:
+  - pydantic
+  - validation
+  - finetuneing
+  - citations
+  - hallucination
+authors:
+  - jxnl
+---
+
+# Verifying LLM Citations with Pydantic
+
+Ensuring the accuracy of information is crucial. This blog post explores how Pydantic's powerful and flexible validators can enhance data accuracy through citation verification.
+
+We'll start with using a simple substring check to verify citations. Then we'll use `instructor` itself to power an LLM to verify citations and align answers with the given citations. Finally, we'll explore how we can use these techniques to generate a dataset of accurate responses.
+
+## Example 1: Simple Substring Check
+
+In this example, we use the `Statements` class to verify if a given substring quote exists within a text chunk. If the substring is not found, an error is raised.
+
+### Code Example:
+
+```python
+from typing import List, Optional
+from openai import OpenAI
+from pydantic import BaseModel, Field, ValidationError, ValidationInfo, field_validator, model_validator
+import instructor
+
+client = instructor.patch(OpenAI())
+
+class Statements(BaseModel):
+    body: str
+    substring_quote: str
+
+    @field_validator("substring_quote")
+    @classmethod
+    def substring_quote_exists(cls, v: str, info: ValidationInfo):
+        context = info.context.get("text_chunks", None)
+
+        for text_chunk in context.values():
+            if v in text_chunk: # (1)
+                return v
+        raise ValueError("Could not find substring_quote `{v}` in contexts")
+
+
+class AnswerWithCitaton(BaseModel):
+    question: str
+    answer: List[Statements]
+```
+
+1. While we use a simple substring check in this example, we can use more complex techniques like regex or Levenshtein distance.
+
+Once the class is defined, we can use it to validate the context and raise an error if the substring is not found.
+
+```python
+try:
+    AnswerWithCitaton.model_validate(
+        {
+            "question": "What is the capital of France?",
+            "answer": [
+                {"body": "Paris", "substring_quote": "Paris is the capital of France"},
+            ],
+        },
+        context={
+            "text_chunks": {
+                1: "Jason is a pirate",
+                2: "Paris is not the capital of France",
+                3: "Irrelevant data",
+            }
+        },
+    )
+except ValidationError as e:
+    print(e)
+```
+
+### Error Message Example:
+
+```
+answer.0.substring_quote
+  Value error, Could not find substring_quote `Paris is the capital of France` in contexts [type=value_error, input_value='Paris is the capital of France', input_type=str]
+    For further information visit [https://errors.pydantic.dev/2.4/v/value_error](https://errors.pydantic.dev/2.4/v/value_error)
+```
+
+Pydantic raises a validation error when the `substring_quote` attribute does not exist in the context. This approach can be used to validate more complex data using techniques like regex or Levenshtein distance.
+
+## Example 2: Using LLM for Verification
+
+This approach leverages OpenAI's LLM to validate citations. If the citation does not exist in the context, the LLM returns an error message.
+
+### Code Example:
+
+```python
+class Validation(BaseModel):
+    is_valid: bool
+    error_messages: Optional[str] = Field(None, description="Error messages if any")
+
+
+class Statements(BaseModel):
+    body: str
+    substring_quote: str
+
+    @model_validator(mode="after")
+    def substring_quote_exists(self, info: ValidationInfo):
+        context = info.context.get("text_chunks", None)
+
+        resp: Validation = client.chat.completions.create(
+            response_model=Validation,
+            messages=[
+                {
+                    "role": "user",
+                    "content": f"Does the following citation exist in the following context?\n\nCitation: {self.substring_quote}\n\nContext: {context}",
+                }
+            ],
+            model="gpt-3.5-turbo",
+        )
+
+        if resp.is_valid:
+            return self
+
+        raise ValueError(resp.error_messages)
+
+
+class AnswerWithCitaton(BaseModel):
+    question: str
+    answer: List[Statements]
+```
+
+Now when we use a correct citation, the LLM returns a valid response.
+
+```python
+resp = AnswerWithCitaton.model_validate(
+    {
+        "question": "What is the capital of France?",
+        "answer": [
+            {"body": "Paris", "substring_quote": "Paris is the capital of France"},
+        ],
+    },
+    context={
+        "text_chunks": {
+            1: "Jason is a pirate",
+            2: "Paris is the capital of France",
+            3: "Irrelevant data",
+        }
+    },
+)
+print(resp.model_dump_json(indent=2))
+```
+
+### Result:
+
+```json
+{
+  "question": "What is the capital of France?",
+  "answer": [
+    {
+      "body": "Paris",
+      "substring_quote": "Paris is the capital of France"
+    }
+  ]
+}
+```
+
+When we have citations that don't exist in the context, the LLM returns an error message.
+
+```python
+try:
+    AnswerWithCitaton.model_validate(
+        {
+            "question": "What is the capital of France?",
+            "answer": [
+                {"body": "Paris", "substring_quote": "Paris is the capital of France"},
+            ],
+        },
+        context={
+            "text_chunks": {
+                1: "Jason is a pirate",
+                2: "Paris is not the capital of France",
+                3: "Irrelevant data",
+            }
+        },
+    )
+except ValidationError as e:
+    print(e)
+```
+
+### Error Message Example:
+
+```
+1 validation error for AnswerWithCitaton
+answer.0
+  Value error, Citation not found in context [type=value_error, input_value={'body': 'Paris', 'substr... the capital of France'}, input_type=dict]
+    For further information visit [https://errors.pydantic.dev/2.4/v/value_error](https://errors.pydantic.dev/2.4/v/value_error)
+```
+
+## Example 3: Aligning Citations and Answers
+
+In this example, we ensure that the provided answers are aligned with the given citations and context. The LLM is used to verify the alignment.
+
+We use the same `Statements` model as above, but we add a new model for the answer that also verifies the alignment of citations.
+
+### Code Example:
+
+```python
+class AnswerWithCitaton(BaseModel):
+    question: str
+    answer: List[Statements]
+
+    @model_validator(mode="after")
+    def validate_answer(self, info: ValidationInfo):
+        context = info.context.get("text_chunks", None)
+
+        resp: Validation = client.chat.completions.create(
+            response_model=Validation,
+            messages=[
+                {
+                    "role": "user",
+                    "content": f"Does the following answers match the question and the context?\n\nQuestion: {self.question}\n\nAnswer: {self.answer}\n\nContext: {context}",
+                }
+            ],
+            model="gpt-3.5-turbo",
+        )
+
+        if resp.is_valid:
+            return self
+
+        raise ValueError(resp.error_messages)
+```
+
+When we have a mismatch between the answer and the citation, the LLM returns an error message.
+
+```python
+try:
+    AnswerWithCitaton.model_validate(
+        {
+            "question": "What is the capital of France?",
+            "answer": [
+                {"body": "Texas", "substring_quote": "Paris is the capital of France"},
+            ],
+        },
+        context={
+            "text_chunks": {
+                1: "Jason is a pirate",
+                2: "Paris is the capital of France",
+                3: "Irrelevant data",
+            }
+        },
+    )
+except ValidationError as e:
+    print(e)
+```
+
+### Error Message Example:
+
+```
+1 validation error for AnswerWithCitaton
+  Value error, The answer does not match the question and context [type=value_error, input_value={'question': 'What is the...he capital of France'}]}, input_type=dict]
+    For further information visit [https://errors.pydantic.dev/2.4/v/value_error](https://errors.pydantic.dev/2.4/v/value_error)
+```
+
+## Conclusion
+
+These examples demonstrate the potential of using Pydantic and OpenAI to enhance data accuracy through citation verification. While the LLM-based approach may not be efficient for runtime operations, it has exciting implications for generating a dataset of accurate responses. By leveraging this method during data generation, we can fine-tune a model that excels in citation accuracy. Similar to our last post on [finetuning a better summarizer](https://jxnl.github.io/instructor/blog/2023/11/05/chain-of-density/).
+
+If you like the content check out our [GitHub](https://github.com/jxnl/instructor) as give us a start and checkout the library.
@@ -0,0 +1,225 @@
+from typing import List, Optional
+from openai import OpenAI
+from pydantic import (
+    BaseModel,
+    Field,
+    ValidationError,
+    ValidationInfo,
+    field_validator,
+    model_validator,
+)
+
+import instructor
+
+client = instructor.patch(OpenAI())
+
+""" 
+Example 1) Simple Substring check that compares a citation to a text chunk
+"""
+
+
+class Statements(BaseModel):
+    body: str
+    substring_quote: str
+
+    @field_validator("substring_quote")
+    @classmethod
+    def substring_quote_exists(cls, v: str, info: ValidationInfo):
+        context = info.context.get("text_chunks", None)
+
+        # Check if the substring_quote is in the text_chunk
+        # if not, raise an error
+        for text_chunk in context.values():
+            if v in text_chunk:
+                return v
+        raise ValueError(
+            f"Could not find substring_quote `{v}` in contexts",
+        )
+
+
+class AnswerWithCitaton(BaseModel):
+    question: str
+    answer: List[Statements]
+
+
+try:
+    AnswerWithCitaton.model_validate(
+        {
+            "question": "What is the capital of France?",
+            "answer": [
+                {"body": "Paris", "substring_quote": "Paris is the capital of France"},
+            ],
+        },
+        context={
+            "text_chunks": {
+                1: "Jason is a pirate",
+                2: "Paris is not the capital of France",
+                3: "Irrelevant data",
+            }
+        },
+    )
+except ValidationError as e:
+    print(e)
+"""
+answer.0.substring_quote
+  Value error, Could not find substring_quote `Paris is the capital of France` in contexts [type=value_error, input_value='Paris is the capital of France', input_type=str]
+    For further information visit https://errors.pydantic.dev/2.4/v/value_error
+"""
+
+
+""" 
+Example 2) Using an LLM to verify if a 
+"""
+
+
+class Validation(BaseModel):
+    """
+    Verfication response from the LLM,
+    the error message should be detailed if the is_valid is False
+    but keep it to less than 100 characters, reference specific
+    attributes that you are comparing, use `...` is the string is too long
+    """
+
+    is_valid: bool
+    error_messages: Optional[str] = Field(None, description="Error messages if any")
+
+
+class Statements(BaseModel):
+    body: str
+    substring_quote: str
+
+    @model_validator(mode="after")
+    def substring_quote_exists(self, info: ValidationInfo):
+        context = info.context.get("text_chunks", None)
+
+        resp: Validation = client.chat.completions.create(
+            response_model=Validation,
+            messages=[
+                {
+                    "role": "user",
+                    "content": f"Does the following citation exist in the following context?\n\nCitation: {self.substring_quote}\n\nContext: {context}",
+                }
+            ],
+            model="gpt-3.5-turbo",
+        )
+
+        if resp.is_valid:
+            return self
+
+        raise ValueError(resp.error_messages)
+
+
+class AnswerWithCitaton(BaseModel):
+    question: str
+    answer: List[Statements]
+
+
+resp = AnswerWithCitaton.model_validate(
+    {
+        "question": "What is the capital of France?",
+        "answer": [
+            {"body": "Paris", "substring_quote": "Paris is the capital of France"},
+        ],
+    },
+    context={
+        "text_chunks": {
+            1: "Jason is a pirate",
+            2: "Paris is the capital of France",
+            3: "Irrelevant data",
+        }
+    },
+)
+# output: notice that there are no errors
+print(resp.model_dump_json(indent=2))
+{
+    "question": "What is the capital of France?",
+    "answer": [{"body": "Paris", "substring_quote": "Paris is the capital of France"}],
+}
+
+# Now we change the text chunk to something else, and we get an error
+try:
+    AnswerWithCitaton.model_validate(
+        {
+            "question": "What is the capital of France?",
+            "answer": [
+                {"body": "Paris", "substring_quote": "Paris is the capital of France"},
+            ],
+        },
+        context={
+            "text_chunks": {
+                1: "Jason is a pirate",
+                2: "Paris is not the capital of France",
+                3: "Irrelevant data",
+            }
+        },
+    )
+except ValidationError as e:
+    print(e)
+""" 
+1 validation error for AnswerWithCitaton
+answer.0
+  Value error, Citation not found in context [type=value_error, input_value={'body': 'Paris', 'substr... the capital of France'}, input_type=dict]
+    For further information visit https://errors.pydantic.dev/2.4/v/value_error
+"""
+
+# Example 3) Using an LLM to verify if the citations and the answers are all aligned
+
+
+# we keep the same model as above for Statements, but we add a new model for the answer
+# that also verifies that the citations are aligned with the answers
+class AnswerWithCitaton(BaseModel):
+    question: str
+    answer: List[Statements]
+
+    @model_validator(mode="after")
+    def validate_answer(self, info: ValidationInfo):
+        context = info.context.get("text_chunks", None)
+
+        resp: Validation = client.chat.completions.create(
+            response_model=Validation,
+            messages=[
+                {
+                    "role": "user",
+                    "content": f"Does the following answers match the question and the context?\n\nQuestion: {self.question}\n\nAnswer: {self.answer}\n\nContext: {context}",
+                }
+            ],
+            model="gpt-3.5-turbo",
+        )
+
+        if resp.is_valid:
+            return self
+
+        raise ValueError(resp.error_messages)
+
+
+""" 
+Using LLMs for citation verification is inefficient during runtime. 
+However, we can utilize them to create a dataset consisting only of accurate responses 
+where citations must be valid (as determined by LLM, fuzzy text search, etc.). 
+
+This approach would require an initial investment during data generation to obtain 
+a finely-tuned model for improved citation.
+"""
+try:
+    AnswerWithCitaton.model_validate(
+        {
+            "question": "What is the capital of France?",
+            "answer": [
+                {"body": "Texas", "substring_quote": "Paris is the capital of France"},
+            ],
+        },
+        context={
+            "text_chunks": {
+                1: "Jason is a pirate",
+                2: "Paris is the capital of France",
+                3: "Irrelevant data",
+            }
+        },
+    )
+except ValidationError as e:
+    print(e)
+""" 
+1 validation error for AnswerWithCitaton
+  Value error, The answer does not match the question and context [type=value_error, input_value={'question': 'What is the...he capital of France'}]}, input_type=dict]
+    For further information visit https://errors.pydantic.dev/2.4/v/value_error
+"""