diff --git a/tutorials/4.knowledge-graphs.ipynb b/tutorials/4.knowledge-graphs.ipynb index 61363ed..d5f6673 100644 --- a/tutorials/4.knowledge-graphs.ipynb +++ b/tutorials/4.knowledge-graphs.ipynb @@ -11,24 +11,21 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "## Introduction\n", + "# Introduction\n", "\n", - "**What is a knowledge graph**\n", + "**What is a knowledge graph?**\n", "\n", - "A knowledge graph, also known as a semantic network, represents a network of real-world entities—i.e. objects, events, situations, or concepts—and illustrates the relationship between them.\n", + "A knowledge graph, also known as a semantic network, represents real-world entities and their relationships. It consists of nodes, edges, and labels. Nodes can represent any entity, while edges define the connections between them. For example, a node representing an author like \"J.K. Rowling\" can be connected to another node representing one of her books, \"Harry Potter\", with the edge \"author of\".\n", "\n", - "A knowledge graph primarily consists of three elements: ``nodes``, ``edges``, and ``labels``. Nodes can represent any entity, be it an object, location, or individual. Edges establish the connection or relationship between these nodes. For instance, consider a node representing a popular author, \"J.K. Rowling\", and another node representing one of her books, \"Harry Potter\". The edge between these nodes could define the relationship as \"author of\", indicating that J.K. Rowling is the author of Harry Potter.\n", + "**Applications of knowledge graphs**\n", "\n", - "**Knowledge graph applications**\n", + "Knowledge graphs have various applications, including:\n", "\n", - "By using automated knowledge graphs, you can split hard topics into visually appealing and easy bits, making learning less scary and more helpful.\n", - "\n", - "some of the widely used examples are:\n", - "- Search Engines: Knowledge graphs are used by search engines like Google to enhance search results with semantic-search information gathered from a wide variety of sources.\n", - "- Recommendation Systems: They are used in recommendation systems to suggest products or services based on user's behavior and preferences.\n", - "- Natural Language Processing: In NLP, knowledge graphs are used to understand and generate human language.\n", - "- Data Integration: Knowledge graphs help in integrating data from different sources by understanding the relationship between them.\n", - "- Artificial Intelligence and Machine Learning: They are used in AI and ML to provide context to data, which helps in better decision making." + "- Search Engines: They enhance search results by incorporating semantic-search information from diverse sources.\n", + "- Recommendation Systems: They suggest products or services based on user behavior and preferences.\n", + "- Natural Language Processing: They aid in understanding and generating human language.\n", + "- Data Integration: They facilitate the integration of data from different sources by identifying relationships.\n", + "- Artificial Intelligence and Machine Learning: They provide contextual information to improve decision-making." ] }, { @@ -54,7 +51,7 @@ }, { "cell_type": "code", - "execution_count": 11, + "execution_count": 6, "metadata": {}, "outputs": [], "source": [ @@ -63,7 +60,7 @@ }, { "cell_type": "code", - "execution_count": 12, + "execution_count": 7, "metadata": {}, "outputs": [], "source": [ @@ -84,20 +81,8 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "## Defining the structures" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Node and Edge Classes" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ + "## Node and Edge Classes\n", + "\n", "We begin by modeling our knowledge graph with Node and Edge objects.\n", "\n", "Node objects represent key concepts or entities, while Edge objects signify the relationships between them." @@ -105,23 +90,18 @@ }, { "cell_type": "code", - "execution_count": 13, + "execution_count": 14, "metadata": {}, "outputs": [], "source": [ "from pydantic import BaseModel, Field\n", - "from typing import List\n", + "from typing import List, Optional\n", "\n", - "# The Node class represents key concepts or entities in our knowledge graph.\n", - "# Each node has an id, a label, and a color.\n", "class Node(BaseModel):\n", " id: int\n", " label: str\n", " color: str\n", "\n", - "# The Edge class signifies the relationships between nodes in our knowledge graph.\n", - "# Each edge has a source node, a target node, a label, and a color.\n", - "# By default, the color of an edge is set to \"black\".\n", "class Edge(BaseModel):\n", " source: int\n", " target: int\n", @@ -133,24 +113,23 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "### KnowledgeGraph Class" + "## `KnowledgeGraph` Class" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "The KnowledgeGraph class integrates the nodes and edges, forming a comprehensive structure of our graph. It contains a list of nodes and a list of edges. Each node represents a key concept or entity, and each edge represents a relationship between two nodes.\n", + "The `KnowledgeGraph` class combines nodes and edges to create a comprehensive graph structure. It includes lists of nodes and edges, where each node represents a key concept or entity, and each edge represents a relationship between two nodes.\n", "\n", - "Later you'll notice that we model this class to be match the graphviz library's graph object.\n", - "Making it easier to visualize our graph.\n", + "Later on, you'll see that we designed this class to match the graph object in the graphviz library, which makes it easier to visualize our graph.\n", "\n", - "The `visualize_knowledge_graph` function visualizes a knowledge graph. It accepts a `KnowledgeGraph` object as input, which includes nodes and edges. The function uses the `graphviz` library to create a directed graph (`Digraph`). Each node and edge from the `KnowledgeGraph` is added to the `Digraph` with their respective attributes (id, label, color). The graph is then rendered and displayed." + "The `visualize_knowledge_graph` function is used to visualize a knowledge graph. It takes a `KnowledgeGraph` object as input, which contains nodes and edges. The function utilizes the `graphviz` library to generate a directed graph (`Digraph`). Each node and edge from the `KnowledgeGraph` is added to the `Digraph` with their respective attributes (id, label, color). Finally, the graph is rendered and displayed." ] }, { "cell_type": "code", - "execution_count": 21, + "execution_count": 15, "metadata": {}, "outputs": [], "source": [ @@ -166,7 +145,7 @@ " dot = Digraph(comment=\"Knowledge Graph\")\n", "\n", " for node in self.nodes:\n", - " dot.node(str(node.id), node.label, color=node.color)\n", + " dot.node(name=str(node.id), label=node.label, color=node.color)\n", " for edge in self.edges:\n", " dot.edge(str(edge.source), str(edge.target), label=edge.label, color=edge.color)\n", " \n", @@ -177,20 +156,10 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "## Generating the Knowledge Graph" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### generate_graph function" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ + "## Generating the Knowledge Graph\n", + "\n", + "### generate_graph function\n", + "\n", "The ``generate_graph`` function uses OpenAI's model to create a KnowledgeGraph object from an input string.\n", "\n", "It requests the model to interpret the input as a detailed knowledge graph and uses the response to form the KnowledgeGraph object." @@ -198,13 +167,13 @@ }, { "cell_type": "code", - "execution_count": 22, + "execution_count": 12, "metadata": {}, "outputs": [], "source": [ "def generate_graph(input) -> KnowledgeGraph:\n", " return client.chat.completions.create(\n", - " model=\"gpt-4-1106-preview\",\n", + " model=\"gpt-3.5-turbo\",\n", " messages=[\n", " {\n", " \"role\": \"user\",\n", @@ -217,7 +186,7 @@ }, { "cell_type": "code", - "execution_count": 23, + "execution_count": 13, "metadata": {}, "outputs": [ { @@ -229,179 +198,125 @@ "\n", "\n", - "\n", - "\n", - "\n", + "\n", + "\n", + "\n", "\n", "\n", "1\n", - "\n", - "Neural Network\n", + "\n", + "Artificial Intelligence\n", "\n", "\n", "\n", "2\n", - "\n", - "Input Layer\n", + "\n", + "Machine Learning\n", "\n", "\n", "\n", "1->2\n", - "\n", - "\n", - "contains\n", + "\n", + "\n", + "is a subset of\n", "\n", "\n", "\n", "3\n", - "\n", - "Hidden Layers\n", + "\n", + "Deep Learning\n", "\n", - "\n", + "\n", "\n", - "1->3\n", - "\n", - "\n", - "contains\n", + "2->3\n", + "\n", + "\n", + "is a subset of\n", "\n", "\n", "\n", "4\n", - "\n", - "Output Layer\n", + "\n", + "Neural Network\n", "\n", - "\n", + "\n", "\n", - "1->4\n", - "\n", - "\n", - "contains\n", - "\n", - "\n", - "\n", - "6\n", - "\n", - "Weights\n", - "\n", - "\n", - "\n", - "1->6\n", - "\n", - "\n", - "uses\n", - "\n", - "\n", - "\n", - "7\n", - "\n", - "Bias\n", - "\n", - "\n", - "\n", - "1->7\n", - "\n", - "\n", - "uses\n", - "\n", - "\n", - "\n", - "9\n", - "\n", - "Learning\n", - "\n", - "\n", - "\n", - "1->9\n", - "\n", - "\n", - "performs\n", - "\n", - "\n", - "\n", - "5\n", - "\n", - "Neurons\n", - "\n", - "\n", - "\n", - "2->5\n", - "\n", - "\n", - "composed of\n", - "\n", - "\n", - "\n", - "3->5\n", - "\n", - "\n", - "composed of\n", - "\n", - "\n", - "\n", - "4->5\n", - "\n", - "\n", - "composed of\n", + "3->4\n", + "\n", + "\n", + "is a subset of\n", "\n", "\n", "\n", "8\n", - "\n", - "Activation Function\n", + "\n", + "Weights\n", "\n", - "\n", - "\n", - "5->8\n", - "\n", - "\n", - "applies\n", + "\n", + "\n", + "4->8\n", + "\n", + "\n", + "has\n", "\n", - "\n", - "\n", - "9->6\n", - "\n", - "\n", - "updates\n", + "\n", + "\n", + "9\n", + "\n", + "Activation Function\n", "\n", - "\n", - "\n", - "9->7\n", - "\n", - "\n", - "updates\n", + "\n", + "\n", + "4->9\n", + "\n", + "\n", + "uses\n", "\n", - "\n", - "\n", - "10\n", - "\n", - "Backpropagation\n", + "\n", + "\n", + "5\n", + "\n", + "Input Layer\n", "\n", - "\n", - "\n", - "9->10\n", - "\n", - "\n", - "involves\n", + "\n", + "\n", + "5->4\n", + "\n", + "\n", + "has\n", "\n", - "\n", - "\n", - "11\n", - "\n", - "Loss Function\n", + "\n", + "\n", + "6\n", + "\n", + "Hidden Layer\n", "\n", - "\n", - "\n", - "10->11\n", - "\n", - "\n", - "uses\n", + "\n", + "\n", + "6->4\n", + "\n", + "\n", + "has\n", + "\n", + "\n", + "\n", + "7\n", + "\n", + "Output Layer\n", + "\n", + "\n", + "\n", + "7->4\n", + "\n", + "\n", + "has\n", "\n", "\n", "\n" ], "text/plain": [ - "" + "" ] }, "metadata": {}, @@ -416,46 +331,34 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "## Advanced Iterative Graph Generation" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "When dealing with extensive or segmented text inputs, processing them all at once might be challenging due to limitations in prompt length or the complexity of the content. In such scenarios, an iterative approach to building the knowledge graph proves beneficial. This method involves processing the text in smaller, manageable chunks, updating the graph with new information from each chunk." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### What are the benefits of this approach?" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "- Scalability: This approach can handle large datasets by breaking them down into smaller, more manageable pieces.\n", + "## Advanced: Accumulating Knowledge Graphs\n", "\n", - "- Flexibility: It allows for dynamic updates to the graph, accommodating new information as it becomes available.\n", + "When dealing with larger datasets, or knowledge that grows over time, processing them all at once can be challenging due to limitations in prompt length or the complexity of the content. In such cases, an iterative approach to building the knowledge graph can be beneficial. This method involves processing the text in smaller, manageable chunks and updating the graph with new information from each chunk.\n", "\n", - "- Efficiency: Processing smaller chunks of text can be more efficient and less prone to errors or omissions." + "### What are the benefits of this approach?\n", + "\n", + "- Scalability: This approach can handle large datasets by breaking them down into smaller, more manageable pieces.\n", + "\n", + "- Flexibility: It allows for dynamic updates to the graph, accommodating new information as it becomes available.\n", + "\n", + "- Efficiency: Processing smaller chunks of text can be more efficient and less prone to errors or omissions." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "### What's different?" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "The Previous example laid the foundation, while this new example will adds more complexity and functionality. The Node and Edge classes have been augmented with a __hash__ method, enabling these objects to be used in sets, thereby making it easier to handle duplicates." + "### What has changed?\n", + "\n", + "The previous example provided a basic structure, while this new example introduces additional complexity and functionality. The Node and Edge classes now have a __hash__ method, allowing them to be used in sets and simplifying duplicate handling.\n", + "\n", + "The KnowledgeGraph class has been enhanced with two new methods: ``update`` and ``draw``.\n", + "\n", + "In the KnowledgeGraph class, the nodes and edges fields are now optional, offering greater flexibility.\n", + "\n", + "The ``update`` method enables the merging and removal of duplicates from two graphs.\n", + "\n", + "The ``draw`` method includes a prefix parameter, making it easier to create different graph versions during iterations." ] }, { @@ -482,27 +385,12 @@ " return hash((self.source, self.target, self.label))" ] }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "KnowledgeGraph Class now have ``update`` and ``draw`` methods.\n", - "\n", - "The nodes and edges fields in the KnowledgeGraph class are now optional, providing more flexibility.\n", - "\n", - "``update``: This method allows for the combination and deduplication of two graphs.\n", - "\n", - "``draw``: includes a prefix parameter, facilitating the creation of different graph versions during iterations." - ] - }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [], "source": [ - "from typing import Optional\n", - "\n", "class KnowledgeGraph(BaseModel):\n", " # Optional list of nodes and edges in the knowledge graph\n", " nodes: Optional[List[Node]] = Field(..., default_factory=list)\n", @@ -530,29 +418,31 @@ { "cell_type": "markdown", "metadata": {}, - "source": [ - "### Generate itrative graph" - ] + "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "The new ``generate_graph`` function is designed to handle a list of inputs iteratively, updating the graph with each new piece of information.\n", + "### Generate iterative graphs\n", "\n", - "If you look carefully it looks liek a very common pattern in programming, a reduce, or fold function. A simple example could be iterating over a list of find the sum of all the elements squared.\n", + "The updated `generate_graph` function is specifically designed to handle a list of inputs iteratively. It updates the graph with each new piece of information.\n", + "\n", + "Upon closer inspection, this pattern resembles a common programming technique known as a \"reduce\" or \"fold\" function. A simple example of this would be iterating over a list to find the sum of all the elements squared.\n", + "\n", + "Here's an example in Python:\n", "\n", "```python\n", "cur_state = 0\n", "for i in [1, 2, 3, 4, 5]:\n", - " c += i**2\n", - "print(c)\n", + " cur_state += i**2\n", + "print(cur_state)\n", "```" ] }, { "cell_type": "code", - "execution_count": 27, + "execution_count": 16, "metadata": {}, "outputs": [], "source": [ @@ -617,360 +507,9 @@ }, { "cell_type": "code", - "execution_count": 28, + "execution_count": 18, "metadata": {}, - "outputs": [ - { - "data": { - "image/svg+xml": [ - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "1\n", - "\n", - "Jason\n", - "\n", - "\n", - "\n", - "3\n", - "\n", - "physicist\n", - "\n", - "\n", - "\n", - "1->3\n", - "\n", - "\n", - "is a\n", - "\n", - "\n", - "\n", - "2\n", - "\n", - "quantum mechanics\n", - "\n", - "\n", - "\n", - "1->2\n", - "\n", - "\n", - "knows\n", - "\n", - "\n", - "\n", - "4\n", - "\n", - "professor\n", - "\n", - "\n", - "\n", - "1->4\n", - "\n", - "\n", - "is a\n", - "\n", - "\n", - "\n" - ], - "text/plain": [ - "" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "data": { - "image/svg+xml": [ - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "1\n", - "\n", - "Jason\n", - "\n", - "\n", - "\n", - "3\n", - "\n", - "physicist\n", - "\n", - "\n", - "\n", - "1->3\n", - "\n", - "\n", - "is a\n", - "\n", - "\n", - "\n", - "2\n", - "\n", - "quantum mechanics\n", - "\n", - "\n", - "\n", - "1->2\n", - "\n", - "\n", - "knows\n", - "\n", - "\n", - "\n", - "4\n", - "\n", - "professor\n", - "\n", - "\n", - "\n", - "1->4\n", - "\n", - "\n", - "is a\n", - "\n", - "\n", - "\n" - ], - "text/plain": [ - "" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "data": { - "image/svg+xml": [ - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "1\n", - "\n", - "Jason\n", - "\n", - "\n", - "\n", - "3\n", - "\n", - "physicist\n", - "\n", - "\n", - "\n", - "1->3\n", - "\n", - "\n", - "is a\n", - "\n", - "\n", - "\n", - "2\n", - "\n", - "quantum mechanics\n", - "\n", - "\n", - "\n", - "1->2\n", - "\n", - "\n", - "knows\n", - "\n", - "\n", - "\n", - "4\n", - "\n", - "professor\n", - "\n", - "\n", - "\n", - "1->4\n", - "\n", - "\n", - "is a\n", - "\n", - "\n", - "\n", - "5\n", - "\n", - "Sarah\n", - "\n", - "\n", - "\n", - "5->1\n", - "\n", - "\n", - "knows\n", - "\n", - "\n", - "\n", - "6\n", - "\n", - "student\n", - "\n", - "\n", - "\n", - "5->6\n", - "\n", - "\n", - "is a student of\n", - "\n", - "\n", - "\n" - ], - "text/plain": [ - "" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "data": { - "image/svg+xml": [ - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "7\n", - "\n", - "University of Toronto\n", - "\n", - "\n", - "\n", - "8\n", - "\n", - "Canada\n", - "\n", - "\n", - "\n", - "7->8\n", - "\n", - "\n", - "is in\n", - "\n", - "\n", - "\n", - "1\n", - "\n", - "Jason\n", - "\n", - "\n", - "\n", - "3\n", - "\n", - "physicist\n", - "\n", - "\n", - "\n", - "1->3\n", - "\n", - "\n", - "is a\n", - "\n", - "\n", - "\n", - "2\n", - "\n", - "quantum mechanics\n", - "\n", - "\n", - "\n", - "1->2\n", - "\n", - "\n", - "knows\n", - "\n", - "\n", - "\n", - "4\n", - "\n", - "professor\n", - "\n", - "\n", - "\n", - "1->4\n", - "\n", - "\n", - "is a\n", - "\n", - "\n", - "\n", - "5\n", - "\n", - "Sarah\n", - "\n", - "\n", - "\n", - "5->7\n", - "\n", - "\n", - "is a student at\n", - "\n", - "\n", - "\n", - "5->1\n", - "\n", - "\n", - "knows\n", - "\n", - "\n", - "\n", - "6\n", - "\n", - "student\n", - "\n", - "\n", - "\n", - "5->6\n", - "\n", - "\n", - "is a student of\n", - "\n", - "\n", - "\n" - ], - "text/plain": [ - "" - ] - }, - "metadata": {}, - "output_type": "display_data" - } - ], + "outputs": [], "source": [ "text_chunks = [\n", " \"Jason knows a lot about quantum mechanics. He is a physicist. He is a professor\",\n", @@ -1050,11 +589,6 @@ "\n", "All of them will follow an idea of iteratively extracting more and more information and accumulating it some state." ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [] } ], "metadata": {