{ "cells": [ { "attachments": {}, "cell_type": "markdown", "id": "80f6cd99", "metadata": {}, "source": [ "# HTML\n", "\n", ">[HTML](https://en.wikipedia.org/wiki/HMTL) s the standard markup language for documents designed to be displayed in a web browser.\n", "\n", "`HtmlTextSplitter` splits text along Markdown headings, code blocks, or horizontal rules. It's implemented as a simple subclass of `RecursiveCharacterSplitter` with HTML-specific separators. See the source code to see the HTML syntax expected by default.\n", "\n", "1. How the text is split: by list of `HTML` specific separators\n", "2. How the chunk size is measured: by number of characters" ] }, { "cell_type": "code", "execution_count": 1, "id": "96d64839", "metadata": { "tags": [] }, "outputs": [], "source": [ "from langchain.text_splitter import HtmlTextSplitter" ] }, { "cell_type": "code", "execution_count": 12, "id": "cfb0da17", "metadata": { "tags": [] }, "outputs": [], "source": [ "html_text = \"\"\"\n", "\n", "\n", "
\n", "⚡ Building applications with LLMs through composability ⚡
\n", "⚡ Building applications with LLMs through composability ⚡
\\n