Basic usage:
MarkdownHeaderTextSplitter strips headers being split on from the output chunk’s content. This can be disabled by setting strip_headers = False.
The default
MarkdownHeaderTextSplitter strips white spaces and new lines. To preserve the original formatting of your Markdown documents, check out ExperimentalMarkdownSyntaxTextSplitter.How to return Markdown lines as separate documents
By default,MarkdownHeaderTextSplitter aggregates lines based on the headers specified in headers_to_split_on. We can disable this by specifying return_each_line:
metadata for each document.
How to constrain chunk size:
Within each markdown group we can then apply any text splitter we want, such asRecursiveCharacterTextSplitter, which allows for further control of the chunk size.