Microsoft Dev Blogs

Document Summarization Solution Patterns using Azure Open AI & Langchain

thumbnail

Document Summarization Solution Patterns using Azure Open AI & Langchain

Introduction

Document summarization is a technique used to extract key information from large documents, helping to generate trends and highlights. However, there are challenges such as input text size limitations, loss of information when summarizing summaries, and redundancy when indexing document summaries for each chunk.

Initial Setup

The solution patterns discussed below use RBAC access for Azure Open AI resources and chunk the text based on document size.

Pattern 1 - Chunking based on Document Size

This pattern chunks the text based on document size and calls the Azure OpenAI API to generate summaries.

Pros: Suitable for summarizing small documents efficiently.

Pattern 2 - Using MapReduce

This pattern involves chunk processing in parallel using Langchain MapReduce.

Pros: Faster processing time as chunks are processed in parallel.

Pattern 3 - Map Reduce Chain

In this approach, each document is mapped to an individual summary using LLMChain, and then a ReduceDocumentChain combines the chunk summaries into a common summary.

Pros: Scalable for large documents, maintains continuity of context between documents, and prevents loss of data that may occur in the Map Reduce approach.

For more details on the implementation and code snippets, please refer to the main document.