Legal summarization
Introduction
This guide demonstrates how to set up an automated legal document summarization system using Not Diamond. Efficient legal document summarization can significantly enhance a company's legal operations, saving time and resources while improving comprehension of complex legal texts.
Why Use Not Diamond for Legal Summarization?
Not Diamond offers several advantages for legal document summarization:
- Optimal Model Selection: Not Diamond intelligently routes requests to the best Large Language Models (LLMs) for each specific legal document, ensuring accurate and relevant summaries.
- Adaptability: Different legal documents may require varying levels of detail or expertise. Not Diamond selects the most appropriate model based on the document's complexity and type.
- Cost Efficiency: By choosing the most suitable model for each task, Not Diamond optimizes LLM usage, potentially reducing costs for large-scale document processing.
- Scalability: As new, specialized legal AI models become available, Not Diamond can easily incorporate them into its routing decisions, keeping your summarization system up-to-date.
What You'll Learn
By the end of this guide, you will be able to:
- Set up a Not Diamond client for legal document summarization.
- Create a basic legal document summarization function.
- Implement document type detection for tailored summarization.
- Handle different types of legal documents (contracts, patents, court opinions).
- Generate structured summaries with key information extraction.
Prerequisites
- Python 3.10 or later.
- Basic knowledge of Python programming.
- API key for Not Diamond.
- Access to legal documents for summarization (ensure compliance with all relevant privacy and confidentiality regulations).
Step 1: Installation
First, install Not Diamond using pip:
pip install notdiamond[create]
Step 2: Setting Up the Not Diamond Client
Before using Not Diamond, you need to set up your API keys. We recommend using a .env
file for secure storage of your API keys.
- Create a
.env
file in your project root:
NOTDIAMOND_API_KEY='your-notdiamond-api-key'
OPENAI_API_KEY='your-openai-api-key'
ANTHROPIC_API_KEY='your-anthropic-api-key'
- Install the
python-dotenv
package:
pip install python-dotenv
- Set up the Not Diamond client in your Python script:
import os
from dotenv import load_dotenv
from notdiamond import NotDiamond
# Load environment variables
load_dotenv()
# Initialize the Not Diamond client
client = NotDiamond()
Note: Replace 'your-notdiamond-api-key'
, 'your-openai-api-key'
, and 'your-anthropic-api-key'
in the .env
file with your actual API keys. Ensure you follow best practices for handling API keys, such as using environment variables or a secure secrets manager.
Step 3: Creating a Basic Legal Document Summarization Function
Let's create a function that summarizes legal documents using Not Diamond:
def summarize_legal_document(document):
messages = [
{"role": "system", "content": "You are a legal AI assistant specialized in summarizing legal documents. Provide concise, accurate summaries that capture the key points of the document."},
{"role": "user", "content": f"Please summarize the following legal document:\n\n{document}"}
]
result = client.chat.completions.create(
messages=messages,
model=['openai/gpt-4', 'anthropic/claude-3-5-sonnet-20240620']
)
return result.content, result.session_id, result.provider.model
# Example usage
legal_document = """
[Insert a sample legal document here, such as a contract or legal opinion]
"""
summary, session_id, model_used = summarize_legal_document(legal_document)
print(f"Summary:\n{summary}")
print(f"Session ID: {session_id}")
print(f"Model used: {model_used}")
Explanation:
- We define a function
summarize_legal_document
that takes a legal document as input. - We construct the
messages
array with a system prompt and the user's request. - We call
client.chat.completions.create
with the messages and a list of models to choose from. - The function returns the summary content, session ID, and the model used.
Step 4: Implementing Document Type Detection
To provide more tailored summaries, we'll add document type detection:
import re
def detect_document_type(document):
document = document.lower()
if re.search(r'\b(agreement|contract|party|parties)\b', document):
return 'contract'
elif re.search(r'\b(patent|invention|claim|claims)\b', document):
return 'patent'
elif re.search(r'\b(court|opinion|judge|ruling|plaintiff|defendant)\b', document):
return 'court_opinion'
else:
return 'general_legal'
def summarize_legal_document(document):
doc_type = detect_document_type(document)
system_messages = {
'contract': "You are a legal AI assistant specialized in summarizing contracts. Focus on key terms, parties involved, obligations, and important clauses.",
'patent': "You are a legal AI assistant specialized in summarizing patents. Focus on the invention description, claims, and any novel aspects.",
'court_opinion': "You are a legal AI assistant specialized in summarizing court opinions. Focus on the key facts, legal issues, holdings, and reasoning.",
'general_legal': "You are a legal AI assistant specialized in summarizing legal documents. Provide a concise summary capturing the key points and legal implications."
}
messages = [
{"role": "system", "content": system_messages[doc_type]},
{"role": "user", "content": f"Please summarize the following legal document:\n\n{document}"}
]
result = client.chat.completions.create(
messages=messages,
model=['openai/gpt-4', 'anthropic/claude-3-5-sonnet-20240620']
)
return result.content, result.session_id, result.provider.model, doc_type
# Example usage
legal_documents = [
"This Agreement is made between Party A and Party B...",
"United States Patent 1234567: A novel method for...",
"IN THE SUPREME COURT OF THE UNITED STATES: The opinion of the court was delivered by Justice...",
"LEGAL MEMORANDUM: Regarding the application of statute XYZ to..."
]
for doc in legal_documents:
summary, session_id, model_used, doc_type = summarize_legal_document(doc)
print(f"\nDocument Type: {doc_type}")
print(f"Summary:\n{summary}")
print(f"Session ID: {session_id}")
print(f"Model used: {model_used}")
Explanation:
- We use regular expressions to detect the document type.
- We tailor the system message based on the detected type.
- The function now returns the document type along with the summary.
Step 5: Generating Structured Summaries with Key Information Extraction
To make the summaries more useful, let's generate structured summaries that extract key information:
def generate_structured_summary(document):
doc_type = detect_document_type(document)
system_messages = {
'contract': """
You are a legal AI assistant specialized in summarizing contracts. Provide a structured summary with the following sections:
1. **Parties Involved**
2. **Key Terms**
3. **Obligations**
4. **Important Clauses**
5. **Duration and Termination**
6. **Governing Law**
""",
'patent': """
You are a legal AI assistant specialized in summarizing patents. Provide a structured summary with the following sections:
1. **Invention Title**
2. **Inventor(s)**
3. **Brief Description**
4. **Key Claims**
5. **Novel Aspects**
6. **Potential Applications**
""",
'court_opinion': """
You are a legal AI assistant specialized in summarizing court opinions. Provide a structured summary with the following sections:
1. **Case Name**
2. **Court and Date**
3. **Key Facts**
4. **Legal Issues**
5. **Holdings**
6. **Reasoning**
7. **Implications**
""",
'general_legal': """
You are a legal AI assistant specialized in summarizing legal documents. Provide a structured summary with the following sections:
1. **Document Type**
2. **Key Parties**
3. **Main Subject Matter**
4. **Important Points**
5. **Legal Implications**
6. **Action Items (if any)**
"""
}
messages = [
{"role": "system", "content": system_messages[doc_type]},
{"role": "user", "content": f"Please provide a structured summary of the following legal document:\n\n{document}"}
]
result = client.chat.completions.create(
messages=messages,
model=['openai/gpt-4', 'anthropic/claude-3-5-sonnet-20240620']
)
return result.content, result.session_id, result.provider.model, doc_type
# Example usage
legal_documents = [
"This Agreement is made between Party A and Party B...",
"United States Patent 1234567: A novel method for...",
"IN THE SUPREME COURT OF THE UNITED STATES: The opinion of the court was delivered by Justice...",
"LEGAL MEMORANDUM: Regarding the application of statute XYZ to..."
]
for doc in legal_documents:
summary, session_id, model_used, doc_type = generate_structured_summary(doc)
print(f"\nDocument Type: {doc_type}")
print(f"Structured Summary:\n{summary}")
print(f"Session ID: {session_id}")
print(f"Model used: {model_used}")
Explanation:
- The
generate_structured_summary
function creates a structured summary based on the document type. - We use formatted strings and bold headings for clarity.
- This approach helps legal professionals quickly grasp essential points.
Conclusion
In this guide, we've built a versatile legal document summarization system using Not Diamond. This system can:
- Handle various types of legal documents (contracts, patents, court opinions, and general legal documents).
- Detect document types automatically.
- Generate structured summaries with key information extraction.
By leveraging Not Diamond's intelligent routing capabilities, this summarization system provides high-quality, tailored summaries while optimizing for both performance and cost. As you continue to develop and refine your legal document summarization system, consider expanding on this foundation to include advanced features such as:
- Integration with document management systems.
- Summarization quality assessment and validation.
- Comparison and analysis of multiple related documents.
- Extraction and indexing of specific clauses or terms across a large corpus.
By following this guide, you have a powerful tool to enhance legal operations within your organization, enabling efficient and accurate summarization of complex legal documents.
Updated 3 months ago
If you'd like to have the model read from long documents and files, check out our guide on Retrieval Augmented Generation (RAG) workflows 👇