The usefulness and outcomes of this LLM-assisted document analysis resulted in adoption of these analyses for the client’s work.
In a high-stakes securities case, Cornerstone Research’s Data Science Center sought to unlock the full potential of document review by integrating cutting-edge large language model (LLM) technology into the workflow. Our team conducted several analyses using a traditional keyword search coupled with manual review and then compared them side by side to LLM-driven methods (semantic search and text retrieval–based query) to evaluate the process, workflow, and results.
The Challenge
The team was tasked with finding what could have been a needle in a haystack—identifying relevant provisions in over 75,000 SEC filings spanning a thirteen-year period. The goal was to find examples comparable to an employment agreement for the defendant, a daunting task that required precise textual interpretation. The team used the two methods to:
- Identify At-Issue Disclosures: This document analysis task required the LLM to perform textual interpretation when classifying whether disclosures fulfilled the established criteria. The LLM-assisted analysis increased the number of relevant examples by 67% when compared to traditional review, with an acceptable false positive rate, which could further be addressed through human verification.
- Retrieve Text for At-Issue Provisions: While this presented a challenge for the traditional keyword search method due to various wording configurations, and because there was unrelated and frequently misidentified information in close proximity to the needed information, the LLM-driven method was able to better navigate this text. The LLM method increased the number of relevant examples by 42%, again with an acceptable false positive rate, which could further be addressed through human verification.
- Find Similar Language in SEC Filings: This approach—using the LLM to identify relevant examples—highlights the streamlining benefits of an LLM-driven process. Using the traditional keyword search approach, the team ultimately had to review 1,395 candidate paragraphs. The LLM-driven analysis yielded just 212 candidate paragraphs, an 85% reduction of scope by reducing false positive matches compared to traditional methods.
Key Benefits
Across all the analyses, compared totraditional methods, the LLM-assisted analysis delivered “true positive” identification (i.e., an accurate identification rate) of 95%–99% and enhanced comprehensiveness. By using the LLM to remove “false positives” from initial keyword searches, broader criteria could be used in the initial search, producing a document count that would normally surpass the constraints for human review and increasing the total number of relevant examples found by 35%–67%.
Impact and Adoption
While the Data Science Center initially embarked on this project as an internal experiment conducted in parallel to the consulting team’s work (with our client’s awareness and consent), the usefulness and outcomes resulted in adoption of these analyses for the client’s work.