Article

How AI and Machine Learning Are Impacting the Litigation Landscape

Mike DeCesaris and Sachin Sancheti detail how expert witnesses are incorporating artificial intelligence and machine learning into their testimony in a variety of civil cases.

Artificial intelligence has long been present in our everyday activities, from a simple Google search to keeping your car centered in its lane on the highway. The public unveiling of ChatGPT in late 2022, however, brought the power of AI closer to home, making it accessible to anyone with a web browser. And in the legal industry, we are seeing the use of AI and machine learning ramp up in litigation, especially when it comes to expert witness preparation and testimony.

The support of expert witnesses has always required leading-edge analytical tools and data science techniques, and AI and machine learning are increasingly important tools in experts’ arsenals. The concept of technology being able to “think” and make decisions, accomplishing tasks more quickly and with better results than humans, conjures thoughts of a “Jetsons-like” world run by robots. However, unlike the old Jetsons cartoons of the 1960s, where flying cars were the de facto mode of transport and robot attendants addressed every need, the “futuristic” ideas around the impact of AI were not that far off from a rapidly approaching reality. In fact, as older, rules-based AI has evolved into machine learning (ML) where computers are programmed to accurately predict outcomes by learning from patterns found in massive data sets, the legal industry has found that AI can do far more than many imagined.

In the world of litigation, the power of AI and ML have been understood for years by law firms and economic and financial consulting firms. AI is ideally suited to support, qualify, and substantiate expert work in litigation matters, which formerly relied on a heavily manual process to improve the efficiency or quality of the data presented in testimony. Moreover, over the last several years, AI and ML have been used directly in expert testimony by both plaintiff and defense side experts.

Somewhat ironically, humans are at least partially responsible for driving the increased use of AI and ML in expert work as we produce ever-growing volumes of user-generated content. Consumer reviews and social media posts, for example, are becoming increasingly relevant in regulatory and litigation matters, including consumer fraud and product liability cases. The volume of this content can be overwhelming, so one familiar approach involves leveraging keywords to identify a more manageable subset of data for review. This is limiting, however, as it often produces results that are irrelevant to the case while omitting relevant results containing novel language. By contrast, ML-based approaches can consider the entire text, using context and syntax to identify the linguistic elements that most accurately indicate relevance.

To see this approach in action, consider litigation involving alleged marketing misrepresentations or defamatory statements, which require an examination of the at-issue content. The most robust analyses are systematic and objective, making them ideal for outsourcing to the noncontroversial training data and impartial models that are hallmarks of state-of-the-art AI and ML approaches.

AI and ML have also proven to be valuable tools for experts across a broad spectrum of consumer fraud and product liability matters. While some scenarios may be obvious, humans possess the creativity to adapt a solution to other use cases. Here, these novel uses include:

Domain-specific sentiment analysis – Publicly available sentiment models perform well on many problems but often fail on tasks that feature domain-specific linguistic structures. Such failure might arise when tasked with measuring the sentiment surrounding an entity in an industry whose discussion features novel or counterintuitive language. Consider a defamation suit filed by a fitness influencer. Terms like “confusion,” “resistance,” and “to failure” generally have negative connotations, but in the fitness space, are often used to describe a successful workout. Likewise, slang terms like “guns” and “shredded” mean something entirely different in the fitness context than in conventional use. In these cases, a general-purpose sentiment model may mischaracterize or overlook such language, while training a domain-specific sentiment model will provide a more accurate assessment of the sentiment contained in allegedly defamatory statements. This training process could involve gathering hundreds of thousands of user-generated reviews for industry products, and then directing a context-aware language model to predict the review score from the text. This custom model will quantify the polarity of the discussion surrounding the influencer, which can then be tracked through time and around certain critical events.

Assessing marketing influence on social media – To assess allegations that a company steered an online discussion through social media marketing, AI and ML can compare the company’s posts to those generated by unaffiliated users (earned media). This can be done using language models and text similarity metrics that quantitatively and objectively assess whether earned media immediately following the company’s posts were more like the company’s posts than either earned media preceding the posts or selected at random.

Image object detection – To assess the incidences of client logos and products appearing across images posted to social media, a custom object detection model can be trained and applied to a random sample of millions of social media images.

Public press topic modeling – To quantify the extent and timing of the public awareness of a marketing claim at issue, AI and ML can be applied to articles published in media outlets. This approach helps isolate the at-issue topic from other closely related but distinct topics. Such distinctions can then facilitate an analysis that is more narrowly focused on the claim at hand.

Multimedia characterization – Where there are allegations of product misrepresentation or improper marketing, AI and ML can characterize the nature of a company’s social media presence. A model trained on text and image content from unaffiliated but topically relevant brands can learn to distinguish content along the lines of broad brand identities (e.g., healthy vs. unhealthy, eco-friendly vs. climate-damaging). Applying such a model to at-issue social media content can quantify whether it conveys each of these brand features.

The nature of allegedly defamatory statements – Even in the presence of clearly negative statements, defamation is notoriously difficult to prove. Defendants may claim that statements were expressed not as fact but as opinion, possibility, entertainment or satire. By leveraging datasets and models that identify the degree of certainty present in natural language examples, experts can objectively measure the degree to which reasonable consumers may interpret the information as fact.

Product liability – One growing area of research concerns the quantification and isolation of specific entities referenced in a broader text. Product liability cases, for instance, may examine user-generated product reviews to identify the importance and sentiment surrounding at-issue product features. Rather than assess the review as a whole, aspect-based sentiment analysis focuses on at-issue features only, allowing for the extraction of strong indicators from nuanced or mixed reviews.

Class certification – A successful class certification challenge will demonstrate that the circumstances of putative class members were sufficiently varied to require individual treatment. Any of the methods discussed above can be taken together to quantify the heterogeneity of the at-issue materials. For example, a case concerning marketing misrepresentations may train a classifier to distinguish at-issue marketing content from content not at issue, model the topics targeted throughout multiple distinct marketing campaigns, and summarize images to demonstrate differing appeal to different consumers.

For centuries, the ability of humans to mold available resources to serve their needs has separated them from less-evolved species. We see it in all walks of life, and the above examples demonstrate it in our small corner of the world. And we will continue to see it as the availability of voluminous social media and other user-generated data continues to expand and become more complex. In its simplest terms, AI and ML are critical in helping us efficiently search through the “haystack” to find the “needle.” Those who try to find the needle by hand will inevitably be left behind.

This article was originally published by Law.com in March 2023.

The views expressed herein are solely those of the authors and do not necessarily represent the views of Cornerstone Research.

Authors

San Francisco

Mike DeCesaris

Vice President, Data Science Center

New York

Sachin Sancheti

Vice President

Cookie	Duration	Description
AWSELB	session	Associated with Amazon Web Services and created by Elastic Load Balancing, AWSELB cookie is used to manage sticky sessions across production servers.
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
CookieLawInfoConsent	1 year	CookieYes sets this cookie to record the default button state of the corresponding category and the status of CCPA. It works only in coordination with the primary cookie.
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Cookie	Duration	Description
__cf_bm	30 minutes	This cookie, set by Cloudflare, is used to support Cloudflare Bot Management.
_cfuvid	session	The _cfuvid cookie is used to allow the Cloudflare WAF to distinguish individual users who share the same IP address. Visitors who do not provide the cookie are likely to be grouped together and may not be able to access the site if there are many other visitors from the same IP address.
cf_clearance	1 year	The cf_clearance cookie is used by Cloudflare to verify that visitors have successfully passed a security challenge and can access the website.
PBSECURESUSID	session	This cookie is set by the provider Podbean. This is a session cookie used to verify that the users are on secure sessions. It helps iin implementing audio files on the website.
wpEmojiSettingsSupports	session	WordPress sets this cookie when a user interacts with emojis on a WordPress site. It helps determine if the user's browser can display emojis properly.

Cookie	Duration	Description
_ga	2 years	The _ga cookie, installed by Google Analytics, calculates visitor, session and campaign data and also keeps track of site usage for the site's analytics report. The cookie stores information anonymously and assigns a randomly generated number to recognize unique visitors.
_ga_*	1 year 1 month 4 days	Google Analytics sets this cookie to store and count page views.
_gat_UA-*	1 minute	Google Analytics sets this cookie for user behaviour tracking.
_gat_UA-12672498-1	1 minute	A variation of the _gat cookie set by Google Analytics and Google Tag Manager to allow website owners to track visitor behaviour and measure site performance. The pattern element in the name contains the unique identity number of the account or website it relates to.
_gid	1 day	Installed by Google Analytics, _gid cookie stores information on how visitors use a website, while also creating an analytics report of the website's performance. Some of the data that are collected include the number of visitors, their source, and the pages they visit anonymously.
vuid	2 years	Vimeo-generated ID used for generating analytics information for the video owner.

Cookie	Duration	Description
_guid	90 days	linkedin.com - Used to identify a LinkedIn Member for advertising through Google ads - LinkedIn
AMCVS_14215E3D5995C57C0A495C55%40AdobeOrg	session	.linkedin.com - Indicates the start of a session for Adobe Experience Cloud - Adobe
AnalyticsSyncHistory	30 days	.linkedin.com - Used to store information about the time a sync took place with the lms_analytics cookie - LinkedIn
bcookie	1 year	.linkedin.com - Browser Identifier cookie used for diagnostic purposes. - LinkedIn
dfpfpt	2 years	.linkedin.com - Unique user identifier to prevent abuse in payment workflows for LinkedIn - LinkedIn
fptctx2	session	.linkedin.com - Used to prevent abuse in payment workflows for LinkedIn - Microsoft
gpv_pn	6 months	.linkedin.com - Used to retain and fetch previous page visited in Adobe Analytics - Adobe
lang	session	.linkedin.com - Used to remember a user's language setting to ensure LinkedIn.com displays in the language selected by the user in their settings. - LinkedIn
li_gp	1 year	.linkedin.com - Stores privacy preferences for guests to LinkedIn - LinkedIn
li_sugr	90 days	.linkedin.com - Used to make a probabilistic match of a user's identity - LinkedIn
liap	1 year	.linkedin.com - Used by non-www.domains to denote the logged in status of a member - LinkedIn
lidc	24 hours	.linkedin.com - To facilitate data center selection - LinkedIn
lms_ads	30 days	.linkedin.com - Used to identify LinkedIn Members off LinkedIn for advertising - LinkedIn
lms_analytics	30 days	.linkedin.com - Used to identify LinkedIn Members off LinkedIn for analytics - LinkedIn
s_cc	session	.linkedin.com - Used to determine if cookies are enabled for Adobe Analytics - Adobe
s_fid	180 days	.linkedin.com - Unique identifier for Adobe Analytics - Adobe
s_ips	session	.linkedin.com - Tracks percent of page viewed - Adobe
s_plt	session	.linkedin.com - Tracks the time that the previous page took to load - Adobe
s_ppv	session	.linkedin.com - Used by Adobe Analytics to retain and fetch what percentage of a page was viewed - Adobe
s_sq	session	.linkedin.com - Used to store information about the previous link that was clicked on by the user by Adobe Analytics - Adobe
s_tp	session	.linkedin.com - Tracks percent of page viewed - Adobe
s_tslv	6 months	.linkedin.com - Used to retain and fetch time since last visit in Adobe Analytics - Adobe
UserMatchHistory	30 days	linkedin.com - Used for id sync process. It stores the last sync time to avoid repeating the syncing process in a frequent manner - LinkedIn