5 Questions with Jonah Berger: Using Artificial Intelligence and Machine Learning in Litigation

Share

5 Questions is a periodic feature produced by Cornerstone Research, which asks our affiliated experts, senior advisors, and professionals to answer five questions.

We interview Professor Jonah Berger, of the Wharton School, University of Pennsylvania, to gain his insights on the role of artificial intelligence (AI) and machine learning (ML) in litigation. Professor Berger discusses key tools that experts can use to bring rigorous, systematic analyses to large, unstructured, and previously difficult-to-analyze datasets.

How can you use AI and ML in litigation?

AI and ML tools have broad application for a wide variety of cases and can be powerful tools particularly when analyzing voluminous amounts of content, including text, images, and video. AI and ML can systematically and rigorously examine these types of content to answer specific questions that arise in litigation.

For example, in a case with false marketing claims, AI and ML can shed light on questions such as:

  • What types of claims was a company making?
  • How was it making those claims (e.g., in text, in images)?
  • How frequently did it make the claims?
  • How did the claims compare to the marketing claims other companies in the same industry were making?
  • How did consumers react to the claims?
  • Is there a relationship between the company’s claims and consumer sentiment or consumer purchasing decisions?

Overall, AI and ML are exciting areas for expert work that can help bring rigorous, systematic methods to large, unstructured datasets that previously were difficult to analyze.

Can you offer specific examples of AI and ML analyses that you can use to try to answer these questions?

There are a number of examples of AI and ML analyses that one can use to address these questions. The analyses range from the simple to the more complex. A few examples are:

  • Sentiment analysis: An intuitive way to assess how consumers reacted to a company’s marketing materials—or other aspects of the company (e.g., the introduction of new products)—is sentiment analysis. ML algorithms can evaluate the sentiment of text or images in documents like social media posts or customer reviews, classifying them as positive, negative, or neutral. Sentiment analysis can range from the simple to the complex. In the simple version, a researcher can employ a pre-trained algorithm on text to glean the text’s sentiment. In the more complex version, the researcher can create a custom algorithm by training a model via manual categorization of a dataset of documents as positive, negative, or neutral. I discuss the process of training a dataset further below.
  • Word or object detection: Another ML analysis on the simpler side is searching for and identifying certain content within a document or a set of documents.I’ll talk about this analysis in the context of an example, in which there is an allegation that a paint company was deceptively and inaccurately marketing itself as environmentally friendly. An initial question in a case like this is the first question I raised above: “What types of claims was a company making?” That is, how was the company saying that it was environmentally friendly? Did it use the terms “environmentally friendly” or “green”? If so, how frequently?

    Similarly, on the images or video front, how and how often did the company portray itself as environmentally friendly? Did it include environmentally friendly images in its marketing materials (e.g., a recycling symbol, a picture of an outdoor scene)? Simple, computerized searches of the company’s marketing materials can shed light on answers to these questions.

  • Topic modeling: If the researcher does not know the most appropriate words or images for which to search, then the researcher can use topic modeling, a more complex form of AI and ML. Topic modeling takes a set of text, often across many documents, and generates a list of topics that the text discusses. An algorithm reviews the text and generates a researcher-defined number of topics by looking for words that appear together frequently and across many of the documents.

    Returning to the paint company “environmentally friendly” example, if the set of documents the researcher is analyzing is a year’s worth of the company’s marketing materials, then a topic model algorithm may find words associated with different marketing campaigns that the company ran during the year. For example, the algorithm may generate a topic based on words related to a major sale that the company had: words such as “sale,” “discount,” and “bargain.” Similarly, the algorithm may generate a topic based on words related to the environmentally friendly claim—words such as “green,” “environment,” or “recycle.” Once the researcher’s model has generated the topics, it is possible to analyze when, how frequently, and at what levels of intensity the company’s marketing materials discussed each topic at different points in time.

    The researcher can also apply topic modeling to other types of documents, such as consumer social media posts or public press articles, to assess what these materials convey about the company.

  • Classifying content: A researcher can also train an algorithm to classify content to help answer the questions I posed above. Below are two examples of algorithmically trained content classifiers.
    • Object detection: A more complex implementation of the image object detection analysis that I discussed earlier is image object detection based on a trained algorithm. To build an image object detection algorithm, a researcher must first curate a collection of images of two types. The first type of image is one that contains the object of interest, ideally in a variety of settings across images. The second type is one that does not contain the object of interest, but contains similar-looking objects, again, ideally in a variety of settings. By exposing the algorithm to the object of interest and to examples of what is not the object of interest, in a variety of settings, the researcher can train the algorithm to identify the object of interest in a number of visual contexts.

      To return to the paint company example, say the researcher is interested in how frequently the recycling symbol appears in the company’s marketing images. The researcher can collect a large dataset of two types of images: (i) images of the recycling symbol, and (ii) images of other arrows that are not the recycling symbol. The researcher can then use both types of images to train the algorithm to differentiate what is versus what is not the recycling symbol. Once the algorithm is trained, the researcher can then use it to analyze the company’s—or a competitor’s—marketing materials. For example, the researcher can ask how frequently the recycling symbol appears in the company’s marketing materials. Through this process, the researcher can estimate how frequently and when in time the company—or its competitors—used certain objects in their marketing materials.

    • Comparison classifier: A second type of algorithmic content classifier can address another of my initial questions: “How did the company’s marketing claims compare to the marketing claims that other companies in the same industry were making?” This algorithmic classifier evaluates how similar certain content is to other content, not just to an object.

      One can evaluate either text or image content with this type of algorithmic content classifier. To illustrate, let me return to the allegation of an outdoor paint company deceptively and inaccurately marketing itself as an environmentally friendly company. In this example, it could be helpful to empirically test whether the company’s marketing materials did, in fact, characterize the company as “environmentally friendly,” according to some benchmark. The issue is that a benchmark may not be immediately obvious. But with an ML-based algorithmic content classifier, other companies may serve as a viable benchmark. If that is the case, the researcher can collect the marketing materials of other companies or brands that are known to be “environmentally friendly,” as well as other companies or brands that are known not to be “environmentally friendly,” and then use those companies’ and brands’ materials to train an algorithm on what does—and does not—constitute environmentally friendly marketing.

      With the trained algorithm, the researcher can input the at-issue company’s allegedly environmentally friendly marketing materials and see whether the algorithm classifies the materials as more similar to the “environmentally friendly” training materials or to the non-“environmentally friendly” training materials. Algorithmic content classifiers like these can be helpful in systematically evaluating claims that content has a certain characteristic or is similar to other types of content, according to the benchmarks of other companies or brands.

These different types of AI and ML analyses, ranging from the simple to the sophisticated, are examples of specific AI and ML analyses that a researcher can use to address questions that arise in litigation.

What data do you need to perform these AI and ML analyses?

Thanks to advances in computing power and methods to analyze unstructured data, really any type of digitized data can be analyzed. For example, a researcher can evaluate digitized versions of:

Thanks to advances in computing power and methods to analyze unstructured data, really any type of digitized data can be analyzed.

  • A company’s marketing materials, as well as other companies’ marketing materials. These materials can include digitized scans of traditional media, such as magazine advertisements, mailers, and banners in retail stores, as well as electronic versions of newer media, such as social media posts and banner advertisements on websites.
  • Third-party discussion about the company. These discussions can include social media posts mentioning the company (both the text and the images of the posts), as well as consumer discussions of the company in product reviews or customer management databases. These data are abundant, granular, time-stamped, and contain a lot of information that consumers express about companies and companies’ products. As such, these data are ripe for analyses of consumer perception of and sentiment toward companies and products at different points in time. The time element also allows the researcher to see how word-of-mouth might be driving sentiment about a company.

    In both my academic work and my expert witness work, I have analyzed these “new” forms of data, such as the text and images from X (formerly Twitter) posts.

Social media posts [and] … consumer discussions of the company in product reviews or customer management databases … are ripe for analyses of consumer perception of and sentiment toward companies and products at different points in time.

  • Third-party discussions in public press articles about the company, such as those appearing in newspapers and magazines. For example, in one of my academic research papers, my coauthor and I analyzed 7,000 articles from the New York Times.1 I have also analyzed public press articles in my expert work, which can be particularly useful when conducting a topic modeling analysis about a company or product over time.
In what types of cases have you personally applied these AI and ML analyses?

I have applied these techniques in a number of matters. I have used the techniques to assess allegations of false or deceptive marketing, allegations of defamation, and allegations of harm to brand equity due to changing consumer perceptions. I have also seen other experts use these types of analyses, to varying degrees, in their expert reports.

What other types of cases could potentially benefit from AI and ML analyses?

Really, any case where you would like to analyze content in a systematic, rigorous way can have opportunities for AI and ML. Other types of cases that come to mind are those involving antitrust and finance issues.

In an antitrust matter, for example, a researcher could use ML to help answer questions about how products compete. ML could shed light on questions such as: Do consumers consider the products of Company A and of Company B to be close substitutes? What products do consumers consider to be substitutes for the products of Company A if consumers do not really view Company B’s products as a substitute? For the first question, a researcher could examine how often and in what context consumers mention Company B when they are talking about Company A, or vice versa, in social media posts or on review websites. The researcher could also evaluate public press or analyst reports. Co-occurrences of different companies in these sources can help the researcher glean the extent to which consumers and other third parties believe different products compete.

In a finance case, AI and ML, applied correctly and being mindful of the particulars of the case, could help a researcher understand the types of publicly available information accessible to market participants.

The views expressed herein are solely those of the author and do not necessarily represent the views of Cornerstone Research.

Interviewee

Jonah Berger

Jonah Berger

Associate Professor of Marketing,
The Wharton School,
University of Pennsylvania