Article

The EU AI Act: Using AI to Analyze the Public Response

Introduction

In December 2023, the European Council and European Parliament reached a provisional agreement on a new set of harmonized rules on artificial intelligence—the AI Act. The AI Act is a regulation that aims to ensure safety and adherence with fundamental rights and EU values, while maintaining the objective of encouraging innovation and investment in AI.

The goals of the AI Act are potentially conflicting. Indeed, it is well established that regulation that adds cost and uncertainty may interfere with the competitive process, for example, by reducing R&D incentives. Pursuing conflicting goals is neither rare nor is it necessarily a bad thing, as long as the regulator and the public are aware of the trade-offs involved.

Any important piece of new legislation is bound to raise a heated public debate about the balance between those goals. This article aims to provide high-level insights into the public response to the AI Act and its general sentiment related to AI. To this end, the article analyzes data from social media platform X (formerly Twitter), which provides public insights and commentary from business leaders, policymakers, and the general public.

Data from social media platforms come in the form of free text or “natural language.” In that sense, it differs from structured data, organized in tables with named fields and values that economists would collate and then fit econometric models to. Free text analysis requires a natural language processing (NLP) approach.

NLP allows for the quantitative analysis of text. Despite having existed as a field of study for several decades, it has seen rapid development over the past ten years. The most recent set of large language models (LLMs)—such as GPT4 from OpenAI¹ or LLaMA from Meta²—are examples of NLP models that are very effective in terms of the accuracy with which they can generate text in response to user prompts, often matching or surpassing the quality of human-level responses. It is due to this successful performance that such models are now interchangeably referred to as “AI models.”

On top of generating text, NLP models are also very effective for extracting information from text—for example, in understanding a text’s main topics, or in classifying text into different categories. While many traditional NLP approaches would be suitable for this exercise, the analysis in this article uses an LLM model to analyze the X data. Modern LLMs provide very accurate text extraction results, and are much better suited to contexts such as social media data. Using this model, the analysis unpacks what people are talking about and how they feel about it. Perhaps unexpectedly, reactions to the AI Act are mixed and some topics (such as innovation) are more polarizing than others.

What does the public response demonstrate?

The AI Act was first proposed in April 2021. Since then, there have been around 135,000 posts on the topic,³ with activity peaking in June 2023, when the European Parliament adopted a “compromise” text, and in December 2023, when the Parliament and Council reached a provisional agreement on the Act.

Note: All charts are interactive.

The authors of the analyzed posts include policymakers, industry participants and observers, members of the media, entrepreneurs, and members of the general public. The posts include both factual observations and opinions. For example, a post may simply note a particular milestone in the adoption process, express a concern about a particular aspect of the Act, or celebrate its expected benefits.

Simple manual human review of posts can provide valuable insights—especially when extended to a large sample. However, this approach is not easily replicable and does not scale very effectively. Asking a human to label each post and identify some key factors is both error-prone and time consuming. Automating this work using LLMs can unpack the content at scale with a precisely defined methodology. This is important, for example, when submitting this type of analysis as evidence in legal proceedings.

Using LLMs securely for effective and efficient use at scale is far from trivial. While access to chatbots such as ChatGPT provides a common, accessible interface for public users, running such models securely at a large scale to conduct robust data analysis requires significant specialized infrastructure and data science knowledge.

Selecting and deploying an appropriate LLM requires good judgment and knowledge of the model landscape. Several different LLMs are made available by corporations and researchers and differ in terms of their quality for specific use cases, operational performance, security, and financial cost.

The analysis in this article uses a private, self-hosted model, hosted and deployed by the Cornerstone Research Data Science Center.⁴ Cornerstone Research’s in-house data processing capabilities include running state-of-the-art graphics processing units (GPUs), sophisticated hardware that provides the processing power and memory needed to run LLMs efficiently. This infrastructure—built to address data confidentiality considerations that may arise when relying on third-party hosted models—enables the practical and secure analysis of text data at scale.

A wide range of approaches can be used with an LLM to analyze free text and, more broadly, to understand how the public has been weighing the policy goals of the AI Act. This analysis follows a general approach that is applicable to various types of analysis. This involved the following steps:

Downloaded the full set of posts since April 2021 that contained either the text string “AI Act” or the hashtag “#AIAct”;
Used a privately hosted LLM deployed to Cornerstone Research’s in-house secure data processing server;
Created a query (“prompt”) to provide the model with examples of several posts and asked the model to provide a data structure response for each post containing (i) the main topics of the post; and (ii) a classification on whether the sentiment of the post is positive, neutral, or negative.

While this process is technically sophisticated, it results in a simple output: a dataset containing each post, and labels identifying its sentiment and its identified topics. This dataset provides a clear, systematic labeling of each post and can be easily aggregated and analyzed to identify key trends.

This sort of analysis is designed to provide a high-level overview of trends. In doing so, the analysis assumes—reasonably—that what people are posting is a good reflection of their sentiment about the various goals of the regulation. Any more detailed analysis would need to consider issues such as how credibly a post reflects the stated values.

How does sentiment evolve?

Using sentiment classification, one can observe how the overall sentiment towards the AI Act has evolved.

A few interesting observations emerge at this high level:

The majority of posts are neutral and do not express a particular opinion one way or the other. Consistent with the majority of activity around such technocratic subjects often being factual representations of developments, posts frequently do not present an opinion on the matter—for example, posts may simply be reporting on particular milestones being achieved or presenting links to news articles about particular developments.⁵
With further developments, the strength of opinions has increased over time. Whereas, during the early stages of the AI Act development, the vast majority of posts were neutral, recently, between a third and a half of all posts have expressed a non-neutral position.
Over the analyzed period, negative posts, on average, outweigh positive posts. This may be due to a negativity bias on the internet—those feeling somewhat satisfied with the AI Act would likely not post about it, whereas those feeling negative about it might feel the need to write something. Over time, negativity within the sample of posts has broadly been increasing. In Q4 2022, 11.8% of posts were negative. By Q4 2023, this increased to 26.3%. On the flip side, positivity also increased. In Q4 2022, the percentage of positive posts was 2.7%, which rose to 12.8% by the end of 2023. These broad trends are explored at a more granular level below using further insights from the LLM modeling.

While the predominance of neutrality and the increasing strength of opinions with time is consistent with prior expectations, the predominance of negativity is surprising. Indeed, one would expect legislation that aims to provide safety and protects rights to be perceived favorably by the commentating public. This may be due to the fact that AI developers or business community members—who are more likely to comment compared to an average member of the general public—may feel that the AI Act represents a regulatory overreach. While this issue is outside the scope of this analysis, it raises the questions of how sentiment is split across different categories of individuals and in what ways exactly the negative or positive sentiment is being portrayed.

Variation exists in the geographical dimension, too. Intuitively, one would expect countries perceived to have stronger political leanings toward free markets and innovation to show more negativity. UK and German commentators show a larger amount of negative sentiment (22%–23%) toward the AI Act than their compatriots in Belgium (15%). However, this does not hold universally, with Sweden showing the highest level of negativity (28%). Similarly, across the Atlantic, U.S. commentators show a similar level of pessimism as the UK and Germany at 22%, while Canadian posts show substantially more negativity at 27%.⁶

Note: Only includes data for geographies with more than 50 posts. Due to data source restrictions not all posts have an identified geography.

What are people posting about?

Many possible technical approaches can be used to summarize the discussion content. Indeed, with modern AI models, one could even ask, “Could you please summarize the discussion?” While interesting, this leaves a bit of analytical ambiguity given the opaque process by which a summary is obtained—the usual “black box” critique of AI systems. Alternatively, researchers can conduct an exercise involving an NLP model to generate labels at the individual-post level and then use reasoning to analyze trends.

The process of generating labels summarizing content is often called “topic modeling.” Topic modeling is an NLP task that aims to identify the main themes or topics within a body of text. It has existed as a research field for many years; however, recent developments in LLMs allow this type of exercise to be undertaken with much more accuracy.

Using this LLM approach identifies a number of topics in the dataset. Since NLP models are agnostic to what should be considered a relevant topic, the range of results is wide—consistent with the diversity of the topics being discussed. The dataset in this article identified thousands of topics, some very infrequent (e.g., “autocorrect,” “cartel screening,” and “theology”), and some very frequent (e.g., “human rights,” “surveillance,” and “innovation”).

Word cloud of word frequency to tweet reactions from the AI Act regulation.

The main topics that emerge—shown in the above word cloud⁷—are broadly consistent with what an informed observer would expect. Several conceptual topics feature prominently: “innovation” was the most commonly discussed topic, followed by “transparency,” “compliance,” “human rights,” “fundamental rights” Other interesting topics that emerge are the General Data Protection Regulation or GDPR (with commentators drawing comparisons to the legislation), ChatGPT (arguably the model that pushed LLMs into the mainstream media), foundation models, facial recognition, surveillance, ethics, open source, predictive policing, discrimination, safety, competition, privacy, and copyright.

How is sentiment on these topics changing?

For any of these topics, the dataset allows for a closer inspection of what the sentiment is and how it evolves. The below chart analyzes the evolution of sentiment for five topics relevant to multiple policy areas – competition, innovation, privacy, rights, and transparency.

Over the analysis period, there is minimal positive commentary on the impact of the Act on competition. Negative sentiment predominates—suggesting a general belief that AI regulation could have an adverse impact on competition. In Q4 2023, positive sentiment overwhelmingly dominates. Manual inspection finds that these are primarily posts celebrating the completed negotiations of the AI Act in December 2023.

Somewhat surprisingly, on the topic of innovation, sentiment is more positive than negative. Many advocates of the AI Act have maintained that there may, in fact, be more—and better—innovation as a result of having clearly defined rules in this area, and this is seen in the frequency of positive innovation-related posts.

In general, one would expect to see more positivity and less negativity around the areas that the AI Act aims to make improvements—in rights, privacy, and transparency. This does not seem to be the case. While more positivity on rights is seen in 2023 compared 2022, the amount of negative sentiment is substantial and sustained. The same is true of the topics of privacy and transparency.

What are the implications? If feedback from internet commentary is to be believed, then this suggests that lawmakers still have some work to do to convince the public that the AI Act will indeed provide the protections that it aims to, and that the intended innovation effects will be beneficial.

Conclusion: gaining insights through intelligent automated approaches

Text is a very rich source of information—by harnessing it effectively, researchers can derive insights systematically and at scale from a variety of sources. Using the latest LLMs allows the analysis of topical issues such as new legislation but can equally be used to review and summarize evidence in legal proceedings, to enrich the data used in economic analysis, or to introduce efficiencies in how economists and lawyers do work.

^{1 Achiam, J., at al. Gpt-4 technical report. arXiv preprint arXiv:2303.08774 (2023). Available at https://doi.org/10.48550/arXiv.2303.08774}

^{2 Touvron, Hugo, et al. “Llama: Open and efficient foundation language models.” arXiv preprint arXiv:2302.13971 (2023). Available at https://arxiv.org/abs/2302.13971}

^{3 As identified by all posts (formerly “tweets”) with the text “AI Act” or those with the hashtag #AIAct}

^{4 The use of self-hosted models addresses and limits the data confidentiality considerations, which may sometimes arise when relying on third-party cloud-hosted models, as is currently the case with alternative LLM model offerings.}

^{5 Manual checks of randomly selected classifications were conducted to verify that posts were correctly classified as neutral, as opposed to simply being difficult to classify.}

^{6 As expected, those countries with the most negative sentiment also have the least positive sentiment, so the inverse holds true.}

^{7 The size of the word indicates the frequency of the topic. This excludes “reposts” – i.e. where others share a post. Including these does not dramatically alter the results but places a lot of weight on some individual posts, potentially skewing the interpretation. This also excludes the topics of “AI,” “the AI Act,” and “regulation” – since by construction all posts are about these topics.}

Acknowledgements: Gregor Langus provided valuable insights and ideas in the preparation of this article. Sam Tauke of the Data Science Center provided the infrastructure needed to conduct the analysis and in preparing the charts.

The views expressed herein are solely those of the author and do not necessarily represent the views of Cornerstone Research.

Author

London

Rashid Muhamedrahimov

Director, Applied Research Center