Big Data Analysis and Infrastructure

The rapid growth in the volume of data collected and generated by companies across nearly every industry has created new challenges and opportunities for analyzing data at scale.

We regularly work with substantial volumes of real-time and historical data, with individual tables comprising hundreds of billions of records.

Our in-house massively parallel processing capabilities and a team of programming specialists ensure that we can conduct large-scale data analytics efficiently and effectively. Our specialized expertise and secure computing facilities allow us to maintain, update, and manage the public and private data used to support expert consulting and testimony.

These resources allow us to ensure ongoing data integrity and navigate the evolving data needs specific to the numerous stages over the potentially long-range horizon of litigation, regulatory, and investigation matters. Our cutting-edge information-retrieval and management tools minimize business disruption and create efficiencies for our clients.

Client Data Production

Clients frequently rely on us to compile large datasets from disparate sources and incompatible formats. We also manage outside vendors for data entry, extraction, and reconciliation to minimize the cost of construction.

We help counsel manage the discovery and data production process and work with clients to thoughtfully and efficiently extract information in anticipation of the analytical needs of subsequent phases of work as well as in response to direct requests from regulators or litigants.

Cigna’s Acquisition of Express Scripts
Supported economic analysis and data production.
Addressed issues involving large healthcare claims data analytics, patient record linkage, and de-identification.
Analyzed overdraft fee practices in consumer financial services class actions, including individual account-level transaction data for millions of customer accounts.
Collected and processed millions of data records for day-ahead, day-of, and real-time California energy markets. Our work entailed analyzing detailed and complex bid, price, and settlement data for all participants in each market.
Analyzed large databases that contained transaction-by-transaction and quote-by-quote data on all equity trades and quotes in U.S. equity markets for five years. Analyzed equity audit trails from National Association of Securities Dealers files containing information on every quote and transaction by individual market makers.
Anderson News LLC et al. v. American Media Inc. et al. Analyzed terabytes of delivery and sales data for more than 120,000 individual retail outlets and thousands of magazine titles.

Secure Analytics Infrastructure

The rapid growth in the volume of data collected and generated by companies across nearly every industry has created new challenges and opportunities for analyzing data at scale. Cornerstone Research has heavily invested in secure, on-premises analytics infrastructure, including sophisticated, high-performance and high-throughput hardware and software. We are also experienced in leveraging cloud computing capabilities for surge storage or compute capacity.

Deployed server infrastructure interfacing with various cryptocurrency networks to collect and analyze hundreds of gigabytes of real-time and historical ledger data. Ingested and analyzed a table with 300 billion rows of order book data.
Gathered public data using cloud infrastructure by leveraging our experience with Amazon Web Services and Microsoft Azure.
Completed regular SOC 2 Type I and Type II audits related to data security, availability, processing integrity, confidentiality, and privacy.

Featured Matter

Cornerstone Research’s Data Science Center assisted in investigating plaintiffs’ claims in an alleged market manipulation class action. With over 200 TB of high-frequency trading data provided, the Data Science Center determined the best protocol to access these large and complex datasets. The team also helped identify the relevant data subsets that would be used in the analyses.

To conduct these analyses, it was critical to understand the characteristics of these databases, how the different tables were connected to each other, and the evolution of the data structure over time. The Data Science Center assisted in identifying and collecting the relevant information, which allowed the team to understand these complex databases.

Under the direction of the testifying expert, the team helped develop complex code that relied on the understanding of these large databases in order to conduct multiple analyses.

Cookie	Duration	Description
AWSELB	session	Associated with Amazon Web Services and created by Elastic Load Balancing, AWSELB cookie is used to manage sticky sessions across production servers.
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
CookieLawInfoConsent	1 year	CookieYes sets this cookie to record the default button state of the corresponding category and the status of CCPA. It works only in coordination with the primary cookie.
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Cookie	Duration	Description
__cf_bm	30 minutes	This cookie, set by Cloudflare, is used to support Cloudflare Bot Management.
_cfuvid	session	The _cfuvid cookie is used to allow the Cloudflare WAF to distinguish individual users who share the same IP address. Visitors who do not provide the cookie are likely to be grouped together and may not be able to access the site if there are many other visitors from the same IP address.
cf_clearance	1 year	The cf_clearance cookie is used by Cloudflare to verify that visitors have successfully passed a security challenge and can access the website.
PBSECURESUSID	session	This cookie is set by the provider Podbean. This is a session cookie used to verify that the users are on secure sessions. It helps iin implementing audio files on the website.
wpEmojiSettingsSupports	session	WordPress sets this cookie when a user interacts with emojis on a WordPress site. It helps determine if the user's browser can display emojis properly.

Cookie	Duration	Description
_ga	2 years	The _ga cookie, installed by Google Analytics, calculates visitor, session and campaign data and also keeps track of site usage for the site's analytics report. The cookie stores information anonymously and assigns a randomly generated number to recognize unique visitors.
_ga_*	1 year 1 month 4 days	Google Analytics sets this cookie to store and count page views.
_gat_UA-*	1 minute	Google Analytics sets this cookie for user behaviour tracking.
_gat_UA-12672498-1	1 minute	A variation of the _gat cookie set by Google Analytics and Google Tag Manager to allow website owners to track visitor behaviour and measure site performance. The pattern element in the name contains the unique identity number of the account or website it relates to.
_gid	1 day	Installed by Google Analytics, _gid cookie stores information on how visitors use a website, while also creating an analytics report of the website's performance. Some of the data that are collected include the number of visitors, their source, and the pages they visit anonymously.
vuid	2 years	Vimeo-generated ID used for generating analytics information for the video owner.

Cookie	Duration	Description
_guid	90 days	linkedin.com - Used to identify a LinkedIn Member for advertising through Google ads - LinkedIn
AMCVS_14215E3D5995C57C0A495C55%40AdobeOrg	session	.linkedin.com - Indicates the start of a session for Adobe Experience Cloud - Adobe
AnalyticsSyncHistory	30 days	.linkedin.com - Used to store information about the time a sync took place with the lms_analytics cookie - LinkedIn
bcookie	1 year	.linkedin.com - Browser Identifier cookie used for diagnostic purposes. - LinkedIn
dfpfpt	2 years	.linkedin.com - Unique user identifier to prevent abuse in payment workflows for LinkedIn - LinkedIn
fptctx2	session	.linkedin.com - Used to prevent abuse in payment workflows for LinkedIn - Microsoft
gpv_pn	6 months	.linkedin.com - Used to retain and fetch previous page visited in Adobe Analytics - Adobe
lang	session	.linkedin.com - Used to remember a user's language setting to ensure LinkedIn.com displays in the language selected by the user in their settings. - LinkedIn
li_gp	1 year	.linkedin.com - Stores privacy preferences for guests to LinkedIn - LinkedIn
li_sugr	90 days	.linkedin.com - Used to make a probabilistic match of a user's identity - LinkedIn
liap	1 year	.linkedin.com - Used by non-www.domains to denote the logged in status of a member - LinkedIn
lidc	24 hours	.linkedin.com - To facilitate data center selection - LinkedIn
lms_ads	30 days	.linkedin.com - Used to identify LinkedIn Members off LinkedIn for advertising - LinkedIn
lms_analytics	30 days	.linkedin.com - Used to identify LinkedIn Members off LinkedIn for analytics - LinkedIn
s_cc	session	.linkedin.com - Used to determine if cookies are enabled for Adobe Analytics - Adobe
s_fid	180 days	.linkedin.com - Unique identifier for Adobe Analytics - Adobe
s_ips	session	.linkedin.com - Tracks percent of page viewed - Adobe
s_plt	session	.linkedin.com - Tracks the time that the previous page took to load - Adobe
s_ppv	session	.linkedin.com - Used by Adobe Analytics to retain and fetch what percentage of a page was viewed - Adobe
s_sq	session	.linkedin.com - Used to store information about the previous link that was clicked on by the user by Adobe Analytics - Adobe
s_tp	session	.linkedin.com - Tracks percent of page viewed - Adobe
s_tslv	6 months	.linkedin.com - Used to retain and fetch time since last visit in Adobe Analytics - Adobe
UserMatchHistory	30 days	linkedin.com - Used for id sync process. It stores the last sync time to avoid repeating the syncing process in a frequent manner - LinkedIn