Collections
Thursday, March 27, 2025

March 2025 Updates - Over 400K new Opioid and Juul Labs Documents Added

Collection Updates


Opioid Industry Documents Archive

Teva and Allergan Documents

OIDA staff added over 220,000 documents to the Teva and Allergan Documents. This batch brings the collection to more than 1.7 million documents and includes training materials, marketing communications, and more.

The Teva and Allergan collection will encompass about 1.9 million documents when complete. Processed documents are being made public on a rolling basis with monthly releases expected through 2025.

Truth Tobacco Industry Documents

Multistate Juul Documents Project

This month, IDL staff added over 200,000 documents produced by the Settling States in the multistate litigation against Juul Labs. When complete, the Juul Labs collection will contain approximately 7 million documents. IDL is working through the files as quickly as possible and will post new documents every month.

North Carolina Juul Labs Collection

18,000 new files have been added to the Juul Labs Collection under the State of North Carolina sub-collection. These materials represent some of the final batches to be processed from the North Carolina settlement agreement.

Explore the new NC Juul Labs Research Guide!

screenshot of the UNC Juul Labs Research Guide

UNC Libraries has created a comprehensive research guide to help you navigate the Juul Labs documents from the North Carolina settlement. This guide provides insight into the origins of the Juul Collection, covering the history of Juul Labs, the NC litigation, and key themes found in the documents. Whether you're researching industry practices, public health impacts, or legal actions, this resource is full of information to get you started.




Education and Research Updates



AI Research Assistant for Fossil Fuel Industry Documents

A team at the Climate Litigation Lab at the University of Oxford Sustainable Law Programme has built a free AI research assistant using our fossil fuel industry documents! The Climate Accountability Research Assistant (CLARA) uses large language models and the latest techniques in information retrieval to interrogate large collections of litigation-relevant historical documents, opening new possibilities for research.

Read more about it in this LinkedIn post.
Screenshot for homepage of CLARA


IDL is now on Bluesky!

Please follow us @‌ucsf-industrydocs.bsky.social for updates about new documents and events. We also continue to post to LinkedIn, Mastodon, and Twitter/X to stay connected with our research community wherever they are.


New Website Coming Soon

We’re busy updating our IDL website, which will launch later this year with a more modern design, faster performance, and better compatibility with mobile devices. As part of this work, we’re collecting feedback from people who use our website.

If you’re interested in taking part in this user research, please email us at industrydocuments@ucsf.edu. Your feedback will be invaluable in helping us make our website easy to navigate for all our users.

Friday, February 07, 2025

January 2025 Updates - Tobacco, Opioid and Chemical Industry Documents

Collection Updates


Opioid Industry Documents Archive
Teva and Allergan Documents

OIDA staff added 226,880 documents to its newest collection, the Teva and Allergan Documents. This batch brings the collection to more than 1.3 million documents and includes sales training presentations, marketing communications, and more.
The Teva and Allergan collection will encompass about 1.9 million documents when complete. Processed documents are being made public on a rolling basis with monthly releases expected through 2025.


Truth Tobacco Industry Documents
JUUL Labs Collection

2,800+ new documents were posted to the Juul Labs Collection today! In partnership with the University of North Carolina at Chapel Hill Libraries, the IDL has processed and made available documents subject to public disclosure under Juul Labs’s 2021 settlement with North Carolina. The IDL is pleased to announce that we have neared completion for the processing of these documents! The project began in December 2023, from which point our archivists have been working to release an average of 240,000 documents every month to our public website. With the onset of 2025, the IDL team has amassed a significantly smaller release of records this January, consisting of documents that required more time-consuming and complicated PII redactions, or some technical challenges that we saved for the end. However, this small release does indicate the majority of the North Carolina Juul Labs documents are now fully available online to our researcher communities.

In the coming months, the IDL archiving team will work through what is left in the NC Juul documents – all files that were originally large ZIP files, the structure of which has been disrupted, and the contents came to the IDL separated as individual records. We have observed that these small files, unfortunately, do not offer much value without the greater context of the original ZIP, and we will work towards reconciling that original structure and release the files accordingly.


New California JUUL Documents Coming Soon
Although we have neared the end of the North Carolina Juul documents, the IDL will soon release additional documents from the California Juul multistate settlement, which was negotiated by the California Department of Justice and six other states in 2023. These forthcoming releases will not be duplicates of the approximately 3 million Juul Labs records already in the IDL but rather are new additions that will further enrich the Juul Labs Collection. Our first release of the new California Juul documents will be coming next month.


Depositions and Trial Transcripts (DATTA)
57 new transcripts of tobacco trial testimony and depositions by Robert Proctor.


Chemical Industry Documents Archive: The Forever Pollution Project Collection

In February 2023, five European countries proposed a PFAS "universal restriction" under the EU chemical regulation REACH (Registration, Evaluation, Authorization and Restriction of Chemicals). The ban would include the entire PFAS chemical 'universe', with some derogations until alternatives are developed. In response, hundreds of industry players have been lobbying decision-makers across Europe to undermine and perhaps kill the proposal.

Over the course of a year, a team of 46 journalists in 16 countries investigated the lobbying and disinformation campaign by the PFAS industry and its allies. This cross-border, interdisciplinary investigation known as the Forever Lobbying Project collected over 14,000 unpublished documents on PFAS, constituting the world’s largest collection to date on the topic. The majority originate from 184 freedom of information requests, 66 of which were shared with the group by the EU lobby watchdog, Corporate Europe Observatory.

This unique trove of documents was donated by the Forever Lobbying Project and is now available to the public in our new Forever Pollution Project Collection.


Purdue/Sackler settlement under consideration includes document disclosure requirement:

The proposed $7.4 billion settlement with members of the Sackler family and their company, Purdue Pharma (Purdue), includes a provision for document disclosure, which would require Purdue to make public more than 30 million documents related to Purdue and the Sacklers’ opioid business.

According to the Office of the Massachusetts Attorney General, if the settlement is approved, the documents are “expected to be added to the existing public document repository” (UCSF-JHU Opioid Industry Documents Archive) that already houses millions of documents from multiple industries responsible for the crisis.

UCSF and Johns Hopkins University are pleased that these vitally significant documents are one step closer to being made public. The Opioid Industry Documents Archive provides evidence on how and why this crisis happened, so that this type of tragedy can be prevented from occurring again.

We look forward to having the opportunity to contribute our expertise in public health, digital archives, and information technology to enable timely and free public access to these important documents.


Education & Research Updates

Center to End Corporate Harm Launches at UCSF

We are very excited to announce the new UCSF Center to End Corporate Harm!

Products, including fossil fuels, chemicals, alcohol, tobacco and ultra-processed foods are now responsible for approximately one in three deaths worldwide. In the US, a rise in chronic diseases, including cancer (175%), diabetes (283%), Parkinson’s (133%), and dementias (75%), have led to what the scientists say is an “industrial epidemic” of disease.

The Center to End Corporate Harm brings together scientists, researchers, and physicians who study various health-harming industries and, in collaboration with the UCSF Industry Documents Library, are working to identify, analyze, and prevent industry-driven disease and develop strategies to counter the destructive influence of polluters and poisoners.


Could You Be the 2025 UCSF Library Artist in Residence?

The UCSF Library Archives and Special Collections and Makers Lab are accepting proposals for the sixth annual UCSF Library Artist in Residence program. The UCSF Library Artist in Residence award, valued at $8,000, will be given annually to one candidate with a degree in studio arts or a related field or a history of exhibiting artistic work in professional venues. The 2025 residency will begin on July 1, 2025 and end on June 30, 2026.
For more information and application process, please visit the UCSF Library site


UC Love Data Week

The UC Love Data Week is a week-long offering of presentations and workshops focused on data access, management, security, sharing, and preservation. All members of the University of California community are welcome to attend.

The IDL will be featured in the Friday, February 14th session at 3pm: Unlocking image, audio, and video data in the Industry Documents Library: a Python based, open source stack for audio transcription, text extraction, sentiment analysis, and topic classification

Thursday, December 19, 2024

Industry Documents Library - 2024 in Review

Season’s Greetings from the UCSF Industry Documents Library!

As 2024 comes to a close, we’d like to share our gratitude for all of you in the IDL community and your ongoing support and connection to our work.
Here are some of the achievements you helped us reach in 2024:

22,459,816 documents now available through IDL!

  • In collaboration with Johns Hopkins University, we continued to acquire and make public millions of documents disclosed in opioid litigation through the UCSF-JHU Opioid Industry Documents Archive (OIDA), including a major new collection of Teva and Allergan materials. There are now over 4 million opioid industry documents available!
  • We launched the Juul Labs Collection in partnership with the University Libraries at the University of North Carolina at Chapel Hill. We’ve added close to 3 million documents to the collection this year and it will continue to expand with additional Juul Labs documents in 2025.
  • We welcomed Emma James and Julie Hillpot to the IDL Team: Emma is our project archivist for the Juul Labs Collection, and Julie is supporting our data annotation and quality control workflows for opioid industry documents.
  • We delivered multiple webinars, workshops, and presentations, including the annual Tobacco and Other Industry Documents Workshop in partnership with the UCSF Center for Tobacco Control Research and Education.
  • We continued to make significant progress on redesigning and rebuilding the IDL website to add new features and make it easier to search. Stay tuned for more news about this next year!
  • We continued our Student Data Science Summer Fellowship in collaboration with the UCSF Library Archives & Special Collections and the Data Science and Open Scholarship team.
  • We added 33 new publications which cite industry documents to our Bibliography, bringing the total number of citations to 1,209!

If you’re able, please consider making a tax-deductible donation to the Industry Documents Library to help us preserve and provide access to the collections for years to come.

From all of us at the IDL, we wish you a peaceful holiday season, and a healthy and hopeful New Year ahead.

Kate, Rachel, Rebecca, Sven, Melissa, J.A., Emma, and Julie

Thursday, December 19, 2024

New Teva and Juul Labs Documents Posted!

Opioid Industry Documents Archive
Teva and Allergan Documents

OIDA staff added 235,705 documents to its newest collection, the Teva and Allergan Documents. This batch brings the collection to over 1 million documents and includes sales training presentations, marketing communications, and more.

The Teva and Allergan collection will encompass about 1.9 million documents when complete. Processed documents are being made public on a rolling basis with monthly releases expected through 2025.

Truth Tobacco Industry Documents
Juul Labs Collection

117,000+ new documents were posted to the Juul Labs Collection today. This brings the collection to over 2.9 million documents and includes social media reports, marketing campaigns, product complaint logs, product design materials, and more.

In partnership with the University of North Carolina at Chapel Hill Libraries, the IDL continues to process and make available documents subject to public disclosure under JUUL Labs’s 2021 settlement with North Carolina.

Thursday, November 21, 2024

November 2024 Updates - New Opioid and JUUL Documents

Opioid Industry Documents Archive - Teva and Allergan Documents


OIDA staff added 259,000+ documents to its newest collection, the Teva and Allergan Documents. This batch brings the collection to more than 848,000 documents and includes sales training presentations, marketing communications, and more.

The Teva and Allergan collection will encompass about 1.9 million documents when complete. Processed documents are being made public on a rolling basis with monthly releases expected from 2024-2026.


Announcing the OIDA Data Products

Explore our newest resource, OIDA Data Products — tools that can facilitate and inspire research.

We created these datasets to provide access points for data analysis of Opioid Industry Documents. Researchers get a running start on exploring data, benefiting from our work to curate and deduplicate documents, provide a glossary of spreadsheet column names, and more. Users can craft queries online or select a subset of the data for download, allowing them to interact with OIDA data before dedicating time and resources to a full analysis.

“OIDA Data Products reduces some of the barriers to working with OIDA data, helping researchers get a sense of the many gems hidden among OIDA’s millions of documents,” said Kevin Hawkins, OIDA program director for Johns Hopkins University. “Working with data wranglers, statisticians, and developers, we hope these data products will facilitate new research, helping us to better understand the opioid crisis.”

To learn more and access OIDA Data Products, visit https://data.oida-resources.jhu.edu/.


Tobacco Industry Documents Archive - Juul Labs Collection

151,000+ new documents were posted to the Juul Labs Collection today!
This new batch of documents includes social media presence reports, marketing campaigns, focus group findings, product design, and more.

In partnership with the University of North Carolina at Chapel Hill Libraries, the IDL continues to process and make available documents subject to public disclosure under JUUL Labs’s 2021 settlement with North Carolina.


An update regarding our audio-visual files:

The IDL partners with the excellent Internet Archive to host the audio and video files found in our industry documents archives.

As you may be aware, the Internet Archive has recently faced a series of cyberattacks, prompting them to enhance security measures, strengthen firewalls, and update software. Unfortunately, these challenges have temporarily prevented the upload of multimedia items, impacting our last two document releases (October and November 2024).

We are closely monitoring the situation and maintaining communication with the Internet Archive team. Once uploading can resume, we will begin posting the audio and video files from our latest collection additions.


New Papers and Publications


Thursday, October 31, 2024

October 2024 Updates - New Opioid and JUUL Documents

Collection Updates


Opioid Industry Documents Archive - Teva and Allergan Documents

OIDA staff added 218,267 documents to its newest collection, the Teva and Allergan Documents. This batch brings the collection to more than 588,000 documents and includes sales training presentations, interviews with prescribers, reports on focus groups, product communications, and more.

The Teva and Allergan collection will encompass about 1.9 million documents when complete. Processed documents are being made public on a rolling basis with monthly releases expected from 2024-2026.


Announcing the OIDA Image Collection and How You Can Help!

Screenshot of OIDA Image Collection browse view

We are proud to introduce the OIDA Image Collection, a website created to highlight images within the OIDA documents. Images provide unique entry points to understand a visual narrative of the opioid industry and gain insight into harmful corporate and marketing practices that contributed to the opioid crisis. Researchers can browse, limit their results by filters, and search by keyword. By viewing the source documents, you can see the images in their original context.

The OIDA team used artificial intelligence (AI) to write captions for highlighted images within the OIDA Image Collection but we could use your help! We have generated captions using two different AI models and need to decide which AI-generated caption is better for use in the OIDA Image Collection. Thanks to support from Hugging Face, a platform for collaborating on models and datasets for machine learning, and its Argilla data annotation tool, we have created a handy interface for voting on the quality of image captions. To help us out, you’ll just need to create a free Hugging Face account.

Your image labeling efforts will contribute to an open preference dataset, crucial for "steering" AI models towards generating more useful outputs in specific domains. Please email opioidarchive@jh.edu with any questions.


Tobacco Industry Documents Archive - Juul Labs Collection

117K new documents were posted to the Juul Labs Collection today! This new batch of documents includes social media presence reports, marketing campaigns, focus group findings, product design, and more.

In partnership with the University of North Carolina at Chapel Hill Libraries, the IDL continues to process and make available documents subject to public disclosure under JUUL Labs’s 2021 settlement with North Carolina.

2019 SRNT slide deck.

Education & Research Updates

World Digital Preservation Day – 7 November 2024


Each year, the Digital Preservation Coalition promotes World Digital Preservation Day, which falls on the first Thursday of November.

In line with this year’s theme of 'Preserving Our Digital Content: Celebrating Communities,' the UC Libraries’ Digital Preservation Working Group (DPWG) is hosting a community-building Open House event which presents an opportunity for everyone to learn more about digital preservation while also sharing their own stories and experiences in this space.

Please join the event online on November 7, 2024 from 11AM – 12PM where you’ll hear digital preservation stories from us at the UCSF Industry Documents Library as well as the UC & Jepson Herbaria.

Register via Zoom.

Annual Tobacco and Other Industry Documents Workshop: Recording Now Available!


IDL and the UCSF Center for Tobacco Research and Education (CTCRE) held the "Annual Tobacco and Other Industry Documents Workshop" virtually on October 8th from 9 am-12:15 pm PT.

If you didn't have a chance to join us, the event recording is now available


New Papers and Publications


Monday, October 21, 2024

2024 Undergraduate Summer Fellow: Gordon Lichtstein

The UCSF Industry Documents Library is pleased to highlight the work of 2024 Summer Fellow Gordon Lichtstein. Gordon is an incoming MIT student with an interest in the intersection of computer science and linguistics in NLP and the application of NLP for the betterment of humanity such as in environmental sustainability or the digital humanities.

Over the course of the 8-week internship, Gordon crafted and completed four distinct projects that leverage natural language processing and data science within the context of our JUUL Labs Collection and the broader IDL. Project One investigates the optical character recognition (OCR) accuracy of low-quality and handwritten documents in the absence of ground truth data. Project Two explores the implementation of embedding search algorithms and visualizations aimed at enhancing the relevance of document recommendations for users. Project Three employs txt-ferret to conduct a thorough scan of a substantial corpus of industry documents to identify sensitive information, including credit card numbers. Finally, Project Four assesses the biases present in large language model (LLM) summarization through the lens of sentiment analysis.

Read Gordon's entire report and reflection via eScholarship.

The IDL staff is deeply appreciative of Gordon's thoughtful and comprehensive contributions, as well as his engagement in team meetings and Amazon Web Services workshops. His projects and use of NLP techniques with our document corpus have greatly enriched our understanding.

PREV
NEXT