Managing Opportunities and Risks in Generative AI Use for Clinical Research
Explore the core tensions between the potential of generative AI and the regulatory realities of clinical research.
Generative artificial intelligence (AI) has captured global attention for its transformative potential across industries, and nowhere is the promise greater — or more fraught — than in health care and clinical research. These domains are ripe for innovation. AI has applications from the simple to the complex; from automating mundane tasks and accelerating drug development, to increasing accuracy in documentation and uncovering complex correlations within vast datasets. But the stakes are uniquely high. Errors jeopardize patient safety, regulatory compliance and trust.
This blog explores the core tensions between generative AI’s potential and the regulatory realities of health care and clinical research. It outlines the current regulatory framework, the limitations in applying traditional guidance to generative AI, the risks of relying too heavily on “human in the loop” strategies, and a path forward that includes workforce education, smarter oversight tools, and rethinking what it means to “train” AI in this context.
Balancing generative AI opportunities and risks
Generative AI offers unparalleled opportunities for transforming various aspects of health care and clinical research. It automates repetitive tasks such as documentation, summarization and annotation — significantly improving efficiency. In clinical trials, generative AI accelerates trial designs through protocol optimization and enrollment forecasting, leading to faster and more effective studies. Enhanced pharmacovigilance is also possible through rapid literature review and signal detection, ensuring better drug safety monitoring. Additionally, advanced data mining capabilities enable the extraction of real-world evidence and patient stratification, providing deeper insights into patient populations and treatment outcomes.
However, the risks associated with generative AI are equally significant. One major concern is its tendency to hallucinate, confidently producing factually incorrect information. In health care, decisions influenced by AI outputs have life-altering consequences, making accuracy and reliability paramount. In regulated domains, every step must be documented, auditable and reproducible to meet stringent regulatory requirements. Current generative AI models often fall short of these expectations, posing challenges in gaining trust from regulators who demand a high bar for reliability and accountability.
Regulatory guidelines for traditional AI models aren’t aligned with generative models
Regulatory agencies such as the U.S. Food and Drug Administration (FDA), the European Medicines Agency (EMA) and the Medicines and Healthcare products Regulatory Agency (MHRA) have issued clear guidance on the use of AI/ML in clinical settings. The FDA, for example, issued an action plan, AI/ML in Software as a Medical Device, emphasizing transparency, real-world monitoring and risk management.
Key guidelines include:
- Data documentation: Provenance, quality and representativeness of training data must be documented.
- Training process transparency: How the model was trained, and the relevant inputs and parameters, must be reproducible.
- Validation and monitoring: Model performance must be evaluated using pre-defined metrics, with continuous monitoring in production.
These guidelines are designed for traditional, narrow AI — models trained for specific tasks using clearly defined datasets. However, generative AI doesn’t fit the mold.
Generative models (e.g., GPT, PaLM, LLaMA) represent a paradigm shift. They are pre-trained on massive datasets that are often proprietary and non-disclosed. Organizations fine-tune or prompt-engineer them but cannot retrain them from scratch. And the inner workings of large language models are often opaque.
This presents three core conflicts:
- Opaque provenance: The data used to train the model is not transparent.
- Non-determinism: The same prompt may yield different outputs.
- No clear validation metrics: Performance is difficult to benchmark in regulated, high-stakes domains.
Thus, current regulatory frameworks are misaligned with generative AI’s architecture.
“Human in the loop” is the preferred interim solution
The most common mitigation strategy is to keep a “human in the loop.” That means generative AI only creates drafts, and a qualified human reviews, edits and approves all outputs. This maintains regulatory compliance and patient safety. For example, generative AI can be used to draft clinical trial summaries which are then reviewed by a regulatory affairs team before submission. This approach enables efficiency in first-draft creation and allows for more focused human expertise on higher-value tasks.
It also introduces new challenges and limitations, including:
- Efficiency ceilings: The need for full human review means efficiency gains are capped.
- Cognitive drift: Over time, humans become desensitized to AI errors and may trust outputs without adequate scrutiny.
- Oversight fatigue: Humans are not ideally suited to maintain high attention over repetitive AI-generated content.
Relying solely on human oversight is not sustainable at scale.
Rethinking oversight in the AI era
To safely scale generative AI in health care, oversight must evolve to address the unique challenges and opportunities presented by these advanced technologies. Developers must create specialized oversight tools designed to highlight inconsistencies or hallucinations in AI outputs. These tools should be capable of flagging areas that require human verification, ensuring that critical decisions are not solely reliant on AI-generated information. Additionally, these oversight tools must compare generative AI outputs to validated reference data, providing a benchmark for accuracy and reliability.
The next step in evolving oversight is the introduction of adversarial AI agents. These agents are designed to challenge or validate outputs from the primary model, acting as a secondary layer of scrutiny. By employing adversarial AI, organizations identify potential errors or biases in the primary model’s outputs, enhancing the overall robustness and reliability of AI-driven decisions. For example, the Microsoft Azure Health Bot uses controlled models with built-in guardrails and auditability to support health care conversations, reducing risk.
Further, it is critical that AI tools enable active learning loops, where the AI system continuously learns from human feedback over time. This approach allows AI models to improve and adapt based on real-world interactions and expert insights, fostering a dynamic and responsive AI ecosystem. Active learning loops ensure that AI systems remain relevant and accurate, evolving in tandem with advancements in medical knowledge and practices.
Humans must shift from being mere editors of AI outputs to managers of AI ecosystems. This transition involves taking a more strategic role in overseeing AI deployment, ensuring that AI tools are used effectively and responsibly. Health care professionals, including data scientists and clinicians, must collaborate to set guidelines, monitor performance and provide feedback that drives continuous improvement.
By evolving oversight mechanisms, introducing adversarial AI agents and enabling active learning loops, health care organizations are able to safely scale generative AI. This approach not only mitigates risks but also leverages AI to enhance human judgment, accelerate scientific discovery and ultimately improve patient outcomes.
Raising digital literacy across the workforce
A generative AI-enabled future demands an AI-literate workforce, equipped with the knowledge and skills to effectively leverage these advanced technologies. Training must be ongoing to keep pace with the rapid advancements in AI capabilities and applications. Users should focus on having a thorough understanding of how generative AI works, including its underlying algorithms, data requirements and the contexts in which it performs best. Equally important is recognizing where generative AI fails, such as its propensity for generating inaccurate or misleading information, known as hallucinations.
Best practices for prompt engineering are crucial, as the quality of AI outputs heavily depends on the inputs provided. Training should cover techniques for crafting effective prompts, troubleshooting poor responses, and iterating on prompts to refine results. Additionally, users must learn how to interpret AI outputs critically, distinguishing between reliable information and potential errors or biases introduced by the AI.
To stay aware of risks, users must also be educated on cognitive biases, such as over-reliance on automation. This involves understanding the limitations of AI and maintaining a healthy skepticism towards its outputs. Users should be trained to cross-verify AI-generated information with trusted sources and to use AI as a tool to augment human decision-making rather than replace it entirely. By fostering an AI-literate workforce, organizations maximize the benefits of generative AI while mitigating its risks, ensuring that AI-driven innovations are both effective and responsible.
A new paradigm for AI training and governance
Generative AI users must adopt a dynamic, life cycle-based approach to governance to ensure the responsible and effective deployment of AI technologies. This approach recognizes that generative AI models, like human professionals, require continuous assessment and improvement. Just as human professionals are assessed annually, generative AI models should be assessed and reevaluated on an ongoing and consistent cadence, ensuring they remain accurate, relevant and aligned with organizational goals and regulatory requirements.
Maintaining version control is critical in this process. Tracking changes to models and prompts as meticulously as software code enables transparency and accountability. This practice allows for the identification of specific versions used in decision-making processes, facilitating audits and compliance with regulatory standards. It also helps in understanding the evolution of AI models and the impact of updates or modifications.
Users must treat AI agents like team members, assigning them specific roles and acknowledging their limitations. This perspective fosters a culture of accountability, where AI outputs are scrutinized and validated just as human contributions would be. It is essential to establish clear guidelines for the use of AI, including protocols for handling errors, biases and unexpected behaviors.
Cross-functional review boards are necessary to oversee AI use, blending clinical, technical and regulatory expertise. These boards should include stakeholders from various disciplines to provide a holistic view of AI deployment. Clinical experts will ensure that AI applications align with medical standards and patient safety, while technical experts address the intricacies of AI algorithms and data integrity. Regulatory experts ensure compliance with legal and ethical standards, providing a framework for responsible AI use.
Regular training and updates for all stakeholders involved in AI governance are essential to keep pace with technological advancements and evolving regulatory landscapes. By adopting a dynamic, life cycle-based approach to AI governance, organizations harness the full potential of AI while mitigating risks, ensuring that AI-driven innovations are both effective and responsible.
Responsibly embracing generative AI in health care and clinical research
Generative AI presents us with a once-in-a-generation opportunity to transform health care and clinical research — but only if deployed responsibly. Going forward, regulators must evolve guidance to account for pre-trained models, and health care organizations must rethink oversight and governance. All health care professionals, from data scientists to clinicians, must increase AI literacy. Technology leaders must build tools that support meaningful human-AI collaboration, not just automation.
The path forward is not just about mitigating risk. It’s about building a future where generative AI enhances human judgment, accelerates science and ultimately improves patient outcomes.
Leverage our data solutions to accelerate your trials
At the PPD™ clinical research business of Thermo Fisher Scientific, our digital solutions are where innovation meets excellence. We are redesigning the entire clinical research model to be powered by purpose-built, CRO-owned technology incorporated into every study. As your dedicated partner, we bring forward-thinking, AI-enabled solutions that accelerate and optimize every phase of clinical development. When sponsors choose us, they benefit from faster study startup, smarter site selection, cleaner data, increased transparency and greater confidence across the board.
Accelerate your clinical research with our drug development digital solutions from Thermo Fisher Scientific.
Recommended for you