Core Conclusions
AI drug discovery has moved from "research showcase" into a "tiered commercialization" phase. The first revenue to materialize comes not from "how many new drugs AI invented" but from the software, data, simulation, clinical and compliance platforms built around pharma R&D workflows: Schrödinger's computational chemistry software, Certara's biosimulation, Veeva's life-science cloud, IQVIA's clinical and data network, and Tempus's clinical-molecular data platform have all already generated verifiable revenue. The FDA has also disclosed that it received 500+ regulatory submissions containing AI components between 2016 and 2023, spanning nonclinical, clinical, post-market and manufacturing stages.
The links with the clearest real commercial value fall into five categories: First, computational chemistry / structure and molecular design software; second, biosimulation and PK/PD modeling; third, clinical trial design, patient recruitment, and RWD/RWE analytics; fourth, life-science R&D data platforms, ELN/LIMS, knowledge graphs and document-compliance automation; fifth, lab automation and closed-loop optimization. By contrast, assets with "only a foundation model, no proprietary data and no wet-lab loop" are far easier to commoditize.
AI can indeed shorten early R&D cycles and reduce wasted experiments, but a lift to overall clinical success rates has not yet been fully proven by large-sample industry data. Public cases led by Insilico show that target identification to preclinical candidate nomination can be compressed to roughly 18 months, with screening completed through about 80 molecules; Iambic's published early-model results claim a meaningful improvement in early developability prediction accuracy. What is still genuinely missing: whether these early efficiency gains can be reliably transmitted to Phase II/III and to approval success.
The profit pool sits mainly with the "pick-and-shovel" vendors and "workflow platforms" in the near term, including software, data, simulation, clinical and R&D-IT infrastructure; the largest mid-to-long-term upside may go to AI-native biotechs that command platform + pipeline + closed-loop experimentation capabilities. Put differently, what is most visible today is ARR, subscription fees, service fees and partnership upfronts; the most nonlinear future value lies in milestones, royalties and proprietary-pipeline NPV.
"A large headline deal value" does not equal "high revenue quality." For example, Recursion's total potential milestones with Sanofi can reach roughly $5.2 billion, and its collaborations with Roche/Genentech also include large milestone and royalty terms; but what has actually landed is the early upfront plus triggered milestones, not the headline deal value. To judge deal quality, look at: the upfront ratio, the density of stage milestones, who bears clinical costs, the retained royalty level, and whether retained proprietary equity is allowed.
The companies that truly sit at the platform core are not the "most accurate model companies" but those that simultaneously command proprietary data, an experimental loop, pharma-workflow interfaces, customer relationships, and regulatory context. Representative examples include: Recursion (high-dimensional phenotyping + CRISPR phenomap), Tempus (clinical-molecular data network), IQVIA (global RWD and CRO network), Veeva (life-science industry cloud), Certara (biosimulation with extremely high regulatory acceptance), and Benchling/Dotmatics (R&D data / experiment-record infrastructure).
Among AI-native biotechs, the ones most worth studying are not those "best at telling a model story" but those that meet four conditions at once: repeat collaborations with pharma; proprietary or partnered programs in the clinic; data assets that can be continuously reused; and the organizational capability to turn model outputs into experimental/clinical decisions. The companies that currently fit this screen most closely include Isomorphic Labs, Insilico Medicine, Iambic, Generate:Biomedicines, and Recursion.
The excess profit pool of "protein structure prediction" itself is not large; the real profit pool sits downstream. AlphaFold and its ecosystem have dramatically reduced the scarcity of structural information; RoseTTAFold All-Atom has also advanced complex modeling and design. As a result, companies that do pure "structure prediction" have a weakening moat; more valuable are companies that integrate structure, sequence, function, manufacturability and wet-lab feedback into a production system.
AI protein design, antibody design and molecule generation are among the corners most prone to "bubbles", because demos are easy while real pharmaceutical developability, CMC, toxicology and clinical translation are hard; by contrast, R&D data platforms, biosimulation, clinical-trial AI, compliance-document automation and R&D operating systems are more likely to compound over the long run.
Among public companies, the AI beneficiaries with the strongest revenue validation are mostly not pure AI drug-discovery companies. From the angle of "revenue already landed, customers already validated, gross margin visible," Veeva, IQVIA, Certara, Schrödinger, Tempus, Thermo Fisher, Danaher, 10x Genomics and Illumina are more trackable than most preclinical AI-native biotechs.
The classic risk profile of "strong AI narrative, weak commercialization" includes: revenue extremely dependent on one-off deal recognition; undisclosed platform usage frequency and repeat-purchase rates; no recurring large-pharma customers; no proprietary/partnered programs in the clinic; cash burn outpacing deal realization; valuation that pre-spends clinical success rates. For such companies, investors most need to guard against "the milestone illusion" and "the model-capability illusion." This risk is especially pronounced in some early AI-native biotechs, automated labs, and "general-purpose bio foundation model" companies.
M&A is already validating the value of the "life-science R&D software layer," not just the "model layer." Siemens acquired Dotmatics at an enterprise value of $5.1 billion, explicitly to expand its AI-driven life-science R&D software footprint; public reporting also indicates that Roche intends to acquire PathAI for up to $1.05 billion, which, if completed, would further validate the strategic value of AI pathology / clinical R&D software assets.
The biggest catalysts over the next 12–24 months are not papers but three types of events: the continuity of AI-discovered molecules in Phase II readouts; large-pharma collaborations upgrading from "exploratory partnerships" to "transfer of development and commercialization rights"; and larger M&A among life-science R&D-IT and automation platforms. Representative trackable events today include: Insilico's clinical progress, AbCellera's ABCL635/575 readouts, Generate's GB-0895 Phase III progression, Isomorphic's first entry into human studies, and more deals resembling Dotmatics/PathAI.
The biggest long-term risk is not that models are too small but that biology is too complex, that the translation step carries too much noise, and that technical barriers leak out once foundation models are rapidly open-sourced. The FDA/EMA are currently both emphasizing risk-based approaches, interpretability, data governance and applicability boundaries—which essentially says that regulators accept AI but do not accept "a black box replacing validation."
The point most easily misunderstood by the market is: "AI drug discovery" is not a single track but a composite of software-platform value, data-asset value, experimental-loop value, clinical-validation value, and drug-pipeline value; companies that nominally all go by "AI biotech" can have completely different economics.
Industry-Chain Landscape and Profit-Pool Judgment
Validation gradient precedes valuation. In this theme, what matters most is not "the AI is strong" but which validation tier a company sits at: Scientific breakthrough (e.g., AlphaFold 3, RoseTTAFold All-Atom) → platform partnership (e.g., Isomorphic, Recursion, Generate, Insilico signing with large pharma) → revenue landing (e.g., Schrödinger, Certara, Veeva, IQVIA, Tempus, BenchSci) → clinical validation (e.g., Insilico, AbCellera, Generate, Relay) → drug approval (among the samples covered here, there is not yet a clear case of a pure AI-native biotech using "AI discovery" as its core narrative being formally approved for market by a mainstream regulator; this point still needs ongoing validation).
The core judgment on profit-pool distribution is: the most profitable layer in the near term is "digitalizing and simulating the R&D workflow"; the mid term favors "platform + service + partnership"; the long term favors "platform + pipeline + equity." The pure-model layer is the easiest to be diluted by open source and cloud vendors; the pure-CRO-labor layer is the easiest to face price pressure from efficiency gains; the true long-term high barrier lies in proprietary data × lab automation × pharma-workflow embedding × clinical/regulatory usability. This is exactly why Siemens acquired Dotmatics and pharma buys Veeva/Benchling/Certara/IQVIA, rather than only buying a general-purpose large model.
Industry-Chain Position Sub-Segment Core Products/Capabilities AI Demand Driver Revenue Model Main Customers Data Barrier Experiment Barrier Regulatory Barrier Margin Profile Representative Companies Public/Private Benefit Strength Investment Upside Key Sources Data layer Genomic data Sequencing, variant calling, sample databases Target discovery, patient stratification, companion diagnostics Instruments + consumables + services Pharma, hospitals, research High Low–Mid Mid Stable, consumables-driven Illumina, PacBio, Oxford Nanopore, BGI Mixed public/private High Mid Data layer Proteomics/MS data Proteomics, mass spectrometry Mechanism and biomarker identification Instruments + consumables + software Pharma, CRO, research Mid Mid Low–Mid Mid-to-high margin for tools Thermo Fisher, Bruker, Waters Public Mid Mid Data layer Single-cell/spatial omics Single-cell, spatial transcriptomics Cell-state modeling, target validation Instruments + reagents + analytics software Pharma, academia, biotech High Mid Low Volatile early, high value long-term 10x Genomics, Bruker Spatial Public High High Data layer Clinical data EHR, molecular-clinical longitudinal data Enrollment, stratification, RWE, endpoint optimization Diagnostics + data subscription + pharma services Pharma, hospitals Very high Low High Margin lift after platformization Tempus, IQVIA Public Very high High Data layer Real-world data Patient records, claims, registry External controls, drug safety, post-market studies Data licensing + analytics services Pharma, regulatory support Very high Low Very high High stickiness IQVIA, Owkin Public/private Very high Mid-high Data layer Research data platform ELN/LIMS/knowledge graph/data lake Turn scattered experiment data into trainable assets SaaS subscription Pharma, biotech, CRO High Mid Mid High SaaS margin Benchling, Dotmatics, Veeva Private/acquired/public Very high High Model layer Bio foundation models Protein/cell/multimodal foundation model Design, prediction, Q&A, co-pilot API/platform/partnership Pharma, AI-biotech Limited if open-sourced Low Low Unstable for pure model Isomorphic, EvolutionaryScale, Profluent Private Mid High Model layer Protein structure models Structure and complex prediction Faster localization of binding sites and design space Research partnership / platformization Pharma, research Mid Low Low Easily commoditized AlphaFold ecosystem, RosettaFold ecosystem Platform/open source Mid Low–Mid Model layer Molecule generation models Generation, optimization, developability prediction Improve hit-to-lead, lead-optimization efficiency Software + partnership + pipeline Pharma, AI-biotech Mid Low Low Weak without a loop Insilico, Iambic, Recursion Public/private High Very high Discovery layer AI target discovery Multi-omics, disease networks, causal modeling Improve target hit rate, shorten validation time Collaborative R&D + milestones Large pharma High Mid-high Mid Volatile early revenue Recursion, BenchSci, Xaira Public/private High High Discovery layer Small-molecule design platform Physics simulation + generative design Improve chemistry efficiency and candidate quality Software license + partnership + proprietary Pharma, biotech Mid Mid Mid High software margin; heavy pipeline losses Schrödinger, Iambic, Insilico Public/private Very high Very high Discovery layer Antibody/protein design platform Antibody discovery, sequence design, manufacturability optimization Biologics optimization, bispecifics/ADC/TCE Collaborative R&D + milestones + royalties Large pharma High Very high Mid Large equity value if successful AbCellera, Generate, BigHat, Cradle Public/private Very high Very high Experiment layer Automated labs Robotics, workstations, workflow orchestration Reduce manual labor, support active-learning loops Equipment + software + services Pharma, CRO, research institutes Mid Very high Mid Asset-heavy early, improving later Opentrons, Strateos, ECL, Thermo Fisher Private/public High Mid-high Experiment layer High-throughput screening HTS, phenotypic screening Build training data and hit discovery Service fees + platform partnership Pharma, biotech Mid Very high Low Average service margin Charles River, WuXi, Recursion Public Mid-high Mid Experiment layer Preclinical validation ADMET, toxicology, in-vitro models Reduce early-attrition errors Software + services Pharma, biotech Mid Mid-high Mid Good margin for Certara-type; mid for traditional outsourcing Certara, Charles River Public High Mid Clinical layer Clinical trial design Protocol design, site selection, risk monitoring Reduce enrollment failure and execution delays SaaS + service fees Large pharma, CRO High Low Very high High stickiness IQVIA, Veeva, Tempus, Tigermed Public Very high Mid-high Clinical layer AI patient recruitment EHR screening, trial matching Shorten enrollment Service fees / platform fees Pharma, CRO, hospitals Very high Low Very high High value but needs network effects Tempus, IQVIA Public Very high Mid-high Clinical layer Drug safety and medical writing PV, regulatory writing, submission automation Reduce compliance and labor costs SaaS + services Pharma, CRO Mid Low Very high High stickiness Veeva, Certara, Tigermed Public Mid-high Mid Platform layer Pharma R&D platform Industry cloud, R&D copilot, literature and patent mining Improve organization-level R&D efficiency SaaS subscription Large pharma High Low High Strong SaaS compounding Veeva, Benchling, Dotmatics Public/private/acquired Very high High Platform layer CRO/R&D outsourcing AI-assisted project delivery Increase capacity and margin FTE/project-based Pharma, biotech Mid Mid Mid-high Easily price-pressured IQVIA, WuXi, Tigermed, Charles River Public Mid Mid Platform layer AI-native biotech End-to-end platform + proprietary pipeline Capture dual platform and drug-equity value Partnership + milestones + royalties + pipeline Pharma / end patients High Very high Very high Most volatile Isomorphic, Insilico, Recursion, Generate, Xaira Public/private Very high Very high Infrastructure layer Cloud and compute Training/inference/data pipelines Large models, biosimulation, automation control Cloud services / API Pharma, AI-biotech Low Low Low Scale profit NVIDIA, Google/DeepMind ecosystem, cloud vendors Public Mid Mid Regulatory layer Regulatory and compliance GxP, model governance, review communication Ensure AI acceptability Consulting / software / validation services Pharma, regulatory-filing teams Mid Low Very high High value-add FDA/EMA-guidance-driven service ecosystem Institutions/software vendors Very high Mid Simplified judgment: Highest near-term certainty belongs to "workflow-layer" companies like Veeva, IQVIA, Certara, Schrödinger, Tempus, Dotmatics/Benchling; largest mid-term optionality belongs to Isomorphic, Insilico, Generate, Iambic, Recursion, AbCellera; indirect beneficiaries that look more like pick-and-shovel vendors are Thermo Fisher, Danaher, 10x Genomics, Illumina, Opentrons; most exposed to disruption are the traditional discovery-outsourcing and manual processes with low technology density, low data content and low automation penetration.
Business Models, Technology Roadmaps and Valuation Frameworks
How AI drug-discovery companies make money. The revenue models the industry has already validated can be grouped into six categories: software subscription / SaaS (Schrödinger, Veeva, Benchling, Dotmatics, Certara); platform licensing and data-access fees (Tempus, IQVIA, and parts of the BenchSci/Tempus/Recursion model); collaborative R&D revenue (Recursion, Generate, Insilico, AbCellera); milestones and royalties (Recursion, AbCellera, Insilico, Generate, Isomorphic); proprietary-pipeline value (Insilico, Generate, AbCellera, Recursion, Relay); M&A exit value (Dotmatics acquired by Siemens; PathAI reportedly acquired by Roche for up to $1.05 billion).
Which model is better. Looking only at revenue quality, the ranking is usually: subscription software > deeply embedded platform fees > repeatable analytics services > upfront-based partnerships > milestones/royalties > single proprietary-pipeline valuation. But if you look at upside, the order almost reverses. The optimal mix is usually neither pure software nor pure pipeline, but software/platform cash flow + some high-quality pipeline equity. Companies like Schrödinger, Certara, Tempus and Veeva are more readily accepted by the secondary market precisely because at least part of their revenue is predictable; pure AI-native biotechs, by contrast, must trade higher volatility for higher upside.
The advantages of the platform + pipeline hybrid model are: first, it lets pharma pay for the platform, easing cash burn; second, it lets platform output stay on the company's own books as a proprietary pipeline, capturing nonlinear returns; third, it more quickly validates whether the platform is repeatable. The risks are: high organizational complexity, heavy capex, volatile revenue recognition, and the tendency for a platform company to gradually "degrade" into an ordinary biotech. Recursion, AbCellera, Generate and Insilico are all going through this tension.
How to judge collaboration-agreement quality. A high-quality agreement usually features: a higher upfront, clearer near-term technical/development milestones, retained royalties or co-development equity, the large pharma bearing subsequent clinical and commercialization costs, and customers showing repeat-purchase / expansion behavior. For example, Isomorphic's collaborations with Lilly/Novartis carry total potential value approaching $3 billion; Generate's multi-target collaboration with Novartis carries total potential value over $1 billion, with an upfront of $65 million; Insilico's new global collaboration with Lilly includes a $115 million upfront and up to $2.75 billion in total potential value; Recursion has already received $134 million in upfront plus progress milestones from Sanofi, and $213 million in upfront and milestones from Roche/Genentech.
The value tiering of technology roadmaps can be summarized as follows:
Technology Layer Best Suited For Long-Term Moat Willingness to Pay Open-Source/Cloud Commoditization Risk M&A Likelihood Judgment Bio data layer Data platforms / hospital networks / pharma Very high Very high Low Very high The most valuable is exclusive data ownership and governance capability. Chemistry data layer Pharma, software platforms High High Mid Mid-high Needs to be linked with the experimental loop. Clinical data layer IQVIA/Tempus/hospital networks Very high Very high Low Very high Subject to regulation and privacy protection; high barriers. Literature/patent layer R&D software companies Mid Mid High Mid Search alone is not enough; it must be embedded in the workflow. Multi-omics modeling layer AI-native biotech/pharma High Mid-high Mid High Needs proprietary experimental data feeding it continuously. Protein structure and design layer Protein-design companies/pharma Mid-high Mid-high High High Enormous scientific value, but the pure structure layer is easily open-sourced. Molecule generation layer AI-native biotech/software platforms Mid-high High Mid-high High The real moat lies in data and wet-lab. Physics simulation layer Computational chemistry software companies High Very high Mid Mid-high Strong willingness to pay; one of the most mature for commercialization. ADMET/toxicity layer Certara/AI-biotech/pharma High Very high Mid Mid-high Closely tied to regulation and decisions. Experiment design layer Benchling/automation platforms High High Mid Very high High value once tied to ELN/LIMS/robotics. Lab automation layer Tool companies/cloud labs Very high High Low High Asset-heavy, but with the strongest closed-loop capability. Feedback learning layer AI-native biotech Very high Mid-high Low High The core of platform compounding. Clinical translation layer Tempus/IQVIA/pharma Very high Very high Low Very high Closest to real commercial outcomes. Pharma collaboration layer Veeva/Benchling/Dotmatics Very high Very high Low Very high Once embedded in the workflow, hard to replace. Regulatory compliance layer Certara/Veeva/IQVIA Very high Very high Low Mid-high Compliance defines the boundary of scaling. Three Scenario Forecasts
Dimension Conservative Base Aggressive Assumption AI mainly improves early-research efficiency, no clear improvement to clinical outcomes AI improves hit-to-lead, candidate quality and some patient stratification AI is systematically validated in translational biology and clinical stratification Pharma AI adoption rate Mid Mid-high High Number of AI-discovered drugs entering the clinic Steady rise Clear increase Rapid rise Clinical success-rate lift Low or insignificant Early lift, later to be confirmed Mid-high Platform licensing revenue High High Very high Milestone release Low–Mid Mid High Proprietary-pipeline value Mid-low Mid-high Very high Most-benefiting link Software/clinical IT/simulation Platform + partnership + some pipeline Platform + pipeline dual-engine Relatively benefiting companies Veeva, IQVIA, Certara, Schrödinger Tempus, Recursion, Insilico, Generate, AbCellera Isomorphic, Insilico, Iambic, Generate, Tempus Companies possibly under pressure Pure story-driven AI-biotech Low-efficiency CRO and manual processes Traditional discovery outsourcing, general-model vendors Main risk Biological complexity, cooling partnership enthusiasm. Platform evidence still needs ongoing clinical validation. Valuation bubble and a regulatory cadence that does not keep pace. Conclusions on the valuation framework:
Software-platform type: prioritize ARR/subscription-revenue growth, gross margin, net retention, customer count, and depth of industry embedding.
Platform + partnership type: look at upfront quality, the density of triggerable milestones, partnership repeat rate, and the pace of proprietary-program advancement.
Pipeline AI-biotech: ultimately still reverts to traditional biotech rNPV, except that whether the platform can raise success rates affects the success-probability assumption.
Data-asset type: look at whether the data is exclusive, whether it is structured, and whether it can be used directly for pharma decisions and regulatory acceptance.
What should least be valued on its own is "the foundation model itself," because structure prediction and general-purpose generative models are being open-sourced and commoditized ever faster.
A Deep Breakdown of Sub-Tracks
The table below compresses each track along "track logic—revenue conversion—barriers—risks—investment attractiveness." Scores reflect this report's research judgment, out of 10.
Track Commercialization Status How AI Demand Converts to Revenue Core Barriers Main Risks Future Catalysts Investment Attractiveness AI target discovery Pharma partnerships exist; revenue is mostly partnership-based Target prioritization, disease-mechanism maps → collaborative R&D fees/milestones Multi-omics + phenotype + causal inference + validation loop False-positive targets, translation failure More repeatable target wins 8 Multi-omics AI platforms Maturity rising Data subscriptions, joint research, analytics-platform fees Data scale and annotation Difficult sample standardization Broader pharma adoption 8 Single-cell AI platforms More tool/research-oriented, but commercializing Instruments + reagents + analytics software + pharma projects Data quality and spatial resolution Volatile academic budgets Spatial-omics penetration 7 Protein structure prediction Enormous scientific value, limited profit pool Mainly embedded as upstream capability into downstream products Algorithm capability Open source and commoditization AF3/RoseTTA ecosystem applications 5 Protein design Entering partnership and preclinical stages Collaborative R&D, platform licensing, proprietary protein pipelines Design × manufacturability × wet lab The gap from sequence to drug More large-pharma milestones 9 Antibody design One of the clearest for commercialization Collaboration milestones, royalties, proprietary antibodies Wet lab, large-scale screening, CMC Crowded competition More clinical entries 9 Small-molecule generation Commercialization accelerating Collaborative R&D, proprietary pipeline, some software Chemistry data, closed-loop experiments Easy to generate, hard to translate More Phase II validation 8 Virtual screening Mature, embedded in the toolchain Software fees, project-based services Coupling with experimental validation Single-point function under price pressure Higher-hit-rate cases 7 Physics simulation/computational chemistry One of the most mature tracks Subscription fees, seat fees, cloud fees Physics models, customer habits Customer budget contraction Cloud-delivery expansion 9 ADMET prediction Reasonably well established Software + services, influencing go/no-go decisions Historical data, regulatory credibility Data bias Higher regulatory acceptance 8 Toxicity prediction Mid-early validation Project services + platform plugins Data scarcity, hard negative samples Mistakenly killing candidates Animal-alternative regulatory push 7 Synthetic-route planning Already in practical use Software API, chemistry workflows Reaction databases Disconnection from real processes Automated-synthesis integration 6 Automated labs Real customers exist Equipment revenue + software + operations Hardware integration and SOPs Long deployment cycles, asset-heavy Recovery in large-pharma CAPEX 8 High-throughput screening Mature but easily commoditized into services Project fees, platform partnership fees Equipment and processes Labor-style price pressure Combination with active learning 7 Biofoundry Still high-end platform-oriented Platform access, joint development, foundry services Automation + engineering Capacity utilization Synthetic biology and protein-design expansion 7 AI preclinical validation Rapidly developing Simulation, candidate screening, in-vitro alternative models Data + model + experiment Still hard to extrapolate to humans FDA push to reduce animal testing 8 AI clinical-trial design Already established Clinical SaaS + services Historical-trial network and RWD Long sales cycles More pharma DCT budgets 9 AI patient recruitment Already established Site-screening and patient-matching service fees EHR network, hospital channels Privacy and interface fragmentation Recruitment-efficiency case studies 8 Real-world data High certainty Data licensing + consulting + evidence generation Data exclusivity and governance Compliance/sovereignty External-control expansion 9 Synthetic control arms Regulatory acceptance still expanding Research services + filing support Long-term longitudinal data Narrow applicability boundary Oncology/rare-disease pilots 6 Drug-safety monitoring Mature commercialization PV software and services Compliance and historical libraries Hard to replace but slow to innovate Generative-writing efficiency gains 7 Medical writing/regulatory filing Meaningful efficiency gains already Document-automation software + services Compliance, review processes Large-model hallucination Veeva/partner-ecosystem expansion 8 AI pharma R&D platforms Most able to compound Industry cloud / knowledge platform / agent subscription Workflow embedding, validation data Large pharma building in-house Larger M&A and a replacement wave 10 R&D data platforms High value ELN/LIMS/data lake/knowledge graph Data structuring and migration cost Open-source base components Siemens-Dotmatics-type deals 10 ELN/LIMS/research data management Mature, early in AI-ification Subscription fees, expansion fees Extremely high stickiness Slow innovation cadence AI-agent layering 9 AI pharma-process optimization Adoption accelerating On-site software, process services GMP and manufacturing know-how Long validation cycles Advanced-manufacturing regulatory push 7 AI-native biotech Largest upside Partnership + milestones + royalties + pipeline Data + model + experiment + clinical Cash burn, clinical failure Phase II/BD/M&A 9 AI + CRO Indirect beneficiary Efficiency gains improve delivery and margin Customer and execution network Price competition DCT/RBQM adoption 6 AI + life-science tools Real beneficiary Instruments, consumables, software upgrades Installed base Volatile research budgets Multi-omics/automation penetration 8 AI + cloud and compute Necessary but not exclusive Cloud fees, training inference Capital-intensive Weak bio-proprietary value Demand for larger models/simulation 6 The tracks most prone to bubbles: general-purpose bio foundation models, "pure generation" companies without a wet-lab loop, single-point structure prediction, and AI-native biotechs that sell only vision without disclosing customer repeat-purchase and clinical progress.
The tracks most likely to produce real compounding: life-science R&D software platforms, clinical and real-world data, biosimulation, workflow-level AI, lab automation with a data loop, and protein/antibody-design companies that genuinely enter the clinic and retain equity.
Master List of Investment Targets and Company Tiering
Master List of Selected Public and Important Private Targets
Note: valuation metrics preferentially use market cap/share price as of 2026-05-19 plus disclosed full-year 2025 revenue for a rough basis; where complete current-period figures for EV/EBITDA, cash burn or R&D expense are not retrieved in this report, they are marked "needs further validation." The "benefit path" and "disruption path" are this report's research judgment.
Company Ticker/Market Listing Status Sub-Segment Core AI Platform/Product AI Benefit Path or Disruption Path Key Validation Disclosed Financials/Valuation Handles Judgment Recursion RXRX / Nasdaq Public Phenomics + small molecules Recursion OS, phenomap, Valence/Exscientia integration Direct beneficiary: platform partnership + pipeline 2025 revenue $74.68 million; cumulative receipts from Roche/Genentech $213 million; cumulative receipts from Sanofi $134 million; cash about $754 million. Market cap about $1.53 billion; P/S about 20.5x; high cash burn. High upside, high risk Schrödinger SDGR / Nasdaq Public Computational chemistry software + pipeline Maestro, LiveDesign, physics platform Direct beneficiary: software subscription + collaborative R&D + small pipeline 2025 revenue $256 million, of which software revenue $199.5 million, software gross margin 74%; the TuneLab integration with Lilly shows the platform has become an industry gateway. Market cap about $892 million; P/S about 3.5x. High-certainty platform Certara CERT / Nasdaq Public Biosimulation/PKPD/regulatory Simcyp, Phoenix, etc. Direct beneficiary: software and services already deeply commercialized 2025 revenue $418.8 million, software revenue $183.3 million, Adj. EBITDA $134.5 million, cash $189.4 million, customers 2,600+. Market cap about $719 million; P/S about 1.7x. An underrated high-quality platform sample AbCellera ABCL / Nasdaq Public Antibody discovery/protein engineering Antibody discovery platform + proprietary pipeline Direct beneficiary but already more of a platform + pipeline biotech 2025 revenue $75.13 million; 104 partner-initiated programs, two proprietary assets in the clinic; cash and equivalents + marketable securities about $534 million. Market cap about $1.26 billion; P/S about 16.8x. Strong platform, clinical and commercialization still to be proven Tempus AI TEM / Nasdaq Public Clinical/molecular data platform AI diagnostics + data and applications Direct beneficiary: diagnostics, data services, trial matching 2025 revenue $1.272 billion, up 83.4% YoY; Q4 revenue $367 million; expanded into a Japan JV. Market cap about $7.81 billion; P/S about 6.1x. One of the strongest commercializing "AI healthcare/R&D data" samples Veeva VEEV / NYSE Public Life-science industry cloud Vault, Veeva AI Agents Platform-type direct beneficiary FY2026 revenue $3.195 billion, subscription revenue $2.684 billion; AI Agents are being rolled out in phases to Clinical/Regulatory/Medical. Market cap about $27.68 billion; P/S about 8.7x. High certainty, valuation not cheap IQVIA IQV / NYSE Public RWD + clinical trials + CRO IQVIA AI, predictive modeling, site selection Platform-type direct beneficiary 2025 revenue $16.31 billion; holds 1.2 billion+ de-identified patient records and 4,600+ data assets. Market cap about $29.3 billion; P/S about 1.8x. A low-valuation, high-certainty clinical/RWD platform Thermo Fisher TMO / NYSE Public Life-science tools + clinical infrastructure Instruments, software, automation, Clario/PPD Tools-type beneficiary 2025 revenue $44.56 billion; continued M&A of Clario to expand clinical-trial digitalization. Market cap about $164.9 billion; P/S about 3.7x. The pick-and-shovel leader Danaher DHR / NYSE Public Bioprocessing/life-science tools Cytiva, Beckman, IDBS ecosystem Tools-type beneficiary 2025 full-year performance solid, Q4 revenue $6.8 billion. Market cap about $116.5 billion; full-year EV/S needs further validation. Pick-and-shovel, indirect AI financial upside Illumina ILMN / Nasdaq Public Sequencing infrastructure NGS platform Tools-type indirect beneficiary 2025 Q4 revenue $1.16 billion; the base layer for multi-omics and precision medicine. Market cap about $21.79 billion. Important data layer, but AI direct monetization weaker than software platforms 10x Genomics TXG / Nasdaq Public Single-cell/spatial omics Chromium, Visium, Xenium ecosystem Tools-type indirect beneficiary 2025 revenue $642.8 million; spatial omics is an important data foundation for target and mechanism modeling. Market cap about $2.75 billion; P/S about 4.3x. A mid-to-long-term data pick-and-shovel vendor Siemens + Dotmatics SIE / Germany Public + M&A asset R&D software platform Dotmatics/Luma Platform-type beneficiary Siemens acquired Dotmatics at $5.1 billion EV; Dotmatics 2025 revenue projected at $300 million+, Adj. EBITDA margin 40%+. The deal itself is the valuation anchor R&D-software-layer M&A validation Generate:Biomedicines GENB / Nasdaq Public AI protein design + pipeline Generate Platform AI-native biotech Collaboration with Novartis carries total potential value $1 billion+, upfront $65 million; raised $400 million in its IPO. Market cap about $844 million. Worth tracking, but clinical validation is still early Isomorphic Labs Undisclosed / UK Private Protein structure + small-molecule design IsoDDE/AlphaFold-derived capabilities AI-native biotech Lilly/Novartis collaborations carry total potential value approaching $3 billion; raised $2.1 billion in 2026. Valuation undisclosed Private-market leader, in the preclinical-to-clinical transition Insilico Medicine 3696.HK / HK Public AI small molecules + pipeline Pharma.AI AI-native biotech 2025 software collaborations reached 13 of the world's top 20 pharma companies; new collaboration with Lilly has a $115 million upfront and total potential value up to $2.75 billion; completed its HK IPO in 2025. Valuation needs further validation A core China/global AI-biotech sample Iambic Therapeutics Undisclosed / US Private AI small molecules AI platform + candidate drugs AI-native biotech New funding of $100 million+ in 2025, plus a collaboration with Takeda. Undisclosed Worth tracking closely Xaira Therapeutics Undisclosed / US Private End-to-end AI-biotech Virtual cell + therapeutics AI-native biotech Launch capital of $1 billion+ in 2024. Undisclosed A top-tier private-market roster, but revenue and clinical are long-dated Cradle Undisclosed / Netherlands/Switzerland Private AI protein-engineering software Protein engineering copilot Platform-type beneficiary 2024/2025 Series B $73 million, total funding $100 million+, already used by top pharma. Undisclosed A protein-design-software sample Benchling Undisclosed / US Private R&D operating system ELN/LIMS/R&D Cloud Platform-type beneficiary Covers 200,000+ scientists and over half of the world's top 50 biopharma; 2021 valuation $6.1 billion. High valuation, financials undisclosed Long-term core infrastructure BenchSci Undisclosed / Canada Private Disease-biology AI ASCEND, BEKG Platform-type beneficiary Used by 16/20 top pharma; 2025 collaborations with Sanofi, Thermo, Merck; 2023 funding $95 million. Undisclosed A quality preclinical-AI software target Opentrons Undisclosed / US Private Automated labs OT-2/Flex Tools-type beneficiary AI-enabled lab automation, focused on lab automation. Undisclosed Benefits from automation adoption, but depends on enterprise willingness to pay WuXi AppTec 603259.SH / 2359.HK Public CRO/CDMO Integrated drug-discovery and manufacturing platform Indirect beneficiary: AI improves discovery and new-molecule service efficiency 2025 revenue RMB 45.46 billion, backlog RMB 58 billion, new-molecule business share over 30%. A/H valuation needs further validation The most trackable indirect beneficiary in the China chain Tigermed 300347.SZ / 3347.HK Public Clinical CRO/DCT/PV CTRM, DCT, AI translation and PV Indirect beneficiary Has launched "Medical Intelligent Q&A" and an "intelligent medical translation platform," and is advancing RBQM and DCT. Valuation needs further validation An AI-clinical-tools beneficiary BGI Genomics 300676.SZ Public Multi-omics/testing Multi-omics big data Indirect beneficiary 2025H1 single-cell-related business grew notably; multi-omics big data covers 100+ countries. Valuation needs further validation A data-layer beneficiary, but AI monetization is still indirect Company Tiering and Investment Priority
Tier A: Core direct beneficiaries of AI drug discovery Veeva, IQVIA, Certara, Schrödinger, Tempus, Benchling, Dotmatics/Siemens. Rationale: they have either formed clear subscription/platform revenue, or command high-value clinical/R&D data networks, or are already embedded in pharma's core workflows.
Tier B: Clear beneficiaries, but with valuation, clinical or commercialization risk Recursion, AbCellera, Generate, Insilico, Iambic, 10x Genomics, WuXi AppTec. Rationale: strong platforms, but either high cash burn, insufficient clinical validation, or still-volatile revenue.
Tier C: AI mainly used for efficiency; weak near-term financial upside Thermo Fisher, Danaher, Illumina, Tigermed, BGI Genomics. Rationale: AI is an enhancer rather than a primary revenue source; benefits are more indirect.
Tier D: Strong narrative but still insufficient evidence of real benefit Some general-purpose foundation-model companies, pure target-discovery companies, and early AI-biotechs with no repeat pharma collaborations / no clinical advancement / no revenue disclosure. Rationale: model capability has not yet translated into stable revenue or clinical value.
Tier E: Traditional links that may be compressed by AI and automation Low-automation, low-data-accumulation, low-technology-content manual discovery outsourcing and documentation/translation/monitoring processes. Rationale: what AI and automation replace first are standardized, repetitive, low-differentiation processes.
Scoring Model and Key-Company Ranking
Positive scoring model
Direct exposure to AI drug-discovery revenue or pipeline value: 20%
Barriers across data, models and the experimental loop: 20%
Pharma-collaboration and customer quality: 15%
Pipeline quality and clinical validation: 15%
Financial quality and cash runway: 10%
Market size and growth upside: 10%
Valuation reasonableness: 10%
Reverse-risk model
Insufficient clinical validation: 25%
Insufficient revenue durability: 20%
Cash burn and financing risk: 20%
Platform effectiveness not yet proven: 15%
Risk of pharma building in-house or open-source substitution: 10%
Overvaluation: 10%
Company Positive Total Score Commercialization Risk Score Brief Comment Veeva 84 18 One of the steadiest AI beneficiaries at the workflow and compliance layer. IQVIA 82 22 Strong clinical/RWD data network, with a valuation that is not extreme. Certara 81 24 The clearest biosimulation business model, regulator-friendly. Tempus 79 37 Scarce data assets, strong growth, but valuation and integration execution need validation. Schrödinger 77 33 Strong software authenticity; platform value exceeds near-term pipeline. Thermo Fisher 74 20 A tools pick-and-shovel vendor; AI is a bonus, not the main story. Dotmatics/Siemens 73 21 M&A has already validated the value of R&D software assets. Recursion 72 61 Strong platform and partnerships, but high cash burn and clinical uncertainty. 10x Genomics 70 35 High long-term value in single-cell/spatial omics, short-term financial volatility. WuXi AppTec 69 28 The clearest indirect AI beneficiary in the China chain. AbCellera 67 58 Quality antibody platform, but revenue and clinical validation are still insufficient. Generate 66 56 One of the leaders in protein design, still needs clinical and revenue continuity. Insilico 65 57 Very strong BD, but the secondary market needs sustained clinical and revenue delivery. Illumina 64 32 Strong data-foundation value, but AI direct monetization is an indirect benefit. Tigermed 60 34 DCT/PV/AI translation bring efficiency gains; upside leans toward operational improvement. Deep Analysis of Key Public Companies
To control length, the following is compressed by "the public samples most worth further study," focusing on direct exposure, platform barriers, revenue validation, valuation and risk.
Company Track Core AI Product/Platform/Pipeline Commercialization Stage Pharma Collaboration/Customers/Clinical Validation AI Impact on Revenue and Profit Data/Model/Experiment/Customer Barriers Valuation and Financial Observations Future Catalysts Research Conclusion Recursion Phenotype + generation + platform + pipeline Recursion OS, phenomap, proprietary pipeline Platform + pipeline hybrid Roche/Genentech, Sanofi; partnership receipts already realized; pipeline advancing. Current revenue still mainly from partnerships, not software ARR; large losses. Strong data, CRISPR phenotype maps; strong experimental loop; mid-high customer barriers. High P/S, plenty of cash but high burn. More milestones, pipeline clinical updates, Exscientia integration synergies. High upside / high risk / worth continued tracking Schrödinger Computational chemistry software Maestro, LiveDesign, physics platform Mature software + optional pipeline Broad software customer base; TuneLab integration with Lilly reinforces platform position. Software revenue approaching $200 million with high gross margin; margins dragged by pipeline investment. Strong algorithms, workflow and customer habits; experimental loop weaker than wet-lab platforms. Market cap has pulled back, P/S below many AI-biotechs. Software ACV recovery, partnership revenue, pipeline readouts. High-certainty platform / worth deep study Certara Biosimulation/regulatory Simcyp, Phoenix, etc. Fully commercialized 2,600+ customers, deep regulatory scenarios. AI mainly enhances the installed platform; revenue stability is relatively strong. High barriers in regulatory acceptance, historical databases, professional services. Relatively low P/S, positive EBITDA, ample cash. Software growth recovery, new-product adoption. High certainty / valuation relatively reasonable AbCellera Antibody discovery + proprietary Antibody platform, proprietary ABCL635/575/688/386 Transitioning from platform to clinical biotech 104 partner programs, two assets in the clinic. Historical partnership revenue is volatile; proprietary investment increases losses. Strong data + wet-lab + manufacturing platform. Cash and securities about $534 million, supporting a longer runway. ABCL635/575 data, more partner milestones. Strong platform / clinical and commercialization need ongoing validation Tempus AI Clinical data/precision medicine Data and application platform, AI diagnostics Strong commercialization Diagnostics, pharma, applications in a three-in-one; hospital network and Japan JV. AI is already directly driving revenue scale, but margins are still improving. Extremely strong clinical-molecular data and workflow barriers. Rapid growth corresponds to a higher valuation, but not extreme. Quality of data-business growth, path to profitability. High growth / valuation needs digesting / worth focused study Veeva R&D and compliance industry cloud Veeva AI Agents, Vault High maturity Deeply embedded in life-science workflows. AI is more likely to lift ARPU and customer stickiness than to be a standalone new track. Extremely high industry-process and compliance barriers. High-quality SaaS, valuation not cheap but understandable. AI Agents rollout pace, Clinical Data launch. Tier A platform winner IQVIA Clinical/real-world data/CRO IQVIA AI, RWD, site selection High maturity Globalized data, CRO and analytics network. AI improves delivery efficiency and customer value; near-term financial upside is steady. Extremely high data and execution-network barriers. Relatively moderate valuation. Increased pharma DCT/RWE budgets. Tier A, high certainty, an underrated AI beneficiary Thermo Fisher Tools + clinical infra Instruments, automation, PPD/Clario High maturity Expanding clinical and digitalization through M&A. AI is a demand enhancer, not a primary revenue source. Strong installed base, consumables and cross-selling. A large pick-and-shovel vendor, P/S not extreme. Clario integration and automation penetration. Tools-type beneficiary Danaher Tools/bioprocessing Cytiva/Beckman/software assets High maturity Life-science and biopharma infrastructure. Mostly reflected in R&D and manufacturing efficiency. Strong process and customer lock-in. More of a defensive beneficiary. Recovery in bioprocessing and lab-automation demand. Indirect beneficiary / long-term allocation research subject Illumina NGS data infrastructure Sequencing platforms Mature The base for precision medicine and multi-omics. AI mainly stimulates downstream application demand. High data/ecosystem value, but pricing and geopolitical risk exist. More like infrastructure than a direct AI beneficiary. New-platform penetration, policy environment. Tier C, worth tracking but not a top pure-AI pick 10x Genomics Single-cell/spatial omics Chromium/Visium/Xenium Mid maturity A high-value research data source. AI demand transmits to demand for high-quality omics data. High single-cell/spatial data barriers. Financial upside affected by research budgets. Spatial-biology adoption. A high-upside name among pick-and-shovel vendors Siemens R&D software platform Dotmatics/Luma Post-M&A integration Validates life-science software value at $5.1 billion. Directly benefits from R&D software platform expansion. High software platform + multimodal data barriers. The acquisition price itself is a valuation coordinate. Integration and cross-selling. Watch the software-platform M&A logic Generate:Biomedicines AI protein design Generate Platform, GB-0895, etc. Early clinicalization Novartis, Amgen collaborations; already public. Current partnership value is relatively high; proprietary clinical work more decisive for valuation. Protein generation + wet-lab + pipeline. High volatility post-IPO, depends on clinical. Phase III progression. High upside / high risk / worth continued tracking Insilico Medicine AI small molecules + pipeline Pharma.AI, ISM series Commercialization and clinical in parallel Lilly, Sanofi, Exelixis, Menarini, etc. Triple structure of software/partnership/pipeline; realization higher than most private peers. End-to-end platform, strong BD capability. Important as a HK-listed AI-biotech sample. More clinical data, partnership wins. A core AI-native biotech research subject WuXi AppTec Integrated CRO/CDMO Drug-discovery-to-manufacturing platform Strong commercialization Order and new-molecule business growth. AI is reflected in efficiency and customer acquisition, not a standalone revenue item. Integrated execution capability. Both valuation and policy factors coexist in the China chain. International-project recovery, AI-service upgrades. A core China indirect-beneficiary sample Tigermed Clinical CRO/DCT DCT, CTRM, AI translation/PV Commercialized DCT and clinical tech services already in commercial sales. AI efficiency gains benefit margins and delivery. Hospital network and execution capability. Weaker direct AI upside. DCT project ramp. A China clinical-AI beneficiary, but leans toward operational improvement Risks, Expectation Gaps and Final Conclusions
Which companies have already fully priced in AI expectations. From the current market narrative and valuation structure, Recursion, AbCellera, some high-heat AI-native biotechs, and general-model companies that still lack stable revenue are more prone to "premature capitalization of platform value"; meanwhile, if future clinical delivery falls short, valuations will be more fragile. Conversely, Certara, IQVIA, and some life-science R&D software assets have not fully reflected in the secondary market AI's potential as an efficiency and category-expansion engine.
Which companies may still have expectation gaps.
Certara: the main reason it is most underrated is that the market still sees it as "professional software + services" rather than one of the decision platforms with the highest AI + regulatory acceptance in drug discovery.
IQVIA: its combination of RWD + CRO + AI is closer to large pharma's budget center than most "single-point AI companies."
Schrödinger: the market is more sensitive to its pipeline volatility and easily underrates the industry-gateway nature of its software platform.
Veeva: outsiders easily see it as traditional SaaS, but the embedding depth of life-science-industry-specific AI agents may let it keep winning incremental budget.
10x Genomics: if spatial/single-cell data becomes the "new base" for AI target discovery and patient stratification, its data-infrastructure value will be re-rated.
Which traditional companies are most likely to benefit. Thermo Fisher, Danaher, Illumina, 10x, IQVIA, Veeva, WuXi AppTec, Tigermed. These companies benefit not because "they will build the best models themselves" but because AI will increase demand for high-quality data, automation, compliance workflows, clinical execution networks, omics infrastructure and R&D software.
Which traditional companies may be disrupted. The most likely to face pressure are: first, low-tech services that rely on large amounts of manual screening, manual documentation and manual monitoring; second, discovery outsourcing without proprietary data and software control; third, low-end experimental processes lacking automation capability. AI does not have to replace all CROs to compress their profit pools; merely raising R&D efficiency inside pharma and at leading platforms will change pricing power.
Systemic risk checklist.
Commercialization below expectations: customers treat AI as a pilot rather than a core procurement.
Clinical failure: the platform improves early efficiency but does not improve human translation.
Platform effectiveness cannot be proven: only individual cases can be told, not replicated at scale.
Insufficient revenue durability: partnership revenue is highly dependent on milestones.
Cash burn and refinancing risk: especially for the platform + pipeline hybrid model.
Risk of large pharma building in-house and open-source substitution: especially the structure-prediction and general-model layers.
Data bias and data-sovereignty risk: constraints on RWD, EHR and cross-border data use.
Lab automation is capex-heavy and slow to deploy.
Regulatory uncertainty: AI entering organization-level GxP and filing still needs more standardization.
Final Conclusions
The importance of AI drug discovery and life-science platforms within the AI industry chain lies not in how many "star drugs" it will produce first, but in its being one of the few scenarios that can genuinely embed AI into high-margin, heavily regulated, long-cycle, decision-intensive industry workflows. It is slower than most general-purpose AI applications, but once it enters core workflows, stickiness is stronger and switching costs are higher.
The five sub-tracks most worth watching
Life-science R&D software and industry cloud
Clinical data/RWD/patient-recruitment platforms
Biosimulation and computational chemistry
Protein/antibody-design platforms with a wet-lab loop
Automated labs and R&D data infrastructure.
The ten public companies most worth deep study Veeva, IQVIA, Certara, Schrödinger, Tempus, Recursion, Thermo Fisher, Danaher, 10x Genomics, WuXi AppTec.
The ten private/primary-market companies most worth tracking Isomorphic Labs, Iambic, Xaira, Cradle, Benchling, BenchSci, BigHat, PathAI, EvolutionaryScale, Profluent.
The five points most easily misunderstood by the market
An AlphaFold/foundation-model breakthrough does not mean the profit pool will stay at the model layer.
AI shortening early discovery time does not mean it will equally shorten approval and time to market.
A several-billion-dollar collaboration headline does not equal real revenue.
A "platform company" without recurring revenue and customer embedding may, in the end, be just a high-valuation biotech.
Data assets and the experimental loop are usually more important than the model itself.
The metrics most worth tracking over the next 6–12 months
Phase II / pivotal PoC data from AI-native biotechs
Newly triggered near-term milestones and add-on deals within large-pharma collaborations
ARR, software-revenue growth, customer retention and large-customer expansion at software platforms
Large M&A among R&D data platforms / automation platforms
Cash balance, net cash burn, and implied runway.
Platform-type winners Veeva, IQVIA, Certara, Schrödinger, Benchling, Dotmatics/Siemens, Tempus. AI-native biotech challengers Isomorphic, Insilico, Recursion, Generate, Iambic, Xaira, AbCellera. Pick-and-shovel vendors Thermo Fisher, Danaher, 10x, Illumina, Opentrons, WuXi AppTec. Pseudo-beneficiary or high-risk profiles Companies that only showcase model capability, with no repeat customers / no clinical advancement / no data loop / no disclosure of revenue quality.
Open Questions and Limitations This report has tried to prioritize public information as of 2026-05-19, but several items remain not fully disclosed or in need of further validation:
Most private companies lack verifiable ARR, gross margins and cash burn;
Some European, Japanese, Korean and Indian companies do not separately disclose AI-drug-research revenue contributions;
For a narrower and deeper next step, I would suggest prioritizing any one of the following directions:
AI protein design, AI small-molecule discovery, automated labs, AI clinical trials, AI + CRO, AI pharma R&D platforms, multi-omics AI, AI-native biotech valuation.
This report is based on public information and does not constitute investment advice. Markets carry risk; invest with caution.
Full report
Sign in to read the full report
Sign up free to unlock the full text, the Baillie growth scorecard, and full-text search.
Log in / Sign up free