← Back to Blog | ← Back to Kortmann BioAdvisory

Training AI Scientists from First Principles

AI-literate biology meets biology-literate AI

Part 1: How We Train AI Now

Most AI systems today learn biology through the rearview mirror — trained on massive datasets like gene expression matrices, clinical annotations, or large corpora scraped from PubMed and bioRxiv. Many commercial platforms now advertise LLMs “fine-tuned on scientific literature” or “optimized for omics.”

But even the smartest postdoc wouldn’t be trained this way.

No real scientist learns discovery by memorizing 10,000 papers. We read them. We wrestle with them. But discovery comes from elsewhere — from curiosity, from contradictions, from watching cells do something strange under the microscope.

Scientific literacy matters — but only if we remember what it’s for. It’s not about recitation. It’s about recognition: of principles, of patterns, of possibilities. The best papers don’t just inform — they provoke. Like hearing great music, they don’t teach you to copy — they make you want to play.

In my experience, the best scientists learn by:

Watching cells behave (I spent hours infecting macrophages with Salmonella and imaging them under the confocal — not for data, just for insight. That practice alone sparked more ideas than any review article ever could.)
Following surprising results (Professor Denise Monack, my postdoc advisor at Stanford, once told me: if it looks weird, look again — the best papers start that way)
Asking childlike, open questions
Making honest mistakes — and being allowed to follow them
Spotting patterns no one else noticed

I still remember how Professor Denise Monack gave me that early freedom during my postdoc. She didn’t just allow me to explore — she encouraged it. I’d spend late hours at the scope just watching infected cells, pulling patterns out of behavior. Professor Manuel Amieva, a master of confocal microscopy, taught me to see beauty in detail — not just in structure, but in change. That kind of mentorship didn’t just teach me tools — it taught me how to think.

In a really good lab, you’re not forced into a matrix of someone else’s work. You’re invited to explore the system yourself. And that’s the mindset we need — for both human and AI scientists.

Biology isn’t a language task. It’s a process. And yet, we’re training machines as if success in science is about citation density and curve fitting.

Optimization-driven learning: Models are typically designed to find correlations, not reasons. The emphasis is on prediction accuracy, not insight generation.
Few-shot or no-shot generalization in biology is still hard: Most models can’t meaningfully reason about a new experimental setup without explicit retraining or guidance.

The further we move into this age of data atlases and cell taxonomies, the more we risk mistaking volume for insight. Having access to every dataset doesn’t make you a scientist. Knowing what to ask when something surprises you — that’s the job.

Part 2: What If We Flipped the Frame?

Let’s imagine an AI that learns biology not from citations, but from curiosity.

Let it:

Watch cells grow, divide, and die — no explanations
Observe unannotated videos of tissue development
See what changes when we perturb a system (add LPS, remove oxygen, apply pressure)
Notice structure and symmetry in RNA folding
Watch neuron–immune cell co-cultures — and infer interaction patterns without prompts

Don’t tell it what’s happening. Let it guess. Let it ask. Let it wonder.

This isn’t just about better data. It’s about better instincts:

Patterns over papers
Observation before explanation
Hypothesis as an emergent property

By building an AI that rediscovers biology, we might just remind ourselves how to rediscover it too.

This is how I help teams navigate discovery in my consulting work: not by drowning in literature, but by asking better questions.

Part 3: Biology Is the Original Algorithm

Here’s what I’ve seen over 15+ years:

Biologists don’t just run protocols. We develop instincts. We learn what looks wrong under the microscope. We sense when a pattern breaks. Sometimes, our best ideas come from a piece of paper, a pen, and silence — not PubMed.

That’s the kind of mind we want to model. That’s where AI-literate biology meets biology-literate AI.

So how would we build that?

Seed cells in culture. Record them. No labels.
Add a stimulus. Record the change. Still no labels.
Watch over time. Let the AI detect health, sickness, division, death.
Let it guess what happened. Let it build its own ontology.

Later:

Feed it directed evolution data. Or stress-based bacterial mutagenesis.
Supply pattern-rich, human-light data. Not Big Omics. Just: small truths, time series, and change.

We don’t just want to teach AI what biology is. We want to let it feel its way into discovery.

Part 4: What Makes a Real Scientist?

Science is 90% failure. Or maybe 100% learning.

If we want to build an AI Scientist, we shouldn’t just teach it to optimize. We should teach it to follow surprising leads. To sit with contradictions. To embrace ambiguity.

In Brazilian Jiu-Jitsu, they say: you don’t win or lose — you win or you learn.

That’s biology. That’s discovery.

If we want machines to think like scientists, we should start by asking:

Can it wonder?
Can it notice?
Can it get curious before it gets trained?

That’s how the best scientists start. And maybe that’s how our smartest machines should too.

🔁 I write about science, discovery, and curiosity at kortmannbioadvisory.com. Let me know what you’d feed an AI to make it think like a biologist.

Published: May 17, 2025

— Jens