← Back to Blog | ← Back to Kortmann BioAdvisory

Training AI Scientists from First Principles

AI-literate biology meets biology-literate AI

Part 1: How We Train AI Now

Most AI systems today learn biology through the rearview mirror — trained on massive datasets like gene expression matrices, clinical annotations, or large corpora scraped from PubMed and bioRxiv. Many commercial platforms now advertise LLMs “fine-tuned on scientific literature” or “optimized for omics.”

But even the smartest postdoc wouldn’t be trained this way.

No real scientist learns discovery by memorizing 10,000 papers. We read them. We wrestle with them. But discovery comes from elsewhere — from curiosity, from contradictions, from watching cells do something strange under the microscope.

Scientific literacy matters — but only if we remember what it’s for. It’s not about recitation. It’s about recognition: of principles, of patterns, of possibilities. The best papers don’t just inform — they provoke. Like hearing great music, they don’t teach you to copy — they make you want to play.

In my experience, the best scientists learn by:

I still remember how Professor Denise Monack gave me that early freedom during my postdoc. She didn’t just allow me to explore — she encouraged it. I’d spend late hours at the scope just watching infected cells, pulling patterns out of behavior. Professor Manuel Amieva, a master of confocal microscopy, taught me to see beauty in detail — not just in structure, but in change. That kind of mentorship didn’t just teach me tools — it taught me how to think.

In a really good lab, you’re not forced into a matrix of someone else’s work. You’re invited to explore the system yourself. And that’s the mindset we need — for both human and AI scientists.

Biology isn’t a language task. It’s a process. And yet, we’re training machines as if success in science is about citation density and curve fitting.

The further we move into this age of data atlases and cell taxonomies, the more we risk mistaking volume for insight. Having access to every dataset doesn’t make you a scientist. Knowing what to ask when something surprises you — that’s the job.

Part 2: What If We Flipped the Frame?

Let’s imagine an AI that learns biology not from citations, but from curiosity.

Let it:

Don’t tell it what’s happening. Let it guess. Let it ask. Let it wonder.

This isn’t just about better data. It’s about better instincts:

By building an AI that rediscovers biology, we might just remind ourselves how to rediscover it too.

This is how I help teams navigate discovery in my consulting work: not by drowning in literature, but by asking better questions.

Part 3: Biology Is the Original Algorithm

Here’s what I’ve seen over 15+ years:

Biologists don’t just run protocols. We develop instincts. We learn what looks wrong under the microscope. We sense when a pattern breaks. Sometimes, our best ideas come from a piece of paper, a pen, and silence — not PubMed.

That’s the kind of mind we want to model. That’s where AI-literate biology meets biology-literate AI.

So how would we build that?

Later:

We don’t just want to teach AI what biology is. We want to let it feel its way into discovery.

Part 4: What Makes a Real Scientist?

Science is 90% failure. Or maybe 100% learning.

If we want to build an AI Scientist, we shouldn’t just teach it to optimize. We should teach it to follow surprising leads. To sit with contradictions. To embrace ambiguity.

In Brazilian Jiu-Jitsu, they say: you don’t win or lose — you win or you learn.

That’s biology. That’s discovery.

If we want machines to think like scientists, we should start by asking:

That’s how the best scientists start. And maybe that’s how our smartest machines should too.

🔁 I write about science, discovery, and curiosity at kortmannbioadvisory.com. Let me know what you’d feed an AI to make it think like a biologist.

Published: May 17, 2025

— Jens