<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Peter K. Koo on Genomics x AI</title><link>https://genomicsxai.github.io/authors/peter-k.-koo/</link><description>Recent content in Peter K. Koo on Genomics x AI</description><generator>Hugo -- gohugo.io</generator><language>en</language><lastBuildDate>Fri, 22 May 2026 17:30:02 -0400</lastBuildDate><atom:link href="https://genomicsxai.github.io/authors/peter-k.-koo/index.xml" rel="self" type="application/rss+xml"/><item><title>Fine-tuning AlphaGenome in native JAX/Haiku</title><link>https://genomicsxai.github.io/blogs/2026-003/</link><pubDate>Wed, 25 Feb 2026 00:00:00 +0000</pubDate><guid>https://genomicsxai.github.io/blogs/2026-003/</guid><description>&lt;img src="https://genomicsxai.github.io/" alt="Featured image of post Fine-tuning AlphaGenome in native JAX/Haiku" /&gt;&lt;aside class="summary-box"&gt;
 &lt;h2 class="summary-box__title"&gt;Summary&lt;/h2&gt;
 &lt;div class="summary-box__body"&gt;
 &lt;p&gt;This post introduces &lt;a class="link" href="https://github.com/genomicsxai/alphagenome_ft" target="_blank" rel="noopener"
 &gt;alphagenome-ft&lt;/a&gt;, a lightweight Python package for fine-tuning &lt;a class="link" href="https://www.nature.com/articles/s41586-025-10014-0" target="_blank" rel="noopener"
 &gt;AlphaGenome&lt;/a&gt; using native JAX/Haiku.&lt;/p&gt;
&lt;p&gt;We highlight workflows for&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;adding custom prediction heads&lt;/li&gt;
&lt;li&gt;differing fine-tuning strategies&lt;/li&gt;
&lt;li&gt;freezing/unfreezing parameters&lt;/li&gt;
&lt;li&gt;attribution approaches&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Here we focus on general workflows applicable to genome-scale assays and custom heads. For fine-tuning the encoder for short sequences such as MPRA, see this &lt;a class="link" href="https://genomicsxai.github.io/blogs/2026-002/" target="_blank" rel="noopener"
 &gt;post&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Code&lt;/strong&gt;:
&lt;a class="link" href="https://github.com/genomicsxai/alphagenome_ft" target="_blank" rel="noopener"
 &gt;AlphaGenome fine-tuning utilities&lt;/a&gt;&lt;/p&gt;

 &lt;/div&gt;
&lt;/aside&gt;

&lt;hr&gt;
&lt;h2 id="motivation"&gt;Motivation
&lt;/h2&gt;&lt;p&gt;&lt;a class="link" href="https://www.nature.com/articles/s41586-025-10014-0" target="_blank" rel="noopener"
 &gt;AlphaGenome&lt;/a&gt; is a foundation sequence-to-function model trained on genome-scale data. Its native JAX/Haiku implementation is powerful but can be cumbersome to modify for custom tasks (we learned this the hard way!). Researchers often want to:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Train a new head on a novel assay&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Apply low-rank adapters for efficient backbone updates&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Fine-tune the full model progressively&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Freeze certain components for stability&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Perform attribution analyses to gain insight into learned &lt;strong&gt;cis&lt;/strong&gt;-regulatory logic&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;To help with this, we developed &lt;a class="link" href="https://github.com/genomicsxai/alphagenome_ft" target="_blank" rel="noopener"
 &gt;alphagenome-ft&lt;/a&gt; which provides a lightweight wrapper that achieves all these asks without modifying the original AlphaGenome codebase.&lt;/p&gt;

 &lt;blockquote&gt;
 &lt;p&gt;&lt;em&gt;Side note:&lt;/em&gt; AlphaGenome is a deep learning model that predicts functional genomic signals (e.g., accessibility, transcription, binding) directly from DNA sequence. We call such models sequence-to-function (seq2func) models.&lt;/p&gt;

 &lt;/blockquote&gt;

 &lt;blockquote&gt;
 &lt;p&gt;&lt;em&gt;Side note 2:&lt;/em&gt; JAX/Haiku are DeepMind&amp;rsquo;s frameworks which are similar to using PyTorch but optimized for large-scale accelerator workloads.&lt;/p&gt;

 &lt;/blockquote&gt;
&lt;h2 id="but-wait-why-fine-tune-alphagenome"&gt;But wait, why fine-tune AlphaGenome?
&lt;/h2&gt;&lt;p&gt;Foundation sequence models like AlphaGenome are trained on diverse genome-scale assays, allowing them to learn general regulatory sequence features. However, most research questions involve &lt;strong&gt;specific cell types, assays, perturbations, or organisms&lt;/strong&gt; that differ from the original training distribution.&lt;/p&gt;
&lt;p&gt;You should first check if the foundation model alphagenome has an ouput track that&amp;rsquo;s the same/similar to your cell type of interest, that might be enough! Otherwise, fine-tuning on your cell type/assay of interest is an option.&lt;/p&gt;
&lt;p&gt;Fine-tuning adapts the pretrained model to these new contexts while preserving the regulatory knowledge already encoded in the backbone.&lt;/p&gt;
&lt;p&gt;Benefits of fine-tuning include:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Improved performance with limited data&lt;/strong&gt; - Leverages pretrained regulatory features instead of learning from scratch.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Stability and efficiency via frozen parameters&lt;/strong&gt; - Freezing the backbone while training a new head reduces overfitting, lowers compute cost (this is important AlphaGenome is a BIG model -450m parameters), and prevents catastrophic forgetting.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Parameter-efficient adaptation&lt;/strong&gt; - Methods such as adapters or partial unfreezing allow targeted updates without retraining the full model.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Faster experimentation cycles&lt;/strong&gt; - New assays or prediction targets can be incorporated with minimal engineering effort.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Preservation of biological priors&lt;/strong&gt; - Retains learned sequence motifs and regulatory grammar that remain relevant across assays and cell types.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;In practice, many workflows begin by training a task-specific head with the backbone frozen, then progressively unfreezing components if additional capacity is needed.&lt;/p&gt;
&lt;h2 id="fine-tuning-with-shorter-sequence-windows"&gt;Fine-tuning with shorter sequence windows
&lt;/h2&gt;&lt;p&gt;By default, AlphaGenome is trained on ~1 million base-pair (1 Mb) input sequences, allowing the model to capture long-range regulatory interactions. However, during fine-tuning you are not required to use the full 1 Mb context.&lt;/p&gt;
&lt;p&gt;If your downstream task does not depend strongly on ultra-long-range interactions, you can fine-tune using shorter input windows (e.g., 32 kb). This reduces memory usage, increases batch size flexibility, and can substantially speed up training.&lt;/p&gt;
&lt;p&gt;This is particularly useful when:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;The signal of interest is predominantly local (functional outputs like chromatin accessibility are)&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;You are adapting to assays with shorter effective regulatory range&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;You want faster experimentation cycles&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Importantly, this is different from encoder-only fine-tuning used for very short sequences (e.g., ~200–300 bp MPRA constructs). In that setting, only the convolutional encoder is used, bypassing the transformer and decoder entirely. This is covered in &lt;a class="link" href="https://genomicsxai.github.io/blogs/2026-002/" target="_blank" rel="noopener"
 &gt;another post&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Here, we are still using the full model stack (encoder → transformer → decoder), but operating on a reduced genomic window.&lt;/p&gt;
&lt;p&gt;In practice, reducing input length is a pragmatic trade-off between computational efficiency and long-range regulatory context (i.e. performance).&lt;/p&gt;
&lt;h2 id="package-key-features"&gt;Package key features
&lt;/h2&gt;&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Custom prediction heads – easily register predefined, template, or fully custom heads&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Flexible parameter freezing – freeze backbone, individual modules, or heads&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Seamless integration – works with pretrained AlphaGenome weights&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Parameter inspection – explore and count model parameters&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Attribution analysis – gradient-based or in silico mutagenesis (ISM) methods&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Native JAX/Haiku – fully compatible with original AlphaGenome pipelines&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;AlphaGenome fine-tuning workflows schematic&lt;/p&gt;
&lt;figure&gt;&lt;a href="https://genomicsxai.github.io/blogs/2026-003/alphagenome_ft.png" class="image-link" data-pswp-width="5209" data-pswp-height="3125"&gt;
		&lt;img src="https://genomicsxai.github.io/blogs/2026-003/alphagenome_ft.png" width="600px" height="359"loading="lazy"
			alt="alphagenome_ft schematic"
			title="Schematic of alphagenome-ft. alphagenome-ft enables fine-tuning of AlphaGenome (architecture shown) from different, modular stages of the model (the encoder - for short sequences, the transformer - for 128 base-pair resolution, and the decoder - 1 base-pair resolution). You can control what parts of the model are frozen or free to update and you can calculate attributions, all in native JAX/Haiku." data-title-escaped="Schematic of alphagenome-ft. alphagenome-ft enables fine-tuning of AlphaGenome (architecture shown) from different, modular stages of the model (the encoder - for short sequences, the transformer - for 128 base-pair resolution, and the decoder - 1 base-pair resolution). You can control what parts of the model are frozen or free to update and you can calculate attributions, all in native JAX/Haiku."&gt;
		&lt;/a&gt;&lt;/figure&gt;&lt;hr&gt;
&lt;h2 id="usage"&gt;Usage
&lt;/h2&gt;&lt;p&gt;If these features don&amp;rsquo;t win you over, let&amp;rsquo;s walk through how easy it is to use:&lt;/p&gt;
&lt;h3 id="installation"&gt;Installation
&lt;/h3&gt;&lt;p&gt;alphagenome-ft wraps AlphaGenome and AlphaGenome Research and is available through &lt;a class="link" href="https://pypi.org/project/alphagenome/" target="_blank" rel="noopener"
 &gt;pip&lt;/a&gt; . Installation requires three steps:&lt;/p&gt;
&lt;div class="code-block" data-lang="python"&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-python" data-lang="python"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="c1"&gt;# Step 1: Install AlphaGenome and Research&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="n"&gt;pip&lt;/span&gt; &lt;span class="n"&gt;install&lt;/span&gt; &lt;span class="n"&gt;git&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="n"&gt;https&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="o"&gt;//&lt;/span&gt;&lt;span class="n"&gt;github&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;com&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;google&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;deepmind&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;alphagenome&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;git&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="n"&gt;pip&lt;/span&gt; &lt;span class="n"&gt;install&lt;/span&gt; &lt;span class="n"&gt;git&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="n"&gt;https&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="o"&gt;//&lt;/span&gt;&lt;span class="n"&gt;github&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;com&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;google&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;deepmind&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;alphagenome_research&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;git&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="c1"&gt;# Step 2: Install alphagenome-ft&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="n"&gt;pip&lt;/span&gt; &lt;span class="n"&gt;install&lt;/span&gt; &lt;span class="n"&gt;alphagenome&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;ft&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;Python ≥ 3.11 is required. All other dependencies (JAX, Haiku, optax, etc.) are handled automatically.&lt;/p&gt;
&lt;h3 id="quick-start-adding-new-heads"&gt;Quick Start: Adding new heads
&lt;/h3&gt;&lt;p&gt;There are two main ways to add heads to AlphaGenome with the package (see the figure above for architecture references):&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Predefined heads&lt;/strong&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Use existing AlphaGenome head types, e.g., rna_seq, atac, chip_tf:&lt;/p&gt;
&lt;div class="code-block" data-lang="python"&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-python" data-lang="python"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="nn"&gt;alphagenome_ft&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="n"&gt;get_predefined_head_config&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="n"&gt;register_predefined_head&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="n"&gt;create_model_with_heads&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="n"&gt;rna_config&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;get_predefined_head_config&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;#34;rna_seq&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;num_tracks&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="n"&gt;register_predefined_head&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;#34;K562_rna_seq&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;rna_config&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;create_model_with_heads&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;#34;all_folds&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;heads&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;#34;K562_rna_seq&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;freeze_except_head&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;#34;K562_rna_seq&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="c1"&gt;#Now ready to train!&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;ol start="2"&gt;
&lt;li&gt;&lt;strong&gt;Custom heads and reference templates&lt;/strong&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Our template heads give guidance on accessing different embeddings, which correspond to different biological resolutions - base-pair (bp) precision, regional regulatory context, and short-sequence feature extraction:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;StandardHead – 1bp embeddings (decoder output)&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;TransformerHead – 128bp embeddings (transformer output)&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;EncoderOnlyHead – CNN encoder output, &amp;lt;1 kb sequences (encoder output)&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Note:&lt;/strong&gt; Template heads are there as a guide for to how to set up your own custom head rather than a definitive &amp;lsquo;best&amp;rsquo;/&amp;lsquo;standard&amp;rsquo; option. You should update these with your own layer and loss function choices to fit your data needs.&lt;/p&gt;
&lt;div class="code-block" data-lang="python"&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-python" data-lang="python"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="nn"&gt;alphagenome_ft&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;templates&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;CustomHeadConfig&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;CustomHeadType&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;register_custom_head&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="n"&gt;register_custom_head&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="s1"&gt;&amp;#39;my_head&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="n"&gt;templates&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;StandardHead&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="n"&gt;CustomHeadConfig&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;CustomHeadType&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;GENOME_TRACKS&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="n"&gt;output_type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;rna_seq&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="n"&gt;num_tracks&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;hr&gt;
&lt;h3 id="workflows"&gt;Workflows
&lt;/h3&gt;&lt;p&gt;A full selection of four workflows are given in our &lt;a class="link" href="https://github.com/genomicsxai/alphagenome_ft" target="_blank" rel="noopener"
 &gt;github repository&lt;/a&gt;, covering:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Heads-only fine-tuning (frozen backbone)&lt;/li&gt;
&lt;li&gt;LoRA-style adapters (parameter-efficient fine-tuning)&lt;/li&gt;
&lt;li&gt;Full-model fine-tuning&lt;/li&gt;
&lt;li&gt;Encoder-only (MPRA / short sequences)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;See the dedicated &lt;a class="link" href="https://genomicsxai.github.io/blogs/2026-002/" target="_blank" rel="noopener"
 &gt;MPRA post&lt;/a&gt; for full post dedicated to Encoder-only fine-tuning (it&amp;rsquo;s great, even though I may be slightly biased as the one who wrote it &amp;hellip;).&lt;/p&gt;

 &lt;blockquote&gt;
 &lt;p&gt;&lt;em&gt;Side note:&lt;/em&gt; &lt;a class="link" href="https://arxiv.org/abs/2106.09685" target="_blank" rel="noopener"
 &gt;Low-rank adapters (LoRA)&lt;/a&gt; enable parameter-efficient fine-tuning by learning small update matrices instead of modifying the full backbone.&lt;/p&gt;

 &lt;/blockquote&gt;
&lt;p&gt;If unsure where to start, we recommend training a task-specific head with the backbone frozen, then progressively unfreezing components if additional capacity is needed.&lt;/p&gt;
&lt;hr&gt;
&lt;p&gt;Some extra functionality you might be interested in:&lt;/p&gt;
&lt;h3 id="parameter-management-and-checkpoints"&gt;Parameter management and checkpoints
&lt;/h3&gt;&lt;p&gt;alphagenome-ft allows:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Modular freezing: encoder, transformer, decoder&lt;/li&gt;
&lt;li&gt;Freezing all heads except one: &lt;code&gt;model.freeze_except_head('my_head')&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;Saving checkpoints (heads-only or full model)&lt;/li&gt;
&lt;li&gt;Loading with custom head registration&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="attribution-analysis"&gt;Attribution analysis
&lt;/h2&gt;&lt;p&gt;After training, alphagenome-ft also supports:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;DeepSHAP-like attributions - using dinucleotide shuffled reference sequences&lt;/li&gt;
&lt;li&gt;Gradient × Input&lt;/li&gt;
&lt;li&gt;Gradient&lt;/li&gt;
&lt;li&gt;In silico mutagenesis (ISM)&lt;/li&gt;
&lt;/ul&gt;

 &lt;blockquote&gt;
 &lt;p&gt;&lt;em&gt;Side note:&lt;/em&gt; Attribution methods highlight which nucleotides drive predictions in these models, helping reveal regulatory motifs and sequence grammar learned by the model.&lt;/p&gt;

 &lt;/blockquote&gt;
&lt;p&gt;You can also visualise contributions with &lt;code&gt;plot_attribution_map&lt;/code&gt; or &lt;code&gt;plot_sequence_logo&lt;/code&gt; functions! See below for an example attribution map when we fine-tuned AlphaGenome&amp;rsquo;s encoder on &lt;a class="link" href="https://www.nature.com/articles/s41588-022-01048-5" target="_blank" rel="noopener"
 &gt;fly STARR-seq data&lt;/a&gt; - See our &lt;a class="link" href="https://genomicsxai.github.io/blogs/2026-002/" target="_blank" rel="noopener"
 &gt;MPRA post&lt;/a&gt; for more details.&lt;/p&gt;
&lt;figure&gt;&lt;a href="https://genomicsxai.github.io/blogs/2026-003/sequence_logo_gradient_x_input.png" class="image-link" data-pswp-width="2982" data-pswp-height="430"&gt;
		&lt;img src="https://genomicsxai.github.io/blogs/2026-003/sequence_logo_gradient_x_input.png" width="900px" height="129"loading="lazy"
			alt="attribution map"
			title="Gradient x Input attribution map for AlphaGenome encoder only fine-tuning on [fly STARR-seq data](https://www.nature.com/articles/s41588-022-01048-5) - See our [MPRA post](https://genomicsxai.github.io/blogs/2026-002/) for more details on this fine-tuning. We can see the model highlights an AP-1 motif around position ~230, consistent with known enhancer regulatory logic." data-title-escaped="Gradient x Input attribution map for AlphaGenome encoder only fine-tuning on [fly STARR-seq data](https://www.nature.com/articles/s41588-022-01048-5) - See our [MPRA post](https://genomicsxai.github.io/blogs/2026-002/) for more details on this fine-tuning. We can see the model highlights an AP-1 motif around position ~230, consistent with known enhancer regulatory logic."&gt;
		&lt;/a&gt;&lt;/figure&gt;&lt;p&gt;You can see we recover the AP-1 motif (&lt;code&gt;TGAsTCA&lt;/code&gt;) comes up at roughly position 230 which is a known regulator for &lt;a class="link" href="https://www.nature.com/articles/s41588-022-01048-5/figures/2" target="_blank" rel="noopener"
 &gt;developmental genes in flies&lt;/a&gt;.&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="implications"&gt;Implications
&lt;/h2&gt;&lt;p&gt;To take a step back, what do we get with alphagenome-ft? AlphaGenome becomes flexible to:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Rapid adaptation to new tasks&lt;/li&gt;
&lt;li&gt;Modular freezing/unfreezing for stability&lt;/li&gt;
&lt;li&gt;Supports genome-scale or perturbation assays&lt;/li&gt;
&lt;li&gt;Enables downstream interpretability&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;AlphaGenome can now be adapted as easily as modern vision and language foundation models — opening the door to rapid regulatory genomics experimentation. So if you think AlphaGenome could be useful if applied to your research, take a look at our package!&lt;/p&gt;

 &lt;blockquote&gt;
 &lt;p&gt;alphagenome-ft brings foundation-model-style transfer learning workflows to regulatory genomics.&lt;/p&gt;

 &lt;/blockquote&gt;
&lt;hr&gt;
&lt;h2 id="compute-requirements"&gt;Compute requirements
&lt;/h2&gt;&lt;p&gt;A full analysis of the runtime requirements for fine-tuning AlphaGenome is &lt;a class="link" href="https://genomicsxai.github.io/blogs/2026-005/" target="_blank" rel="noopener"
 &gt;highlighted in separate blog post&lt;/a&gt; but in short:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Fine-tuning with a frozen backbone, i.e. head-only fine-tuning will fit on a single, middle of the range GPU, maxing out at around 14-27 GB vram (batch size 1).&lt;/li&gt;
&lt;li&gt;For full Fine-tuning, you will need at least 76.1 GB vram, so a H100/H200 GPU.&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;h2 id="code-and-tutorials"&gt;Code and tutorials
&lt;/h2&gt;&lt;ul&gt;
&lt;li&gt;&lt;a class="link" href="https://github.com/genomicsxai/alphagenome_ft" target="_blank" rel="noopener"
 &gt;Source code &amp;amp; utilities&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Colab notebooks: &lt;a class="link" href="https://colab.research.google.com/github/genomicsxai/alphagenome_ft/blob/main/notebooks/finetune_encoder_only_mpra.ipynb" target="_blank" rel="noopener"
 &gt;Encoder Fine-tuning (MPRA)&lt;/a&gt; | &lt;a class="link" href="https://colab.research.google.com/github/genomicsxai/alphagenome_ft/blob/main/notebooks/finetune_rna_head_only.ipynb" target="_blank" rel="noopener"
 &gt;Heads-only Fine-tuning&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class="link" href="https://genomicsxai.github.io/blogs/2026-005/" target="_blank" rel="noopener"
 &gt;Benchmarking AlphaGenome on NVIDIA GPUs: latency, memory, and feasibility across sequence lengths&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;h2 id="tldr"&gt;TL;DR
&lt;/h2&gt;&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;AlphaGenome&lt;/strong&gt; is a powerful sequence-to-function foundation model, but adapting it natively in JAX/Haiku can be cumbersome.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;alphagenome-ft&lt;/strong&gt; provides a lightweight wrapper for:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;adding custom or predefined prediction heads&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;freezing and unfreezing specific modules&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;parameter-efficient fine-tuning (e.g., adapters)&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;running attribution analyses&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Most workflows can start by training a task-specific head with the backbone frozen, then progressively unfreezing if needed.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;This enables rapid adaptation to new assays while preserving pretrained regulatory knowledge.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;If you want to fine-tune AlphaGenome without modifying its core codebase, alphagenome-ft is designed to make that process modular, efficient, and reproducible!&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="references"&gt;References
&lt;/h2&gt;&lt;ol&gt;
&lt;li&gt;Avsec, Ž. et al. Advancing regulatory variant effect prediction with AlphaGenome., 649, Nature (2026).&lt;/li&gt;
&lt;li&gt;Alan Murphy, Peter Koo. &amp;ldquo;Adapting AlphaGenome to MPRA data.&amp;rdquo; Genomics × AI Blog, 20 February 2026. &lt;a class="link" href="https://genomicsxai.github.io/blogs/2026-002/" target="_blank" rel="noopener"
 &gt;https://genomicsxai.github.io/blogs/2026-002/&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Hu, E. J. et al. Lora: Low-rank adaptation of large language models (2021), &lt;a class="link" href="https://arxiv.org/abs/2106.09685" target="_blank" rel="noopener"
 &gt;https://arxiv.org/abs/2106.09685&lt;/a&gt;. 2106.09685.&lt;/li&gt;
&lt;li&gt;de Almeida, B. P., Reiter, F., Pagani, M. &amp;amp; Stark, A. Deepstarr predicts enhancer activity from dna sequence and enables the de novo design of synthetic enhancers., 54, Nat. genetics (2022).&lt;/li&gt;
&lt;/ol&gt;</description></item><item><title>Adapting AlphaGenome to MPRA data</title><link>https://genomicsxai.github.io/blogs/2026-002/</link><pubDate>Fri, 20 Feb 2026 00:00:00 +0000</pubDate><guid>https://genomicsxai.github.io/blogs/2026-002/</guid><description>&lt;img src="https://genomicsxai.github.io/" alt="Featured image of post Adapting AlphaGenome to MPRA data" /&gt;&lt;aside class="summary-box"&gt;
 &lt;h2 class="summary-box__title"&gt;Summary&lt;/h2&gt;
 &lt;div class="summary-box__body"&gt;
 &lt;p&gt;This post provides a high-level overview of how to use the &lt;a class="link" href="https://www.nature.com/articles/s41586-025-10014-0" target="_blank" rel="noopener"
 &gt;AlphaGenome&lt;/a&gt; and &lt;a class="link" href="https://www.nature.com/articles/s41592-021-01252-x" target="_blank" rel="noopener"
 &gt;Enformer&lt;/a&gt; repositories to extract modular convolutional encoders for short sequences — including links to the GitHub repositories — and summarises the results we achieved on perturbation assays.&lt;/p&gt;
&lt;p&gt;Foundation sequence-to-function models like AlphaGenome and Enformer are trained on ~1 Mb genomic windows to predict thousands of regulatory tracks. We show that their most transferable component is the convolutional encoder that learns local cis-regulatory grammar.&lt;/p&gt;
&lt;p&gt;By extracting this encoder from the long-range transformer and decoder modules, we:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;achieve state-of-the-art performance on &lt;a class="link" href="https://www.nature.com/articles/s41586-024-08430-9" target="_blank" rel="noopener"
 &gt;lentiMPRA&lt;/a&gt;, &lt;a class="link" href="https://www.nature.com/articles/s41588-022-01048-5" target="_blank" rel="noopener"
 &gt;STARR-seq&lt;/a&gt;, and &lt;a class="link" href="http://www.genomeinterpretation.org/cagi5-regulation-saturation.html" target="_blank" rel="noopener"
 &gt;CAGI5&lt;/a&gt; benchmarks&lt;/li&gt;
&lt;li&gt;reduce inference cost by ~500×&lt;/li&gt;
&lt;li&gt;generalise across assays, species, and architectures&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This reframes foundation genomics models as modular regulatory representation engines, reusable for short perturbation sequences (100–300 bp) and regulatory design workflows.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Code:&lt;/strong&gt;
&lt;a class="link" href="https://github.com/genomicsxai/alphagenome_ft" target="_blank" rel="noopener"
 &gt;AlphaGenome fine-tuning utilities&lt;/a&gt; |
&lt;a class="link" href="https://github.com/Al-Murphy/alphagenome_FT_MPRA" target="_blank" rel="noopener"
 &gt;Full analysis and experiments&lt;/a&gt;&lt;/p&gt;

 &lt;/div&gt;
&lt;/aside&gt;

&lt;hr&gt;
&lt;h2 id="motivation"&gt;Motivation
&lt;/h2&gt;&lt;p&gt;Foundation-scale sequence-to-function models have rapidly advanced regulatory genomics. Architectures like &lt;a class="link" href="https://www.nature.com/articles/s41586-025-10014-0" target="_blank" rel="noopener"
 &gt;AlphaGenome&lt;/a&gt; and &lt;a class="link" href="https://www.nature.com/articles/s41592-021-01252-x" target="_blank" rel="noopener"
 &gt;Enformer&lt;/a&gt; predict thousands of regulatory tracks across large genomic contexts and achieve impressive genome-wide accuracy (hence the term generalists).&lt;/p&gt;

 &lt;blockquote&gt;
 &lt;p&gt;&lt;em&gt;Side note:&lt;/em&gt; sequence-to-function (seq2func) models learn a direct mapping from DNA sequence to one or more experimentally measured molecular readouts from assays such as chromatin accessibility, transcription factor binding, or gene expression.&lt;/p&gt;

 &lt;/blockquote&gt;
&lt;p&gt;These models also just continue to increase in their number of parameters, receptive fields and number of tasks they predict - if you&amp;rsquo;re skeptical just look at a selection of these recent models:&lt;/p&gt;
&lt;figure&gt;&lt;a href="https://genomicsxai.github.io/blogs/2026-002/generalists_genomic_ai_recep_field_tasks_params_bp_res_tasks.png" class="image-link" data-pswp-width="8067" data-pswp-height="5031"&gt;
		&lt;img src="https://genomicsxai.github.io/blogs/2026-002/generalists_genomic_ai_recep_field_tasks_params_bp_res_tasks.png" width="600px" height="374"loading="lazy"
			alt="The Landscape of seq2func models by genomic receptive field and task breadth"
			title="The Landscape of seq2func models by genomic receptive field and task breadth. Shown is the number of prediction tasks versus the input receptive field for representative generalist seq2func models. Marker size is proportional to the reported parameter count. A red marker edge indicates models that produce base-pair–aligned predictions." data-title-escaped="The Landscape of seq2func models by genomic receptive field and task breadth. Shown is the number of prediction tasks versus the input receptive field for representative generalist seq2func models. Marker size is proportional to the reported parameter count. A red marker edge indicates models that produce base-pair–aligned predictions."&gt;
		&lt;/a&gt;&lt;/figure&gt;&lt;p&gt;But many real experimental workflows don’t look like the genome. Perturbation assays — including MPRAs, enhancer design screens, and synthetic element optimisation — evaluate short (~100–300 bp) sequences outside their native context. Applying these now megabase-scale predictors to such data introduces unnecessary padding, compute overhead, and arbitrary flanking sequence assumptions which are just unsatisfactory!&lt;/p&gt;
&lt;p&gt;We asked a simple question:&lt;/p&gt;

 &lt;blockquote&gt;
 &lt;p&gt;What if we treated these models as reusable regulatory feature extractors instead of end-to-end predictors?&lt;/p&gt;

 &lt;/blockquote&gt;
&lt;hr&gt;
&lt;h2 id="the-key-idea-modular-regulatory-encoders"&gt;The key idea: modular regulatory encoders
&lt;/h2&gt;&lt;p&gt;Modern seq2func models like AlphaGenome can be decomposed into three functional components:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;Sequence encoder - learns motifs, spacing rules, and local regulatory syntax (e.g. convolutions and pooling)&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Long-range context module - (e.g. transformers) models distal regulatory dependencies&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Task decoder - predicts assay-specific outputs&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;For short perturbation sequences assayed in isolation — such as MPRA constructs that test &lt;em&gt;cis&lt;/em&gt;-regulatory activity outside their native chromosomal context — long-range genomic interactions are largely absent, so distal context modeling is often unnecessary. The encoder, however, contains rich regulatory representations learned from genome-scale supervision. We extract and reuse this encoder - see the image below:&lt;/p&gt;
&lt;p&gt;&lt;figure&gt;&lt;a href="https://genomicsxai.github.io/blogs/2026-002/modular_generalists_manuscript.png" class="image-link" data-pswp-width="8855" data-pswp-height="3125"&gt;
		&lt;img src="https://genomicsxai.github.io/blogs/2026-002/modular_generalists_manuscript.png" width="1000px" height="352"loading="lazy"
			alt="Generalist seq2func models as modular regulatory encoders"
			title="Generalist seq2func models as modular regulatory encoders. Left, AlphaGenome&amp;#39;s U-Net architecture with encoder, long-range context integration (transformer), and decoder modules. Right, proposed modular view in which the pretrained encoder is extracted as a reusable cis-regulatory representation module and fine-tuned on short, variable-length perturbation sequences such as MPRA constructs, while the transformer and decoder remain in the full stack for tasks requiring long-range context." data-title-escaped="Generalist seq2func models as modular regulatory encoders. Left, AlphaGenome&amp;amp;#39;s U-Net architecture with encoder, long-range context integration (transformer), and decoder modules. Right, proposed modular view in which the pretrained encoder is extracted as a reusable cis-regulatory representation module and fine-tuned on short, variable-length perturbation sequences such as MPRA constructs, while the transformer and decoder remain in the full stack for tasks requiring long-range context."&gt;
		&lt;/a&gt;&lt;/figure&gt;.&lt;/p&gt;

 &lt;blockquote&gt;
 &lt;p&gt;&lt;strong&gt;Encoder intuition&lt;/strong&gt; - In these models, the encoder progressively downsamples the input sequence through convolution and pooling operations, similar to how image CNNs compress spatial resolution while increasing feature richness. As a result, the encoder outputs a sequence of embeddings where each position summarises regulatory features over a window of roughly ~128 bp rather than single nucleotides. This resolution is sufficient to capture motif combinations and local regulatory syntax while keeping representations compact and computationally efficient.&lt;/p&gt;

 &lt;/blockquote&gt;
&lt;p&gt;Although AlphaGenome was trained on ~1 megabase genomic windows, we show that its convolutional encoder can be repurposed for much shorter sequences. This reflects a division of labor within the architecture: the encoder captures local regulatory grammar, while the transformer and decoder handle long-range integration and base-resolution track prediction. By isolating the encoder, we retain the reusable representation module while discarding machinery designed for distal genomic context — precisely the setting of MPRA assays and other tasks centered on local regulatory activity, such as chromatin accessibility prediction.&lt;/p&gt;
&lt;h3 id="what-we-do"&gt;What we do:
&lt;/h3&gt;&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;isolate the convolutional encoder&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;adapt positional handling for short inputs&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;pool encoder embeddings&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;attach a lightweight regression head&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;optionally fine-tune or keep encoder frozen&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This allows direct training on short sequences while preserving pretrained regulatory features! We applied this to AlphaGenome and Enformer (the later to highlight the generalisation of the approach).&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="why-this-helps"&gt;Why this helps
&lt;/h2&gt;&lt;h3 id="practical-advantages"&gt;Practical advantages:
&lt;/h3&gt;&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;supports variable-length inputs&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;removes megabase padding overhead&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;standardises comparisons across architectures&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;dramatically reduces inference cost — in our testing it was 500 fold quicker to run the encoder model than full AlphaGenome&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="conceptual-advantage"&gt;Conceptual advantage:
&lt;/h3&gt;&lt;ul&gt;
&lt;li&gt;separates regulatory representation learning from task-specific prediction&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;h2 id="performance-on-mpra-and-starr-seq"&gt;Performance on MPRA and STARR-seq
&lt;/h2&gt;&lt;p&gt;Before I get into the how of doing this, let me convince you that it&amp;rsquo;s worthwhile — we evaluated modular encoders on:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;a class="link" href="https://www.nature.com/articles/s41586-024-08430-9" target="_blank" rel="noopener"
 &gt;lentiMPRA&lt;/a&gt; constructs (HepG2, K562, WTC11)&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a class="link" href="https://www.nature.com/articles/s41588-022-01048-5" target="_blank" rel="noopener"
 &gt;STARR-seq&lt;/a&gt; enhancer activity in Drosophila&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Results:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;achieved state-of-the-art accuracy on both tasks (subplots a-b below)&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;AlphaGenome encoder probing remained strong across species&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Enformer benefited more from fine-tuning — perhaps its encoder learned less cis-regulatory logic&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;AlphaGenome required minimal adaptation as pretrained encoder already captures transferable signal&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;

 &lt;blockquote&gt;
 &lt;p&gt;&lt;em&gt;Side note:&lt;/em&gt; probing means the AlphaGenome encoder is frozen and only the added head is updated whereas fine-tuning means everything is updated (encoder and head).&lt;/p&gt;

 &lt;/blockquote&gt;
&lt;p&gt;This supports the idea that genome-scale training learns reusable regulatory structure. The performance results:&lt;/p&gt;
&lt;figure&gt;&lt;a href="https://genomicsxai.github.io/blogs/2026-002/lenti_starr_res.png" class="image-link" data-pswp-width="6303" data-pswp-height="2761"&gt;
		&lt;img src="https://genomicsxai.github.io/blogs/2026-002/lenti_starr_res.png" width="900px" height="394"loading="lazy"
			alt="Benchmark on lentiMPRA and STARR-seq"
			title="Benchmark on lentiMPRA and STARR-seq. Test-set Pearson correlation for (left) lentiMPRA and (right) STARR-seq. We compared against best-in-class models [MPRALegNet](https://www.nature.com/articles/s41586-024-08430-9), [DeepSTARR](https://www.nature.com/articles/s41588-022-01048-5), [DREAM-RNN](https://www.nature.com/articles/s41587-024-02414-w), and AlphaGenome (AG). We applied encoder extraction and fine-tuning to Enformer (Enf. MPRA) and AlphaGenome (AG MPRA), evaluated with probing (head-only) or encoder fine-tuning." data-title-escaped="Benchmark on lentiMPRA and STARR-seq. Test-set Pearson correlation for (left) lentiMPRA and (right) STARR-seq. We compared against best-in-class models [MPRALegNet](https://www.nature.com/articles/s41586-024-08430-9), [DeepSTARR](https://www.nature.com/articles/s41588-022-01048-5), [DREAM-RNN](https://www.nature.com/articles/s41587-024-02414-w), and AlphaGenome (AG). We applied encoder extraction and fine-tuning to Enformer (Enf. MPRA) and AlphaGenome (AG MPRA), evaluated with probing (head-only) or encoder fine-tuning."&gt;
		&lt;/a&gt;&lt;/figure&gt;&lt;hr&gt;
&lt;h2 id="what-matters-when-adapting-encoders"&gt;What matters when adapting encoders?
&lt;/h2&gt;&lt;p&gt;So in an attempt to understand the loss landscape as much as possible, we did a hyperparameter sweep which revealed the:&lt;/p&gt;
&lt;h3 id="most-important-choices"&gt;Most important choices
&lt;/h3&gt;&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;deeper MLP heads&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;flattening encoder embeddings&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="less-important-choices"&gt;Less important choices
&lt;/h3&gt;&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;optimiser choice&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;weight decay&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;learning rate schedule&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Progressive unfreezing also provided modest gains, with a benefit from earlier encoder updates. The results of this sweep is at the end of the post. Note we used the sweep as a starting point for an iterative greedy search over hyperparameters to get the local optimal for each lentiMPRA cell line.&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="transfer-to-regulatory-variant-prediction-cagi5"&gt;Transfer to regulatory variant prediction (CAGI5)
&lt;/h2&gt;&lt;p&gt;We next evaluated all models on the &lt;a class="link" href="http://www.genomeinterpretation.org/cagi5-regulation-saturation.html" target="_blank" rel="noopener"
 &gt;CAGI5 benchmark&lt;/a&gt; which provides experimentally measured effects of thousands of regulatory variants, making it a standard test for evaluating how well models predict functional impacts beyond the training assay.&lt;/p&gt;
&lt;p&gt;Key findings&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;MPRA fine-tuning improved performance (using matched cell types with lentiMPRA models)&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;frozen encoder probing generalised better out-of-distribution&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;task-specific fine-tuning can introduce assay bias — full fine-tuning rather than probing led to the models overfitting on the lentiMPRA data and thus worse performance on the CAGI5 data.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;A smaller aggregation window improved pretrained AlphaGenome&amp;rsquo;s performance (more on this below).&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This may highlight a trade-off of specialisation vs generalisation, or with better regularisation maybe this could be controlled even with the larger number of free parameters. The results:&lt;/p&gt;
&lt;figure&gt;&lt;a href="https://genomicsxai.github.io/blogs/2026-002/cagi5_augmentation_comparison.png" class="image-link" data-pswp-width="17962" data-pswp-height="5721"&gt;
		&lt;img src="https://genomicsxai.github.io/blogs/2026-002/cagi5_augmentation_comparison.png" width="900px" height="286"loading="lazy"
			alt="Zero-shot CAGI5 performance for HepG2 and K562 variants"
			title="Zero-shot CAGI5 performance for HepG2 and K562 variants; right, high-confidence SNP subset. Dark blue denotes a single prediction per variant whereas light blue is random shift and reverse complement augmentation. We compare against MPRALegNet and AlphaGenome (AG). We applied encoder extraction and fine-tuning to Enformer (Enf. MPRA) and AlphaGenome (AG MPRA), evaluated with probing (head-only) or encoder fine-tuning." data-title-escaped="Zero-shot CAGI5 performance for HepG2 and K562 variants; right, high-confidence SNP subset. Dark blue denotes a single prediction per variant whereas light blue is random shift and reverse complement augmentation. We compare against MPRALegNet and AlphaGenome (AG). We applied encoder extraction and fine-tuning to Enformer (Enf. MPRA) and AlphaGenome (AG MPRA), evaluated with probing (head-only) or encoder fine-tuning."&gt;
		&lt;/a&gt;&lt;/figure&gt;&lt;hr&gt;
&lt;h3 id="a-technical-note-improving-base-alphagenomes-performance-on-cagi5"&gt;A Technical Note: Improving Base AlphaGenome&amp;rsquo;s Performance on CAGI5
&lt;/h3&gt;&lt;p&gt;When we tested against the AlphaGenome model before any fine-tuning on MPRA data, we noticed something interesting.&lt;/p&gt;
&lt;p&gt;Aligning the aggregated window size of the chromatin accessibility (DNase HepG2 and K562 tracks) to match the size of the MPRA assay (central 384 base-pairs) improved zero-shot prediction relative to AlphaGenome’s original protocol (central 501bp) by 25%!&lt;/p&gt;
&lt;figure&gt;&lt;a href="https://genomicsxai.github.io/blogs/2026-002/cagi5_central_mask_comparison.png" class="image-link" data-pswp-width="13442" data-pswp-height="5721"&gt;
		&lt;img src="https://genomicsxai.github.io/blogs/2026-002/cagi5_central_mask_comparison.png" width="700px" height="297"loading="lazy"
			alt="Central aggregation approach AlphaGenome"
			title="Differing AlphaGenome&amp;#39;s mask size for CAGI5 benchmark on HepG2 and K562 variants; right, high-confidence SNP subset. Pretrained AlphaGenome performance when using our approach of aggregating the central 384 base-pairs for DNase HepG2 and K562 tracks versus the protocol outlined in AlphaGenome&amp;#39;s original publication (central 501 base-pairs). The smaller window led to much improved performance but still below that after fine-tuning on MPRA data (our approach). Performance is measured as Pearson correlation between predicted and observed activity." data-title-escaped="Differing AlphaGenome&amp;amp;#39;s mask size for CAGI5 benchmark on HepG2 and K562 variants; right, high-confidence SNP subset. Pretrained AlphaGenome performance when using our approach of aggregating the central 384 base-pairs for DNase HepG2 and K562 tracks versus the protocol outlined in AlphaGenome&amp;amp;#39;s original publication (central 501 base-pairs). The smaller window led to much improved performance but still below that after fine-tuning on MPRA data (our approach). Performance is measured as Pearson correlation between predicted and observed activity."&gt;
		&lt;/a&gt;&lt;/figure&gt;&lt;p&gt;So I would advise testing differing aggregation windows if you are using AlphaGenome in this manner. Or, just use our extracted encoder approach which boosted performance by another 10%!&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="what-transfers--and-why"&gt;What transfers — and why?
&lt;/h2&gt;&lt;p&gt;So we should probably now take a step back, what are our results showing?&lt;/p&gt;
&lt;p&gt;They highlight that encoder representations learned under genome-scale multitask supervision retain regulatory signal that transfers across:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;assays&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;perturbation regimes&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;species (STARR-seq data was in fly, AlphaGenome was trained on human and mouse — this is pretty cool!)&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This transfer was observed across distinct architectures (AlphaGenome and Enformer), suggesting that &lt;strong&gt;the modular encoder perspective is broadly applicable&lt;/strong&gt;.&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="implications-for-regulatory-design-workflows"&gt;Implications for regulatory design workflows
&lt;/h2&gt;&lt;p&gt;Now to the so what? Well, encoder-only predictors have numerous advantages over their generalist parents, they enable:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;rapid scoring of candidate constructs&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;iterative design → score → optimise loops&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;compute-efficient large-scale screening&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Seq2func foundation models can therefore function as reusable regulatory representation engines inside perturbation pipelines — think of synthetic biology DNA design, where these models could help accelerate synthetic enhancer and promoter development (see &lt;a class="link" href="https://pubmed.ncbi.nlm.nih.gov/39322281/" target="_blank" rel="noopener"
 &gt;this work&lt;/a&gt; for example).&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="open-questions"&gt;Open questions
&lt;/h2&gt;&lt;p&gt;So what didn&amp;rsquo;t we explore here:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Which encoder layers contribute most to transfer?&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;How stable are representations across assays and species?&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Can modular encoders accelerate generative regulatory design?&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;All of these would be really interesting future directions.&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="takeaway--the-tldr"&gt;Takeaway — the TL;DR
&lt;/h2&gt;&lt;p&gt;Foundation seq2func models are typically used as monolithic predictors.&lt;/p&gt;
&lt;p&gt;A modular view reveals something more useful:&lt;/p&gt;

 &lt;blockquote&gt;
 &lt;p&gt;Their encoders are transferable regulatory representation modules.&lt;/p&gt;

 &lt;/blockquote&gt;
&lt;p&gt;Extracting and adapting these representations enables efficient perturbation modeling, fair cross-model comparison, and scalable regulatory design workflows.&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="code"&gt;Code
&lt;/h2&gt;&lt;p&gt;Finally, how can you use this approach:&lt;/p&gt;
&lt;p&gt;This analysis uses the native jax/haiku AlphaGenome wrapper package which is available from the &lt;a class="link" href="https://github.com/genomicsxai/alphagenome_ft" target="_blank" rel="noopener"
 &gt;Genomics x AI community github&lt;/a&gt; (see &lt;a class="link" href="https://genomicsxai.github.io/blogs/2026-003/" target="_blank" rel="noopener"
 &gt;our post on this&lt;/a&gt;) and all code to run the analysis is &lt;a class="link" href="https://github.com/Al-Murphy/alphagenome_FT_MPRA" target="_blank" rel="noopener"
 &gt;here&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;But here is a minimum script or if you would prefer to run it yourself on lentiMPRA data, see our &lt;a class="link" href="https://colab.research.google.com/github/genomicsxai/alphagenome_ft/blob/main/notebooks/finetune_encoder_only_mpra.ipynb" target="_blank" rel="noopener"
 &gt;colab notebook&lt;/a&gt;:&lt;/p&gt;
&lt;h3 id="tutorial"&gt;Tutorial
&lt;/h3&gt;&lt;h3 id="1-model-initialisation"&gt;1. Model initialisation
&lt;/h3&gt;&lt;div class="code-block" data-lang="python"&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-python" data-lang="python"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="nn"&gt;alphagenome.models&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;dna_output&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="nn"&gt;alphagenome_ft&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="n"&gt;templates&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="n"&gt;CustomHeadConfig&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="n"&gt;CustomHeadType&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="n"&gt;register_custom_head&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="n"&gt;create_model_with_heads&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="c1"&gt;# 1. Register an encoder-only head&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="n"&gt;register_custom_head&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="s2"&gt;&amp;#34;mpra_head&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="n"&gt;templates&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;EncoderOnlyHead&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="n"&gt;CustomHeadConfig&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="nb"&gt;type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;CustomHeadType&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;GENOME_TRACKS&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="n"&gt;output_type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;dna_output&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;OutputType&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;RNA_SEQ&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="n"&gt;num_tracks&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="p"&gt;),&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="c1"&gt;# 2. Create a model that uses encoder output only&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;create_model_with_heads&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="s2"&gt;&amp;#34;all_folds&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="n"&gt;heads&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;#34;mpra_head&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="n"&gt;use_encoder_output&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="kc"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="c1"&gt;# ← CRITICAL for encoder-only mode&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="c1"&gt;# 3. Optionally freeze backbone to start with heads-only finetuning&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;freeze_except_head&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;#34;mpra_head&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="c1"&gt;#Now ready to train!&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;Key points:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;use_encoder_output=True&lt;/code&gt; bypasses the transformer/decoder stack and exposes encoder features at ~128 bp resolution&lt;/li&gt;
&lt;li&gt;&lt;code&gt;templates.EncoderOnlyHead&lt;/code&gt; applies a simple MLP on top of these embeddings&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="2-training-loop"&gt;2. Training Loop
&lt;/h3&gt;&lt;p&gt;For MPRA-like data, you will typically have &lt;strong&gt;short sequences and scalar or low-dimensional outputs&lt;/strong&gt; (e.g. log expression).&lt;/p&gt;
&lt;p&gt;You can either:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Use your own data loader and a custom training loop with &lt;code&gt;model.create_loss_fn_for_head&lt;/code&gt;, or&lt;/li&gt;
&lt;li&gt;Follow the more complete MPRA scripts in the external repository.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Minimal example with a custom loop:&lt;/p&gt;
&lt;div class="code-block" data-lang="python"&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-python" data-lang="python"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;jax&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;jax.numpy&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nn"&gt;jnp&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;optax&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="nn"&gt;alphagenome_ft&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;CustomHead&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="nn"&gt;alphagenome_ft&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;create_optimizer&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="c1"&gt;# Suppose you have: sequences_onehot: (B, L, 4), targets: (B, 1)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="n"&gt;loss_fn&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;create_loss_fn_for_head&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;#34;mpra_head&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="n"&gt;optimizer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;create_optimizer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_params&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="n"&gt;trainable_head_names&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;#34;mpra_head&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;,),&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="n"&gt;learning_rate&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;1e-3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="n"&gt;weight_decay&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;1e-4&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="n"&gt;heads_only&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="kc"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="n"&gt;opt_state&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;optimizer&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;init&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_params&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;train_step&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;opt_state&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;batch_sequences&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;batch_targets&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;loss_inner&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;current_params&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="n"&gt;preds_dict&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_predict&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="n"&gt;current_params&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="n"&gt;batch_sequences&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="n"&gt;jnp&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;zeros&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="n"&gt;batch_sequences&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;shape&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;],),&lt;/span&gt; &lt;span class="n"&gt;dtype&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;jnp&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;int32&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="c1"&gt;# organism_index&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="n"&gt;negative_strand_mask&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;jnp&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;zeros&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="n"&gt;batch_sequences&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;shape&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;],),&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="n"&gt;dtype&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nb"&gt;bool&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="n"&gt;strand_reindexing&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_metadata&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="nb"&gt;next&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;iter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_metadata&lt;/span&gt;&lt;span class="p"&gt;))]&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;strand_reindexing&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="p"&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="n"&gt;preds&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;preds_dict&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;#34;mpra_head&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="n"&gt;loss_dict&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;loss_fn&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="n"&gt;preds&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="s2"&gt;&amp;#34;targets&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;batch_targets&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;&amp;#34;organism_index&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;None&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="p"&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;loss_dict&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;#34;loss&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="n"&gt;loss&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;grads&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;jax&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;value_and_grad&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;loss_inner&lt;/span&gt;&lt;span class="p"&gt;)(&lt;/span&gt;&lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="n"&gt;updates&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;new_opt_state&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;optimizer&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;update&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;grads&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;opt_state&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="n"&gt;new_params&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;optax&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;apply_updates&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;updates&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;new_params&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;new_opt_state&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;loss&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;hr&gt;
&lt;h2 id="conclusion--bridging-genome-scale-models-and-perturbation-assays"&gt;Conclusion — bridging genome-scale models and perturbation assays
&lt;/h2&gt;&lt;p&gt;Foundation sequence-to-function models are built for megabase context and genome-wide prediction. We show that their most transferable asset is much smaller: the convolutional encoder that learns &lt;em&gt;cis&lt;/em&gt;-regulatory grammar.&lt;/p&gt;
&lt;p&gt;By isolating this module, we:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;repurpose genome-scale pretrained representations for 100–300 bp perturbation sequences&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;eliminate unnecessary long-range context machinery in assays that isolate regulatory elements&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;achieve state-of-the-art MPRA, STARR-seq and CAGI5 benchmark performance&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;reduce inference cost by orders of magnitude&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;generalise the approach across architectures&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Despite being trained on ~1 Mb inputs, the AlphaGenome encoder adapts cleanly to &amp;gt;=128 bp sequences — matching the scale at which &lt;em&gt;cis&lt;/em&gt;-regulatory logic operates in perturbation assays.&lt;/p&gt;

 &lt;blockquote&gt;
 &lt;p&gt;This reframes foundation genomics models not as monolithic predictors, but as modular regulatory representation engines that can be embedded directly into perturbation, design, and variant-effect workflows.&lt;/p&gt;

 &lt;/blockquote&gt;
&lt;h3 id="code-1"&gt;Code
&lt;/h3&gt;&lt;p&gt;Implementation and reproducible experiments:
&lt;a class="link" href="https://github.com/Al-Murphy/alphagenome_FT_MPRA" target="_blank" rel="noopener"
 &gt;https://github.com/Al-Murphy/alphagenome_FT_MPRA&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;AlphaGenome encoder fine-tuning utilities:
&lt;a class="link" href="https://github.com/genomicsxai/alphagenome_ft" target="_blank" rel="noopener"
 &gt;https://github.com/genomicsxai/alphagenome_ft&lt;/a&gt;&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="hyperparameter-sweep-results"&gt;Hyperparameter sweep results
&lt;/h2&gt;&lt;h3 id="stage-1"&gt;Stage 1
&lt;/h3&gt;&lt;p&gt;Stage 1 was a hyperparameter sweep for lentiMPRA with a frozen encoder (probing regime). The performance shown is batch-averaged Pearson R (not Pearson R over the hole set so will often be lower) on the &lt;strong&gt;validation set&lt;/strong&gt;. Note that the optimal hyperparameters were used as a starting point for a cell type-specific iterative greedy search.&lt;/p&gt;
&lt;p&gt;We varied the prediction head architecture and training hyperparameters while keeping encoder weights fixed. Note no reverse complement or random shift augementations were used for this benchmark. mlp-X-Y denotes a two-layer multilayer perceptron head with hidden dimensions X and Y; mlp-X denotes a single hidden layer of size X; pool-flatten uses global pooling followed by flattening; pool-center extracts the central token representation; do-p indicates dropout rate p applied to the head; wd-1eK indicates weight decay of $10^{-K}$; lr-plateau and lr-cosine denote ReduceLROnPlateau and cosine annealing learning rate schedules, respectively; opt-adamw indicates the AdamW optimiser; act-gelu replaces the default activation with GELU. Baseline used a single multilayer perceptron head of size 1024 with sum pooling, Adam optimiser and RELU activation, and no dropout, weight decay or learning rate plateau. Performance is reported as Pearson correlation on the held-out test fold for HepG2, K562, and WTC11, with average performance and rank across cell types.&lt;/p&gt;
&lt;table&gt;
 &lt;thead&gt;
 &lt;tr&gt;
 &lt;th&gt;Hyperparameter&lt;/th&gt;
 &lt;th&gt;HepG2&lt;/th&gt;
 &lt;th&gt;K562&lt;/th&gt;
 &lt;th&gt;WTC11&lt;/th&gt;
 &lt;th&gt;Average&lt;/th&gt;
 &lt;th&gt;Rank&lt;/th&gt;
 &lt;/tr&gt;
 &lt;/thead&gt;
 &lt;tbody&gt;
 &lt;tr&gt;
 &lt;td&gt;pool-flatten&lt;/td&gt;
 &lt;td&gt;&lt;strong&gt;0.8536&lt;/strong&gt;&lt;/td&gt;
 &lt;td&gt;&lt;strong&gt;0.8253&lt;/strong&gt;&lt;/td&gt;
 &lt;td&gt;&lt;strong&gt;0.7727&lt;/strong&gt;&lt;/td&gt;
 &lt;td&gt;&lt;strong&gt;0.8172&lt;/strong&gt;&lt;/td&gt;
 &lt;td&gt;1&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;nl-512-256&lt;/td&gt;
 &lt;td&gt;0.8495&lt;/td&gt;
 &lt;td&gt;0.8239&lt;/td&gt;
 &lt;td&gt;0.7698&lt;/td&gt;
 &lt;td&gt;0.8144&lt;/td&gt;
 &lt;td&gt;2&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;nl-256-256&lt;/td&gt;
 &lt;td&gt;0.8501&lt;/td&gt;
 &lt;td&gt;0.8216&lt;/td&gt;
 &lt;td&gt;0.7697&lt;/td&gt;
 &lt;td&gt;0.8138&lt;/td&gt;
 &lt;td&gt;3&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;nl-512-512&lt;/td&gt;
 &lt;td&gt;0.8482&lt;/td&gt;
 &lt;td&gt;0.8234&lt;/td&gt;
 &lt;td&gt;0.7694&lt;/td&gt;
 &lt;td&gt;0.8137&lt;/td&gt;
 &lt;td&gt;4&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;nl-128&lt;/td&gt;
 &lt;td&gt;0.8498&lt;/td&gt;
 &lt;td&gt;0.8209&lt;/td&gt;
 &lt;td&gt;0.7676&lt;/td&gt;
 &lt;td&gt;0.8128&lt;/td&gt;
 &lt;td&gt;5&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;pool-center&lt;/td&gt;
 &lt;td&gt;0.8476&lt;/td&gt;
 &lt;td&gt;0.8205&lt;/td&gt;
 &lt;td&gt;0.7666&lt;/td&gt;
 &lt;td&gt;0.8116&lt;/td&gt;
 &lt;td&gt;6&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;do-0.5&lt;/td&gt;
 &lt;td&gt;0.8482&lt;/td&gt;
 &lt;td&gt;0.8194&lt;/td&gt;
 &lt;td&gt;0.7670&lt;/td&gt;
 &lt;td&gt;0.8115&lt;/td&gt;
 &lt;td&gt;7&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;nl-256&lt;/td&gt;
 &lt;td&gt;0.8479&lt;/td&gt;
 &lt;td&gt;0.8180&lt;/td&gt;
 &lt;td&gt;0.7645&lt;/td&gt;
 &lt;td&gt;0.8101&lt;/td&gt;
 &lt;td&gt;8&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;do-0.1&lt;/td&gt;
 &lt;td&gt;0.8477&lt;/td&gt;
 &lt;td&gt;0.8190&lt;/td&gt;
 &lt;td&gt;0.7636&lt;/td&gt;
 &lt;td&gt;0.8101&lt;/td&gt;
 &lt;td&gt;9&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;nl-2048&lt;/td&gt;
 &lt;td&gt;0.8467&lt;/td&gt;
 &lt;td&gt;0.8159&lt;/td&gt;
 &lt;td&gt;0.7674&lt;/td&gt;
 &lt;td&gt;0.8100&lt;/td&gt;
 &lt;td&gt;10&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;do-0.4&lt;/td&gt;
 &lt;td&gt;0.8470&lt;/td&gt;
 &lt;td&gt;0.8179&lt;/td&gt;
 &lt;td&gt;0.7641&lt;/td&gt;
 &lt;td&gt;0.8097&lt;/td&gt;
 &lt;td&gt;11&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;wd-1e6&lt;/td&gt;
 &lt;td&gt;0.8466&lt;/td&gt;
 &lt;td&gt;0.8152&lt;/td&gt;
 &lt;td&gt;0.7670&lt;/td&gt;
 &lt;td&gt;0.8096&lt;/td&gt;
 &lt;td&gt;12&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;nl-512&lt;/td&gt;
 &lt;td&gt;0.8452&lt;/td&gt;
 &lt;td&gt;0.8169&lt;/td&gt;
 &lt;td&gt;0.7661&lt;/td&gt;
 &lt;td&gt;0.8094&lt;/td&gt;
 &lt;td&gt;13&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;do-0.2&lt;/td&gt;
 &lt;td&gt;0.8471&lt;/td&gt;
 &lt;td&gt;0.8166&lt;/td&gt;
 &lt;td&gt;0.7644&lt;/td&gt;
 &lt;td&gt;0.8094&lt;/td&gt;
 &lt;td&gt;14&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;do-0.3&lt;/td&gt;
 &lt;td&gt;0.8458&lt;/td&gt;
 &lt;td&gt;0.8172&lt;/td&gt;
 &lt;td&gt;0.7637&lt;/td&gt;
 &lt;td&gt;0.8089&lt;/td&gt;
 &lt;td&gt;15&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;wd-1e4&lt;/td&gt;
 &lt;td&gt;0.8459&lt;/td&gt;
 &lt;td&gt;0.8145&lt;/td&gt;
 &lt;td&gt;0.7647&lt;/td&gt;
 &lt;td&gt;0.8084&lt;/td&gt;
 &lt;td&gt;16&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;&amp;mdash;&amp;mdash;&amp;mdash;&amp;mdash;&amp;mdash;&lt;/td&gt;
 &lt;td&gt;&amp;mdash;&amp;mdash;&amp;mdash;-&lt;/td&gt;
 &lt;td&gt;&amp;mdash;&amp;mdash;&amp;mdash;-&lt;/td&gt;
 &lt;td&gt;&amp;mdash;&amp;mdash;&amp;mdash;&lt;/td&gt;
 &lt;td&gt;&amp;mdash;&amp;mdash;&amp;mdash;-&lt;/td&gt;
 &lt;td&gt;&amp;mdash;-&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;baseline-default&lt;/td&gt;
 &lt;td&gt;0.8458&lt;/td&gt;
 &lt;td&gt;0.8150&lt;/td&gt;
 &lt;td&gt;0.7639&lt;/td&gt;
 &lt;td&gt;0.8082&lt;/td&gt;
 &lt;td&gt;17&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;&amp;mdash;&amp;mdash;&amp;mdash;&amp;mdash;&amp;mdash;&lt;/td&gt;
 &lt;td&gt;&amp;mdash;&amp;mdash;&amp;mdash;-&lt;/td&gt;
 &lt;td&gt;&amp;mdash;&amp;mdash;&amp;mdash;-&lt;/td&gt;
 &lt;td&gt;&amp;mdash;&amp;mdash;&amp;mdash;&lt;/td&gt;
 &lt;td&gt;&amp;mdash;&amp;mdash;&amp;mdash;-&lt;/td&gt;
 &lt;td&gt;&amp;mdash;-&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;opt-adamw&lt;/td&gt;
 &lt;td&gt;0.8458&lt;/td&gt;
 &lt;td&gt;0.8150&lt;/td&gt;
 &lt;td&gt;0.7635&lt;/td&gt;
 &lt;td&gt;0.8081&lt;/td&gt;
 &lt;td&gt;19&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;nl-1024&lt;/td&gt;
 &lt;td&gt;0.8458&lt;/td&gt;
 &lt;td&gt;0.8150&lt;/td&gt;
 &lt;td&gt;0.7635&lt;/td&gt;
 &lt;td&gt;0.8081&lt;/td&gt;
 &lt;td&gt;19&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;wd-1e5&lt;/td&gt;
 &lt;td&gt;0.8459&lt;/td&gt;
 &lt;td&gt;0.8152&lt;/td&gt;
 &lt;td&gt;0.7632&lt;/td&gt;
 &lt;td&gt;0.8081&lt;/td&gt;
 &lt;td&gt;19&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;act-gelu&lt;/td&gt;
 &lt;td&gt;0.8431&lt;/td&gt;
 &lt;td&gt;0.8168&lt;/td&gt;
 &lt;td&gt;0.7576&lt;/td&gt;
 &lt;td&gt;0.8058&lt;/td&gt;
 &lt;td&gt;21&lt;/td&gt;
 &lt;/tr&gt;
 &lt;/tbody&gt;
&lt;/table&gt;
&lt;h3 id="stage-2"&gt;Stage 2
&lt;/h3&gt;&lt;p&gt;Stage 2 was a hyperparameter sweep for lentiMPRA with encoder unfreezing (fine-tuning regime). The performance shown is batch-averaged Pearson R (not Pearson R over the hole set so will often be lower) on the &lt;strong&gt;validation set&lt;/strong&gt;. Note that the optimal choices were not used from this sweep to ensure optimal performance of the stage 1 (frozen base) models.&lt;/p&gt;
&lt;p&gt;Starting from the best Stage 1 configuration, we varied the unfreezing schedule. s2-s1epN denotes unfreezing the encoder after N epochs of head-only training; s2-baseline denotes the default unfreezing schedule used in the main experiments (unfreezing triggered by validation loss plateau). Baseline used a single multilayer perceptron head of size 1024 with sum pooling, Adam optimiser and RELU activation, and no dropout, weight decay or learning rate plateau. All models used reverse complement and random shift augmentations. Performance is reported as Pearson correlation on the held-out test fold for HepG2, K562, and WTC11, with average performance and rank across cell types.&lt;/p&gt;
&lt;table&gt;
 &lt;thead&gt;
 &lt;tr&gt;
 &lt;th&gt;Hyperparameter&lt;/th&gt;
 &lt;th&gt;HepG2&lt;/th&gt;
 &lt;th&gt;K562&lt;/th&gt;
 &lt;th&gt;WTC11&lt;/th&gt;
 &lt;th&gt;Average&lt;/th&gt;
 &lt;th&gt;Rank&lt;/th&gt;
 &lt;/tr&gt;
 &lt;/thead&gt;
 &lt;tbody&gt;
 &lt;tr&gt;
 &lt;td&gt;s2-s1ep1&lt;/td&gt;
 &lt;td&gt;&lt;strong&gt;0.8720&lt;/strong&gt;&lt;/td&gt;
 &lt;td&gt;&lt;strong&gt;0.8437&lt;/strong&gt;&lt;/td&gt;
 &lt;td&gt;&lt;strong&gt;0.7754&lt;/strong&gt;&lt;/td&gt;
 &lt;td&gt;&lt;strong&gt;0.8304&lt;/strong&gt;&lt;/td&gt;
 &lt;td&gt;1&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;s2-s1ep2&lt;/td&gt;
 &lt;td&gt;0.8709&lt;/td&gt;
 &lt;td&gt;0.8432&lt;/td&gt;
 &lt;td&gt;0.7731&lt;/td&gt;
 &lt;td&gt;0.8291&lt;/td&gt;
 &lt;td&gt;2&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;s2-s1ep3&lt;/td&gt;
 &lt;td&gt;0.8689&lt;/td&gt;
 &lt;td&gt;0.8417&lt;/td&gt;
 &lt;td&gt;0.7706&lt;/td&gt;
 &lt;td&gt;0.8271&lt;/td&gt;
 &lt;td&gt;3&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;s2-s1ep5&lt;/td&gt;
 &lt;td&gt;0.8695&lt;/td&gt;
 &lt;td&gt;0.8396&lt;/td&gt;
 &lt;td&gt;0.7691&lt;/td&gt;
 &lt;td&gt;0.8261&lt;/td&gt;
 &lt;td&gt;4&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;s2-s1ep4&lt;/td&gt;
 &lt;td&gt;0.8686&lt;/td&gt;
 &lt;td&gt;0.8413&lt;/td&gt;
 &lt;td&gt;0.7668&lt;/td&gt;
 &lt;td&gt;0.8256&lt;/td&gt;
 &lt;td&gt;5&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;&amp;mdash;&amp;mdash;&amp;mdash;&amp;mdash;&amp;mdash;&lt;/td&gt;
 &lt;td&gt;&amp;mdash;&amp;mdash;&amp;mdash;-&lt;/td&gt;
 &lt;td&gt;&amp;mdash;&amp;mdash;&amp;mdash;-&lt;/td&gt;
 &lt;td&gt;&amp;mdash;&amp;mdash;&amp;mdash;-&lt;/td&gt;
 &lt;td&gt;&amp;mdash;&amp;mdash;&amp;mdash;-&lt;/td&gt;
 &lt;td&gt;&amp;mdash;-&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;s2-baseline-es&lt;/td&gt;
 &lt;td&gt;0.8624&lt;/td&gt;
 &lt;td&gt;0.8362&lt;/td&gt;
 &lt;td&gt;0.7688&lt;/td&gt;
 &lt;td&gt;0.8225&lt;/td&gt;
 &lt;td&gt;6&lt;/td&gt;
 &lt;/tr&gt;
 &lt;/tbody&gt;
&lt;/table&gt;
&lt;h2 id="references"&gt;References
&lt;/h2&gt;&lt;ol&gt;
&lt;li&gt;Avsec, Ž. et al. Advancing regulatory variant effect prediction with alphagenome., 649, Nature (2026).&lt;/li&gt;
&lt;li&gt;Avsec, Ž. et al. Effective gene expression prediction from sequence by integrating long-range interactions., 18, Nat. methods (2021).&lt;/li&gt;
&lt;li&gt;Agarwal, V. et al. Massively parallel characterization of transcriptional regulatory elements., 639, Nature (2025).&lt;/li&gt;
&lt;li&gt;de Almeida, B. P., Reiter, F., Pagani, M. &amp;amp; Stark, A. Deepstarr predicts enhancer activity from dna sequence and enables thede novo design of synthetic enhancers., 54, Nat. genetics (2022).&lt;/li&gt;
&lt;li&gt;of Genome Interpretation Consortium, T. C. A. Cagi, the critical assessment of genome interpretation, establishes progress and prospects for computational genetic variant interpretation methods., 25, Genome biology (2024).&lt;/li&gt;
&lt;li&gt;Rafi, A. M. et al. A community effort to optimize sequence-based deep learning models of gene regulation., 43, Nat. biotechnology (2025).&lt;/li&gt;
&lt;li&gt;Lal, A., Garfield, D., Biancalani, T. &amp;amp; Eraslan, G. Designing realistic regulatory dna with autoregressive language models., 34, Genome Res. (2024).&lt;/li&gt;
&lt;li&gt;Alan Murphy, Masayuki Nagai, Alejandro Buendia, Anshul Kundaje, Peter Koo. &amp;ldquo;Fine-tuning AlphaGenome in native JAX/Haiku.&amp;rdquo; Genomics × AI Blog, 25 February 2026. &lt;a class="link" href="https://genomicsxai.github.io/blogs/2026-003/" target="_blank" rel="noopener"
 &gt;https://genomicsxai.github.io/blogs/2026-003/&lt;/a&gt;.&lt;/li&gt;
&lt;/ol&gt;</description></item></channel></rss>