<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Ntv3 on Genomics x AI</title><link>https://genomicsxai.github.io/tags/ntv3/</link><description>Recent content in Ntv3 on Genomics x AI</description><generator>Hugo -- gohugo.io</generator><language>en</language><lastBuildDate>Thu, 25 Jun 2026 02:09:07 -0400</lastBuildDate><atom:link href="https://genomicsxai.github.io/tags/ntv3/index.xml" rel="self" type="application/rss+xml"/><item><title>Benchmarking seq2func models on distal enhancer effects with CRISPRi screens</title><link>https://genomicsxai.github.io/blogs/2026-007/</link><pubDate>Thu, 25 Jun 2026 00:00:00 +0000</pubDate><guid>https://genomicsxai.github.io/blogs/2026-007/</guid><description>&lt;img src="https://genomicsxai.github.io/" alt="Featured image of post Benchmarking seq2func models on distal enhancer effects with CRISPRi screens" /&gt;&lt;aside class="summary-box"&gt;
 &lt;h2 class="summary-box__title"&gt;Summary&lt;/h2&gt;
 &lt;div class="summary-box__body"&gt;
 &lt;p&gt;We benchmark four sequence-to-function genomic deep learning models — &lt;a class="link" href="https://www.nature.com/articles/s41592-021-01252-x" target="_blank" rel="noopener"
 &gt;Enformer&lt;/a&gt;, &lt;a class="link" href="https://www.nature.com/articles/s41588-024-02053-6" target="_blank" rel="noopener"
 &gt;Borzoi&lt;/a&gt;, &lt;a class="link" href="https://www.biorxiv.org/content/10.64898/2025.12.22.695963v1" target="_blank" rel="noopener"
 &gt;NTv3&lt;/a&gt;, and &lt;a class="link" href="https://www.nature.com/articles/s41586-025-10014-0" target="_blank" rel="noopener"
 &gt;AlphaGenome&lt;/a&gt; — zero-shot on two K562 CRISPRi enhancer-knockdown screens (&lt;a class="link" href="https://pubmed.ncbi.nlm.nih.gov/31784727/" target="_blank" rel="noopener"
 &gt;Fulco et al., 2019&lt;/a&gt; and &lt;a class="link" href="https://pubmed.ncbi.nlm.nih.gov/30612741/" target="_blank" rel="noopener"
 &gt;Gasperini et al., 2019&lt;/a&gt;), extending the in-silico CRISPRi setup from &lt;a class="link" href="https://genomebiology.biomedcentral.com/articles/10.1186/s13059-023-02899-9" target="_blank" rel="noopener"
 &gt;Karollus et al., 2023&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Three things stand out:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;AlphaGenome leads on both screens&lt;/strong&gt; (Pearson&amp;rsquo;s r = 0.67 on Fulco et al., 0.45 on Gasperini et al.), with Borzoi a close second on Fulco et al. and a more distant second on Gasperini et al.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;All four models systematically underpredict knockdown magnitude&lt;/strong&gt;, and the gap to experimental measurements widens with enhancer-to-TSS distance — distal cis-regulatory element (CRE) effects remain difficult to predict.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;AlphaGenome&amp;rsquo;s RNA-Seq head (with GENCODE-exon aggregation) beats its CAGE head&lt;/strong&gt; on the larger screen — which output track you choose matters.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Code&lt;/strong&gt;: &lt;a class="link" href="https://github.com/Al-Murphy/seq2func_crispri_eval" target="_blank" rel="noopener"
 &gt;seq2func_crispri_eval&lt;/a&gt;&lt;/p&gt;

 &lt;/div&gt;
&lt;/aside&gt;

&lt;hr&gt;
&lt;h2 id="motivation"&gt;Motivation
&lt;/h2&gt;&lt;p&gt;Sequence-to-function (seq2func) models keep getting bigger, longer-context, and more capable. &lt;a class="link" href="https://www.nature.com/articles/s41586-025-10014-0" target="_blank" rel="noopener"
 &gt;AlphaGenome&lt;/a&gt; is the current state of the art, succeeding &lt;a class="link" href="https://www.nature.com/articles/s41592-021-01252-x" target="_blank" rel="noopener"
 &gt;Enformer&lt;/a&gt; (2021) and &lt;a class="link" href="https://www.nature.com/articles/s41588-024-02053-6" target="_blank" rel="noopener"
 &gt;Borzoi&lt;/a&gt; (2024). DNA language models like &lt;a class="link" href="https://www.biorxiv.org/content/10.64898/2025.12.22.695963v1" target="_blank" rel="noopener"
 &gt;NTv3&lt;/a&gt; now compete in the same space by post-training on the same functional genomic tracks.&lt;/p&gt;
&lt;p&gt;AlphaGenome was launched as a substantial step forward, with most of its zero-shot evaluation focused on single-nucleotide variant (SNV) effects. Two related weaknesses are especially well-documented for this class of model, both originally demonstrated on Enformer: predicting expression across &lt;strong&gt;personalised genomes&lt;/strong&gt; (e.g. &lt;a class="link" href="https://www.nature.com/articles/s41588-023-01524-6" target="_blank" rel="noopener"
 &gt;Sasse et al.&lt;/a&gt;), and predicting &lt;strong&gt;distal CRE effect magnitudes&lt;/strong&gt; (how much knocking down a far-away enhancer changes its target gene&amp;rsquo;s expression). Both involve sequence perturbations the model wasn&amp;rsquo;t directly trained on, and both are where seq2func models historically struggle. Performance on these tasks is what tells us whether a model&amp;rsquo;s apparent advance translates to the questions biologists actually care about.&lt;/p&gt;
&lt;p&gt;The personalised genome question is how well a model predicts gene expression for a specific individual from their own genome. It&amp;rsquo;s hard because the variants that distinguish one individual from the reference are sparse and small in effect, easy to lose against the much stronger reference signal a sequence model is trained to predict. &lt;a class="link" href="https://www.biorxiv.org/content/10.64898/2026.02.01.702969v1.full" target="_blank" rel="noopener"
 &gt;Tu, 2026&lt;/a&gt; and &lt;a class="link" href="https://www.biorxiv.org/content/10.1101/2025.08.05.668750v2.full" target="_blank" rel="noopener"
 &gt;Shen, 2025&lt;/a&gt; recently revisited it: AlphaGenome improves over Enformer on this task, but still falls well short of useful accuracy. The needle moves, but there&amp;rsquo;s a long way to go.&lt;/p&gt;
&lt;p&gt;The distal-CRE magnitude question — much bigger perturbations, disabling a whole regulatory element — has received little attention since &lt;a class="link" href="https://genomebiology.biomedcentral.com/articles/10.1186/s13059-023-02899-9" target="_blank" rel="noopener"
 &gt;Karollus et al., 2023&lt;/a&gt;. They defined an &lt;em&gt;in-silico&lt;/em&gt; CRISPRi benchmark on &lt;a class="link" href="https://pubmed.ncbi.nlm.nih.gov/31784727/" target="_blank" rel="noopener"
 &gt;Fulco 2019&lt;/a&gt; and &lt;a class="link" href="https://pubmed.ncbi.nlm.nih.gov/30612741/" target="_blank" rel="noopener"
 &gt;Gasperini 2019&lt;/a&gt; and ran it on Enformer and Basenji2, finding both substantially undershoot real enhancer effects. Borzoi, NTv3, and AlphaGenome have all followed Enformer, claiming gains from longer contexts and broader predictive tasks — yet none have been evaluated on Karollus&amp;rsquo;s magnitude benchmark.&lt;/p&gt;
&lt;p&gt;A related but easier task — enhancer-gene linking — has been tested more widely: in the original Enformer paper (Figure 1a), in AlphaGenome&amp;rsquo;s evaluation on the &lt;a class="link" href="https://www.biorxiv.org/content/10.1101/2023.11.09.563812v1" target="_blank" rel="noopener"
 &gt;ENCODE-rE2G CRISPRi dataset&lt;/a&gt; (Figure 1b), and in the &lt;a class="link" href="https://www.nature.com/articles/s41467-025-65077-4" target="_blank" rel="noopener"
 &gt;DNALONGBENCH&lt;/a&gt; benchmark. Linking is binary classification: does enhancer X regulate gene Y? A model succeeds by sorting interacting from non-interacting pairs above some threshold. Magnitude prediction is regression: how much does Y drop when X is disabled? It demands ranked and calibrated effect sizes across a continuous range, committing the model to a quantitative theory of enhancer-promoter regulation rather than a topological one. AlphaGenome leads on linking and notes that even it underestimates the impact of very distal enhancers — but doesn&amp;rsquo;t run the magnitude-prediction benchmark.&lt;/p&gt;
&lt;figure&gt;&lt;a href="https://genomicsxai.github.io/blogs/2026-007/enf_ag_crispr_orig_pprs.png" class="image-link" data-pswp-width="7209" data-pswp-height="4730"&gt;
		&lt;img src="https://genomicsxai.github.io/blogs/2026-007/enf_ag_crispr_orig_pprs.png" width="900px" height="590"loading="lazy"
			alt="Figure 1"
			title="Figure 1: Enhancer-gene linking performance from the original Enformer and AlphaGenome publications. (a) Adapted from Avsec et al. 2021 (Enformer): enhancer–gene pair classification performance (CRISPRi-validated versus non-validated candidate enhancers), stratified by relative distance, measured by auPRC on two CRISPRi datasets for different methods, models, and contribution scores. (b) Adapted from Avsec et al. 2025 (AlphaGenome): zero-shot enhancer-gene linking performance on the ENCODE-rE2G CRISPRi dataset (auPRC), stratified by enhancer-to-TSS distance." data-title-escaped="Figure 1: Enhancer-gene linking performance from the original Enformer and AlphaGenome publications. (a) Adapted from Avsec et al. 2021 (Enformer): enhancer–gene pair classification performance (CRISPRi-validated versus non-validated candidate enhancers), stratified by relative distance, measured by auPRC on two CRISPRi datasets for different methods, models, and contribution scores. (b) Adapted from Avsec et al. 2025 (AlphaGenome): zero-shot enhancer-gene linking performance on the ENCODE-rE2G CRISPRi dataset (auPRC), stratified by enhancer-to-TSS distance."&gt;
		&lt;/a&gt;&lt;/figure&gt;&lt;p&gt;We aim to address this. We re-run the Karollus benchmark on AlphaGenome, Borzoi, and NTv3 — and on Enformer, to anchor against Karollus&amp;rsquo;s original numbers. The result is a measure of progress on distal-CRE prediction across the last four years of seq2func releases, not just a test of AlphaGenome&amp;rsquo;s generalisation abilities.&lt;/p&gt;

 &lt;blockquote&gt;
 &lt;p&gt;&lt;em&gt;Side note:&lt;/em&gt; CRISPRi (CRISPR interference) uses a catalytically dead Cas9 fused to a repressive domain to silence a target locus without cutting DNA. When the target is a distal enhancer, you can read out which genes go down in expression — giving you a measured enhancer → gene effect size.&lt;/p&gt;

 &lt;/blockquote&gt;

 &lt;blockquote&gt;
 &lt;p&gt;&lt;em&gt;Side note:&lt;/em&gt; &amp;ldquo;seq2func&amp;rdquo; = sequence-to-function. A model that takes raw DNA as input and predicts functional genomic signals (RNA-seq, CAGE, ATAC, ChIP, etc.) as output. Enformer, Borzoi, and AlphaGenome were all trained end-to-end as seq2func models. NTv3 is a hybrid: first pretrained as a genomic language model (gLM) via masked language modeling — predicting hidden bases from their surrounding sequence across many species&amp;rsquo; genomes — then post-trained on functional assays to produce seq2func outputs.&lt;/p&gt;

 &lt;/blockquote&gt;
&lt;hr&gt;
&lt;h2 id="the-benchmark"&gt;The benchmark
&lt;/h2&gt;&lt;p&gt;We test on two K562 CRISPRi enhancer screens:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;&lt;a class="link" href="https://pubmed.ncbi.nlm.nih.gov/31784727/" target="_blank" rel="noopener"
 &gt;Fulco et al., 2019&lt;/a&gt;&lt;/strong&gt; — ~60 validated enhancer–gene pairs from K562 CRISPRi-FlowFISH. Small but high-confidence.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;&lt;a class="link" href="https://pubmed.ncbi.nlm.nih.gov/30612741/" target="_blank" rel="noopener"
 &gt;Gasperini et al., 2019&lt;/a&gt;&lt;/strong&gt; — ~440 high-confidence significant pairs from pooled K562 CRISPRi at scale.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;For each pair, the procedure is:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Build a sequence window of the model&amp;rsquo;s native context (196 kb for Enformer, 524 kb for Borzoi, 1 Mb for NTv3 and AlphaGenome), centred on the gene&amp;rsquo;s TSS.&lt;/li&gt;
&lt;li&gt;Score the wild-type window.&lt;/li&gt;
&lt;li&gt;Dinucleotide-shuffle a 2 kb slice covering the enhancer. Repeat N = 50 times, average the model&amp;rsquo;s prediction.&lt;/li&gt;
&lt;li&gt;Aggregate the model output over K562 RNA-Seq tracks in a TSS-centred 640 bp window. For AlphaGenome we additionally use the mean signal across GENCODE exons of the target gene (their paper&amp;rsquo;s preferred RNA-Seq score).&lt;/li&gt;
&lt;li&gt;Score &lt;code&gt;pred_delta = (WT − mean(shuffle)) / WT&lt;/code&gt; per pair and correlate with the measured fractional knockdown.&lt;/li&gt;
&lt;/ol&gt;

 &lt;blockquote&gt;
 &lt;p&gt;&lt;em&gt;Side note&lt;/em&gt;: This is a marginalisation procedure — by averaging predictions across many randomised backgrounds in which the enhancer&amp;rsquo;s motif content is destroyed, we isolate the enhancer&amp;rsquo;s contribution from the surrounding sequence context (the same idea underlies Global Importance Analysis (GIA); &lt;a class="link" href="https://pmc.ncbi.nlm.nih.gov/articles/PMC8118286/" target="_blank" rel="noopener"
 &gt;Koo et al., 2021&lt;/a&gt;). The background choice matters: N-replacement is out-of-distribution for the model and can produce strange predictions; drawing random bases changes local G+C content, biasing on composition rather than motif loss; even a plain base shuffle destroys dinucleotide frequencies like CpG counts, themselves a learned regulatory signal. Dinucleotide shuffling (Altschul–Erickson) destroys motif content while keeping both single-base and dinucleotide composition intact — the standard background for in-silico CRISPRi.&lt;/p&gt;

 &lt;/blockquote&gt;
&lt;p&gt;This protocol is a direct extension of &lt;a class="link" href="https://genomebiology.biomedcentral.com/articles/10.1186/s13059-023-02899-9" target="_blank" rel="noopener"
 &gt;Karollus 2023&lt;/a&gt; — most of it is bit-for-bit faithful to theirs. We reuse their Fulco et al. evaluation tables (&lt;code&gt;ziga_additional_columns.tsv&lt;/code&gt; + &lt;code&gt;enhancer_knockdown_effects.tsv&lt;/code&gt; from their &lt;a class="link" href="https://zenodo.org/records/7613255" target="_blank" rel="noopener"
 &gt;Zenodo release&lt;/a&gt;) with the same merge keys and validated-pair filtering, their TSS / enhancer / strand conventions, the same Enformer DeepMind checkpoint, and their K562 CAGE readout, central-bin aggregation, and Pearson/Spearman correlation framework. For Gasperini we apply the same protocol to the Cell 2019 high-confidence pairs (after hg19 → hg38 liftover). The model-specific adjustments — per-architecture aggregation conventions like 5×128 bp bins for Enformer, 20×32 bp for Borzoi, exon-mean for AlphaGenome RNA-Seq — are architecture-driven; the evaluation harness around them is the Karollus harness. Four deliberate departures from Karollus are flagged in &lt;a class="link" href="#limitations" &gt;Limitations&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Each model is evaluated using its canonical published inference setup, with no test-time augmentations applied (TTA is explored separately for AlphaGenome below). Inference compute per pair therefore isn&amp;rsquo;t uniform: Borzoi runs four forward passes (one per fold), while the other three run one — we compare models as their authors released them rather than artificially restrict Borzoi to a single fold. Enformer uses the &lt;a class="link" href="https://github.com/lucidrains/enformer-pytorch" target="_blank" rel="noopener"
 &gt;lucidrains PyTorch port&lt;/a&gt; of the DeepMind weights — same weights as Karollus used, different framework wrapper. Borzoi predictions come from the &lt;a class="link" href="https://huggingface.co/johahi" target="_blank" rel="noopener"
 &gt;Flashzoi&lt;/a&gt; community PyTorch port, evaluated as the original paper&amp;rsquo;s 4-fold ensemble (predictions averaged across the four published fold checkpoints). AlphaGenome uses the &lt;a class="link" href="https://github.com/genomicsxai/alphagenome-pytorch" target="_blank" rel="noopener"
 &gt;PyTorch port&lt;/a&gt; loaded with the all-fold distilled checkpoint, a single model trained to reproduce the multi-fold ensemble&amp;rsquo;s behaviour. NTv3 is the &lt;a class="link" href="https://huggingface.co/InstaDeepAI/NTv3_650M_post" target="_blank" rel="noopener"
 &gt;InstaDeepAI HuggingFace release&lt;/a&gt;.&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="results"&gt;Results
&lt;/h2&gt;&lt;p&gt;&lt;strong&gt;AlphaGenome leads, with Borzoi close behind on Fulco et al.&lt;/strong&gt; Figure 2 shows predicted vs. measured fractional knockdown for all four models on each screen, with points coloured by enhancer-to-TSS distance.&lt;/p&gt;
&lt;figure&gt;&lt;a href="https://genomicsxai.github.io/blogs/2026-007/crispri_fig1.png" class="image-link" data-pswp-width="6303" data-pswp-height="3542"&gt;
		&lt;img src="https://genomicsxai.github.io/blogs/2026-007/crispri_fig1.png" width="900px" height="505"loading="lazy"
			alt="Figure 2"
			title="Figure 2: Predicted vs. measured fractional knockdown for the four models on (a) Fulco et al., 2019 and (b) Gasperini et al., 2019. Points coloured by enhancer-to-TSS distance. AlphaGenome achieves the highest Pearson and Spearman correlations on both screens; on Gasperini, Enformer effectively fails and NTv3 trails the leaders by a wide margin. The dashed x = y line marks perfect prediction (predicted = observed); points falling below it indicate the model underestimates the experimental knockdown." data-title-escaped="Figure 2: Predicted vs. measured fractional knockdown for the four models on (a) Fulco et al., 2019 and (b) Gasperini et al., 2019. Points coloured by enhancer-to-TSS distance. AlphaGenome achieves the highest Pearson and Spearman correlations on both screens; on Gasperini, Enformer effectively fails and NTv3 trails the leaders by a wide margin. The dashed x = y line marks perfect prediction (predicted = observed); points falling below it indicate the model underestimates the experimental knockdown."&gt;
		&lt;/a&gt;&lt;/figure&gt;&lt;p&gt;The headline numbers (left two columns: each model on every pair its own receptive field can reach; right two columns: every model restricted to Enformer&amp;rsquo;s 196 kb receptive field, so all four score the same set of pairs):&lt;/p&gt;
&lt;table&gt;
	&lt;thead&gt;
			&lt;tr&gt;
					&lt;th&gt;Model&lt;/th&gt;
					&lt;th style="text-align: right"&gt;Fulco — full&lt;/th&gt;
					&lt;th style="text-align: right"&gt;Gasperini — full&lt;/th&gt;
					&lt;th style="text-align: right"&gt;Fulco — Enformer RF&lt;/th&gt;
					&lt;th style="text-align: right"&gt;Gasperini — Enformer RF&lt;/th&gt;
			&lt;/tr&gt;
	&lt;/thead&gt;
	&lt;tbody&gt;
			&lt;tr&gt;
					&lt;td&gt;Enformer&lt;/td&gt;
					&lt;td style="text-align: right"&gt;0.29 / 0.19&lt;/td&gt;
					&lt;td style="text-align: right"&gt;0.04 / 0.04&lt;/td&gt;
					&lt;td style="text-align: right"&gt;0.29 / 0.19&lt;/td&gt;
					&lt;td style="text-align: right"&gt;0.03 / 0.04&lt;/td&gt;
			&lt;/tr&gt;
			&lt;tr&gt;
					&lt;td&gt;Borzoi&lt;/td&gt;
					&lt;td style="text-align: right"&gt;0.66 / 0.49&lt;/td&gt;
					&lt;td style="text-align: right"&gt;0.34 / 0.33&lt;/td&gt;
					&lt;td style="text-align: right"&gt;0.66 / 0.49&lt;/td&gt;
					&lt;td style="text-align: right"&gt;0.34 / 0.35&lt;/td&gt;
			&lt;/tr&gt;
			&lt;tr&gt;
					&lt;td&gt;NTv3&lt;/td&gt;
					&lt;td style="text-align: right"&gt;0.34 / 0.09&lt;/td&gt;
					&lt;td style="text-align: right"&gt;0.12 / 0.14&lt;/td&gt;
					&lt;td style="text-align: right"&gt;0.34 / 0.09&lt;/td&gt;
					&lt;td style="text-align: right"&gt;0.13 / 0.17&lt;/td&gt;
			&lt;/tr&gt;
			&lt;tr&gt;
					&lt;td&gt;AlphaGenome&lt;/td&gt;
					&lt;td style="text-align: right"&gt;&lt;strong&gt;0.67 / 0.54&lt;/strong&gt;&lt;/td&gt;
					&lt;td style="text-align: right"&gt;&lt;strong&gt;0.45 / 0.45&lt;/strong&gt;&lt;/td&gt;
					&lt;td style="text-align: right"&gt;&lt;strong&gt;0.67 / 0.54&lt;/strong&gt;&lt;/td&gt;
					&lt;td style="text-align: right"&gt;&lt;strong&gt;0.46 / 0.48&lt;/strong&gt;&lt;/td&gt;
			&lt;/tr&gt;
	&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;Values are Pearson&amp;rsquo;s &lt;em&gt;r&lt;/em&gt; / Spearman&amp;rsquo;s ρ. N pairs in the &amp;ldquo;full&amp;rdquo; columns varies by model because larger-context models score extra distal pairs that Enformer can&amp;rsquo;t see (Fulco N = 62–63, Gasperini N = 352–438); in the &amp;ldquo;Enformer RF (Receptive Field)&amp;rdquo; columns N is uniform (Fulco et al. N = 62, Gasperini et al. N = 352), restricted to the enhancers Enformer predicts.&lt;/p&gt;
&lt;p&gt;The matched-RF columns rule out the simple &amp;ldquo;they just see more sequence&amp;rdquo; explanation for AlphaGenome and Borzoi&amp;rsquo;s lead: Fulco et al. barely moves (only one distal pair drops), and on Gasperini et al. the rankings are unchanged — AlphaGenome even nudges up slightly (Pearson 0.45 → 0.46, Spearman 0.45 → 0.48).&lt;/p&gt;
&lt;p&gt;A few takeaways:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;AlphaGenome outperforms all other models&lt;/strong&gt; on both screens, though essentially tied with Borzoi on Fulco et al. — see the small-sample caveat in &lt;a class="link" href="#limitations" &gt;Limitations&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Enformer and NTv3 struggle on Gasperini et al.&lt;/strong&gt; Enformer&amp;rsquo;s shorter 196 kb context excludes ~20% of Gasperini et al. pairs, but even on the closer pairs, it manages near-zero correlation. NTv3 has a 1 Mb context, the same as AlphaGenome, but still trails the leaders by a wide margin — so context length isn&amp;rsquo;t the explanation.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;On the same set of pairs, AlphaGenome still leads.&lt;/strong&gt; Of Gasperini et al.&amp;rsquo;s pairs, 21 fall within AlphaGenome&amp;rsquo;s 1 Mb context but outside Borzoi&amp;rsquo;s 524 kb — so they contribute to AG&amp;rsquo;s correlation but not Borzoi&amp;rsquo;s (Note, all of the tested Fulco et al. pairs are in Borzoi&amp;rsquo;s context). To make the comparison apples-to-apples, we restrict AlphaGenome to only the pairs Borzoi can also score: Pearson&amp;rsquo;s r 0.45 → 0.44, Spearman&amp;rsquo;s ρ 0.45 → 0.45 — giving essentially the same result.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;The Gasperini et al. ceiling is low across the board.&lt;/strong&gt; Even AlphaGenome leaves substantial variance unexplained. Some of this is likely measurement-side: Gasperini et al.&amp;rsquo;s scaled screen uses a high-MOI single-cell pooled design (median 28 gRNAs per cell), and the statistical and trans-perturbation challenges of high-MOI screens are well-documented (&lt;a class="link" href="https://genomebiology.biomedcentral.com/articles/10.1186/s13059-021-02545-2" target="_blank" rel="noopener"
 &gt;Barry et al., 2021&lt;/a&gt;). Fulco et al.&amp;rsquo;s CRISPRi-FlowFISH targets one enhancer at a time with a bulk-population readout, which we&amp;rsquo;d expect to give more precise per-pair effect sizes. We can&amp;rsquo;t separate measurement noise from model error from these correlations alone, so attributing the gap is interpretive — but it&amp;rsquo;s at least consistent with the per-screen difference we see. Either way, the headline framing is that AlphaGenome improves CRISPRi prediction but a large gap remains, especially for distal cis-regulatory elements (CREs).&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="predictions-shrink-knockdowns-dont"&gt;Predictions shrink, knockdowns don&amp;rsquo;t
&lt;/h3&gt;&lt;p&gt;Figure 3 plots observed and predicted effect against enhancer-to-TSS distance, to more closely match &lt;a class="link" href="https://link.springer.com/article/10.1186/s13059-023-02899-9/figures/5" target="_blank" rel="noopener"
 &gt;Karollus et al.&amp;rsquo;s plotting style&lt;/a&gt;.&lt;/p&gt;
&lt;figure&gt;&lt;a href="https://genomicsxai.github.io/blogs/2026-007/crispri_fig2.png" class="image-link" data-pswp-width="6303" data-pswp-height="2761"&gt;
		&lt;img src="https://genomicsxai.github.io/blogs/2026-007/crispri_fig2.png" width="900px" height="394"loading="lazy"
			alt="Figure 3"
			title="Figure 3: Observed (grey) and predicted (blue) effect (y-axis) vs. enhancer-to-TSS distance (x-axis). Predictions are systematically smaller than observed knockdowns across all four models, and the gap widens at distal distances." data-title-escaped="Figure 3: Observed (grey) and predicted (blue) effect (y-axis) vs. enhancer-to-TSS distance (x-axis). Predictions are systematically smaller than observed knockdowns across all four models, and the gap widens at distal distances."&gt;
		&lt;/a&gt;&lt;/figure&gt;&lt;p&gt;Two patterns hold across all four models:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Predictions are systematically smaller in magnitude than observed knockdowns.&lt;/strong&gt; Even where the rank ordering is right (good Spearman), the predicted effect sizes are compressed.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;The gap grows with distance.&lt;/strong&gt; Models capture some of the distance-decay biology, but the slope is shallower than the data demands — most obviously for Enformer and NTv3, less so for AlphaGenome.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;A short aside: AlphaGenome&amp;rsquo;s RNA-Seq head, aggregated as the mean signal across the target gene&amp;rsquo;s GENCODE exons (their paper&amp;rsquo;s preferred approach), beats AlphaGenome&amp;rsquo;s CAGE head on Gasperini et al. (Pearson&amp;rsquo;s r 0.45 vs. 0.39). On Fulco et al. the difference is small (0.67 vs. 0.66). So if you&amp;rsquo;re benchmarking a new model on these screens, your output-aggregation choice should be considered carefully!&lt;/p&gt;
&lt;figure&gt;&lt;a href="https://genomicsxai.github.io/blogs/2026-007/crispri_suppfig2.png" class="image-link" data-pswp-width="4917" data-pswp-height="3542"&gt;
		&lt;img src="https://genomicsxai.github.io/blogs/2026-007/crispri_suppfig2.png" width="900px" height="648"loading="lazy"
			alt="Figure 4"
			title="Figure 4: Predicted vs. measured fractional knockdown for Enformer and AlphaGenome CAGE track and RNA-Seq track on (a) Fulco et al., 2019 and (b) Gasperini et al., 2019. Points coloured by enhancer-to-TSS distance. AlphaGenome&amp;#39;s RNA-Seq head with GENCODE-exon aggregation beats its CAGE head on Gasperini et al. and is comparable on Fulco et al. The dashed x = y line marks perfect prediction (predicted = observed); points falling below it indicate the model underestimates the experimental knockdown." data-title-escaped="Figure 4: Predicted vs. measured fractional knockdown for Enformer and AlphaGenome CAGE track and RNA-Seq track on (a) Fulco et al., 2019 and (b) Gasperini et al., 2019. Points coloured by enhancer-to-TSS distance. AlphaGenome&amp;amp;#39;s RNA-Seq head with GENCODE-exon aggregation beats its CAGE head on Gasperini et al. and is comparable on Fulco et al. The dashed x = y line marks perfect prediction (predicted = observed); points falling below it indicate the model underestimates the experimental knockdown."&gt;
		&lt;/a&gt;&lt;/figure&gt;&lt;h3 id="scoring-it-two-ways"&gt;Scoring it two ways
&lt;/h3&gt;&lt;p&gt;A second aside: we computed the predicted effect two ways. Figure 2 used the linear &lt;code&gt;pred_delta = (WT − mean(shuffle)) / WT&lt;/code&gt;. We also computed log&lt;sub&gt;2&lt;/sub&gt;(WT / mean(shuffle)) against −log&lt;sub&gt;2&lt;/sub&gt;(1 − y_delta) — a log fold-change scaling that de-emphasises a few high-end outliers. Figure 5 shows the log&lt;sub&gt;2&lt;/sub&gt; version. Spearman is identical between the two scorings (rank-invariant); Pearson can shift noticeably, especially on Gasperini, and AlphaGenome and Borzoi narrowly swap order on Fulco — though the broad picture (AlphaGenome and Borzoi well above NTv3) is robust.&lt;/p&gt;
&lt;figure&gt;&lt;a href="https://genomicsxai.github.io/blogs/2026-007/crispri_suppfig1.png" class="image-link" data-pswp-width="4917" data-pswp-height="3542"&gt;
		&lt;img src="https://genomicsxai.github.io/blogs/2026-007/crispri_suppfig1.png" width="900px" height="648"loading="lazy"
			alt="Figure 5"
			title="Figure 5: Predicted vs. measured knockdown on the log2 alignment scale (log2(WT / mean(shuffle)) vs. −log2(1 − y_delta)) for Borzoi, NTv3, and AlphaGenome on (a) Fulco et al., 2019 and (b) Gasperini et al., 2019. Enformer omitted. Points coloured by enhancer-to-TSS distance. Model ranking matches Figure 2; Pearson can shift on Gasperini, and AlphaGenome and Borzoi swap order on Fulco. Spearman&amp;#39;s ρ is identical between the two scorings. The dashed x = y line marks perfect prediction (predicted = observed); points falling below it indicate the model underestimates the experimental knockdown." data-title-escaped="Figure 5: Predicted vs. measured knockdown on the log2 alignment scale (log2(WT / mean(shuffle)) vs. −log2(1 − y_delta)) for Borzoi, NTv3, and AlphaGenome on (a) Fulco et al., 2019 and (b) Gasperini et al., 2019. Enformer omitted. Points coloured by enhancer-to-TSS distance. Model ranking matches Figure 2; Pearson can shift on Gasperini, and AlphaGenome and Borzoi swap order on Fulco. Spearman&amp;amp;#39;s ρ is identical between the two scorings. The dashed x = y line marks perfect prediction (predicted = observed); points falling below it indicate the model underestimates the experimental knockdown."&gt;
		&lt;/a&gt;&lt;/figure&gt;&lt;h3 id="test-time-augmentation-stabilising-alphagenome"&gt;Test-time augmentation: stabilising AlphaGenome
&lt;/h3&gt;&lt;p&gt;Karollus et al.&amp;rsquo;s original benchmark applies &lt;strong&gt;test-time augmentations (TTA)&lt;/strong&gt; — for each sequence, run the model on 3 small offsets (−43, 0, +43 bp) crossed with both orientations (forward + reverse-complement), then average the 6 predictions. The 43 bp shift isn&amp;rsquo;t arbitrary: Enformer&amp;rsquo;s output is binned at 128 bp, and 128 / 3 ≈ 43, so the three offsets sample the TSS at roughly even thirds within its bin, averaging out where exactly the landmark falls. Genomic seq2func models have small but real positional sensitivity — predictions wobble across shifts — and TTA reliably damps this. Past work has shown the same effect in adjacent settings (&lt;a class="link" href="https://www.nature.com/articles/s42256-022-00570-9" target="_blank" rel="noopener"
 &gt;Toneyan et al., 2022&lt;/a&gt;).&lt;/p&gt;

 &lt;blockquote&gt;
 &lt;p&gt;&lt;em&gt;Side note:&lt;/em&gt; Test-time augmentation (TTA) means running the model on slightly perturbed versions of the same input and averaging the outputs, so the final prediction is less dependent on incidental properties like exact bin alignment or input orientation.&lt;/p&gt;

 &lt;/blockquote&gt;
&lt;p&gt;The headline benchmark (Figures 2–5) doesn&amp;rsquo;t use TTA, so all four models are compared on equal footing. To see what stabilising AlphaGenome&amp;rsquo;s predictions buys on this task, we ran a separate analysis applying Karollus et al.&amp;rsquo;s 6-pass recipe to AlphaGenome alone. Figure 6 shows the result: TTA improves AlphaGenome&amp;rsquo;s headline correlations on both screens, lifting Pearson&amp;rsquo;s r on Fulco from 0.67 to 0.69 and Spearman on Gasperini from 0.45 to 0.47.&lt;/p&gt;
&lt;figure&gt;&lt;a href="https://genomicsxai.github.io/blogs/2026-007/crispri_suppfig3.png" class="image-link" data-pswp-width="4917" data-pswp-height="3542"&gt;
		&lt;img src="https://genomicsxai.github.io/blogs/2026-007/crispri_suppfig3.png" width="900px" height="648"loading="lazy"
			alt="Figure 6"
			title="Figure 6: AlphaGenome predicted vs. measured fractional knockdown on (a) Fulco et al., 2019 and (b) Gasperini et al., 2019, without test time augmentations (TTA) (middle) versus with 6-pass TTA (3 shifts × 2 orientations, averaged; right). Points coloured by enhancer-to-TSS distance. TTA modestly improves AlphaGenome&amp;#39;s correlations on both screens — most visibly Fulco Pearson (0.67 → 0.69) and Gasperini Spearman (0.45 → 0.47); the other metrics are essentially unchanged. Enformer without TTA included for context. The dashed x = y line marks perfect prediction (predicted = observed); points falling below it indicate the model underestimates the experimental knockdown." data-title-escaped="Figure 6: AlphaGenome predicted vs. measured fractional knockdown on (a) Fulco et al., 2019 and (b) Gasperini et al., 2019, without test time augmentations (TTA) (middle) versus with 6-pass TTA (3 shifts × 2 orientations, averaged; right). Points coloured by enhancer-to-TSS distance. TTA modestly improves AlphaGenome&amp;amp;#39;s correlations on both screens — most visibly Fulco Pearson (0.67 → 0.69) and Gasperini Spearman (0.45 → 0.47); the other metrics are essentially unchanged. Enformer without TTA included for context. The dashed x = y line marks perfect prediction (predicted = observed); points falling below it indicate the model underestimates the experimental knockdown."&gt;
		&lt;/a&gt;&lt;/figure&gt;&lt;h3 id="does-ensembling-help"&gt;Does ensembling help?
&lt;/h3&gt;&lt;p&gt;On the matched-RF (Receptive Field) set used in the headline table, a simple equal-weight average of the four models&amp;rsquo; predictions produces a mixed result: Pearson&amp;rsquo;s r rises on Fulco (0.70, vs AlphaGenome&amp;rsquo;s 0.67) but falls on Gasperini (0.42, vs 0.45). Equal-weight ensembling helps only when the models are comparably strong (Fulco); when one model dominates (AlphaGenome on Gasperini), averaging in the weaker models appears to dilute the overall predictive signal.&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="limitations"&gt;Limitations
&lt;/h2&gt;&lt;p&gt;A few things to flag about our analysis.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Differences from Karollus 2023.&lt;/strong&gt; We cross-checked our Enformer reimplementation against Karollus&amp;rsquo;s original code line-by-line with an LLM (&lt;a class="link" href="https://www.anthropic.com/" target="_blank" rel="noopener"
 &gt;Claude Opus 4.7&lt;/a&gt;). Our Enformer numbers don&amp;rsquo;t exactly match their published ones but the qualitative pattern is the same — Enformer underpredicts distal enhancer effects. Four deliberate departures from their approach may explain some of this gap:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Window construction.&lt;/strong&gt; Karollus reads precomputed &lt;code&gt;sequence_start&lt;/code&gt;/&lt;code&gt;sequence_end&lt;/code&gt; from a fixed table, designed so both TSS and enhancer sit inside Enformer&amp;rsquo;s central crop. We build the window on-the-fly as strict TSS-centred so that all models were compared equally.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;TTA only for AlphaGenome.&lt;/strong&gt; Karollus et al. applies TTA (6 forward passes per sequence) uniformly across all evaluated models. Our headline four-model comparison (Figures 2–5) does not apply TTA to any model — each model is evaluated using its standard published setup: single forward pass for Enformer, NTv3, and AlphaGenome (the distilled checkpoint), and the 4-fold ensemble for Borzoi. We separately apply Karollus et al.&amp;rsquo;s 6-pass TTA recipe to AlphaGenome (Figure 6) to characterise the stability gain, but we don&amp;rsquo;t apply it to Enformer, Borzoi, or NTv3.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;No N-replacement control.&lt;/strong&gt; Karollus also runs an all-N enhancer replacement (replacing every base with the &amp;ldquo;N&amp;rdquo; wildcard) as a stronger knockout than shuffling. We deliberately don&amp;rsquo;t: while reference genomes do contain N regions (gaps, centromeres, hard-masked repeats), these are typically excluded or under-weighted in training pipelines for these models, so a 2 kb block of N embedded in an otherwise-normal genic context is anomalous input. Predictions there reflect how the model handles unfamiliar input, not how it responds to motif loss, which is the actual question. The dinucleotide shuffle keeps the model in-distribution and isolates the motif-loss signal cleanly.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Linear vs. log&lt;sub&gt;2&lt;/sub&gt; observed effect.&lt;/strong&gt; Karollus uses log&lt;sub&gt;2&lt;/sub&gt;(1 + fraction_change); we report the linear fractional change and the log&lt;sub&gt;2&lt;/sub&gt; observed effect. Spearman is identical; Pearson&amp;rsquo;s r differs by a small log-vs-linear distortion.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;These methodological differences may shift absolute Pearson values, though broad model rankings should be unaffected.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Small Fulco sample (n ≈ 63).&lt;/strong&gt; Fulco et al.&amp;rsquo;s ~60 validated enhancer–gene pairs is small enough that correlation estimates carry wide confidence intervals — for r ≈ 0.65 at n = 63 the 95% CI spans roughly ±0.15 either side. Numerical differences within roughly ±0.05–0.10 (e.g., AlphaGenome 0.67 vs Borzoi 0.66 on Fulco Pearson) should be read as essentially tied, not as a meaningful ordering. Gasperini&amp;rsquo;s ~440 pairs give tighter estimates, so the model gaps there are more reliable.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;K562 only.&lt;/strong&gt; Both screens are in K562, and K562 is heavily represented in every model&amp;rsquo;s training data. None of these numbers say anything about how the models would do on cell types under-represented in training. We&amp;rsquo;d expect the gap between models, and between predictions and truth, to widen there.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;In-silico ≠ real CRISPRi.&lt;/strong&gt; Dinucleotide shuffling destroys motif content but doesn&amp;rsquo;t capture chromatin context changes, dCas9 occupancy, or 3D genome reorganisation. It&amp;rsquo;s a motif-loss proxy, not a full simulation. Karollus discusses this; worth keeping in mind when reading the numbers.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;CRISPRi isn&amp;rsquo;t purely cis.&lt;/strong&gt; Knocking down an enhancer can affect other nearby genes, or the target gene&amp;rsquo;s own regulatory partners, whose altered expression shifts the cellular context against which we measure the target. Part of any measured enhancer effect is downstream of these indirect (trans) effects, which a sequence-only model predicting from a TSS-centred window can&amp;rsquo;t capture.&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="implications"&gt;Implications
&lt;/h2&gt;&lt;p&gt;The benchmark tests seq2func models on experimentally validated distal enhancer-promoter interactions. AlphaGenome&amp;rsquo;s gains over Enformer on distal CRE effect-magnitude prediction are substantial, and most pronounced on the larger Gasperini et al. screen. That&amp;rsquo;s real progress. But the headline finding — that even AlphaGenome underestimates the magnitude of distal-enhancer effects — converges with what AlphaGenome&amp;rsquo;s own paper observed on the easier enhancer-gene linking task on ENCODE-rE2G: distal CREs remain hard from two distinct evaluation angles. Fully benchmarking seq2func models on distal CRE effect magnitudes will require follow-up CRISPRi screens in cell lines beyond K562 — without them, we can&amp;rsquo;t tell which gains generalise.&lt;/p&gt;
&lt;p&gt;Echoing &lt;a class="link" href="https://www.biorxiv.org/content/10.64898/2026.02.01.702969v1.full" target="_blank" rel="noopener"
 &gt;Tu, 2026&lt;/a&gt; and &lt;a class="link" href="https://www.biorxiv.org/content/10.1101/2025.08.05.668750v2.full" target="_blank" rel="noopener"
 &gt;Shen, 2025&lt;/a&gt; from a different angle: AlphaGenome improves CRISPRi effect prediction, but the remaining gap — especially for distal CREs — is still an open problem for the field.&lt;/p&gt;
&lt;p&gt;The repo is set up so adding a fifth model is one new &lt;code&gt;scripts/test_&amp;lt;dataset&amp;gt;_&amp;lt;newmodel&amp;gt;.py&lt;/code&gt; writing a CSV with &lt;code&gt;y_delta&lt;/code&gt; and &lt;code&gt;pred_delta&lt;/code&gt; columns. If you&amp;rsquo;ve got a model you want to test, we&amp;rsquo;d love to see it!&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="code-and-links"&gt;Code and links
&lt;/h2&gt;&lt;ul&gt;
&lt;li&gt;&lt;a class="link" href="https://github.com/Al-Murphy/seq2func_crispri_eval" target="_blank" rel="noopener"
 &gt;Source code &amp;amp; evaluation scripts&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class="link" href="https://genomebiology.biomedcentral.com/articles/10.1186/s13059-023-02899-9" target="_blank" rel="noopener"
 &gt;Karollus et al., 2023 — the benchmark this extends&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class="link" href="https://zenodo.org/records/7613255" target="_blank" rel="noopener"
 &gt;SequenceModelBenchmark Zenodo (Karollus tables)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Models: &lt;a class="link" href="https://github.com/lucidrains/enformer-pytorch" target="_blank" rel="noopener"
 &gt;Enformer (PyTorch)&lt;/a&gt; · &lt;a class="link" href="https://github.com/calico/borzoi" target="_blank" rel="noopener"
 &gt;Borzoi&lt;/a&gt; (original) · &lt;a class="link" href="https://huggingface.co/johahi" target="_blank" rel="noopener"
 &gt;Flashzoi&lt;/a&gt; (Borzoi PyTorch port, 4-fold ensemble — used here) · &lt;a class="link" href="https://huggingface.co/InstaDeepAI/NTv3_650M_post" target="_blank" rel="noopener"
 &gt;NTv3&lt;/a&gt; · &lt;a class="link" href="https://github.com/genomicsxai/alphagenome-pytorch" target="_blank" rel="noopener"
 &gt;AlphaGenome PyTorch port&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;h2 id="references"&gt;References
&lt;/h2&gt;&lt;ol&gt;
&lt;li&gt;Avsec, Ž. et al. Effective gene expression prediction from sequence by integrating long-range interactions. &lt;em&gt;Nature Methods&lt;/em&gt;, 18, 1196–1203 (2021). &lt;a class="link" href="https://doi.org/10.1038/s41592-021-01252-x" target="_blank" rel="noopener"
 &gt;https://doi.org/10.1038/s41592-021-01252-x&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Linder, J. et al. Predicting RNA-seq coverage from DNA sequence as a unifying model of gene regulation. &lt;em&gt;Nature Genetics&lt;/em&gt;, 57, 949–961 (2025). &lt;a class="link" href="https://doi.org/10.1038/s41588-024-02053-6" target="_blank" rel="noopener"
 &gt;https://doi.org/10.1038/s41588-024-02053-6&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Boshar, S. et al. A foundational model for joint sequence-function multi-species modeling at scale for long-range genomic prediction. &lt;em&gt;bioRxiv&lt;/em&gt; (2025). &lt;a class="link" href="https://doi.org/10.64898/2025.12.22.695963" target="_blank" rel="noopener"
 &gt;https://doi.org/10.64898/2025.12.22.695963&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Avsec, Ž. et al. Advancing regulatory variant effect prediction with AlphaGenome. &lt;em&gt;Nature&lt;/em&gt;, 649, 1206–1218 (2026). &lt;a class="link" href="https://doi.org/10.1038/s41586-025-10014-0" target="_blank" rel="noopener"
 &gt;https://doi.org/10.1038/s41586-025-10014-0&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Karollus, A., Mauermeier, T., Gagneur, J. Current sequence-based models capture gene expression determinants in promoters but mostly ignore distal enhancers. &lt;em&gt;Genome Biology&lt;/em&gt;, 24, 56 (2023). &lt;a class="link" href="https://doi.org/10.1186/s13059-023-02899-9" target="_blank" rel="noopener"
 &gt;https://doi.org/10.1186/s13059-023-02899-9&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Fulco, C. P. et al. Activity-by-contact model of enhancer–promoter regulation from thousands of CRISPR perturbations. &lt;em&gt;Nature Genetics&lt;/em&gt;, 51, 1664–1669 (2019). &lt;a class="link" href="https://doi.org/10.1038/s41588-019-0538-0" target="_blank" rel="noopener"
 &gt;https://doi.org/10.1038/s41588-019-0538-0&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Gasperini, M. et al. A genome-wide framework for mapping gene regulation via cellular genetic screens. &lt;em&gt;Cell&lt;/em&gt;, 176, 377–390.e19 (2019). &lt;a class="link" href="https://doi.org/10.1016/j.cell.2018.11.029" target="_blank" rel="noopener"
 &gt;https://doi.org/10.1016/j.cell.2018.11.029&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Tu, X. et al. A modality gap in personal-genome prediction by sequence-to-function models. &lt;em&gt;bioRxiv&lt;/em&gt; (2026). &lt;a class="link" href="https://doi.org/10.64898/2026.02.01.702969" target="_blank" rel="noopener"
 &gt;https://doi.org/10.64898/2026.02.01.702969&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Shen, L. AlphaGenome enhances personal gene expression prediction but retains key limitations. &lt;em&gt;bioRxiv&lt;/em&gt; (2025). &lt;a class="link" href="https://doi.org/10.1101/2025.08.05.668750" target="_blank" rel="noopener"
 &gt;https://doi.org/10.1101/2025.08.05.668750&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Sasse, A., Ng, B., Spiro, A.E. et al. Benchmarking of deep neural networks for predicting personal gene expression from DNA sequence highlights shortcomings. &lt;em&gt;Nature Genetics&lt;/em&gt;, 55, 2060–2064 (2023). &lt;a class="link" href="https://doi.org/10.1038/s41588-023-01524-6" target="_blank" rel="noopener"
 &gt;https://doi.org/10.1038/s41588-023-01524-6&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Koo, P.K., Majdandzic, A., Ploenzke, M., Anand, P., Paul, S.B. Global importance analysis: An interpretability method to quantify importance of genomic features in deep neural networks. &lt;em&gt;PLoS Computational Biology&lt;/em&gt;, 17(5), e1008925 (2021). &lt;a class="link" href="https://doi.org/10.1371/journal.pcbi.1008925" target="_blank" rel="noopener"
 &gt;https://doi.org/10.1371/journal.pcbi.1008925&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Toneyan, S., Tang, Z., Koo, P.K. Evaluating deep learning for predicting epigenomic profiles. &lt;em&gt;Nature Machine Intelligence&lt;/em&gt;, 4, 1088–1100 (2022). &lt;a class="link" href="https://doi.org/10.1038/s42256-022-00570-9" target="_blank" rel="noopener"
 &gt;https://doi.org/10.1038/s42256-022-00570-9&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Gschwind, A.R. et al. An encyclopedia of enhancer-gene regulatory interactions in the human genome. &lt;em&gt;bioRxiv&lt;/em&gt; (2023). &lt;a class="link" href="https://doi.org/10.1101/2023.11.09.563812" target="_blank" rel="noopener"
 &gt;https://doi.org/10.1101/2023.11.09.563812&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Cheng, W. et al. DNALONGBENCH: a benchmark suite for long-range DNA prediction tasks. &lt;em&gt;Nature Communications&lt;/em&gt;, 16, 10108 (2025). &lt;a class="link" href="https://doi.org/10.1038/s41467-025-65077-4" target="_blank" rel="noopener"
 &gt;https://doi.org/10.1038/s41467-025-65077-4&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Barry, T., Wang, X., Morris, J.A. et al. SCEPTRE improves calibration and sensitivity in single-cell CRISPR screen analysis. &lt;em&gt;Genome Biology&lt;/em&gt;, 22, 344 (2021). &lt;a class="link" href="https://doi.org/10.1186/s13059-021-02545-2" target="_blank" rel="noopener"
 &gt;https://doi.org/10.1186/s13059-021-02545-2&lt;/a&gt;&lt;/li&gt;
&lt;/ol&gt;</description></item></channel></rss>