IDP-ELM: Accurate and Fast Prediction of Intrinsically Disordered Protein
IDP-ELM[1] predicts intrinsically disordered regions (IDRs) and their functions directly from amino-acid sequence — no structure or multiple-sequence alignment required. By combining multiple protein language models with ensemble learning, it reaches state-of-the-art accuracy (0.8469 AUC on nonredundant CAID) at a throughput of ~0.8 s per sequence, predicting three per-residue tracks: disorder (IDR), disordered flexible linkers (DFL), and disordered protein binding (DP).
Paste one or more sequences in FASTA formator upload a FASTA file, then submit. Each run opens a dedicated, bookmarkable result page with a per-residue disorder profile.
Up to 2,000 residues per job. For longer sequences, please contact us.
or
Accepts .fasta, .fa, .txt, or .seq.
Predictions currently run on a CPU server, so a job may take from a few seconds to a couple of minutes depending on length. The result page shows live progress, can be bookmarked or shared, and is retained for one week.
Note on the served ensemble. The published IDP-ELM combines nine protein
language models. Due to the compute resources of this public server, this deployment
ensembles only the five smaller models, which keeps the accuracy impact small while greatly
reducing the computational cost.
How IDP-ELM Works
For each protein language model (PLM), the per-residue representations feed a BiLSTM that predicts secondary structure; its logits are concatenated with the representations and passed to a BiGRU that predicts disorder (IDR); the IDR logits are in turn fed, with the representations, to a further BiGRU that predicts the IDR functions (DFL and DP). The outputs of the per-PLM predictors are then averaged by ensemble learning.
Figure 1: The IDP-ELM predictor. Each PLM drives a BiLSTM → BiGRU → BiGRU cascade for secondary structure, disorder, and disorder functions; predictions from all PLMs are combined by an averaging (ensemble) layer.
Performance
On the nonredundant CAID[2] benchmark, IDP-ELM outperforms existing disorder predictors across AUC, F1 and MCC while needing only the sequence as input — no MSA generation, which is the slow, sometimes-impossible step for other methods.
Figure 2: Top-10 IDP predictors on the nonredundant CAID test set (AUC, F1, MCC). IDP-ELM achieves the best overall performance among MSA-free and MSA-based methods.
Case Studies
Beyond aggregate metrics, IDP-ELM recovers disordered regions that other predictors miss — capturing the bulk of an IDR with few false positives, even for proteins that resist crystallisation.
Figure 3: Exemplary visualisations. Structures (AlphaFold2[3], rendered in PyMOL[4]) are coloured by disorder (red) and order (green). IDP-ELM closely matches the experimentally annotated disordered regions where several top predictors fail.
Predicted Tracks
Each residue receives three probabilities (0–1). A value above 0.5 indicates the residue is predicted to belong to that class.
^
Xu, S.; Onoda, A. Accurate and Fast Prediction of Intrinsically Disordered Protein by Multiple Protein Language Models and Ensemble Learning. J. Chem. Inf. Model.2024, 64 (7), 2901–2911. DOI: 10.1021/acs.jcim.3c01202
^
Necci, M.; Piovesan, D.; CAID Predictors; DisProt Curators; Tosatto, S. C. E. Critical Assessment of Protein Intrinsic Disorder Prediction. Nat. Methods2021, 18 (5), 472–481. DOI: 10.1038/s41592-021-01117-3
^
Jumper, J.; Evans, R.; Pritzel, A.; Green, T.; Figurnov, M.; Ronneberger, O.; Tunyasuvunakool, K.; Bates, R.; Žídek, A.; Potapenko, A.; et al. Highly Accurate Protein Structure Prediction with AlphaFold. Nature2021, 596 (7873), 583–589. DOI: 10.1038/s41586-021-03819-2
^
The PyMOL Molecular Graphics System, Version 2.0; Schrödinger, LLC.