IDP-ELM: Accurate and Fast Prediction of Intrinsically Disordered Protein

IDP-ELM^[1] predicts intrinsically disordered regions (IDRs) and their functions directly from amino-acid sequence — no structure or multiple-sequence alignment required. By combining multiple protein language models with ensemble learning, it reaches state-of-the-art accuracy (0.8469 AUC on nonredundant CAID) at a throughput of ~0.8 s per sequence, predicting three per-residue tracks: disorder (IDR), disordered flexible linkers (DFL), and disordered protein binding (DP).

GitHub | Paper | License

Run a Prediction

Paste one or more sequences in FASTA format or upload a FASTA file, then submit. Each run opens a dedicated, bookmarkable result page with a per-residue disorder profile.

Sequence (FASTA) Up to 2,000 residues per job. For longer sequences, please contact us.

Upload FASTA file Accepts .fasta, .fa, .txt, or .seq.

Predictions currently run on a CPU server, so a job may take from a few seconds to a couple of minutes depending on length. The result page shows live progress, can be bookmarked or shared, and is retained for one week.

Note on the served ensemble. The published IDP-ELM combines nine protein language models. Due to the compute resources of this public server, this deployment ensembles only the five smaller models, which keeps the accuracy impact small while greatly reducing the computational cost.

How IDP-ELM Works

For each protein language model (PLM), the per-residue representations feed a BiLSTM that predicts secondary structure; its logits are concatenated with the representations and passed to a BiGRU that predicts disorder (IDR); the IDR logits are in turn fed, with the representations, to a further BiGRU that predicts the IDR functions (DFL and DP). The outputs of the per-PLM predictors are then averaged by ensemble learning.

Architecture of IDP-ELM: a protein language model feeds a BiLSTM (secondary structure), then a BiGRU (predicted IDRs), then a BiGRU (predicted IDR functions), combined across PLMs by averaging. — Figure 1: The IDP-ELM predictor. Each PLM drives a BiLSTM → BiGRU → BiGRU cascade for secondary structure, disorder, and disorder functions; predictions from all PLMs are combined by an averaging (ensemble) layer.

Performance

On the nonredundant CAID^[2] benchmark, IDP-ELM outperforms existing disorder predictors across AUC, F1 and MCC while needing only the sequence as input — no MSA generation, which is the slow, sometimes-impossible step for other methods.

Bar chart of the top 10 IDP predictors on the nonredundant CAID test set; IDP-ELM has the highest AUC, F1 and MCC. — Figure 2: Top-10 IDP predictors on the nonredundant CAID test set (AUC, F1, MCC). IDP-ELM achieves the best overall performance among MSA-free and MSA-based methods.

Case Studies

Beyond aggregate metrics, IDP-ELM recovers disordered regions that other predictors miss — capturing the bulk of an IDR with few false positives, even for proteins that resist crystallisation.

Predicted versus actual disordered regions for example proteins, with 3D structures coloured by disorder (red) and order (green); IDP-ELM matches the experimental annotation closely where other predictors fail. — Figure 3: Exemplary visualisations. Structures (AlphaFold2^[3], rendered in PyMOL^[4]) are coloured by disorder (red) and order (green). IDP-ELM closely matches the experimentally annotated disordered regions where several top predictors fail.

Predicted Tracks

Each residue receives three probabilities (0–1). A value above 0.5 indicates the residue is predicted to belong to that class.

IDR — intrinsically disordered region DFL — disordered flexible linker DP — disordered protein binding

References

^ Xu, S.; Onoda, A. Accurate and Fast Prediction of Intrinsically Disordered Protein by Multiple Protein Language Models and Ensemble Learning. J. Chem. Inf. Model. 2024, 64 (7), 2901–2911. DOI: 10.1021/acs.jcim.3c01202
^ Necci, M.; Piovesan, D.; CAID Predictors; DisProt Curators; Tosatto, S. C. E. Critical Assessment of Protein Intrinsic Disorder Prediction. Nat. Methods 2021, 18 (5), 472–481. DOI: 10.1038/s41592-021-01117-3
^ Jumper, J.; Evans, R.; Pritzel, A.; Green, T.; Figurnov, M.; Ronneberger, O.; Tunyasuvunakool, K.; Bates, R.; Žídek, A.; Potapenko, A.; et al. Highly Accurate Protein Structure Prediction with AlphaFold. Nature 2021, 596 (7873), 583–589. DOI: 10.1038/s41586-021-03819-2
^ The PyMOL Molecular Graphics System, Version 2.0; Schrödinger, LLC.

Please contact shijie.xu@ees.hokudai.ac.jp for any questions.

Changelogs

2026-05-28Rebuilt the web interface with a live result page and per-residue disorder profiles.
2024-05-28Web server updated.
2023-09-12First release.