IDP-ELM: Accurate and Fast Prediction of Intrinsically Disordered Protein

IDP-ELM[1] predicts intrinsically disordered regions (IDRs) and their functions directly from amino-acid sequence — no structure or multiple-sequence alignment required. By combining multiple protein language models with ensemble learning, it reaches state-of-the-art accuracy (0.8469 AUC on nonredundant CAID) at a throughput of ~0.8 s per sequence, predicting three per-residue tracks: disorder (IDR), disordered flexible linkers (DFL), and disordered protein binding (DP).

Run a Prediction

Paste one or more sequences in FASTA format or upload a FASTA file, then submit. Each run opens a dedicated, bookmarkable result page with a per-residue disorder profile.

Up to 2,000 residues per job. For longer sequences, please contact us.
or
Accepts .fasta, .fa, .txt, or .seq.

Predictions currently run on a CPU server, so a job may take from a few seconds to a couple of minutes depending on length. The result page shows live progress, can be bookmarked or shared, and is retained for one week.

How IDP-ELM Works

For each protein language model (PLM), the per-residue representations feed a BiLSTM that predicts secondary structure; its logits are concatenated with the representations and passed to a BiGRU that predicts disorder (IDR); the IDR logits are in turn fed, with the representations, to a further BiGRU that predicts the IDR functions (DFL and DP). The outputs of the per-PLM predictors are then averaged by ensemble learning.

Architecture of IDP-ELM: a protein language model feeds a BiLSTM (secondary structure), then a BiGRU (predicted IDRs), then a BiGRU (predicted IDR functions), combined across PLMs by averaging.
Figure 1: The IDP-ELM predictor. Each PLM drives a BiLSTM → BiGRU → BiGRU cascade for secondary structure, disorder, and disorder functions; predictions from all PLMs are combined by an averaging (ensemble) layer.

Performance

On the nonredundant CAID[2] benchmark, IDP-ELM outperforms existing disorder predictors across AUC, F1 and MCC while needing only the sequence as input — no MSA generation, which is the slow, sometimes-impossible step for other methods.

Bar chart of the top 10 IDP predictors on the nonredundant CAID test set; IDP-ELM has the highest AUC, F1 and MCC.
Figure 2: Top-10 IDP predictors on the nonredundant CAID test set (AUC, F1, MCC). IDP-ELM achieves the best overall performance among MSA-free and MSA-based methods.

Case Studies

Beyond aggregate metrics, IDP-ELM recovers disordered regions that other predictors miss — capturing the bulk of an IDR with few false positives, even for proteins that resist crystallisation.

Predicted versus actual disordered regions for example proteins, with 3D structures coloured by disorder (red) and order (green); IDP-ELM matches the experimental annotation closely where other predictors fail.
Figure 3: Exemplary visualisations. Structures (AlphaFold2[3], rendered in PyMOL[4]) are coloured by disorder (red) and order (green). IDP-ELM closely matches the experimentally annotated disordered regions where several top predictors fail.

Predicted Tracks

Each residue receives three probabilities (0–1). A value above 0.5 indicates the residue is predicted to belong to that class.

IDR — intrinsically disordered region DFL — disordered flexible linker DP — disordered protein binding

References

Please contact shijie.xu@ees.hokudai.ac.jp for any questions.

Changelogs