We've setup an online database that contain GREMLIN predictions for protein domains from the PFAM protein family database. The intent of this database is to give you an idea if GREMLIN might be useful for your gene (once you figure out what protein family it might belong to.) If there are enough homologous sequences to perform the analysis, you can then either use these pre-computed predictions, or submit your sequence to our on-line server.

Predictions made for protein families (Pfam database)

  • Full list = provides a list for entire pfam, along with number of sequences
  • Predictions (All) = provides a subset of the pfam list, for which we made predictions
  • Predictions (No Structure) = only list the pfams where there is no structure
    (using Pfam_id: 1 or PF00001)




  • Alignments
    • Pfam hmm(s) were used as a seed to enriched the alignment with additionial sequences using HHblits and the uniprot database. The alignmenst where filtered (coverage 75%, indentity 90%). Sites that had more than 75% gaps were removed.
  • Predictions
    • GREMLIN (with no-priors) was used to measure co-evolution.
    • Note, we only made predictions for Pfams that were >50 length and had at least 10 sequences per length (extending this to 5 sequences per length for those that have no strcuture)
    • Predictions will be updated yearly to reflect new Pfams, and availiablity of more sequences.
  • PDB homolog
    • HHsearch was used to locate global alignments, the top hit was used to calculate HH_delta.
      • HH_delta > 0.5 indicates the input alignment differs significantly from that of the alignment generated for the PDB.
    • From these minimal distance contacts were extracted (<5A), taking into account the overall probability (and HHscore) and the per site alignment probability.

