Latest news

Publication

C2MS: Dynamic Monitoring and Management of Cloud Infrastructures

9 years 2 months ago
Presentation

Ad hoc Cloud Computing

9 years 2 months ago
Publication

Evolutionary Computation and Constraint Satisfaction

9 years 3 months ago
Publication

Ad hoc Cloud Computing

9 years 5 months ago
Software release

DICOM Confidential 1.4.4 released

9 years 7 months ago
Story

Congratulations to Gary McGilvary on his PhD

10 years 1 week ago
Publication

Ad hoc Cloud Computing (PhD Thesis)

10 years 1 week ago
Publication

Quantification of Ultra-Widefield Retinal Images

10 years 1 month ago
Publication

Precise montaging and metric quantification of retinal surface area from ultra-widefield fundus photography and fluorescein angiography

10 years 1 month ago
Software release

New DICOM Confidential Release

10 years 2 months ago

Historical Interest Only

This is a static HTML version of an old Drupal site. The site is no longer maintained and could be deleted at any point. It is only here for historical interest.

EnzML: multi-label prediction of enzyme classes using InterPro signatures.

5 October 2012 - 8:34am — Jano.van.Hemert

Title	EnzML: multi-label prediction of enzyme classes using InterPro signatures.
Publication Type	Journal Article
Year of Publication	2012
Authors	De Ferrari, L, Aitken, S, van Hemert, J, Goryanin, I
Journal Title	BMC Bioinformatics
Volume	13
Pages	61
Journal Date	2012
ISSN	1471-2105
Abstract	BACKGROUND: Manual annotation of enzymatic functions cannot keep up with automatic genome sequencing. In this work we explore the capacity of InterPro sequence signatures to automatically predict enzymatic function. RESULTS: We present EnzML, a multi-label classification method that can efficiently account also for proteins with multiple enzymatic functions: 50,000 in UniProt. EnzML was evaluated using a standard set of 300,747 proteins for which the manually curated Swiss-Prot and KEGG databases have agreeing Enzyme Commission (EC) annotations. EnzML achieved more than 98% subset accuracy (exact match of all correct Enzyme Commission classes of a protein) for the entire dataset and between 87 and 97% subset accuracy in reannotating eight entire proteomes: human, mouse, rat, mouse-ear cress, fruit fly, the S. pombe yeast, the E. coli bacterium and the M. jannaschii archaebacterium. To understand the role played by the dataset size, we compared the cross-evaluation results of smaller datasets, either constructed at random or from specific taxonomic domains such as archaea, bacteria, fungi, invertebrates, plants and vertebrates. The results were confirmed even when the redundancy in the dataset was reduced using UniRef100, UniRef90 or UniRef50 clusters. CONCLUSIONS: InterPro signatures are a compact and powerful attribute space for the prediction of enzymatic function. This representation makes multi-label machine learning feasible in reasonable time (30 minutes to train on 300,747 instances with 10,852 attributes and 2,201 class values) using the Mulan Binary Relevance Nearest Neighbours algorithm implementation (BR-kNN).
DOI	10.1186/1471-2105-13-61
Alternate Journal	BMC Bioinformatics
PubMed ID	22533924

Main menu

Latest news

Pages

You are here

Historical Interest Only

EnzML: multi-label prediction of enzyme classes using InterPro signatures.

Search form

Main menu

Latest news

Pages

You are here

Historical Interest Only

EnzML: multi-label prediction of enzyme classes using InterPro signatures.