TY  - JOUR
T1  - EnzML: multi-label prediction of enzyme classes using InterPro signatures.
JF  - BMC Bioinformatics
Y1  - 2012
A1  - De Ferrari, Luna
A1  - Stuart Aitken
A1  - van Hemert, Jano
A1  - Goryanin, Igor
AB  - BACKGROUND: Manual annotation of enzymatic functions cannot keep up with automatic genome sequencing. In this work we explore the capacity of InterPro sequence signatures to automatically predict enzymatic function.    RESULTS: We present EnzML, a multi-label classification method that can efficiently account also for proteins with multiple enzymatic functions: 50,000 in UniProt. EnzML was evaluated using a standard set of 300,747 proteins for which the manually curated Swiss-Prot and KEGG databases have agreeing Enzyme Commission (EC) annotations. EnzML achieved more than 98% subset accuracy (exact match of all correct Enzyme Commission classes of a protein) for the entire dataset and between 87 and 97% subset accuracy in reannotating eight entire proteomes: human, mouse, rat, mouse-ear cress, fruit fly, the S. pombe yeast, the E. coli bacterium and the M. jannaschii archaebacterium. To understand the role played by the dataset size, we compared the cross-evaluation results of smaller datasets, either constructed at random or from specific taxonomic domains such as archaea, bacteria, fungi, invertebrates, plants and vertebrates. The results were confirmed even when the redundancy in the dataset was reduced using UniRef100, UniRef90 or UniRef50 clusters.    CONCLUSIONS: InterPro signatures are a compact and powerful attribute space for the prediction of enzymatic function. This representation makes multi-label machine learning feasible in reasonable time (30 minutes to train on 300,747 instances with 10,852 attributes and 2,201 class values) using the Mulan Binary Relevance Nearest Neighbours algorithm implementation (BR-kNN).
VL  - 13
ER  - 

TY  - CONF
T1  - A model of social collaboration in Molecular Biology knowledge bases
T2  - Proceedings of the 6th Conference of the European Social Simulation    Association (ESSA'09)
Y1  - 2009
A1  - De Ferrari, Luna
A1  - Stuart Aitken
A1  - van Hemert, Jano
A1  - Goryanin, Igor
AB  - Manual annotation of biological data cannot keep up with data production.     Open annotation models using wikis have been proposed to address     this problem. In this empirical study we analyse 36 years of knowledge     collection by 738 authors in two Molecular Biology wikis (EcoliWiki     and WikiPathways) and two knowledge bases (OMIM and Reactome). We     first investigate authorship metrics (authors per entry and edits     per author) which are power-law distributed in Wikipedia and we find     they are heavy-tailed in these four systems too. We also find surprising     similarities between the open (editing open to everyone) and the     closed systems (expert curators only). Secondly, to discriminate     between driving forces in the measured distributions, we simulate     the curation process and find that knowledge overlap among authors     can drive the number of authors per entry, while the time the users     spend on the knowledge base can drive the number of contributions     per author.
JF  - Proceedings of the 6th Conference of the European Social Simulation    Association (ESSA'09)
PB  - European Social Simulation Association
ER  - 

TY  - CONF
T1  - WikiSim: simulating knowledge collection and curation in structured    wikis.
T2  - Proceedings of the 2008 International Symposium on Wikis in Porto,    Portugal
Y1  - 2008
A1  - De~Ferrari, Luna
A1  - Stuart Aitken
A1  - van Hemert, Jano
A1  - Goryanin, Igor
AB  - The aim of this work is to model quantitatively one of the main properties     of wikis: how high quality knowledge can emerge from the individual     work of independent volunteers. The approach chosen is to simulate     knowledge collection and curation in wikis. The basic model represents     the wiki as a set of of true/false values, added and edited at each     simulation round by software agents (users) following a fixed set     of rules. The resulting WikiSim simulations already manage to reach     distributions of edits and user contributions very close to those     reported for Wikipedia. WikiSim can also span conditions not easily     measurable in real-life wikis, such as the impact of various amounts     of user mistakes. WikiSim could be extended to model wiki software     features, such as discussion pages and watch lists, while monitoring     the impact they have on user actions and consensus, and their effect     on knowledge quality. The method could also be used to compare wikis     with other curation scenarios based on centralised editing by experts.     The future challenges for WikiSim will be to find appropriate ways     to evaluate and validate the models and to keep them simple while     still capturing relevant properties of wiki systems.
JF  - Proceedings of the 2008 International Symposium on Wikis in Porto,    Portugal
PB  - ACM
CY  - New York, NY, USA
ER  - 

TY  - CHAP
T1  - COBrA and COBrA-CT: Ontology Engineering Tools
T2  - Anatomy Ontologies for Bioinformatics: Principles and Practice
Y1  - 2007
A1  - Stuart Aitken
A1  - Yin Chen
ED  - Albert Burger
ED  - Duncan Davidson
ED  - Richard Baldock
AB  - COBrA is a Java-based ontology editor for bio-ontologies and anatomies that dif-   fers from other editors by supporting the linking of concepts between two ontologies, and   providing sophisticated analysis and verification functions. In addition to the Gene Ontology   and Open Biology Ontologies formats, COBrA can import and export ontologies in the Se-   mantic Web formats RDF, RDFS and OWL.   COBrA is being re-engineered as a Prot  ́eg  ́e plug-in, and complemented by an ontology   server and a tool for the management of ontology versions and collaborative ontology de-   velopment. We describe both the original COBrA tool and the current developments in this   chapter.
JF  - Anatomy Ontologies for Bioinformatics: Principles and Practice
PB  - Springer
SN  - ISBN-10:1846288843
UR  - http://www.amazon.ca/Anatomy-Ontologies-Bioinformatics-Principles-Practice/dp/1846288843
ER  - 

TY  - JOUR
T1  - OBO Explorer: An Editor for Open Biomedical Ontologies in OWL
JF  - Bioinformatics
Y1  - 2007
A1  - Stuart Aitken
A1  - Yin Chen
A1  - Jonathan Bard
AB  - To clarify the semantics, and take advantage of tools and algorithms developed for the Semantic Web, a mapping from the Open Biomedical Ontologies (OBO) format to the Web Ontology Language (OWL) has been established. We present an ontology editor that allows end users to work directly with this OWL representation of OBO format ontologies.
PB  - Oxford Journals
UR  - http://bioinformatics.oxfordjournals.org/cgi/content/abstract/btm593?
ER  -