TY - BOOK T1 - The DATA Bonanza: Improving Knowledge Discovery in Science, Engineering, and Business T2 - Wiley Series on Parallel and Distributed Computing (Editor: Albert Y. Zomaya) Y1 - 2013 A1 - Atkinson, Malcolm P. A1 - Baxter, Robert M. A1 - Peter Brezany A1 - Oscar Corcho A1 - Michelle Galea A1 - Parsons, Mark A1 - Snelling, David A1 - van Hemert, Jano KW - Big Data KW - Data Intensive KW - data mining KW - Data Streaming KW - Databases KW - Dispel KW - Distributed Computing KW - Knowledge Discovery KW - Workflows AB - With the digital revolution opening up tremendous opportunities in many fields, there is a growing need for skilled professionals who can develop data-intensive systems and extract information and knowledge from them. This book frames for the first time a new systematic approach for tackling the challenges of data-intensive computing, providing decision makers and technical experts alike with practical tools for dealing with our exploding data collections. Emphasising data-intensive thinking and interdisciplinary collaboration, The DATA Bonanza: Improving Knowledge Discovery in Science, Engineering, and Business examines the essential components of knowledge discovery, surveys many of the current research efforts worldwide, and points to new areas for innovation. Complete with a wealth of examples and DISPEL-based methods demonstrating how to gain more from data in real-world systems, the book: * Outlines the concepts and rationale for implementing data-intensive computing in organisations * Covers from the ground up problem-solving strategies for data analysis in a data-rich world * Introduces techniques for data-intensive engineering using the Data-Intensive Systems Process Engineering Language DISPEL * Features in-depth case studies in customer relations, environmental hazards, seismology, and more * Showcases successful applications in areas ranging from astronomy and the humanities to transport engineering * Includes sample program snippets throughout the text as well as additional materials on a companion website The DATA Bonanza is a must-have guide for information strategists, data analysts, and engineers in business, research, and government, and for anyone wishing to be on the cutting edge of data mining, machine learning, databases, distributed systems, or large-scale computing. JF - Wiley Series on Parallel and Distributed Computing (Editor: Albert Y. Zomaya) PB - John Wiley & Sons Inc. SN - 978-1-118-39864-7 ER - TY - CHAP T1 - Data-Intensive Analysis T2 - THE DATA BONANZA: Improving Knowledge Discovery for Science, Engineering and Business Y1 - 2013 A1 - Oscar Corcho A1 - van Hemert, Jano ED - Malcolm Atkinson ED - Rob Baxter ED - Peter Brezany ED - Oscar Corcho ED - Michelle Galea ED - Parsons, Mark ED - Snelling, David ED - van Hemert, Jano KW - data mining KW - Data-Analysis Experts KW - Data-Intensive Analysis KW - Knowledge Discovery AB - Part II: "Data-intensive Knowledge Discovery", focuses on the needs of data-analysis experts. It illustrates the problem-solving strategies appropriate for a data-rich world, without delving into the details of underlying technologies. It should engage and inform data-analysis specialists, such as statisticians, data miners, image analysts, bio-informaticians or chemo-informaticians, and generate ideas pertinent to their application areas. Chapter 5: "Data-intensive Analysis", introduces a set of common problems that data-analysis experts often encounter, by means of a set of scenarios of increasing levels of complexity. The scenarios typify knowledge discovery challenges and the presented solutions provide practical methods; a starting point for readers addressing their own data challenges. JF - THE DATA BONANZA: Improving Knowledge Discovery for Science, Engineering and Business PB - John Wiley & Sons Ltd. ER - TY - CHAP T1 - Data-Intensive Components and Usage Patterns T2 - THE DATA BONANZA: Improving Knowledge Discovery for Science, Engineering and Business Y1 - 2013 A1 - Oscar Corcho ED - Malcolm Atkinson ED - Rob Baxter ED - Peter Brezany ED - Oscar Corcho ED - Michelle Galea ED - Parsons, Mark ED - Snelling, David ED - van Hemert, Jano KW - Data Analysis KW - data mining KW - Data-Intensive Components KW - Registry KW - Workflow Libraries KW - Workflow Sharing AB - Chapter 7: "Data-intensive components and usage patterns", provides a systematic review of the components that are commonly used in knowledge discovery tasks as well as common patterns of component composition. That is, it introduces the processing elements from which knowledge discovery solutions are built and common composition patterns for delivering trustworthy information. It reflects on how these components and patterns are evolving in a data-intensive context. JF - THE DATA BONANZA: Improving Knowledge Discovery for Science, Engineering and Business PB - John Wiley & Sons Ltd. ER - TY - CONF T1 - Matching Spatial Regions with Combinations of Interacting Gene Expression Patterns T2 - Communications in Computer and Information Science Y1 - 2008 A1 - van Hemert, J. I. A1 - Baldock, R. A. ED - M. Elloumi ED - \emph ED - et al KW - biomedical KW - data mining KW - DGEMap KW - e-Science AB - The Edinburgh Mouse Atlas aims to capture in-situ gene expression patterns in a common spatial framework. In this study, we construct a grammar to define spatial regions by combinations of these patterns. Combinations are formed by applying operators to curated gene expression patterns from the atlas, thereby resembling gene interactions in a spatial context. The space of combinations is searched using an evolutionary algorithm with the objective of finding the best match to a given target pattern. We evaluate the method by testing its robustness and the statistical significance of the results it finds. JF - Communications in Computer and Information Science PB - Springer Verlag ER - TY - CONF T1 - Mining spatial gene expression data for association rules T2 - Lecture Notes in Bioinformatics Y1 - 2007 A1 - van Hemert, J. I. A1 - Baldock, R. A. ED - S. Hochreiter ED - R. Wagner KW - biomedical KW - data mining KW - DGEMap KW - e-Science AB - We analyse data from the Edinburgh Mouse Atlas Gene-Expression Database (EMAGE) which is a high quality data source for spatio-temporal gene expression patterns. Using a novel process whereby generated patterns are used to probe spatially-mapped gene expression domains, we are able to get unbiased results as opposed to using annotations based predefined anatomy regions. We describe two processes to form association rules based on spatial configurations, one that associates spatial regions, the other associates genes. JF - Lecture Notes in Bioinformatics PB - Springer Verlag UR - http://dx.doi.org/10.1007/978-3-540-71233-6_6 ER - TY - CONF T1 - Adaptive Genetic Programming Applied to New and Existing Simple Regression Problems T2 - Springer Lecture Notes on Computer Science Y1 - 2001 A1 - Eggermont, J. A1 - van Hemert, J. I. ED - J. Miller ED - Tomassini, M. ED - P. L. Lanzi ED - C. Ryan ED - A. G. B. Tettamanzi ED - W. B. Langdon KW - data mining AB - In this paper we continue our study on adaptive genetic pro-gramming. We use Stepwise Adaptation of Weights to boost performance of a genetic programming algorithm on simple symbolic regression problems. We measure the performance of a standard GP and two variants of SAW extensions on two different symbolic regression prob-lems from literature. Also, we propose a model for randomly generating polynomials which we then use to further test all three GP variants. JF - Springer Lecture Notes on Computer Science PB - Springer-Verlag, Berlin SN - 9-783540-418993 ER - TY - CONF T1 - Evolutionary Computation in Constraint Satisfaction and Machine Learning --- An abstract of my PhD. T2 - Proceedings of the Brussels Evolutionary Algorithms Day (BEAD-2001) Y1 - 2001 A1 - van Hemert, J. I. ED - Anne Defaweux ED - Bernard Manderick ED - Tom Lenearts ED - Johan Parent ED - Piet van Remortel KW - constraint satisfaction KW - data mining JF - Proceedings of the Brussels Evolutionary Algorithms Day (BEAD-2001) PB - Vrije Universiteit Brussel (VUB) ER - TY - CONF T1 - Stepwise Adaptation of Weights for Symbolic Regression with Genetic Programming T2 - Proceedings of the Twelfth Belgium/Netherlands Conference on Artificial Intelligence (BNAIC'00) Y1 - 2000 A1 - Eggermont, J. A1 - van Hemert, J. I. ED - van den Bosch, A. ED - H. Weigand KW - data mining KW - genetic programming AB - In this paper we continue study on the Stepwise Adaptation of Weights (SAW) technique. Previous studies on constraint satisfaction and data clas-sification have indicated that SAW is a promising technique to boost the performance of evolutionary algorithms. Here we use SAW to boost per-formance of a genetic programming algorithm on simple symbolic regression problems. We measure the performance of a standard GP and two variants of SAW extensions on two different symbolic regression problems. JF - Proceedings of the Twelfth Belgium/Netherlands Conference on Artificial Intelligence (BNAIC'00) PB - BNVKI, Dutch and the Belgian AI Association ER - TY - CONF T1 - Adapting the Fitness Function in GP for Data Mining T2 - Springer Lecture Notes on Computer Science Y1 - 1999 A1 - Eggermont, J. A1 - Eiben, A. E. A1 - van Hemert, J. I. ED - R. Poli ED - P. Nordin ED - W. B. Langdon ED - T. C. Fogarty KW - data mining KW - genetic programming AB - In this paper we describe how the Stepwise Adaptation of Weights (SAW) technique can be applied in genetic programming. The SAW-ing mechanism has been originally developed for and successfully used in EAs for constraint satisfaction problems. Here we identify the very basic underlying ideas behind SAW-ing and point out how it can be used for different types of problems. In particular, SAW-ing is well suited for data mining tasks where the fitness of a candidate solution is composed by `local scores' on data records. We evaluate the power of the SAW-ing mechanism on a number of benchmark classification data sets. The results indicate that extending the GP with the SAW-ing feature increases its performance when different types of misclassifications are not weighted differently, but leads to worse results when they are. JF - Springer Lecture Notes on Computer Science PB - Springer-Verlag, Berlin SN - 3-540-65899-8 ER - TY - CONF T1 - Comparing genetic programming variants for data classification T2 - Proceedings of the Eleventh Belgium/Netherlands Conference on Artificial Intelligence (BNAIC'99) Y1 - 1999 A1 - Eggermont, J. A1 - Eiben, A. E. A1 - van Hemert, J. I. ED - E. Postma ED - M. Gyssens KW - classification KW - data mining KW - genetic programming AB - This article is a combined summary of two papers written by the authors. Binary data classification problems (with exactly two disjoint classes) form an important application area of machine learning techniques, in particular genetic programming (GP). In this study we compare a number of different variants of GP applied to such problems whereby we investigate the effect of two significant changes in a fixed GP setup in combination with two different evolutionary models JF - Proceedings of the Eleventh Belgium/Netherlands Conference on Artificial Intelligence (BNAIC'99) PB - BNVKI, Dutch and the Belgian AI Association ER - TY - CONF T1 - A comparison of genetic programming variants for data classification T2 - Springer Lecture Notes on Computer Science Y1 - 1999 A1 - Eggermont, J. A1 - Eiben, A. E. A1 - van Hemert, J. I. ED - D. J. Hand ED - J. N. Kok ED - M. R. Berthold KW - classification KW - data mining KW - genetic programming AB - In this paper we report the results of a comparative study on different variations of genetic programming applied on binary data classification problems. The first genetic programming variant is weighting data records for calculating the classification error and modifying the weights during the run. Hereby the algorithm is defining its own fitness function in an on-line fashion giving higher weights to `hard' records. Another novel feature we study is the atomic representation, where `Booleanization' of data is not performed at the root, but at the leafs of the trees and only Boolean functions are used in the trees' body. As a third aspect we look at generational and steady-state models in combination of both features. JF - Springer Lecture Notes on Computer Science PB - Springer-Verlag, Berlin SN - 3-540-66332-0 ER -