TY - BOOK
T1 - The DATA Bonanza: Improving Knowledge Discovery in Science, Engineering, and Business
T2 - Wiley Series on Parallel and Distributed Computing (Editor: Albert Y. Zomaya)
Y1 - 2013
A1 - Atkinson, Malcolm P.
A1 - Baxter, Robert M.
A1 - Peter Brezany
A1 - Oscar Corcho
A1 - Michelle Galea
A1 - Parsons, Mark
A1 - Snelling, David
A1 - van Hemert, Jano
KW - Big Data
KW - Data Intensive
KW - data mining
KW - Data Streaming
KW - Databases
KW - Dispel
KW - Distributed Computing
KW - Knowledge Discovery
KW - Workflows
AB - With the digital revolution opening up tremendous opportunities in many fields, there is a growing need for skilled professionals who can develop data-intensive systems and extract information and knowledge from them. This book frames for the first time a new systematic approach for tackling the challenges of data-intensive computing, providing decision makers and technical experts alike with practical tools for dealing with our exploding data collections. Emphasising data-intensive thinking and interdisciplinary collaboration, The DATA Bonanza: Improving Knowledge Discovery in Science, Engineering, and Business examines the essential components of knowledge discovery, surveys many of the current research efforts worldwide, and points to new areas for innovation. Complete with a wealth of examples and DISPEL-based methods demonstrating how to gain more from data in real-world systems, the book: * Outlines the concepts and rationale for implementing data-intensive computing in organisations * Covers from the ground up problem-solving strategies for data analysis in a data-rich world * Introduces techniques for data-intensive engineering using the Data-Intensive Systems Process Engineering Language DISPEL * Features in-depth case studies in customer relations, environmental hazards, seismology, and more * Showcases successful applications in areas ranging from astronomy and the humanities to transport engineering * Includes sample program snippets throughout the text as well as additional materials on a companion website The DATA Bonanza is a must-have guide for information strategists, data analysts, and engineers in business, research, and government, and for anyone wishing to be on the cutting edge of data mining, machine learning, databases, distributed systems, or large-scale computing.
JF - Wiley Series on Parallel and Distributed Computing (Editor: Albert Y. Zomaya)
PB - John Wiley & Sons Inc.
SN - 978-1-118-39864-7
ER -
TY - CHAP
T1 - Data-Intensive Analysis
T2 - THE DATA BONANZA: Improving Knowledge Discovery for Science, Engineering and Business
Y1 - 2013
A1 - Oscar Corcho
A1 - van Hemert, Jano
ED - Malcolm Atkinson
ED - Rob Baxter
ED - Peter Brezany
ED - Oscar Corcho
ED - Michelle Galea
ED - Parsons, Mark
ED - Snelling, David
ED - van Hemert, Jano
KW - data mining
KW - Data-Analysis Experts
KW - Data-Intensive Analysis
KW - Knowledge Discovery
AB - Part II: "Data-intensive Knowledge Discovery", focuses on the needs of data-analysis experts. It illustrates the problem-solving strategies appropriate for a data-rich world, without delving into the details of underlying technologies. It should engage and inform data-analysis specialists, such as statisticians, data miners, image analysts, bio-informaticians or chemo-informaticians, and generate ideas pertinent to their application areas. Chapter 5: "Data-intensive Analysis", introduces a set of common problems that data-analysis experts often encounter, by means of a set of scenarios of increasing levels of complexity. The scenarios typify knowledge discovery challenges and the presented solutions provide practical methods; a starting point for readers addressing their own data challenges.
JF - THE DATA BONANZA: Improving Knowledge Discovery for Science, Engineering and Business
PB - John Wiley & Sons Ltd.
ER -
TY - CHAP
T1 - Data-Intensive Components and Usage Patterns
T2 - THE DATA BONANZA: Improving Knowledge Discovery for Science, Engineering and Business
Y1 - 2013
A1 - Oscar Corcho
ED - Malcolm Atkinson
ED - Rob Baxter
ED - Peter Brezany
ED - Oscar Corcho
ED - Michelle Galea
ED - Parsons, Mark
ED - Snelling, David
ED - van Hemert, Jano
KW - Data Analysis
KW - data mining
KW - Data-Intensive Components
KW - Registry
KW - Workflow Libraries
KW - Workflow Sharing
AB - Chapter 7: "Data-intensive components and usage patterns", provides a systematic review of the components that are commonly used in knowledge discovery tasks as well as common patterns of component composition. That is, it introduces the processing elements from which knowledge discovery solutions are built and common composition patterns for delivering trustworthy information. It reflects on how these components and patterns are evolving in a data-intensive context.
JF - THE DATA BONANZA: Improving Knowledge Discovery for Science, Engineering and Business
PB - John Wiley & Sons Ltd.
ER -
TY - CONF
T1 - Matching Spatial Regions with Combinations of Interacting Gene Expression Patterns
T2 - Communications in Computer and Information Science
Y1 - 2008
A1 - van Hemert, J. I.
A1 - Baldock, R. A.
ED - M. Elloumi
ED - \emph
ED - et al
KW - biomedical
KW - data mining
KW - DGEMap
KW - e-Science
AB - The Edinburgh Mouse Atlas aims to capture in-situ gene expression patterns in a common spatial framework. In this study, we construct a grammar to define spatial regions by combinations of these patterns. Combinations are formed by applying operators to curated gene expression patterns from the atlas, thereby resembling gene interactions in a spatial context. The space of combinations is searched using an evolutionary algorithm with the objective of finding the best match to a given target pattern. We evaluate the method by testing its robustness and the statistical significance of the results it finds.
JF - Communications in Computer and Information Science
PB - Springer Verlag
ER -
TY - CONF
T1 - Mining spatial gene expression data for association rules
T2 - Lecture Notes in Bioinformatics
Y1 - 2007
A1 - van Hemert, J. I.
A1 - Baldock, R. A.
ED - S. Hochreiter
ED - R. Wagner
KW - biomedical
KW - data mining
KW - DGEMap
KW - e-Science
AB - We analyse data from the Edinburgh Mouse Atlas Gene-Expression Database (EMAGE) which is a high quality data source for spatio-temporal gene expression patterns. Using a novel process whereby generated patterns are used to probe spatially-mapped gene expression domains, we are able to get unbiased results as opposed to using annotations based predefined anatomy regions. We describe two processes to form association rules based on spatial configurations, one that associates spatial regions, the other associates genes.
JF - Lecture Notes in Bioinformatics
PB - Springer Verlag
UR - http://dx.doi.org/10.1007/978-3-540-71233-6_6
ER -
TY - CONF
T1 - Adaptive Genetic Programming Applied to New and Existing Simple Regression Problems
T2 - Springer Lecture Notes on Computer Science
Y1 - 2001
A1 - Eggermont, J.
A1 - van Hemert, J. I.
ED - J. Miller
ED - Tomassini, M.
ED - P. L. Lanzi
ED - C. Ryan
ED - A. G. B. Tettamanzi
ED - W. B. Langdon
KW - data mining
AB - In this paper we continue our study on adaptive genetic pro-gramming. We use Stepwise Adaptation of Weights to boost performance of a genetic programming algorithm on simple symbolic regression problems. We measure the performance of a standard GP and two variants of SAW extensions on two different symbolic regression prob-lems from literature. Also, we propose a model for randomly generating polynomials which we then use to further test all three GP variants.
JF - Springer Lecture Notes on Computer Science
PB - Springer-Verlag, Berlin
SN - 9-783540-418993
ER -
TY - CONF
T1 - Evolutionary Computation in Constraint Satisfaction and Machine Learning --- An abstract of my PhD.
T2 - Proceedings of the Brussels Evolutionary Algorithms Day (BEAD-2001)
Y1 - 2001
A1 - van Hemert, J. I.
ED - Anne Defaweux
ED - Bernard Manderick
ED - Tom Lenearts
ED - Johan Parent
ED - Piet van Remortel
KW - constraint satisfaction
KW - data mining
JF - Proceedings of the Brussels Evolutionary Algorithms Day (BEAD-2001)
PB - Vrije Universiteit Brussel (VUB)
ER -
TY - CONF
T1 - Stepwise Adaptation of Weights for Symbolic Regression with Genetic Programming
T2 - Proceedings of the Twelfth Belgium/Netherlands Conference on Artificial Intelligence (BNAIC'00)
Y1 - 2000
A1 - Eggermont, J.
A1 - van Hemert, J. I.
ED - van den Bosch, A.
ED - H. Weigand
KW - data mining
KW - genetic programming
AB - In this paper we continue study on the Stepwise Adaptation of Weights (SAW) technique. Previous studies on constraint satisfaction and data clas-sification have indicated that SAW is a promising technique to boost the performance of evolutionary algorithms. Here we use SAW to boost per-formance of a genetic programming algorithm on simple symbolic regression problems. We measure the performance of a standard GP and two variants of SAW extensions on two different symbolic regression problems.
JF - Proceedings of the Twelfth Belgium/Netherlands Conference on Artificial Intelligence (BNAIC'00)
PB - BNVKI, Dutch and the Belgian AI Association
ER -
TY - CONF
T1 - Adapting the Fitness Function in GP for Data Mining
T2 - Springer Lecture Notes on Computer Science
Y1 - 1999
A1 - Eggermont, J.
A1 - Eiben, A. E.
A1 - van Hemert, J. I.
ED - R. Poli
ED - P. Nordin
ED - W. B. Langdon
ED - T. C. Fogarty
KW - data mining
KW - genetic programming
AB - In this paper we describe how the Stepwise Adaptation of Weights (SAW) technique can be applied in genetic programming. The SAW-ing mechanism has been originally developed for and successfully used in EAs for constraint satisfaction problems. Here we identify the very basic underlying ideas behind SAW-ing and point out how it can be used for different types of problems. In particular, SAW-ing is well suited for data mining tasks where the fitness of a candidate solution is composed by `local scores' on data records. We evaluate the power of the SAW-ing mechanism on a number of benchmark classification data sets. The results indicate that extending the GP with the SAW-ing feature increases its performance when different types of misclassifications are not weighted differently, but leads to worse results when they are.
JF - Springer Lecture Notes on Computer Science
PB - Springer-Verlag, Berlin
SN - 3-540-65899-8
ER -
TY - CONF
T1 - Comparing genetic programming variants for data classification
T2 - Proceedings of the Eleventh Belgium/Netherlands Conference on Artificial Intelligence (BNAIC'99)
Y1 - 1999
A1 - Eggermont, J.
A1 - Eiben, A. E.
A1 - van Hemert, J. I.
ED - E. Postma
ED - M. Gyssens
KW - classification
KW - data mining
KW - genetic programming
AB - This article is a combined summary of two papers written by the authors. Binary data classification problems (with exactly two disjoint classes) form an important application area of machine learning techniques, in particular genetic programming (GP). In this study we compare a number of different variants of GP applied to such problems whereby we investigate the effect of two significant changes in a fixed GP setup in combination with two different evolutionary models
JF - Proceedings of the Eleventh Belgium/Netherlands Conference on Artificial Intelligence (BNAIC'99)
PB - BNVKI, Dutch and the Belgian AI Association
ER -
TY - CONF
T1 - A comparison of genetic programming variants for data classification
T2 - Springer Lecture Notes on Computer Science
Y1 - 1999
A1 - Eggermont, J.
A1 - Eiben, A. E.
A1 - van Hemert, J. I.
ED - D. J. Hand
ED - J. N. Kok
ED - M. R. Berthold
KW - classification
KW - data mining
KW - genetic programming
AB - In this paper we report the results of a comparative study on different variations of genetic programming applied on binary data classification problems. The first genetic programming variant is weighting data records for calculating the classification error and modifying the weights during the run. Hereby the algorithm is defining its own fitness function in an on-line fashion giving higher weights to `hard' records. Another novel feature we study is the atomic representation, where `Booleanization' of data is not performed at the root, but at the leafs of the trees and only Boolean functions are used in the trees' body. As a third aspect we look at generational and steady-state models in combination of both features.
JF - Springer Lecture Notes on Computer Science
PB - Springer-Verlag, Berlin
SN - 3-540-66332-0
ER -