TY - BOOK T1 - The DATA Bonanza: Improving Knowledge Discovery in Science, Engineering, and Business T2 - Wiley Series on Parallel and Distributed Computing (Editor: Albert Y. Zomaya) Y1 - 2013 A1 - Atkinson, Malcolm P. A1 - Baxter, Robert M. A1 - Peter Brezany A1 - Oscar Corcho A1 - Michelle Galea A1 - Parsons, Mark A1 - Snelling, David A1 - van Hemert, Jano KW - Big Data KW - Data Intensive KW - data mining KW - Data Streaming KW - Databases KW - Dispel KW - Distributed Computing KW - Knowledge Discovery KW - Workflows AB - With the digital revolution opening up tremendous opportunities in many fields, there is a growing need for skilled professionals who can develop data-intensive systems and extract information and knowledge from them. This book frames for the first time a new systematic approach for tackling the challenges of data-intensive computing, providing decision makers and technical experts alike with practical tools for dealing with our exploding data collections. Emphasising data-intensive thinking and interdisciplinary collaboration, The DATA Bonanza: Improving Knowledge Discovery in Science, Engineering, and Business examines the essential components of knowledge discovery, surveys many of the current research efforts worldwide, and points to new areas for innovation. Complete with a wealth of examples and DISPEL-based methods demonstrating how to gain more from data in real-world systems, the book: * Outlines the concepts and rationale for implementing data-intensive computing in organisations * Covers from the ground up problem-solving strategies for data analysis in a data-rich world * Introduces techniques for data-intensive engineering using the Data-Intensive Systems Process Engineering Language DISPEL * Features in-depth case studies in customer relations, environmental hazards, seismology, and more * Showcases successful applications in areas ranging from astronomy and the humanities to transport engineering * Includes sample program snippets throughout the text as well as additional materials on a companion website The DATA Bonanza is a must-have guide for information strategists, data analysts, and engineers in business, research, and government, and for anyone wishing to be on the cutting edge of data mining, machine learning, databases, distributed systems, or large-scale computing. JF - Wiley Series on Parallel and Distributed Computing (Editor: Albert Y. Zomaya) PB - John Wiley & Sons Inc. SN - 978-1-118-39864-7 ER - TY - CHAP T1 - Data-Intensive Analysis T2 - THE DATA BONANZA: Improving Knowledge Discovery for Science, Engineering and Business Y1 - 2013 A1 - Oscar Corcho A1 - van Hemert, Jano ED - Malcolm Atkinson ED - Rob Baxter ED - Peter Brezany ED - Oscar Corcho ED - Michelle Galea ED - Parsons, Mark ED - Snelling, David ED - van Hemert, Jano KW - data mining KW - Data-Analysis Experts KW - Data-Intensive Analysis KW - Knowledge Discovery AB - Part II: "Data-intensive Knowledge Discovery", focuses on the needs of data-analysis experts. It illustrates the problem-solving strategies appropriate for a data-rich world, without delving into the details of underlying technologies. It should engage and inform data-analysis specialists, such as statisticians, data miners, image analysts, bio-informaticians or chemo-informaticians, and generate ideas pertinent to their application areas. Chapter 5: "Data-intensive Analysis", introduces a set of common problems that data-analysis experts often encounter, by means of a set of scenarios of increasing levels of complexity. The scenarios typify knowledge discovery challenges and the presented solutions provide practical methods; a starting point for readers addressing their own data challenges. JF - THE DATA BONANZA: Improving Knowledge Discovery for Science, Engineering and Business PB - John Wiley & Sons Ltd. ER - TY - CHAP T1 - Data-Intensive Components and Usage Patterns T2 - THE DATA BONANZA: Improving Knowledge Discovery for Science, Engineering and Business Y1 - 2013 A1 - Oscar Corcho ED - Malcolm Atkinson ED - Rob Baxter ED - Peter Brezany ED - Oscar Corcho ED - Michelle Galea ED - Parsons, Mark ED - Snelling, David ED - van Hemert, Jano KW - Data Analysis KW - data mining KW - Data-Intensive Components KW - Registry KW - Workflow Libraries KW - Workflow Sharing AB - Chapter 7: "Data-intensive components and usage patterns", provides a systematic review of the components that are commonly used in knowledge discovery tasks as well as common patterns of component composition. That is, it introduces the processing elements from which knowledge discovery solutions are built and common composition patterns for delivering trustworthy information. It reflects on how these components and patterns are evolving in a data-intensive context. JF - THE DATA BONANZA: Improving Knowledge Discovery for Science, Engineering and Business PB - John Wiley & Sons Ltd. ER - TY - CHAP T1 - The Data-Intensive Survival Guide T2 - THE DATA BONANZA: Improving Knowledge Discovery for Science, Engineering and Business Y1 - 2013 A1 - Malcolm Atkinson ED - Malcolm Atkinson ED - Rob Baxter ED - Peter Brezany ED - Oscar Corcho ED - Michelle Galea ED - Parsons, Mark ED - Snelling, David ED - van Hemert, Jano KW - Data-Analysis Experts KW - Data-Intensive Architecture KW - Data-intensive Computing KW - Data-Intensive Engineers KW - Datascopes KW - Dispel KW - Domain Experts KW - Intellectual Ramps KW - Knowledge Discovery KW - Workflows AB - Chapter 3: "The data-intensive survival guide", presents an overview of all of the elements of the proposed data-intensive strategy. Sufficient detail is presented for readers to understand the principles and practice that we recommend. It should also provide a good preparation for readers who choose to sample later chapters. It introduces three professional viewpoints: domain experts, data-analysis experts, and data-intensive engineers. Success depends on a balanced approach that develops the capacity of all three groups. A data-intensive architecture provides a flexible framework for that balanced approach. This enables the three groups to build and exploit data-intensive processes that incrementally step from data to results. A language is introduced to describe these incremental data processes from all three points of view. The chapter introduces ‘datascopes’ as the productized data-handling environments and ‘intellectual ramps’ as the ‘on ramps’ for the highways from data to knowledge. JF - THE DATA BONANZA: Improving Knowledge Discovery for Science, Engineering and Business PB - John Wiley & Sons Ltd. ER - TY - CHAP T1 - Data-Intensive Thinking with DISPEL T2 - THE DATA BONANZA: Improving Knowledge Discovery for Science, Engineering and Business Y1 - 2013 A1 - Malcolm Atkinson ED - Malcolm Atkinson ED - Rob Baxter ED - Peter Brezany ED - Oscar Corcho ED - Michelle Galea ED - Parsons, Mark ED - Snelling, David ED - van Hemert, Jano KW - Data-Intensive Machines KW - Data-Intensive Thinking, Data-intensive Computing KW - Dispel KW - Distributed Computing KW - Knowledge Discovery AB - Chapter 4: "Data-intensive thinking with DISPEL", engages the reader with technical issues and solutions, by working through a sequence of examples, building up from a sketch of a solution to a large-scale data challenge. It uses the DISPEL language extensively, introducing its concepts and constructs. It shows how DISPEL may help designers, data-analysts, and engineers develop solutions to the requirements emerging in any data-intensive application domain. The reader is taken through simple steps initially, this then builds to conceptually complex steps that are necessary to cope with the realities of real data providers, real data, real distributed systems, and long-running processes. JF - THE DATA BONANZA: Improving Knowledge Discovery for Science, Engineering and Business PB - John Wiley & Sons Inc. ER - TY - CHAP T1 - Definition of the DISPEL Language T2 - THE DATA BONANZA: Improving Knowledge Discovery for Science, Engineering and Business Y1 - 2013 A1 - Paul Martin A1 - Yaikhom, Gagarine ED - Malcolm Atkinson ED - Rob Baxter ED - Peter Brezany ED - Oscar Corcho ED - Michelle Galea ED - Parsons, Mark ED - Snelling, David ED - van Hemert, Jano KW - Data Streaming KW - Data-intensive Computing KW - Dispel AB - Chapter 10: "Definition of the DISPEL language", describes the novel aspects of the DISPEL language: its constructs, capabilities, and anticipated programming style. JF - THE DATA BONANZA: Improving Knowledge Discovery for Science, Engineering and Business T3 - {Parallel and Distributed Computing, series editor Albert Y. Zomaya} PB - John Wiley & Sons Inc. ER - TY - CHAP T1 - The Digital-Data Challenge T2 - THE DATA BONANZA: Improving Knowledge Discovery for Science, Engineering and Business Y1 - 2013 A1 - Malcolm Atkinson A1 - Parsons, Mark ED - Malcolm Atkinson ED - Rob Baxter ED - Peter Brezany ED - Oscar Corcho ED - Michelle Galea ED - Parsons, Mark ED - Snelling, David ED - van Hemert, Jano KW - Big Data KW - Data-intensive Computing, Knowledge Discovery KW - Digital Data KW - Digital-Data Revolution AB - Part I: Strategies for success in the digital-data revolution, provides an executive summary of the whole book to convince strategists, politicians, managers, and educators that our future data-intensive society requires new thinking, new behavior, new culture, and new distribution of investment and effort. This part will introduce the major concepts so that readers are equipped to discuss and steer their organization’s response to the opportunities and obligations brought by the growing wealth of data. It will help readers understand the changing context brought about by advances in digital devices, digital communication, and ubiquitous computing. Chapter 1: The digital-data challenge, will help readers to understand the challenges ahead in making good use of the data and introduce ideas that will lead to helpful strategies. A global digital-data revolution is catalyzing change in the ways in which we live, work, relax, govern, and organize. This is a significant change in society, as important as the invention of printing or the industrial revolution, but more challenging because it is happening globally at lnternet speed. Becoming agile in adapting to this new world is essential. JF - THE DATA BONANZA: Improving Knowledge Discovery for Science, Engineering and Business PB - John Wiley & Sons Ltd. ER - TY - CHAP T1 - The Digital-Data Revolution T2 - THE DATA BONANZA: Improving Knowledge Discovery for Science, Engineering and Business Y1 - 2013 A1 - Malcolm Atkinson ED - Malcolm Atkinson ED - Rob Baxter ED - Peter Brezany ED - Oscar Corcho ED - Michelle Galea ED - Parsons, Mark ED - Snelling, David ED - van Hemert, Jano KW - Data KW - Information KW - Knowledge KW - Knowledge Discovery KW - Social Impact of Digital Data KW - Wisdom, Data-intensive Computing AB - Chapter 2: "The digital-data revolution", reviews the relationships between data, information, knowledge, and wisdom. It analyses and quantifies the changes in technology and society that are delivering the data bonanza, and then reviews the consequential changes via representative examples in biology, Earth sciences, social sciences, leisure activity, and business. It exposes quantitative details and shows the complexity and diversity of the growing wealth of data, introducing some of its potential benefits and examples of the impediments to successfully realizing those benefits. JF - THE DATA BONANZA: Improving Knowledge Discovery for Science, Engineering and Business PB - John Wiley & Sons Ltd. ER - TY - CHAP T1 - DISPEL Development T2 - THE DATA BONANZA: Improving Knowledge Discovery for Science, Engineering and Business Y1 - 2013 A1 - Adrian Mouat A1 - Snelling, David ED - Malcolm Atkinson ED - Rob Baxter ED - Peter Brezany ED - Oscar Corcho ED - Michelle Galea ED - Parsons, Mark ED - Snelling, David ED - van Hemert, Jano KW - Diagnostics KW - Dispel KW - IDE KW - Libraries KW - Processing Elements AB - Chapter 11: "DISPEL development", describes the tools and libraries that a DISPEL developer might expect to use. The tools include those needed during process definition, those required to organize enactment, and diagnostic aids for developers of applications and platforms. JF - THE DATA BONANZA: Improving Knowledge Discovery for Science, Engineering and Business PB - John Wiley & Sons Inc. ER - TY - CHAP T1 - DISPEL Enactment T2 - THE DATA BONANZA: Improving Knowledge Discovery for Science, Engineering and Business Y1 - 2013 A1 - Chee Sun Liew A1 - Krause, Amrey A1 - Snelling, David ED - Malcolm Atkinson ED - Rob Baxter ED - Peter Brezany ED - Oscar Corcho ED - Michelle Galea ED - Parsons, Mark ED - Snelling, David ED - van Hemert, Jano KW - Data Streaming KW - Data-Intensive Engineering KW - Dispel KW - Workflow Enactment AB - Chapter 12: "DISPEL enactment", describes the four stages of DISPEL enactment. It is targeted at the data-intensive engineers who implement enactment services. JF - THE DATA BONANZA: Improving Knowledge Discovery for Science, Engineering and Business PB - John Wiley & Sons Inc. ER - TY - CHAP T1 - Foreword T2 - THE DATA BONANZA: Improving Knowledge Discovery for Science, Engineering and Business Y1 - 2013 A1 - Tony Hey ED - Malcolm Atkinson ED - Rob Baxter ED - Peter Brezany ED - Oscar Corcho ED - Michelle Galea ED - Parsons, Mark ED - Snelling, David ED - van Hemert, Jano KW - Big Data KW - Data-intensive Computing, Knowledge Discovery JF - THE DATA BONANZA: Improving Knowledge Discovery for Science, Engineering and Business PB - John Wiley & Sons Ltd. ER - TY - CHAP T1 - Platforms for Data-Intensive Analysis T2 - THE DATA BONANZA: Improving Knowledge Discovery for Science, Engineering and Business Y1 - 2013 A1 - Snelling, David ED - Malcolm Atkinson ED - Baxter, Robert M. ED - Peter Brezany ED - Oscar Corcho ED - Michelle Galea ED - Parsons, Mark ED - Snelling, David ED - van Hemert, Jano KW - Data-Intensive Engineering KW - Data-Intensive Systems KW - Dispel KW - Distributed Systems AB - Part III: "Data-intensive engineering", is targeted at technical experts who will develop complex applications, new components, or data-intensive platforms. The techniques introduced may be applied very widely; for example, to any data-intensive distributed application, such as index generation, image processing, sequence comparison, text analysis, and sensor-stream monitoring. The challenges, methods, and implementation requirements are illustrated by making extensive use of DISPEL. Chapter 9: "Platforms for data-intensive analysis", gives a reprise of data-intensive architectures, examines the business case for investing in them, and introduces the stages of data-intensive workflow enactment. JF - THE DATA BONANZA: Improving Knowledge Discovery for Science, Engineering and Business PB - John Wiley & Sons Ltd. ER - TY - CHAP T1 - Preface T2 - THE DATA BONANZA: Improving Knowledge Discovery for Science, Engineering and Business Y1 - 2013 A1 - Malcolm Atkinson ED - Malcolm Atkinson ED - Rob Baxter ED - Peter Brezany ED - Oscar Corcho ED - Michelle Galea ED - Parsons, Mark ED - Snelling, David ED - van Hemert, Jano KW - Big Data, Data-intensive Computing, Knowledge Discovery AB - Who should read the book and why. The structure and conventions used. Suggested reading paths for different categories of reader. JF - THE DATA BONANZA: Improving Knowledge Discovery for Science, Engineering and Business PB - John Wiley & Sons Ltd. ER - TY - CHAP T1 - Problem Solving in Data-Intensive Knowledge Discovery T2 - THE DATA BONANZA: Improving Knowledge Discovery for Science, Engineering and Business Y1 - 2013 A1 - Oscar Corcho A1 - van Hemert, Jano ED - Malcolm Atkinson ED - Rob Baxter ED - Peter Brezany ED - Oscar Corcho ED - Michelle Galea ED - Parsons, Mark ED - Snelling, David ED - van Hemert, Jano KW - Data-Analysis Experts KW - Data-Intensive Analysis KW - Design Patterns for Knowledge Discovery KW - Knowledge Discovery AB - Chapter 6: "Problem solving in data-intensive knowledge discovery", on the basis of the previous scenarios, this chapter provides an overview of effective strategies in knowledge discovery, highlighting common problem-solving methods that apply in conventional contexts, and focusing on the similarities and differences of these methods. JF - THE DATA BONANZA: Improving Knowledge Discovery for Science, Engineering and Business PB - John Wiley & Sons Ltd. ER - TY - CHAP T1 - Sharing and Reuse in Knowledge Discovery T2 - THE DATA BONANZA: Improving Knowledge Discovery for Science, Engineering and Business Y1 - 2013 A1 - Oscar Corcho ED - Malcolm Atkinson ED - Rob Baxter ED - Peter Brezany ED - Oscar Corcho ED - Michelle Galea ED - Parsons, Mark ED - Snelling, David ED - van Hemert, Jano KW - Data-Intensive Analysis KW - Knowledge Discovery KW - Ontologies KW - Semantic Web KW - Sharing AB - Chapter 8: "Sharing and re-use in knowledge discovery", introduces more advanced knowledge discovery problems, and shows how improved component and pattern descriptions facilitate re-use. This supports the assembly of libraries of high level components well-adapted to classes of knowledge discovery methods or application domains. The descriptions are made more powerful by introducing notations from the semantic Web. JF - THE DATA BONANZA: Improving Knowledge Discovery for Science, Engineering and Business PB - John Wiley & Sons Ltd. ER - TY - JOUR T1 - Data-Intensive Architecture for Scientific Knowledge Discovery JF - Distributed and Parallel Databases Y1 - 2012 A1 - Atkinson, Malcolm P. A1 - Chee Sun Liew A1 - Michelle Galea A1 - Paul Martin A1 - Krause, Amrey A1 - Adrian Mouat A1 - Oscar Corcho A1 - Snelling, David KW - Knowledge discovery, workflow management system AB - This paper presents a data-intensive architecture that demonstrates the ability to support applications from a wide range of application domains, and support the different types of users involved in defining, designing and executing data-intensive processing tasks. The prototype architecture is introduced, and the pivotal role of DISPEL as a canonical language is explained. The architecture promotes the exploration and exploitation of distributed and heterogeneous data and spans the complete knowledge discovery process, from data preparation, to analysis, to evaluation and reiteration. The architecture evaluation included large-scale applications from astronomy, cosmology, hydrology, functional genetics, imaging processing and seismology. VL - 30 UR - http://dx.doi.org/10.1007/s10619-012-7105-3 IS - 5 ER - TY - JOUR T1 - Validation and mismatch repair of workflows through typed data streams JF - Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences Y1 - 2011 A1 - Yaikhom, Gagarine A1 - Malcolm Atkinson A1 - van Hemert, Jano A1 - Oscar Corcho A1 - Krause, Amy AB - The type system of a language guarantees that all of the operations on a set of data comply with the rules and conditions set by the language. While language typing is a fundamental requirement for any programming language, the typing of data that flow between processing elements within a workflow is currently being treated as optional. In this paper, we introduce a three-level type system for typing workflow data streams. These types are parts of the Data Intensive System Process Engineering Language programming language, which empowers users with the ability to validate the connections inside a workflow composition, and apply appropriate data type conversions when necessary. Furthermore, this system enables the enactment engine in carrying out type-directed workflow optimizations. VL - 369 IS - 1949 ER -