Logo

Mining Interesting Stuff
Extract patterns that best explain your data

About

MIST is a collection of algorithms for mining the most interesting patterns from a dataset, that is, only those patterns that are most relevant to a data analyst doing exploratory data analysis.

View details »

Algorithms

We provide algorithms for mining three different types of pattern: itemsets (sets of data attributes), sequences (subsequences of data attributes), API Calls (subsequences of API calls).

View details »



About MIST

MIST is a collection of tools designed to mine the most interesting patterns from a given dataset, specifically those patterns that are useful for a data analyst performing exploratory data analysis. Unlike frequent pattern mining algorithms, which return huge numbers of highly redundant patterns, our algorithms are designed from the ground up to mine only those patterns that are the most interesting, greatly reducing redundancy.

In order to achieve this, we define a probablistic model of transactions and apply a statistical inference algorihm to efficiently infer the interesting patterns directly from the database. For more technical details, please see the accompanying papers to the algorithms below.


Algorithms

We provide algorithms for mining three different types of pattern:

Itemsets

Interesting Itemset Miner mines sets of attributes from data.

This is an implementation of the algorithm from our PKDD paper.

View details and code »

Sequences

Interesting Sequence Miner mines sequences of attributes from data.

This is an implementation of the algorithm from our KDD paper.

View details and code »

API Calls

Probabilistic API Miner mines sequences of API calls from data.

This is an implementation of the algorithm from our FSE paper.

View details and code »


Datasets

The datasets used in our papers are available in the datasets/ subdirectory in the source code for each algorithm (see above).


Team Members

  • Jaroslav Fowkes is a Postdoc at the University of Edinburgh and member of the machine learning group. His research focuses on developing novel statistical methods for exploratory data analysis as well as natural language processing techniques for the analysis of program source code text.

  • Charles Sutton is a lecturer (= US Assistant Professor) at the University of Edinburgh and member of the machine learning group. His research aims at new statistical methods for interactive machine learning as well as to handle data about the operation and performance of large-scale computer systems.


Logo

This work was supported by the Engineering and Physical Sciences Research Council (Grant Number EP/K024043/1).