Naturalize

Learn coding conventions from data to improve the stylistic consistency of your codebase

About Naturalize

Naturalize suggests renaming or reformatting changes that are consistent with your codebase. It unifies a surprising name or formatting with a name or formatting choice preferred in similar contexts elsewhere in its training codebase.

View details »

Tool Documentation

A set of documentation notes about the Java Naturalize framework tools. Here you will find the usage of the different Naturalize tools, along with any configuration options to parametrize Naturalize for your needs.

View details »

Download Tools

Download the Java Naturalize tools here.

View details »


About Naturalize

Naturalize is a language agnostic framework for learning coding conventions from a codebase and then expoiting this information for suggesting better identifier names and formatting changes in the code.

A brief text on the importance of coding conventions can be found in Green's essay How To Write Unmaintainable Code. We have used this tool to suggest pull requests to some open source projects an

Wow, that's a pretty cool tool!

Very interesting project, and thanks for the contribution!

Tool Documentation

styleprofile

styleprofile is a tool that let you profile a set of files and retrieve suggestions about alternatives. Its usage is

		usage: styleprofile FILE1, FILE2, ...
		 -a                             Rename all identifiers. Default
		 -c,--codebasedir <DIRECTORY>   Codebase to use to train renamer. This option is mutually exclusive with -l
		 -l,--languagemodel <FILE>      Use this pretrained language model.
		 -m,--methods                   Rename method calls.
		 -t,--types                     Rename types.
		 -v,--variables                 Rename variables.
		

styleprofile accepts either a pre-trained language model or a codebase and trains a language model on that codebase. It then uses the trained language model for renaming. The type of identifiers for which suggestions will be made is controlled by the flags -m for method calls, -t for types and -v for variables.

Note that if the files under consideration are included in the -c directory they will be automatically excluded.

Example

For example executing the command styleprofile -c ./junit ./junit/src/test/java/junit/tests/runner/TextRunnerTest.java assuming that the project folder junit exists in your current directory it returns
=========================================================
Suggestions for junit/src/test/java/junit/tests/runner/TextRunnerTest.java
=========================================================
package junit.tests.runner;

import java.io.File;
import java.io.IOException;
import java.io.InputStream;
import java.io.OutputStream;
import java.io.PrintStream;

import junit.framework.TestCase;
import junit.framework.TestResult;
import junit.framework.TestSuite;

public class TextRunnerTest extends TestCase {

	public void testFailure() throws Exception {
		execTest("junit.tests.framework.Failure", false);
	}

	public void testSuccess() throws Exception {
		execTest("junit.tests.framework.Success", true);
	}

	public void testError() throws Exception {
		execTest("junit.tests.BogusDude", false);
	}

	void execTest(String testClass, boolean success) throws Exception {
		String java = System.getProperty("java.home") + File.separator + "bin" + File.separator + "java";
		String cp = System.getProperty("java.class.path");
		//use -classpath for JDK 1.1.7 compatibility
		String[] cmd = {java, "-classpath", cp, "junit.textui.TestRunner", testClass};
		Process p = Runtime.getRuntime().exec(cmd);
		InputStream i = p.getInputStream();
		while ((i.read()) != -1)
			; //System.out.write(b);
		assertTrue((p.waitFor() == 0) == success);
		if (success) {
			assertTrue(p.exitValue() == 0);
		} else {
			assertFalse(p.exitValue() == 0);
		}
	}

	public void testRunReturnsResult() {
		PrintStream oldOut = System.out;
		System.setOut(new PrintStream(
				new OutputStream() {
					@Override
					public void write(int arg0) throws IOException {
					}
				}
		));
		try {
			TestResult result = junit.textui.TestRunner.run(new TestSuite());
			assertTrue(result.wasSuccessful());
		} finally {
			System.setOut(oldOut);
		}
	}


}
-------------------------------------------
1.'arg0' (12.65%) -> {waitDuration(62.48%), }
2.'i' (18.07%) -> {input(81.93%), }
3.'success' (3.74%) -> {actual(11.20%), handleException(6.69%), description(6.33%), condition(5.88%), wait(5.84%) }
4.'oldOut' (17.68%) -> {oldPrintStream(46.03%), }	
	

buildlm

buildlm is a utility tool that allows you to build your own language model from a codebase in order to use it later. Note that devstyle will use this model only when using -l option. It's usage is

		usage: buildlm
		 -n               n-gram n parameter. The size of n.
		 -o,--output      File to output the serialized n-gram model.
		 -t,--trainDir    The directory containing the training files.		
	

build lm outputs a serialized n-gram language model that can be used with the rest of the tools. For example, to run it type java -jar buildlm.jar -n 5 -t /dir/of/codebase/to/train -o myLm.ser

stylish?

stylish? is a wrapper of the styleprofile as a pre-commit script that checks a given set of files if there are any high confidence suggestions that warrant a commit abortion. If the given set of changes are natural enough according to the framework, naturalizecheck exists with code 0, otherwise it exists with non-zero return code.

To be called from a command line naturalizecheck has the following usage

	usage: naturalizecheck FILE1, FILE2, ...
	 -a                             Check all identifiers. Default
	 -c,--codebasedir <DIRECTORY>   Use this codebase to use to train language model. This option is mutually exclusive with -l
	 -l,--languagemodel <FILE>     Use this pretrained language model given by the file
	 -m,--methods                   Check method calls.
	 -t,--types                     Check types.
	 -v,--variables                 Check variables.
	

Git pre-commit hook

Along with the Runnable JAR file file, we ship a bash pre-commit script for Git repositories. The user has to set the NATURALIZE_LOCATION variable and if needed any additional flags in the NATURALIZE_OPTIONS environment variable. The pre-commit script needs to be placed in the /path/to/repo/.git/hooks/pre-commit directory.

The pre-commit script assumes that as a training set you use the code that is included in the Git repository.

devstyle

devstyle is an Eclipse plugin that allows you to get suggestions in Eclipse IDE. In the context menu, you will get a "Naturalize" option.

Advanced Tool Configuration Options

The tools come with the most of the parameters pre-set to defaults. However, you may change some of the parameters by adding a default.properties file in the directory of the jar files.

Download Tools

Here is the list of the tools that you can download:

Download NameSizeType
styleprofile32 MBRunnable JAR file Download...
buildlm32 MBRunnable JAR file Download...
stylish?32 MBRunnable JAR file Download...
devstyle4.8 MBJAR file (Eclipse Plugin) Download...
genrule MBJAR file Download...
Git commit hook1 KBBash Script Download...

Paper-specific Data

Data ObjectSizeType
List of Evaluation Projects1 KBText File Download...
Pre-built 5-gram LM for identifier renaming536.1 MBKryo Serialized Object Download...
GitHub Java Corpus-List of projects and SHAs or .tar.gz Download...
Needless Diversity (Human Evaluation)28 KBText File Download...
Links to submitted GitHub pull requests 290 bytesText File Download...

Team Members

Miltos Allamanis
is a PhD student and a Microsoft Research PhD Scholar at the University of Edinburgh under the supervision of Charles Sutton. He is pursuing a PhD on Statistical Natural Language Processing for Programming Language Text.
Earl Barr
is a lecturer (= US Assistant Professor) at UCL and a member of CREST whose research interests lie in the areas of software engineering, programming languages, computer security and systems.
Christian Bird
is Researcher at Microsoft Research whose research interests lie in the areas of empirical software engineering, open source software communities, social networks, communication and collaboration in software engineering and software tools.
Charles Sutton
is a lecturer (= US Assistant Professor) at the University of Edinburgh and member of the machine learning group. His research aims at new statistical machine learning methods designed to handle data about the operation and performance of large-scale computer systems with ultimate goal to improve techniques for developing, managing, and debugging computer systems.

Related Links

A set of related links to Naturalize and coding conventions that the may be useful to Naturalize readers.