Learn coding conventions from data to improve the stylistic consistency of your codebase
Naturalize suggests renaming or reformatting changes that are consistent with your codebase. It unifies a surprising name or formatting with a name or formatting choice preferred in similar contexts elsewhere in its training codebase.
A set of documentation notes about the Java Naturalize framework tools. Here you will find the usage of the different Naturalize tools, along with any configuration options to parametrize Naturalize for your needs.
Naturalize is a language agnostic framework for learning coding conventions from a codebase and then expoiting this information for suggesting better identifier names and formatting changes in the code.
A brief text on the importance of coding conventions can be found in Green's essay How To Write Unmaintainable Code. We have used this tool to suggest pull requests to some open source projects an
Wow, that's a pretty cool tool!
Very interesting project, and thanks for the contribution!
styleprofile is a tool that let you profile a set of files and retrieve suggestions about alternatives. Its usage is
usage: styleprofile FILE1, FILE2, ... -a Rename all identifiers. Default -c,--codebasedir <DIRECTORY> Codebase to use to train renamer. This option is mutually exclusive with -l -l,--languagemodel <FILE> Use this pretrained language model. -m,--methods Rename method calls. -t,--types Rename types. -v,--variables Rename variables.
styleprofile accepts either a pre-trained language model or a
codebase and trains a language model on that codebase.
It then uses the trained language model for renaming. The type
of identifiers for which suggestions will be made is controlled
by the flags -m
for method calls, -t
for types and -v
for variables.
Note that if the files under consideration are included
in the -c
directory they will be automatically
excluded.
styleprofile -c ./junit ./junit/src/test/java/junit/tests/runner/TextRunnerTest.java
assuming that the project folder junit
exists in your
current directory it returns
========================================================= Suggestions for junit/src/test/java/junit/tests/runner/TextRunnerTest.java ========================================================= package junit.tests.runner; import java.io.File; import java.io.IOException; import java.io.InputStream; import java.io.OutputStream; import java.io.PrintStream; import junit.framework.TestCase; import junit.framework.TestResult; import junit.framework.TestSuite; public class TextRunnerTest extends TestCase { public void testFailure() throws Exception { execTest("junit.tests.framework.Failure", false); } public void testSuccess() throws Exception { execTest("junit.tests.framework.Success", true); } public void testError() throws Exception { execTest("junit.tests.BogusDude", false); } void execTest(String testClass, boolean success) throws Exception { String java = System.getProperty("java.home") + File.separator + "bin" + File.separator + "java"; String cp = System.getProperty("java.class.path"); //use -classpath for JDK 1.1.7 compatibility String[] cmd = {java, "-classpath", cp, "junit.textui.TestRunner", testClass}; Process p = Runtime.getRuntime().exec(cmd); InputStream i = p.getInputStream(); while ((i.read()) != -1) ; //System.out.write(b); assertTrue((p.waitFor() == 0) == success); if (success) { assertTrue(p.exitValue() == 0); } else { assertFalse(p.exitValue() == 0); } } public void testRunReturnsResult() { PrintStream oldOut = System.out; System.setOut(new PrintStream( new OutputStream() { @Override public void write(int arg0) throws IOException { } } )); try { TestResult result = junit.textui.TestRunner.run(new TestSuite()); assertTrue(result.wasSuccessful()); } finally { System.setOut(oldOut); } } } ------------------------------------------- 1.'arg0' (12.65%) -> {waitDuration(62.48%), } 2.'i' (18.07%) -> {input(81.93%), } 3.'success' (3.74%) -> {actual(11.20%), handleException(6.69%), description(6.33%), condition(5.88%), wait(5.84%) } 4.'oldOut' (17.68%) -> {oldPrintStream(46.03%), }
buildlm is a utility tool that allows you to build your own language model from a codebase
in order to use it later. Note that devstyle
will use this model
only when using -l
option. It's usage is
usage: buildlm -n n-gram n parameter. The size of n. -o,--outputFile to output the serialized n-gram model. -t,--trainDir The directory containing the training files.
build lm outputs a serialized n-gram language model that can
be used with the rest of the tools. For example, to run it type
java -jar buildlm.jar -n 5 -t /dir/of/codebase/to/train -o myLm.ser
stylish?
is a wrapper of the styleprofile as a pre-commit
script that checks a given set of files if there are any high
confidence suggestions that warrant a commit abortion. If the
given set of changes are natural enough according to the
framework, naturalizecheck exists with code 0, otherwise it
exists with non-zero return code.
To be called from a command line naturalizecheck has the following usage
usage: naturalizecheck FILE1, FILE2, ... -a Check all identifiers. Default -c,--codebasedir <DIRECTORY> Use this codebase to use to train language model. This option is mutually exclusive with -l -l,--languagemodel <FILE> Use this pretrained language model given by the file -m,--methods Check method calls. -t,--types Check types. -v,--variables Check variables.
Along with the Runnable JAR file file, we ship a bash pre-commit script for
Git repositories. The user has to set the NATURALIZE_LOCATION
variable and if needed any additional flags in the
NATURALIZE_OPTIONS
environment variable.
The pre-commit script needs to be placed in the
/path/to/repo/.git/hooks/pre-commit
directory.
The pre-commit script assumes that as a training set you use the code that is included in the Git repository.
devstyle
is an Eclipse plugin that allows you
to get suggestions in Eclipse IDE. In the context menu, you will
get a "Naturalize" option.
The tools come with the most of the parameters pre-set to
defaults. However, you may change some of the parameters by
adding a default.properties
file in the directory
of the jar files.
n
in the n-gram LM used. Default is 5.Here is the list of the tools that you can download:
Download Name | Size | Type | |
---|---|---|---|
styleprofile | 32 MB | Runnable JAR file | Download... |
buildlm | 32 MB | Runnable JAR file | Download... |
stylish? | 32 MB | Runnable JAR file | Download... |
devstyle | 4.8 MB | JAR file (Eclipse Plugin) | Download... |
genrule | MB | JAR file | Download... |
Git commit hook | 1 KB | Bash Script | Download... |
Data Object | Size | Type | |
---|---|---|---|
List of Evaluation Projects | 1 KB | Text File | Download... |
Pre-built 5-gram LM for identifier renaming | 536.1 MB | Kryo Serialized Object | Download... |
GitHub Java Corpus | - | List of projects and SHAs or .tar.gz | Download... |
Needless Diversity (Human Evaluation) | 28 KB | Text File | Download... |
Links to submitted GitHub pull requests | 290 bytes | Text File | Download... |
A set of related links to Naturalize and coding conventions that the may be useful to Naturalize readers.