Graphlr: indexing antlr3 generated Java AST through a Neo4j graph - codecentric AG Blog

:

While working on my Sonar fork which allows to simulate refactoring without actually touching source files I have once again realized what a PITA it is to traverse the antlr-generated Abstract Syntax Tree (AST) for Java. The mechanism is absolutely cool, no doubt. But the final AST representation is not intuitive, and the corresponding traversal code always looks ugly.

While intensively working with Neo4j, I just asked myself: wouldn’t it be nice to use it as index for the Java-AST? You only need to jump to a relevant node and still can use the classic AST-traversal to get details out of it. Or you could wire the whole AST through an accompanying graph and thus use the graph to traverse the whole AST.

So, here comes Graphlr as the first result. It allows traversing the antlr3 generated Java AST using Neo4j. It’s of course open source and available through the same licence antlr is. Graphlr is nothing more than an antlr grammar/parser/lexer file based on the original one for Java and extending it with, well, code building the Neo4j graph. The graph is a sort of index, if you like, allowing quick jumps into the code spots you’re interested in.

Let’s look at an example, though I assume that you’re familiar with antlr, especially antlr3.

package org.pbit.graphlr;
 
import java.util.List;
 
import org.antlr.runtime.ANTLRFileStream;
import org.antlr.runtime.CharStream;
import org.antlr.runtime.CommonTokenStream;
import org.antlr.runtime.tree.Tree;
 
 
public class GraphlrTest {
    public static void main(String[] args) throws Exception {
        try {
            CharStream input = new ANTLRFileStream("/data/workspace/Graphlr/src/org/pbit/graphlr/GraphlrTest.java");
            GraphlrJavaLexer lex = new GraphlrJavaLexer(input);
            CommonTokenStream tokens = new CommonTokenStream(lex);
            GraphlrJavaParser parser = new GraphlrJavaParser(tokens);
            parser.compilationUnit();
 
            //find compilation units
            List<Tree> l = parser.runCypher("start ret=node:node_auto_index(type = 'class') return ret");
            for (Tree t : l) {
                System.out.println("Compilation unit found: " + t.getText()); 
            }
 
            //find methods of a particular class
            l = parser.runCypher("start n=node:node_auto_index(name = 'GraphlrTest') match n-[:IMPLEMENTS]->ret return ret");
            for (Tree t : l) {
                System.out.println("GraphlrTest implements: " + t.getText()); 
            }
        } catch(Throwable t) {
            t.printStackTrace();
        }
    }
 
    protected void dummy () {}
}
 
final class Dummy {
 
}

package org.pbit.graphlr;import java.util.List;import org.antlr.runtime.ANTLRFileStream; import org.antlr.runtime.CharStream; import org.antlr.runtime.CommonTokenStream; import org.antlr.runtime.tree.Tree;public class GraphlrTest { public static void main(String[] args) throws Exception { try { CharStream input = new ANTLRFileStream("/data/workspace/Graphlr/src/org/pbit/graphlr/GraphlrTest.java"); GraphlrJavaLexer lex = new GraphlrJavaLexer(input); CommonTokenStream tokens = new CommonTokenStream(lex); GraphlrJavaParser parser = new GraphlrJavaParser(tokens); parser.compilationUnit(); //find compilation units List<Tree> l = parser.runCypher("start ret=node:node_auto_index(type = 'class') return ret"); for (Tree t : l) { System.out.println("Compilation unit found: " + t.getText()); } //find methods of a particular class l = parser.runCypher("start n=node:node_auto_index(name = 'GraphlrTest') match n-[:IMPLEMENTS]->ret return ret"); for (Tree t : l) { System.out.println("GraphlrTest implements: " + t.getText()); } } catch(Throwable t) { t.printStackTrace(); } } protected void dummy () {} }final class Dummy { }

It’s very simple. First, it still works and feels like antlr3. So you use the generated classes to lex and parse your class (I’m doing it with the one including the test, so you can see the relation). Then you start with the compilation unit. From here, you can use available Neo4j nodes as well as indexes to traverse the AST. In my case, I use Cypher.

There are two traversals finding all the top level classes as well as all methods of the main class. That’s basically it.

So, there is still work to be done. Why I started this implementation is the main reason to extend the Neo4j based AST index to allow direct jumps into method calls, variable declaration etc. It’s a bit more tricky with antlr, but still doable.

If you want to check it out or maybe to contribute, feel free to get it from GitHub: https://github.com/pavlobaron/graphlr

Feedback is very welcome.