Parsing JSON with javacc my Notebook.
Although I didn't wrote a new great programming language, I had a little experience with C lexers/parsers especialy with lex/yacc|flex/bison , a LALR parser. Now I'm programming in Java, I've been looking for the parsers available for this language: the most popular tools seems to be the top-down parsers javacc and antlr. In this post I show how I wrote a simple javacc parser reading a JSON entry (this is just an exercice, it is also easy to write this kind of parser using java.io.StreamTokenizer or java.util.Scanner).
First of all I found that the documentation was very limited and unlike Bison, I had the feeling that the javacc tutorial was a kind of "Look a the examples, isn't it cool ?"? I just writing my notebook here, so on my side I won't explain how I think I've understand how javacc works :-).
OK now, let's go back, to our JSON grammar and to the content of the javacc file.
The file is called JSONHandler.jj, it contains a java class JSONHandler with a main
calling a method object
after reading the input from stdin. This method will parse the json stream , transform it into a java object and echo it on stderr. The class is delimited by the keywords PARSER_BEGIN and PARSER_END
public class JSONHandler {
public static void main(String args[])
{
try
{
JSONHandler parser = new JSONHandler(System.in);
Object o=parser.object();
System.err.println("JAVA OBJECT: "+o);
} catch(Exception err)
{
err.printStackTrace();
}
}
}
PARSER_END(JSONHandler)
Next we declare that the blank characters will be ignored:
{
" "
| "\t"
| "\n"
| "\r"
}
We then declare the lexical tokens (numbers, string, quoted strings) using a BNF grammar. As an example, here, a
SIMPLE_QUOTE_LITERAL
starts and ends with "\'" , it contains an unlimited number of ("escaped C special characters" or "normal characters").{
<#LETTER: ["_","a"-"z","A"-"Z"] >
| <#DIGIT: ["0"-"9"] >
| <#SIGN: ["-","+"]>
| <#EXPONENT: ("E"|"e") (<SIGN>)? (<DIGIT>)+
>
| <FLOATING_NUMBER: (<DIGIT>)* "." (<DIGIT>)* (<EXPO
NENT>)?
| (<DIGIT>)+ (<EXPONENT>) >
| <INT_NUMBER: (<DIGIT>)+ >
| <IDENTIFIER: <LETTER> (<LETTER>|<DIGIT>|<->)* >
| <#ESCAPE_CHAR: "\\" ["n","t","b","r","f","\\","'","\""]
>
| <SIMPLE_QUOTE_LITERAL:
"\'"
( (~["\'","\\","\n","\r"])
| <ESCAPE_CHAR>
)*
"\'"
>
|
<DOUBLE_QUOTE_LITERAL:
"\""
( (~["\"","\\","\n","\r"])
| <ESCAPE_CHAR>
)*
"\""
>
}
As we saw, the parser starts is job by invoking the object method. We expect here a JSON array or a JSON object or another identifier (null ||false||true||string). Those choices store their results in the variable
o
. This method returns a java.lang.Object
.{Object o;}
{
(
o=array()
| o= map()
| o= identifier()
)
{return o;}
}
JSON arrays will be returned as java.util.Vector<Object>. A JSON array is identified as starting by "[" and ending with "]", it contains zero or more JSON objects separated with a comma. Each time an element of this array is found, it is added in the vector. at the end, the vector is returned as the value of this array.
{Vector<Object> vector= new Vector<Object>(); Object o;}
{
"[" ( o= object() {vector.addElement(o);} ("," o=object() {vector.addElement(o);} ) * )? "]"
{
return vector;
}
}
A JSON identifier can be a number, a string, a quoted string (need to be un-escaped), null, true or false. The content of this lexical token is obtained via the Token object which class is generated by javacc.
{Token t;}
{
(
t=<FLOATING_NUMBER>
{
return new Double(t.image);
}
| t=<INT_NUMBER>
{
return new Long(t.image);
}
| t=<IDENTIFIER>
{
if(t.image.equals("true"))
{
return Boolean.TRUE;
}
else if(t.image.equals("false"))
{
return Boolean.FALSE;
}
else if(t.image.equals("null"))
{
return null;
}
return t.image;
}
| t=<SIMPLE_QUOTE_LITERAL>
{
return unescape(t.image);
}
| t=<DOUBLE_QUOTE_LITERAL>
{
return unescape(t.image);
}
)
}
JSON Object will be returned as java.util.HashMap. A JSON Object starts with '{' and ends with '}'. In this case, we're passing this map as an argument each time the parser finds a pair of key/value.
{HashMap<String,Object> map= new HashMap<String,Object>(); }
{
"{" ( keyValue(map) ("," keyValue(map))*)? "}"
{
return map;
}
}
public void keyValue( HashMap<String,Object> map):
{Object k; Object v;}
{
(k=identifier() ":" v=object())
{
if(k==null) throw new ParseException("null cannot be used as key in object");
if(!(k instanceof String)) throw new ParseException(k.toString()+"("+k.getClass()+") cannot
be used as key in object");
map.put(k.toString(),v);
}
}
Compilation:
Java Compiler Compiler Version 4.0 (Parser Generator)
(type "javacc" with no arguments for help)
Reading from file JSONHandler.jj . . .
Parser generated successfully.
javac JSONHandler.java
Testing:
Here is the content of test.json
organisms:[
{
id:10929,
name:"Bovine Rotavirus"
},
{
id:9606,
name:"Homo Sapiens"
}
],
proteins:[
{
label:"NSP3",
description:"Rotavirus Non Structural Protein 3",
organism-id: 10929,
acc: "ACB38353"
},
{
label:"EIF4G",
description:"eukaryotic translation initiation factor 4 gamma",
organism-id: 9606,
acc:"AAI40897"
}
],
interactions:[
{
label:"NSP3 interacts with EIF4G1",
pubmed-id:[77120248,38201627],
proteins:["ACB38353","AAI40897"]
}
]
}
This file is parsed, a *java* object is returned and printed on screen.
JAVA OBJECT: {organisms=[{id=10929, name=Bovine Rotavirus}, {id=9606, name=Homo Sapiens}],
proteins=[{description=Rotavirus Non Structural Protein 3, organism-id=10929, label=NSP3,
acc=ACB38353}, {description=eukaryotic translation initiation factor 4 gamma,
organism-id=9606, label=EIF4G, acc=AAI40897}], interactions=[{pubmed-id=[77120248, 38201627],
label=NSP3 interacts with EIF4G1, proteins=[ACB38353, AAI40897]}]}
Here is the complete javacc source code:
import java.util.Vector;
import java.util.HashMap;
public class JSONHandler {
public static void main(String args[])
{
try
{
JSONHandler parser = new JSONHandler(System.in);
Object o=parser.object();
System.err.println("JAVA OBJECT: "+o);
} catch(Exception err)
{
err.printStackTrace();
}
}
/** unescape a C string */
private static String unescape(String s)
{
StringBuilder sb= new StringBuilder(s.length());
for(int i=1;i< s.length()-1;++i)
{
if(s.charAt(i)=='\\')
{
if(i+1< s.length()-1)
{
++i;
switch(s.charAt(i))
{
case '\n': sb.append('\n'); break;
case '\r': sb.append('\r'); break;
case '\\': sb.append('\\'); break;
case 'b': sb.append('\b'); break;
case 't': sb.append('\t'); break;
case 'f': sb.append('\f'); break;
case '\'': sb.append('\''); break;
case '\"': sb.append('\"'); break;
default: sb.append(s.charAt(i));
}
}
}
else
{
sb.append(s.charAt(i));
}
}
return sb.toString();
}
}
PARSER_END(JSONHandler)
SKIP :
{
" "
| "\t"
| "\n"
| "\r"
}
TOKEN : /* LITERALS */
{
<#LETTER: ["_","a"-"z","A"-"Z"] >
| <#DIGIT: ["0"-"9"] >
| <#SIGN: ["-","+"]>
| <#EXPONENT: ("E"|"e") (<SIGN>)? (<DIGIT>)+ >
| <FLOATING_NUMBER: (<DIGIT>)* "." (<DIGIT>)* (<EXPONENT>)?
| (<DIGIT>)+ (<EXPONENT>) >
| <INT_NUMBER: (<DIGIT>)+ >
| <IDENTIFIER: <LETTER> (<LETTER>|<DIGIT>|"-")* >
| <#ESCAPE_CHAR: "\\" ["n","t","b","r","f","\\","'","\""] >
| <SIMPLE_QUOTE_LITERAL:
"\'"
( (~["\'","\\","\n","\r"])
| <ESCAPE_CHAR>
)*
"\'"
>
|
<DOUBLE_QUOTE_LITERAL:
"\""
( (~["\"","\\","\n","\r"])
| <ESCAPE_CHAR>
)*
"\""
>
}
public Object object():
{Object o;}
{
(
o=array()
| o= map()
| o= identifier()
)
{return o;}
}
public Object identifier():
{Token t;}
{
(
t=<FLOATING_NUMBER>
{
return new Double(t.image);
}
| t=<INT_NUMBER>
{
return new Long(t.image);
}
| t=<IDENTIFIER>
{
if(t.image.equals("true"))
{
return Boolean.TRUE;
}
else if(t.image.equals("false"))
{
return Boolean.FALSE;
}
else if(t.image.equals("null"))
{
return null;
}
return t.image;
}
| t=<SIMPLE_QUOTE_LITERAL>
{
return unescape(t.image);
}
| t=<DOUBLE_QUOTE_LITERAL>
{
return unescape(t.image);
}
)
}
public Vector<Object> array():
{Vector<Object> vector= new Vector<Object>(); Object o;}
{
"[" ( o=object() {vector.addElement(o);} ("," o=object() {vector.addElement(o);} ) * )? "]"
{
return vector;
}
}
public HashMap<String,Object> map():
{HashMap<String,Object> map= new HashMap<String,Object>(); }
{
"{" ( keyValue(map) ("," keyValue(map))*)? "}"
{
return map;
}
}
public void keyValue( HashMap<String,Object> map):
{Object k; Object v;}
{
(k=identifier() ":" v=object())
{
if(k==null) throw new ParseException("null cannot be used as key in object");
if(!(k instanceof String)) throw new ParseException(k.toString()+"("+k.getClass()+") cannot be used as key in object");
map.put(k.toString(),v);
}
}
That's it. What's next ? jjtree is another component of the javacc package and seems to be a promising tool: it builds a tree structure from the grammar. The nodes of this tree can then be visited just like in a DOM/XML document and a language can be implemented, but here again, the documentation is succinct.
9 comments:
Pierre, you can better use List<Object> and Map<Object> as return values, being interfaces instead of implementations, and if you don't need threading in the parsing, ArrayList is faster than Vector.
Thank you egon
Dear Pierre,
I found your post and the information very usefull. I was writing a small JSon lib that I would like to publish, and I would like to include a (slightly modified) version of your JavaCC Grammer as a parser. So I would like to know: May I resuse your grammer? If so, under which license?
Hi Max,
Feel free to use this code in any way you want !:-) Please, just add a reference to me (Pierre Lindenbaum plindenbaum yahoo fr ) and to this post in the source and/or the README.
Pierre,
I've published the library at:
http://max.berger.name/oss/mjl
and reference you in the source file and as contributor. Thank you very much for your post!
Max
No problem Max, thank you for using this small code.
Pierre
i have a similar grammer setup and i'm having some issues. do you mind helping me out? all of my source code is at:
https://sourceforge.net/projects/javajson/
and i've created a fairly detailed bug with test case here:
https://sourceforge.net/tracker/?func=detail&aid=2733371&group_id=162311&atid=823285
the test case is actually committed into the test folder.
max, i've implemented a json library. you should take a look at it before spending a lot of time on it. maybe you can help me out with it.
https://sourceforge.net/projects/javajson/
Very useful example for using javacc in real life.
Thank you a lot, and hoping to see more about this topic.
Post a Comment