First of all I found that the documentation was very limited and unlike Bison, I had the feeling that the javacc tutorial was a kind of "Look a the examples, isn't it cool ?"? I just writing my notebook here, so on my side I won't explain how I think I've understand how javacc works :-).
OK now, let's go back, to our JSON grammar and to the content of the javacc file.
The file is called JSONHandler.jj, it contains a java class JSONHandler with a
main
calling a method object
after reading the input from stdin. This method will parse the json stream , transform it into a java object and echo it on stderr. The class is delimited by the keywords PARSER_BEGIN and PARSER_ENDPARSER_BEGIN(JSONHandler)
public class JSONHandler {
public static void main(String args[])
{
try
{
JSONHandler parser = new JSONHandler(System.in);
Object o=parser.object();
System.err.println("JAVA OBJECT: "+o);
} catch(Exception err)
{
err.printStackTrace();
}
}
}
PARSER_END(JSONHandler)
public class JSONHandler {
public static void main(String args[])
{
try
{
JSONHandler parser = new JSONHandler(System.in);
Object o=parser.object();
System.err.println("JAVA OBJECT: "+o);
} catch(Exception err)
{
err.printStackTrace();
}
}
}
PARSER_END(JSONHandler)
Next we declare that the blank characters will be ignored:
SKIP :
{
" "
| "\t"
| "\n"
| "\r"
}
{
" "
| "\t"
| "\n"
| "\r"
}
We then declare the lexical tokens (numbers, string, quoted strings) using a BNF grammar. As an example, here, a
SIMPLE_QUOTE_LITERAL
starts and ends with "\'" , it contains an unlimited number of ("escaped C special characters" or "normal characters").TOKEN : /* LITERALS */
{
<#LETTER: ["_","a"-"z","A"-"Z"] >
| <#DIGIT: ["0"-"9"] >
| <#SIGN: ["-","+"]>
| <#EXPONENT: ("E"|"e") (<SIGN>)? (<DIGIT>)+
>
| <FLOATING_NUMBER: (<DIGIT>)* "." (<DIGIT>)* (<EXPO
NENT>)?
| (<DIGIT>)+ (<EXPONENT>) >
| <INT_NUMBER: (<DIGIT>)+ >
| <IDENTIFIER: <LETTER> (<LETTER>|<DIGIT>|<->)* >
| <#ESCAPE_CHAR: "\\" ["n","t","b","r","f","\\","'","\""]
>
| <SIMPLE_QUOTE_LITERAL:
"\'"
( (~["\'","\\","\n","\r"])
| <ESCAPE_CHAR>
)*
"\'"
>
|
<DOUBLE_QUOTE_LITERAL:
"\""
( (~["\"","\\","\n","\r"])
| <ESCAPE_CHAR>
)*
"\""
>
}
{
<#LETTER: ["_","a"-"z","A"-"Z"] >
| <#DIGIT: ["0"-"9"] >
| <#SIGN: ["-","+"]>
| <#EXPONENT: ("E"|"e") (<SIGN>)? (<DIGIT>)+
>
| <FLOATING_NUMBER: (<DIGIT>)* "." (<DIGIT>)* (<EXPO
NENT>)?
| (<DIGIT>)+ (<EXPONENT>) >
| <INT_NUMBER: (<DIGIT>)+ >
| <IDENTIFIER: <LETTER> (<LETTER>|<DIGIT>|<->)* >
| <#ESCAPE_CHAR: "\\" ["n","t","b","r","f","\\","'","\""]
>
| <SIMPLE_QUOTE_LITERAL:
"\'"
( (~["\'","\\","\n","\r"])
| <ESCAPE_CHAR>
)*
"\'"
>
|
<DOUBLE_QUOTE_LITERAL:
"\""
( (~["\"","\\","\n","\r"])
| <ESCAPE_CHAR>
)*
"\""
>
}
As we saw, the parser starts is job by invoking the object method. We expect here a JSON array or a JSON object or another identifier (null ||false||true||string). Those choices store their results in the variable
o
. This method returns a java.lang.Object
.public Object object():
{Object o;}
{
(
o=array()
| o= map()
| o= identifier()
)
{return o;}
}
{Object o;}
{
(
o=array()
| o= map()
| o= identifier()
)
{return o;}
}
JSON arrays will be returned as java.util.Vector<Object>. A JSON array is identified as starting by "[" and ending with "]", it contains zero or more JSON objects separated with a comma. Each time an element of this array is found, it is added in the vector. at the end, the vector is returned as the value of this array.
public Vector<Object> array():
{Vector<Object> vector= new Vector<Object>(); Object o;}
{
"[" ( o= object() {vector.addElement(o);} ("," o=object() {vector.addElement(o);} ) * )? "]"
{
return vector;
}
}
{Vector<Object> vector= new Vector<Object>(); Object o;}
{
"[" ( o= object() {vector.addElement(o);} ("," o=object() {vector.addElement(o);} ) * )? "]"
{
return vector;
}
}
A JSON identifier can be a number, a string, a quoted string (need to be un-escaped), null, true or false. The content of this lexical token is obtained via the Token object which class is generated by javacc.
public Object identifier():
{Token t;}
{
(
t=<FLOATING_NUMBER>
{
return new Double(t.image);
}
| t=<INT_NUMBER>
{
return new Long(t.image);
}
| t=<IDENTIFIER>
{
if(t.image.equals("true"))
{
return Boolean.TRUE;
}
else if(t.image.equals("false"))
{
return Boolean.FALSE;
}
else if(t.image.equals("null"))
{
return null;
}
return t.image;
}
| t=<SIMPLE_QUOTE_LITERAL>
{
return unescape(t.image);
}
| t=<DOUBLE_QUOTE_LITERAL>
{
return unescape(t.image);
}
)
}
{Token t;}
{
(
t=<FLOATING_NUMBER>
{
return new Double(t.image);
}
| t=<INT_NUMBER>
{
return new Long(t.image);
}
| t=<IDENTIFIER>
{
if(t.image.equals("true"))
{
return Boolean.TRUE;
}
else if(t.image.equals("false"))
{
return Boolean.FALSE;
}
else if(t.image.equals("null"))
{
return null;
}
return t.image;
}
| t=<SIMPLE_QUOTE_LITERAL>
{
return unescape(t.image);
}
| t=<DOUBLE_QUOTE_LITERAL>
{
return unescape(t.image);
}
)
}
JSON Object will be returned as java.util.HashMap. A JSON Object starts with '{' and ends with '}'. In this case, we're passing this map as an argument each time the parser finds a pair of key/value.
public HashMap<String,Object> map():
{HashMap<String,Object> map= new HashMap<String,Object>(); }
{
"{" ( keyValue(map) ("," keyValue(map))*)? "}"
{
return map;
}
}
public void keyValue( HashMap<String,Object> map):
{Object k; Object v;}
{
(k=identifier() ":" v=object())
{
if(k==null) throw new ParseException("null cannot be used as key in object");
if(!(k instanceof String)) throw new ParseException(k.toString()+"("+k.getClass()+") cannot
be used as key in object");
map.put(k.toString(),v);
}
}
{HashMap<String,Object> map= new HashMap<String,Object>(); }
{
"{" ( keyValue(map) ("," keyValue(map))*)? "}"
{
return map;
}
}
public void keyValue( HashMap<String,Object> map):
{Object k; Object v;}
{
(k=identifier() ":" v=object())
{
if(k==null) throw new ParseException("null cannot be used as key in object");
if(!(k instanceof String)) throw new ParseException(k.toString()+"("+k.getClass()+") cannot
be used as key in object");
map.put(k.toString(),v);
}
}
Compilation:
javacc JSONHandler.jj
Java Compiler Compiler Version 4.0 (Parser Generator)
(type "javacc" with no arguments for help)
Reading from file JSONHandler.jj . . .
Parser generated successfully.
javac JSONHandler.java
Java Compiler Compiler Version 4.0 (Parser Generator)
(type "javacc" with no arguments for help)
Reading from file JSONHandler.jj . . .
Parser generated successfully.
javac JSONHandler.java
Testing:
Here is the content of test.json
{
organisms:[
{
id:10929,
name:"Bovine Rotavirus"
},
{
id:9606,
name:"Homo Sapiens"
}
],
proteins:[
{
label:"NSP3",
description:"Rotavirus Non Structural Protein 3",
organism-id: 10929,
acc: "ACB38353"
},
{
label:"EIF4G",
description:"eukaryotic translation initiation factor 4 gamma",
organism-id: 9606,
acc:"AAI40897"
}
],
interactions:[
{
label:"NSP3 interacts with EIF4G1",
pubmed-id:[77120248,38201627],
proteins:["ACB38353","AAI40897"]
}
]
}
organisms:[
{
id:10929,
name:"Bovine Rotavirus"
},
{
id:9606,
name:"Homo Sapiens"
}
],
proteins:[
{
label:"NSP3",
description:"Rotavirus Non Structural Protein 3",
organism-id: 10929,
acc: "ACB38353"
},
{
label:"EIF4G",
description:"eukaryotic translation initiation factor 4 gamma",
organism-id: 9606,
acc:"AAI40897"
}
],
interactions:[
{
label:"NSP3 interacts with EIF4G1",
pubmed-id:[77120248,38201627],
proteins:["ACB38353","AAI40897"]
}
]
}
This file is parsed, a *java* object is returned and printed on screen.
java JSONHandler < test.json
JAVA OBJECT: {organisms=[{id=10929, name=Bovine Rotavirus}, {id=9606, name=Homo Sapiens}],
proteins=[{description=Rotavirus Non Structural Protein 3, organism-id=10929, label=NSP3,
acc=ACB38353}, {description=eukaryotic translation initiation factor 4 gamma,
organism-id=9606, label=EIF4G, acc=AAI40897}], interactions=[{pubmed-id=[77120248, 38201627],
label=NSP3 interacts with EIF4G1, proteins=[ACB38353, AAI40897]}]}
JAVA OBJECT: {organisms=[{id=10929, name=Bovine Rotavirus}, {id=9606, name=Homo Sapiens}],
proteins=[{description=Rotavirus Non Structural Protein 3, organism-id=10929, label=NSP3,
acc=ACB38353}, {description=eukaryotic translation initiation factor 4 gamma,
organism-id=9606, label=EIF4G, acc=AAI40897}], interactions=[{pubmed-id=[77120248, 38201627],
label=NSP3 interacts with EIF4G1, proteins=[ACB38353, AAI40897]}]}
Here is the complete javacc source code:
PARSER_BEGIN(JSONHandler)
import java.util.Vector;
import java.util.HashMap;
public class JSONHandler {
public static void main(String args[])
{
try
{
JSONHandler parser = new JSONHandler(System.in);
Object o=parser.object();
System.err.println("JAVA OBJECT: "+o);
} catch(Exception err)
{
err.printStackTrace();
}
}
/** unescape a C string */
private static String unescape(String s)
{
StringBuilder sb= new StringBuilder(s.length());
for(int i=1;i< s.length()-1;++i)
{
if(s.charAt(i)=='\\')
{
if(i+1< s.length()-1)
{
++i;
switch(s.charAt(i))
{
case '\n': sb.append('\n'); break;
case '\r': sb.append('\r'); break;
case '\\': sb.append('\\'); break;
case 'b': sb.append('\b'); break;
case 't': sb.append('\t'); break;
case 'f': sb.append('\f'); break;
case '\'': sb.append('\''); break;
case '\"': sb.append('\"'); break;
default: sb.append(s.charAt(i));
}
}
}
else
{
sb.append(s.charAt(i));
}
}
return sb.toString();
}
}
PARSER_END(JSONHandler)
SKIP :
{
" "
| "\t"
| "\n"
| "\r"
}
TOKEN : /* LITERALS */
{
<#LETTER: ["_","a"-"z","A"-"Z"] >
| <#DIGIT: ["0"-"9"] >
| <#SIGN: ["-","+"]>
| <#EXPONENT: ("E"|"e") (<SIGN>)? (<DIGIT>)+ >
| <FLOATING_NUMBER: (<DIGIT>)* "." (<DIGIT>)* (<EXPONENT>)?
| (<DIGIT>)+ (<EXPONENT>) >
| <INT_NUMBER: (<DIGIT>)+ >
| <IDENTIFIER: <LETTER> (<LETTER>|<DIGIT>|"-")* >
| <#ESCAPE_CHAR: "\\" ["n","t","b","r","f","\\","'","\""] >
| <SIMPLE_QUOTE_LITERAL:
"\'"
( (~["\'","\\","\n","\r"])
| <ESCAPE_CHAR>
)*
"\'"
>
|
<DOUBLE_QUOTE_LITERAL:
"\""
( (~["\"","\\","\n","\r"])
| <ESCAPE_CHAR>
)*
"\""
>
}
public Object object():
{Object o;}
{
(
o=array()
| o= map()
| o= identifier()
)
{return o;}
}
public Object identifier():
{Token t;}
{
(
t=<FLOATING_NUMBER>
{
return new Double(t.image);
}
| t=<INT_NUMBER>
{
return new Long(t.image);
}
| t=<IDENTIFIER>
{
if(t.image.equals("true"))
{
return Boolean.TRUE;
}
else if(t.image.equals("false"))
{
return Boolean.FALSE;
}
else if(t.image.equals("null"))
{
return null;
}
return t.image;
}
| t=<SIMPLE_QUOTE_LITERAL>
{
return unescape(t.image);
}
| t=<DOUBLE_QUOTE_LITERAL>
{
return unescape(t.image);
}
)
}
public Vector<Object> array():
{Vector<Object> vector= new Vector<Object>(); Object o;}
{
"[" ( o=object() {vector.addElement(o);} ("," o=object() {vector.addElement(o);} ) * )? "]"
{
return vector;
}
}
public HashMap<String,Object> map():
{HashMap<String,Object> map= new HashMap<String,Object>(); }
{
"{" ( keyValue(map) ("," keyValue(map))*)? "}"
{
return map;
}
}
public void keyValue( HashMap<String,Object> map):
{Object k; Object v;}
{
(k=identifier() ":" v=object())
{
if(k==null) throw new ParseException("null cannot be used as key in object");
if(!(k instanceof String)) throw new ParseException(k.toString()+"("+k.getClass()+") cannot be used as key in object");
map.put(k.toString(),v);
}
}
import java.util.Vector;
import java.util.HashMap;
public class JSONHandler {
public static void main(String args[])
{
try
{
JSONHandler parser = new JSONHandler(System.in);
Object o=parser.object();
System.err.println("JAVA OBJECT: "+o);
} catch(Exception err)
{
err.printStackTrace();
}
}
/** unescape a C string */
private static String unescape(String s)
{
StringBuilder sb= new StringBuilder(s.length());
for(int i=1;i< s.length()-1;++i)
{
if(s.charAt(i)=='\\')
{
if(i+1< s.length()-1)
{
++i;
switch(s.charAt(i))
{
case '\n': sb.append('\n'); break;
case '\r': sb.append('\r'); break;
case '\\': sb.append('\\'); break;
case 'b': sb.append('\b'); break;
case 't': sb.append('\t'); break;
case 'f': sb.append('\f'); break;
case '\'': sb.append('\''); break;
case '\"': sb.append('\"'); break;
default: sb.append(s.charAt(i));
}
}
}
else
{
sb.append(s.charAt(i));
}
}
return sb.toString();
}
}
PARSER_END(JSONHandler)
SKIP :
{
" "
| "\t"
| "\n"
| "\r"
}
TOKEN : /* LITERALS */
{
<#LETTER: ["_","a"-"z","A"-"Z"] >
| <#DIGIT: ["0"-"9"] >
| <#SIGN: ["-","+"]>
| <#EXPONENT: ("E"|"e") (<SIGN>)? (<DIGIT>)+ >
| <FLOATING_NUMBER: (<DIGIT>)* "." (<DIGIT>)* (<EXPONENT>)?
| (<DIGIT>)+ (<EXPONENT>) >
| <INT_NUMBER: (<DIGIT>)+ >
| <IDENTIFIER: <LETTER> (<LETTER>|<DIGIT>|"-")* >
| <#ESCAPE_CHAR: "\\" ["n","t","b","r","f","\\","'","\""] >
| <SIMPLE_QUOTE_LITERAL:
"\'"
( (~["\'","\\","\n","\r"])
| <ESCAPE_CHAR>
)*
"\'"
>
|
<DOUBLE_QUOTE_LITERAL:
"\""
( (~["\"","\\","\n","\r"])
| <ESCAPE_CHAR>
)*
"\""
>
}
public Object object():
{Object o;}
{
(
o=array()
| o= map()
| o= identifier()
)
{return o;}
}
public Object identifier():
{Token t;}
{
(
t=<FLOATING_NUMBER>
{
return new Double(t.image);
}
| t=<INT_NUMBER>
{
return new Long(t.image);
}
| t=<IDENTIFIER>
{
if(t.image.equals("true"))
{
return Boolean.TRUE;
}
else if(t.image.equals("false"))
{
return Boolean.FALSE;
}
else if(t.image.equals("null"))
{
return null;
}
return t.image;
}
| t=<SIMPLE_QUOTE_LITERAL>
{
return unescape(t.image);
}
| t=<DOUBLE_QUOTE_LITERAL>
{
return unescape(t.image);
}
)
}
public Vector<Object> array():
{Vector<Object> vector= new Vector<Object>(); Object o;}
{
"[" ( o=object() {vector.addElement(o);} ("," o=object() {vector.addElement(o);} ) * )? "]"
{
return vector;
}
}
public HashMap<String,Object> map():
{HashMap<String,Object> map= new HashMap<String,Object>(); }
{
"{" ( keyValue(map) ("," keyValue(map))*)? "}"
{
return map;
}
}
public void keyValue( HashMap<String,Object> map):
{Object k; Object v;}
{
(k=identifier() ":" v=object())
{
if(k==null) throw new ParseException("null cannot be used as key in object");
if(!(k instanceof String)) throw new ParseException(k.toString()+"("+k.getClass()+") cannot be used as key in object");
map.put(k.toString(),v);
}
}
That's it. What's next ? jjtree is another component of the javacc package and seems to be a promising tool: it builds a tree structure from the grammar. The nodes of this tree can then be visited just like in a DOM/XML document and a language can be implemented, but here again, the documentation is succinct.
Pierre, you can better use List<Object> and Map<Object> as return values, being interfaces instead of implementations, and if you don't need threading in the parsing, ArrayList is faster than Vector.
ReplyDeleteThank you egon
ReplyDeleteDear Pierre,
ReplyDeleteI found your post and the information very usefull. I was writing a small JSon lib that I would like to publish, and I would like to include a (slightly modified) version of your JavaCC Grammer as a parser. So I would like to know: May I resuse your grammer? If so, under which license?
Hi Max,
ReplyDeleteFeel free to use this code in any way you want !:-) Please, just add a reference to me (Pierre Lindenbaum plindenbaum yahoo fr ) and to this post in the source and/or the README.
Pierre,
ReplyDeleteI've published the library at:
http://max.berger.name/oss/mjl
and reference you in the source file and as contributor. Thank you very much for your post!
Max
No problem Max, thank you for using this small code.
ReplyDeletePierre
i have a similar grammer setup and i'm having some issues. do you mind helping me out? all of my source code is at:
ReplyDeletehttps://sourceforge.net/projects/javajson/
and i've created a fairly detailed bug with test case here:
https://sourceforge.net/tracker/?func=detail&aid=2733371&group_id=162311&atid=823285
the test case is actually committed into the test folder.
max, i've implemented a json library. you should take a look at it before spending a lot of time on it. maybe you can help me out with it.
ReplyDeletehttps://sourceforge.net/projects/javajson/
Very useful example for using javacc in real life.
ReplyDeleteThank you a lot, and hoping to see more about this topic.