SpiderMonkey is intended to be used as an interpreter for executing JavaScript, but it is also possible to use SpiderMonkey simply as a parser for the language. This can be useful if you are writing a tool that needs to work with JavaScript code. For example, the JSCoverage tool uses this technique.
This document demonstrates how to use SpiderMonkey to parse JavaScript. Note that some of the SpiderMonkey APIs used in this document are not considered public and are subject to change at any time; this document applies to version 1.60 of SpiderMonkey.
The following header files must be included:
#include <jsapi.h> #include <jsparse.h> #include <jsscan.h>
The first step is to initialize the JavaScript engine:
JSRuntime * runtime; JSContext * context; JSObject * global; runtime = JS_NewRuntime(8L * 1024L * 1024L); if (runtime == NULL) { fprintf(stderr, "cannot create runtime"); exit(EXIT_FAILURE); } context = JS_NewContext(runtime, 8192); if (context == NULL) { fprintf(stderr, "cannot create context"); exit(EXIT_FAILURE); } global = JS_NewObject(context, NULL, NULL, NULL); if (global == NULL) { fprintf(stderr, "cannot create global object"); exit(EXIT_FAILURE); } if (! JS_InitStandardClasses(context, global)) { fprintf(stderr, "cannot initialize standard classes"); exit(EXIT_FAILURE); }
The SpiderMonkey parser can receive JavaScript code from a
FILE *
input stream:
JSTokenStream * token_stream; JSParseNode * node; token_stream = js_NewFileTokenStream(context, NULL, stdin); if (token_stream == NULL) { fprintf(stderr, "cannot create token stream from file\n"); exit(EXIT_FAILURE); } node = js_ParseTokenStream(context, global, token_stream); if (node == NULL) { fprintf(stderr, "parse error in file\n"); exit(EXIT_FAILURE); }
The result of the call to js_ParseTokenStream
is a parse tree.
The format of the parse tree is documented in
jsparse.h
.
The nodes of the tree are struct JSParseNode
objects. This
structure has the following fields:
pn_type
TOK_FUNCTION
for function definitions,
TOK_IF
for if
statements,
TOK_WHILE
for while
statements, and so on.
All possible values are defined in
jsscan.h
.
pn_u
unary
struct
which has a JSParseNode *
field
named kid
.
binary
struct
which has two JSParseNode *
fields named left
and right
.
ternary
struct
which has three JSParseNode *
fields named kid1
, kid2
and kid3
.
list
struct
which has a JSParseNode *
field
named head
which points to a linked list of
JSParseNode
objects. For each element of the list, the
pn_next
field points to the next element of the list or
NULL
for the tail of the list.
func
name
struct
which has a JSAtom *
field
named atom
and a JSParseNode *
field named
expr
.
dval
double
value.
pn_arity
pn_u
is valid (although in most cases this can be determined from
the pn_type
field). The value is one of the following:
PN_UNARY
PN_BINARY
PN_TERNARY
PN_LIST
PN_FUNC
PN_NAME
PN_NULLARY
pn_next
pn_op
pn_pos
node->pn_pos.begin.lineno
gives the line number of the first character in the node.
node->pn_pos.end.lineno
gives the line number of the last character in the node.
node->pn_pos.begin.index
gives the column number of the first character in the node.
node->pn_pos.end.index
gives the column number of the last character in the node.
For example, suppose that the variable node
points to a node
representing a while
statement; then the
node->pn_type
field is TOK_WHILE
, the
node->pn_arity
field is PN_BINARY
, and the
node->pn_u.binary
field is used. The
node->pn_u.binary.left
field points to a JSParseNode
which
represents the condition of the while
statement, and the
node->pn_u.binary.right
field points to a JSParseNode
which
represents the body of the while
statement.
It is possible to refer to fields in the pn_u
union using a shorter form,
thanks to some macros defined in
jsparse.h
.
For example, the node->pn_u.binary.left
field has the shorter form
node->pn_left
.
We now have sufficient information to write code to display an outline of a parse tree:
const char * TOKENS[81] = { "EOF", "EOL", "SEMI", "COMMA", "ASSIGN", "HOOK", "COLON", "OR", "AND", "BITOR", "BITXOR", "BITAND", "EQOP", "RELOP", "SHOP", "PLUS", "MINUS", "STAR", "DIVOP", "UNARYOP", "INC", "DEC", "DOT", "LB", "RB", "LC", "RC", "LP", "RP", "NAME", "NUMBER", "STRING", "OBJECT", "PRIMARY", "FUNCTION", "EXPORT", "IMPORT", "IF", "ELSE", "SWITCH", "CASE", "DEFAULT", "WHILE", "DO", "FOR", "BREAK", "CONTINUE", "IN", "VAR", "WITH", "RETURN", "NEW", "DELETE", "DEFSHARP", "USESHARP", "TRY", "CATCH", "FINALLY", "THROW", "INSTANCEOF", "DEBUGGER", "XMLSTAGO", "XMLETAGO", "XMLPTAGC", "XMLTAGC", "XMLNAME", "XMLATTR", "XMLSPACE", "XMLTEXT", "XMLCOMMENT", "XMLCDATA", "XMLPI", "AT", "DBLCOLON", "ANYNAME", "DBLDOT", "FILTER", "XMLELEM", "XMLLIST", "RESERVED", "LIMIT", }; const int NUM_TOKENS = sizeof(TOKENS) / sizeof(TOKENS[0]); void print_tree(JSParseNode * root, int indent) { if (root == NULL) { return; } printf("%*s", indent, ""); if (root->pn_type >= NUM_TOKENS) { printf("UNKNOWN"); } else { printf("%s: starts at line %d, column %d, ends at line %d, column %d", TOKENS[root->pn_type], root->pn_pos.begin.lineno, root->pn_pos.begin.index, root->pn_pos.end.lineno, root->pn_pos.end.index); } printf("\n"); switch (root->pn_arity) { case PN_UNARY: print_tree(root->pn_kid, indent + 2); break; case PN_BINARY: print_tree(root->pn_left, indent + 2); print_tree(root->pn_right, indent + 2); break; case PN_TERNARY: print_tree(root->pn_kid1, indent + 2); print_tree(root->pn_kid2, indent + 2); print_tree(root->pn_kid3, indent + 2); break; case PN_LIST: { JSParseNode * p; for (p = root->pn_head; p != NULL; p = p->pn_next) { print_tree(p, indent + 2); } } break; case PN_FUNC: case PN_NAME: case PN_NULLARY: break; default: fprintf(stderr, "Unknown node type\n"); exit(EXIT_FAILURE); break; } }For example, consider this JavaScript program:
var x, y; x = 1; y = x * 2;The program has the following parse tree:
LC: starts at line 0, column 0, ends at line 3, column 10 VAR: starts at line 1, column 0, ends at line 1, column 8 NAME: starts at line 1, column 4, ends at line 1, column 5 NAME: starts at line 1, column 7, ends at line 1, column 8 SEMI: starts at line 2, column 0, ends at line 2, column 5 ASSIGN: starts at line 2, column 0, ends at line 2, column 5 NAME: starts at line 2, column 0, ends at line 2, column 1 NUMBER: starts at line 2, column 4, ends at line 2, column 5 SEMI: starts at line 3, column 0, ends at line 3, column 9 ASSIGN: starts at line 3, column 0, ends at line 3, column 9 NAME: starts at line 3, column 0, ends at line 3, column 1 STAR: starts at line 3, column 4, ends at line 3, column 9 NAME: starts at line 3, column 4, ends at line 3, column 5 NUMBER: starts at line 3, column 8, ends at line 3, column 9
Memory used by SpiderMonkey can be freed when no longer needed:
JS_DestroyContext(context); JS_DestroyRuntime(runtime);
You can download the complete code for displaying the parse tree. You will requires SpiderMonkey version 1.60 to compile the code.
To compile the example on Linux, using GCC:
gcc -DXP_UNIX -I/path/to/spidermonkey/headers -o example example.c -L/path/to/spidermonkey/library -ljs -lm
To compile the example on Windows, using Microsoft's command line compiler:
cl.exe /DXP_WIN /I\path\to\spidermonkey\headers example.c /MD /link /libpath:\path\to\spidermonkey\library js32.lib
Detailed documentation for all the different types of parse nodes is located in
jsparse.h
and will not be repeated here. The following sections merely attempt to clarify
the details for some types of nodes.
PN_NAME
nodes
Nodes with a pn_arity
of PN_NAME
have a field
node->pn_atom
of type JSAtom *
. This data type can
converted to JSString *
. The JSString
data type is
defined in jsstr.h
to represent an array of 16-bit
jschar
characters.
#include <jsatom.h> ... JSAtom * atom = node->pn_atom; JSString * string = ATOM_TO_STRING(atom); for (int i = 0; i < string->length; i++) { /* assuming that this atom contains only ASCII characters */ char c = string->chars[i]; putc(c); }
PN_FUNC
(TOK_FUNC
) nodes
If you need to obtain the name of a function or its parameters,
see the js_DecompileFunction
function in
jsopcode.c
for an example. The following code shows roughly what you need to do:
JSObject * object = ATOM_TO_OBJECT(node->pn_funAtom); JSFunction * function = (JSFunction *) JS_GetPrivate(context, object); /* function name */ if (function->atom) { /* function->atom is a JSAtom * containing the function name */ ... } /* function parameters */ JSAtom ** params = malloc(function->nargs * sizeof(JSAtom *)); for (int i = 0; i < function->nargs; i++) { /* initialize to NULL for sanity check */ params[i] = NULL; } JSScope * scope = OBJ_SCOPE(object); for (JSScopeProperty * scope_property = SCOPE_LAST_PROP(scope); scope_property != NULL; scope_property = scope_property->parent) { if (scope_property->getter != js_GetArgument) { continue; } params[(uint16) scope_property->shortid] = JSID_TO_ATOM(scope_property->id); } for (int i = 0; i < function->nargs; i++) { assert(params[i] != NULL); /* params[i] is a JSAtom * containing the function name */ ... } free(params);
TOK_ASSIGN
nodes
A TOK_ASSIGN
is not a simple assignment if pn_op
has
one of the following values:
pn_op |
Operator |
---|---|
JSOP_ADD |
+= |
JSOP_SUB |
-= |
JSOP_MUL |
*= |
JSOP_DIV |
/= |
JSOP_MOD |
%= |
JSOP_LSH |
<<= |
JSOP_RSH |
>>= |
JSOP_URSH |
>>>= |
JSOP_BITAND |
&= |
JSOP_BITOR |
|= |
JSOP_BITXOR |
^= |
TOK_BITOR
, TOK_BITXOR
, TOK_BITAND
,
TOK_EQOP
, TOK_RELOP
, TOK_SHOP
,
TOK_PLUS
, TOK_MINUS
, TOK_STAR
,
TOK_DIVOP
nodes
These nodes usually have a pn_arity
of PN_BINARY
,
but an expression like 1 + 2 + 3
will be represented as a single
node with a pn_arity
of PN_LIST
.
TOK_INC
, TOK_DEC
nodes
The pn_op
value is one of the following:
pn_op |
Operator |
---|---|
JSOP_INCNAME , JSOP_INCPROP , JSOP_INCELEM |
prefix increment |
JSOP_DECNAME , JSOP_DECPROP , JSOP_DECELEM |
prefix decrement |
JSOP_NAMEINC , JSOP_PROPINC , JSOP_ELEMINC |
postfix increment |
JSOP_NAMEDEC , JSOP_PROPDEC , JSOP_ELEMDEC |
postfix decrement |
TOK_STRING
nodes
For a node of type TOK_STRING
, pn_atom
contains the
text of the string, without quotes and with any escape characters already applied. For
example, the string literal "1\\2\n"
is represented by a node
whose pn_atom
contains four characters: 1
,
\
, 2
, and a newline character.
TOK_OBJECT
nodes
Nodes with a pn_op
value of JSOP_REGEXP
represent
regular expression literals. The text of the regular expression can be
obtained with the following code:
JSObject * object = ATOM_TO_OBJECT(node->pn_atom); jsval result; js_regexp_toString(context, object, 0, NULL, &result); JSString * string = JSVAL_TO_STRING(result);
TOK_INSTANCEOF
nodes
These nodes have a pn_arity
of PN_BINARY
, with
pn_left
representing the expression to the left of instanceof
and
pn_right
representing the expression to the right of instanceof
.
TOK_IN
nodes
These nodes have a pn_arity
of PN_BINARY
, with
pn_left
representing the expression to the left of in
and
pn_right
representing the expression to the right of in
.