SpiderMonkey is intended to be used as an interpreter for executing JavaScript, but it is also possible to use SpiderMonkey simply as a parser for the language. This can be useful if you are writing a tool that needs to work with JavaScript code. For example, the JSCoverage tool uses this technique.
This document demonstrates how to use SpiderMonkey to parse JavaScript. Note that some of the SpiderMonkey APIs used in this document are not considered public and are subject to change at any time; this document applies to version 1.60 of SpiderMonkey.
The following header files must be included:
#include <jsapi.h> #include <jsparse.h> #include <jsscan.h>
The first step is to initialize the JavaScript engine:
JSRuntime * runtime;
JSContext * context;
JSObject * global;
runtime = JS_NewRuntime(8L * 1024L * 1024L);
if (runtime == NULL) {
fprintf(stderr, "cannot create runtime");
exit(EXIT_FAILURE);
}
context = JS_NewContext(runtime, 8192);
if (context == NULL) {
fprintf(stderr, "cannot create context");
exit(EXIT_FAILURE);
}
global = JS_NewObject(context, NULL, NULL, NULL);
if (global == NULL) {
fprintf(stderr, "cannot create global object");
exit(EXIT_FAILURE);
}
if (! JS_InitStandardClasses(context, global)) {
fprintf(stderr, "cannot initialize standard classes");
exit(EXIT_FAILURE);
}
The SpiderMonkey parser can receive JavaScript code from a
FILE * input stream:
JSTokenStream * token_stream;
JSParseNode * node;
token_stream = js_NewFileTokenStream(context, NULL, stdin);
if (token_stream == NULL) {
fprintf(stderr, "cannot create token stream from file\n");
exit(EXIT_FAILURE);
}
node = js_ParseTokenStream(context, global, token_stream);
if (node == NULL) {
fprintf(stderr, "parse error in file\n");
exit(EXIT_FAILURE);
}
The result of the call to js_ParseTokenStream is a parse tree.
The format of the parse tree is documented in
jsparse.h.
The nodes of the tree are struct JSParseNode objects. This
structure has the following fields:
pn_type
TOK_FUNCTION for function definitions,
TOK_IF for if statements,
TOK_WHILE for while statements, and so on.
All possible values are defined in
jsscan.h.
pn_u
unary
struct which has a JSParseNode * field
named kid.
binary
struct which has two JSParseNode *
fields named left and right.
ternary
struct which has three JSParseNode *
fields named kid1, kid2 and kid3.
list
struct which has a JSParseNode * field
named head which points to a linked list of
JSParseNode objects. For each element of the list, the
pn_next field points to the next element of the list or
NULL for the tail of the list.
func
name
struct which has a JSAtom * field
named atom and a JSParseNode * field named
expr.
dval
double
value.
pn_arity
pn_u is valid (although in most cases this can be determined from
the pn_type field). The value is one of the following:
PN_UNARY
PN_BINARY
PN_TERNARY
PN_LIST
PN_FUNC
PN_NAME
PN_NULLARY
pn_next
pn_op
pn_pos
node->pn_pos.begin.lineno gives the line number of the first character in the node.
node->pn_pos.end.lineno gives the line number of the last character in the node.
node->pn_pos.begin.index gives the column number of the first character in the node.
node->pn_pos.end.index gives the column number of the last character in the node.
For example, suppose that the variable node points to a node
representing a while statement; then the
node->pn_type field is TOK_WHILE, the
node->pn_arity field is PN_BINARY, and the
node->pn_u.binary field is used. The
node->pn_u.binary.left field points to a JSParseNode which
represents the condition of the while statement, and the
node->pn_u.binary.right field points to a JSParseNode which
represents the body of the while statement.
It is possible to refer to fields in the pn_u union using a shorter form,
thanks to some macros defined in
jsparse.h.
For example, the node->pn_u.binary.left field has the shorter form
node->pn_left.
We now have sufficient information to write code to display an outline of a parse tree:
const char * TOKENS[81] = {
"EOF", "EOL", "SEMI", "COMMA", "ASSIGN", "HOOK", "COLON", "OR", "AND",
"BITOR", "BITXOR", "BITAND", "EQOP", "RELOP", "SHOP", "PLUS", "MINUS", "STAR",
"DIVOP", "UNARYOP", "INC", "DEC", "DOT", "LB", "RB", "LC", "RC", "LP", "RP",
"NAME", "NUMBER", "STRING", "OBJECT", "PRIMARY", "FUNCTION", "EXPORT",
"IMPORT", "IF", "ELSE", "SWITCH", "CASE", "DEFAULT", "WHILE", "DO", "FOR",
"BREAK", "CONTINUE", "IN", "VAR", "WITH", "RETURN", "NEW", "DELETE",
"DEFSHARP", "USESHARP", "TRY", "CATCH", "FINALLY", "THROW", "INSTANCEOF",
"DEBUGGER", "XMLSTAGO", "XMLETAGO", "XMLPTAGC", "XMLTAGC", "XMLNAME",
"XMLATTR", "XMLSPACE", "XMLTEXT", "XMLCOMMENT", "XMLCDATA", "XMLPI", "AT",
"DBLCOLON", "ANYNAME", "DBLDOT", "FILTER", "XMLELEM", "XMLLIST", "RESERVED",
"LIMIT",
};
const int NUM_TOKENS = sizeof(TOKENS) / sizeof(TOKENS[0]);
void print_tree(JSParseNode * root, int indent) {
if (root == NULL) {
return;
}
printf("%*s", indent, "");
if (root->pn_type >= NUM_TOKENS) {
printf("UNKNOWN");
}
else {
printf("%s: starts at line %d, column %d, ends at line %d, column %d",
TOKENS[root->pn_type],
root->pn_pos.begin.lineno, root->pn_pos.begin.index,
root->pn_pos.end.lineno, root->pn_pos.end.index);
}
printf("\n");
switch (root->pn_arity) {
case PN_UNARY:
print_tree(root->pn_kid, indent + 2);
break;
case PN_BINARY:
print_tree(root->pn_left, indent + 2);
print_tree(root->pn_right, indent + 2);
break;
case PN_TERNARY:
print_tree(root->pn_kid1, indent + 2);
print_tree(root->pn_kid2, indent + 2);
print_tree(root->pn_kid3, indent + 2);
break;
case PN_LIST:
{
JSParseNode * p;
for (p = root->pn_head; p != NULL; p = p->pn_next) {
print_tree(p, indent + 2);
}
}
break;
case PN_FUNC:
case PN_NAME:
case PN_NULLARY:
break;
default:
fprintf(stderr, "Unknown node type\n");
exit(EXIT_FAILURE);
break;
}
}
For example, consider this JavaScript program:
var x, y; x = 1; y = x * 2;The program has the following parse tree:
LC: starts at line 0, column 0, ends at line 3, column 10
VAR: starts at line 1, column 0, ends at line 1, column 8
NAME: starts at line 1, column 4, ends at line 1, column 5
NAME: starts at line 1, column 7, ends at line 1, column 8
SEMI: starts at line 2, column 0, ends at line 2, column 5
ASSIGN: starts at line 2, column 0, ends at line 2, column 5
NAME: starts at line 2, column 0, ends at line 2, column 1
NUMBER: starts at line 2, column 4, ends at line 2, column 5
SEMI: starts at line 3, column 0, ends at line 3, column 9
ASSIGN: starts at line 3, column 0, ends at line 3, column 9
NAME: starts at line 3, column 0, ends at line 3, column 1
STAR: starts at line 3, column 4, ends at line 3, column 9
NAME: starts at line 3, column 4, ends at line 3, column 5
NUMBER: starts at line 3, column 8, ends at line 3, column 9
Memory used by SpiderMonkey can be freed when no longer needed:
JS_DestroyContext(context); JS_DestroyRuntime(runtime);
You can download the complete code for displaying the parse tree. You will requires SpiderMonkey version 1.60 to compile the code.
To compile the example on Linux, using GCC:
gcc -DXP_UNIX -I/path/to/spidermonkey/headers -o example example.c -L/path/to/spidermonkey/library -ljs -lm
To compile the example on Windows, using Microsoft's command line compiler:
cl.exe /DXP_WIN /I\path\to\spidermonkey\headers example.c /MD /link /libpath:\path\to\spidermonkey\library js32.lib
Detailed documentation for all the different types of parse nodes is located in
jsparse.h
and will not be repeated here. The following sections merely attempt to clarify
the details for some types of nodes.
PN_NAME nodes
Nodes with a pn_arity of PN_NAME have a field
node->pn_atom of type JSAtom *. This data type can
converted to JSString *. The JSString data type is
defined in jsstr.h to represent an array of 16-bit
jschar characters.
#include <jsatom.h>
...
JSAtom * atom = node->pn_atom;
JSString * string = ATOM_TO_STRING(atom);
for (int i = 0; i < string->length; i++) {
/* assuming that this atom contains only ASCII characters */
char c = string->chars[i];
putc(c);
}
PN_FUNC (TOK_FUNC) nodes
If you need to obtain the name of a function or its parameters,
see the js_DecompileFunction function in
jsopcode.c
for an example. The following code shows roughly what you need to do:
JSObject * object = ATOM_TO_OBJECT(node->pn_funAtom);
JSFunction * function = (JSFunction *) JS_GetPrivate(context, object);
/* function name */
if (function->atom) {
/* function->atom is a JSAtom * containing the function name */
...
}
/* function parameters */
JSAtom ** params = malloc(function->nargs * sizeof(JSAtom *));
for (int i = 0; i < function->nargs; i++) {
/* initialize to NULL for sanity check */
params[i] = NULL;
}
JSScope * scope = OBJ_SCOPE(object);
for (JSScopeProperty * scope_property = SCOPE_LAST_PROP(scope);
scope_property != NULL;
scope_property = scope_property->parent) {
if (scope_property->getter != js_GetArgument) {
continue;
}
params[(uint16) scope_property->shortid] = JSID_TO_ATOM(scope_property->id);
}
for (int i = 0; i < function->nargs; i++) {
assert(params[i] != NULL);
/* params[i] is a JSAtom * containing the function name */
...
}
free(params);
TOK_ASSIGN nodes
A TOK_ASSIGN is not a simple assignment if pn_op has
one of the following values:
pn_op |
Operator |
|---|---|
JSOP_ADD |
+= |
JSOP_SUB |
-= |
JSOP_MUL |
*= |
JSOP_DIV |
/= |
JSOP_MOD |
%= |
JSOP_LSH |
<<= |
JSOP_RSH |
>>= |
JSOP_URSH |
>>>= |
JSOP_BITAND |
&= |
JSOP_BITOR |
|= |
JSOP_BITXOR |
^= |
TOK_BITOR, TOK_BITXOR, TOK_BITAND,
TOK_EQOP, TOK_RELOP, TOK_SHOP,
TOK_PLUS, TOK_MINUS, TOK_STAR,
TOK_DIVOP nodes
These nodes usually have a pn_arity of PN_BINARY,
but an expression like 1 + 2 + 3 will be represented as a single
node with a pn_arity of PN_LIST.
TOK_INC, TOK_DEC nodes
The pn_op value is one of the following:
pn_op |
Operator |
|---|---|
JSOP_INCNAME, JSOP_INCPROP, JSOP_INCELEM |
prefix increment |
JSOP_DECNAME, JSOP_DECPROP, JSOP_DECELEM |
prefix decrement |
JSOP_NAMEINC, JSOP_PROPINC, JSOP_ELEMINC |
postfix increment |
JSOP_NAMEDEC, JSOP_PROPDEC, JSOP_ELEMDEC |
postfix decrement |
TOK_STRING nodes
For a node of type TOK_STRING, pn_atom contains the
text of the string, without quotes and with any escape characters already applied. For
example, the string literal "1\\2\n" is represented by a node
whose pn_atom contains four characters: 1,
\, 2, and a newline character.
TOK_OBJECT nodes
Nodes with a pn_op value of JSOP_REGEXP represent
regular expression literals. The text of the regular expression can be
obtained with the following code:
JSObject * object = ATOM_TO_OBJECT(node->pn_atom); jsval result; js_regexp_toString(context, object, 0, NULL, &result); JSString * string = JSVAL_TO_STRING(result);
TOK_INSTANCEOF nodes
These nodes have a pn_arity of PN_BINARY, with
pn_left representing the expression to the left of instanceof and
pn_right representing the expression to the right of instanceof.
TOK_IN nodes
These nodes have a pn_arity of PN_BINARY, with
pn_left representing the expression to the left of in and
pn_right representing the expression to the right of in.