Browse Source

Code generator rewrite

This is a complete rewrite of the PEG.js code generator. Its goals are:

  1. Allow optimizing the generated parser code for code size as well as
     for parsing speed.

  2. Prepare ground for future optimizations and big features (like
     incremental parsing).

  2. Replace the old template-based code-generation system with
     something more lightweight and flexible.

  4. General code cleanup (structure, style, variable names, ...).

New Architecture
----------------

The new code generator consists of two steps:

  * Bytecode generator -- produces bytecode for an abstract virtual
    machine

  * JavaScript generator -- produces JavaScript code based on the
    bytecode

The abstract virtual machine is stack-based. Originally I wanted to make
it register-based, but it turned out that all the code related to it
would be more complex and the bytecode itself would be longer (because
of explicit register specifications in instructions). The only downsides
of the stack-based approach seem to be few small inefficiencies (see
e.g. the |NIP| instruction), which seem to be insignificant.

The new generator allows optimizing for parsing speed or code size (you
can choose using the |optimize| option of the |PEG.buildParser| method
or the --optimize/-o option on the command-line).

When optimizing for size, the JavaScript generator emits the bytecode
together with its constant table and a generic bytecode interpreter.
Because the interpreter is small and the bytecode and constant table
grow only slowly with size of the grammar, the resulting parser is also
small.

When optimizing for speed, the JavaScript generator just compiles the
bytecode into JavaScript. The generated code is relatively efficient, so
the resulting parser is fast.

Internal Identifiers
--------------------

As a small bonus, all internal identifiers visible to user code in the
initializer, actions and predicates are prefixed by |peg$|. This lowers
the chance that identifiers in user code will conflict with the ones
from PEG.js. It also makes using any internals in user code ugly, which
is a good thing. This solves GH-92.

Performance
-----------

The new code generator improved parsing speed and parser code size
significantly. The generated parsers are now:

  * 39% faster when optimizing for speed

  * 69% smaller when optimizing for size (without minification)

  * 31% smaller when optimizing for size (with minification)

(Parsing speed was measured using the |benchmark/run| script. Code size
was measured by generating parsers for examples in the |examples|
directory and adding up the file sizes. Minification was done by |uglify
--ascii| in version 1.3.4.)

Final Note
----------

This is just a beginning! The new code generator lays a foundation upon
which many optimizations and improvements can (and will) be made.

Stay tuned :-)
redux
David Majda 9 years ago
parent
commit
fe1ca481ab
  1. 5
      Makefile
  2. 2
      README.md
  3. 8
      benchmark/index.css
  4. 5
      benchmark/index.html
  5. 3
      benchmark/index.js
  6. 19
      benchmark/run
  7. 19
      bin/pegjs
  8. 4
      lib/compiler.js
  9. 46
      lib/compiler/opcodes.js
  10. 4
      lib/compiler/passes.js
  11. 230
      lib/compiler/passes/allocate-registers.js
  12. 594
      lib/compiler/passes/generate-bytecode.js
  13. 867
      lib/compiler/passes/generate-code.js
  14. 950
      lib/compiler/passes/generate-javascript.js
  15. 5253
      lib/parser.js
  16. 14
      lib/utils.js
  17. 5
      package.json
  18. 309
      spec/compiler/passes/allocate-registers.spec.js
  19. 674
      spec/compiler/passes/generate-bytecode.spec.js
  20. 7
      spec/generated-parser.spec.js
  21. 2
      spec/index.html

5
Makefile

@ -8,8 +8,9 @@ PEGJS_VERSION = `cat $(VERSION_FILE)`
MODULES = utils \
grammar-error \
parser \
compiler/passes/allocate-registers \
compiler/passes/generate-code \
compiler/opcodes \
compiler/passes/generate-bytecode \
compiler/passes/generate-javascript \
compiler/passes/remove-proxy-rules \
compiler/passes/report-left-recursion \
compiler/passes/report-missing-rules \

2
README.md

@ -108,6 +108,8 @@ object to `PEG.buildParser`. The following options are supported:
* `output` — if set to `"parser"`, the method will return generated parser
object; if set to `"source"`, it will return parser source code as a string
(default: `"parser"`)
* `optimize`— selects between optimizing the generated parser for parsing
speed (`"speed"`) or code size (`"size"`) (default: `"speed"`)
Using the Parser
----------------

8
benchmark/index.css

@ -24,11 +24,11 @@ table tr.total td.parse-speed .value { font-size: 175%; }
a, a:visited { color: #3d586c; }
#options {
width: 46em;
margin: 2em auto; border-radius: .5em; -moz-border-radius: .5em; padding: .5em 1em;
text-align: center;
width: 45em;
margin: 2em auto; border-radius: .5em; -moz-border-radius: .5em; padding: .5em 1.5em;
background-color: #f0f0f0;
}
#options #run-count { width: 3em; }
#options #cache { margin-left: 2em; }
#options #run { width: 5em; margin-left: 2em; }
#options label[for=optimize] { margin-left: 2em; }
#options #run { float:right; width: 5em; }

5
benchmark/index.html

@ -13,6 +13,11 @@
<input type="text" id="run-count" value="10"> times
<input type="checkbox" id="cache">
<label for="cache">Use results cache</label>
<label for="optimize">Optimize:</label>
<select id="optimize">
<option value="speed">Speed</option>
<option value="size">Size</option>
</select>
<input type="button" id="run" value="Run">
</div>

3
benchmark/index.js

@ -63,7 +63,8 @@ $("#run").click(function() {
}
var options = {
cache: $("#cache").is(":checked"),
cache: $("#cache").is(":checked"),
optimize: $("#optimize").val()
};
Runner.run(benchmarks, runCount, options, {

19
benchmark/run

@ -81,6 +81,8 @@ function printHelp() {
util.puts("Options:");
util.puts(" -n, --run-count <n> number of runs (default: 10)");
util.puts(" --cache make tested parsers cache results");
util.puts(" -o, --optimize <goal> select optimization for speed or size (default:");
util.puts(" speed)");
}
function exitSuccess() {
@ -111,7 +113,10 @@ function nextArg() {
/* Main */
var runCount = 10;
var options = { };
var options = {
cache: false,
optimize: "speed"
};
while (args.length > 0 && isOption(args[0])) {
switch (args[0]) {
@ -131,6 +136,18 @@ while (args.length > 0 && isOption(args[0])) {
options.cache = true;
break;
case "-o":
case "--optimize":
nextArg();
if (args.length === 0) {
abort("Missing parameter of the -o/--optimize option.");
}
if (args[0] !== "speed" && args[0] !== "size") {
abort("Optimization goal must be either \"speed\" or \"size\".");
}
options.optimize = args[0];
break;
case "-h":
case "--help":
printHelp();

19
bin/pegjs

@ -29,6 +29,8 @@ function printHelp() {
util.puts(" parser will be allowed to start parsing");
util.puts(" from (default: the first rule in the");
util.puts(" grammar)");
util.puts(" -o, --optimize <goal> select optimization for speed or size (default:");
util.puts(" speed)");
util.puts(" -v, --version print version information and exit");
util.puts(" -h, --help print help and exit");
}
@ -71,8 +73,9 @@ function readStream(inputStream, callback) {
/* This makes the generated parser a CommonJS module by default. */
var exportVar = "module.exports";
var options = {
cache: false,
output: "source"
cache: false,
output: "source",
optimize: "speed"
};
while (args.length > 0 && isOption(args[0])) {
@ -100,6 +103,18 @@ while (args.length > 0 && isOption(args[0])) {
.map(function(s) { return s.trim() });
break;
case "-o":
case "--optimize":
nextArg();
if (args.length === 0) {
abort("Missing parameter of the -o/--optimize option.");
}
if (args[0] !== "speed" && args[0] !== "size") {
abort("Optimization goal must be either \"speed\" or \"size\".");
}
options.optimize = args[0];
break;
case "-v":
case "--version":
printVersion();

4
lib/compiler.js

@ -11,8 +11,8 @@ module.exports = {
"reportMissingRules",
"reportLeftRecursion",
"removeProxyRules",
"allocateRegisters",
"generateCode"
"generateBytecode",
"generateJavascript"
],
/*

46
lib/compiler/opcodes.js

@ -0,0 +1,46 @@
/* Bytecode instruction opcodes. */
module.exports = {
/* Stack Manipulation */
PUSH: 0, // PUSH c
PUSH_CURR_POS: 1, // PUSH_CURR_POS
POP: 2, // POP
POP_CURR_POS: 3, // POP_CURR_POS
POP_N: 4, // POP_N n
NIP: 5, // NIP
NIP_CURR_POS: 6, // NIP_CURR_POS
APPEND: 7, // APPEND
WRAP: 8, // WRAP n
TEXT: 9, // TEXT
/* Conditions and Loops */
IF: 10, // IF t, f
IF_ERROR: 11, // IF_ERROR t, f
IF_NOT_ERROR: 12, // IF_NOT_ERROR t, f
WHILE_NOT_ERROR: 13, // WHILE_NOT_ERROR b
/* Matching */
MATCH_ANY: 14, // MATCH_ANY a, f, ...
MATCH_STRING: 15, // MATCH_STRING s, a, f, ...
MATCH_STRING_IC: 16, // MATCH_STRING_IC s, a, f, ...
MATCH_REGEXP: 17, // MATCH_REGEXP r, a, f, ...
ACCEPT_N: 18, // ACCEPT_N n
ACCEPT_STRING: 19, // ACCEPT_STRING s
FAIL: 20, // FAIL e
/* Calls */
REPORT_SAVED_POS: 21, // REPORT_SAVED_POS p
REPORT_CURR_POS: 22, // REPORT_CURR_POS
CALL: 23, // CALL f, n, pc, p1, p2, ..., pN
/* Rules */
RULE: 24, // RULE r
/* Failure Reporting */
SILENT_FAILS_ON: 25, // SILENT_FAILS_ON
SILENT_FAILS_OFF: 26 // SILENT_FAILS_FF
};

4
lib/compiler/passes.js

@ -9,6 +9,6 @@ module.exports = {
reportMissingRules: require("./passes/report-missing-rules"),
reportLeftRecursion: require("./passes/report-left-recursion"),
removeProxyRules: require("./passes/remove-proxy-rules"),
allocateRegisters: require("./passes/allocate-registers"),
generateCode: require("./passes/generate-code")
generateBytecode: require("./passes/generate-bytecode"),
generateJavascript: require("./passes/generate-javascript")
};

230
lib/compiler/passes/allocate-registers.js

@ -1,230 +0,0 @@
var utils = require("../../utils");
/*
* Allocates registers that the generated code for each node will use to store
* match results and parse positions. For "action", "semantic_and" and
* "semantic_or" nodes it also computes visibility of labels at the point of
* action/predicate code execution and a mapping from label names to registers
* that will contain the labeled values.
*
* The following will hold after running this pass:
*
* * All nodes except "grammar" and "rule" nodes will have a |resultIndex|
* property. It will contain an index of a register that will store a match
* result of the expression represented by the node in generated code.
*
* * Some nodes will have a |posIndex| property. It will contain an index of a
* register that will store a saved parse position in generated code.
*
* * All "rule" nodes will contain a |registerCount| property. It will contain
* the number of registers that will be used by code generated for the
* rule's expression.
*
* * All "action", "semantic_and" and "semantic_or" nodes will have a |params|
* property. It will contain a mapping from names of labels visible at the
* point of action/predicate code execution to registers that will contain
* the labeled values.
*/
module.exports = function(ast) {
/*
* Register allocator that allocates registers from an unlimited
* integer-indexed pool. It allows allocating and releaseing registers in any
* order. It also supports reference counting (this simplifies tracking active
* registers when they store values passed to action/predicate code).
* Allocating a register allways uses the first free register (the one with
* the lowest index).
*/
var registers = (function() {
var refCounts = []; // reference count for each register that was
// allocated at least once
return {
alloc: function() {
var i;
for (i = 0; i < refCounts.length; i++) {
if (refCounts[i] === 0) {
refCounts[i] = 1;
return i;
}
}
refCounts.push(1);
return refCounts.length - 1;
},
use: function(index) {
refCounts[index]++;
},
release: function(index) {
refCounts[index]--;
},
maxIndex: function() {
return refCounts.length - 1;
},
reset: function() {
refCounts = [];
}
};
})();
/*
* Manages mapping of label names to indices of registers that will store the
* labeled values as long as they are in scope.
*/
var vars = (function(registers) {
var envs = []; // stack of nested environments
return {
beginScope: function() {
envs.push({});
},
endScope: function() {
var env = envs.pop(), name;
for (name in env) {
registers.release(env[name]);
}
},
add: function(name, index) {
envs[envs.length - 1][name] = index;
registers.use(index);
},
buildParams: function() {
var env = envs[envs.length - 1], params = {}, name;
for (name in env) {
params[name] = env[name];
}
return params;
}
};
})(registers);
function savePos(node, f) {
node.posIndex = registers.alloc();
f();
registers.release(node.posIndex);
}
function reuseResult(node, subnode) {
subnode.resultIndex = node.resultIndex;
}
function allocResult(node, f) {
node.resultIndex = registers.alloc();
f();
registers.release(node.resultIndex);
}
function scoped(f) {
vars.beginScope();
f();
vars.endScope();
}
function nop() {}
function computeExpressionScoped(node) {
scoped(function() { compute(node.expression); });
}
function computeExpressionScopedReuseResult(node) {
reuseResult(node, node.expression);
computeExpressionScoped(node);
}
function computeExpressionScopedAllocResult(node) {
allocResult(node.expression, function() { computeExpressionScoped(node); });
}
function computeExpressionScopedReuseResultSavePos(node) {
savePos(node, function() { computeExpressionScopedReuseResult(node); });
}
function computeParams(node) {
node.params = vars.buildParams();
}
var compute = utils.buildNodeVisitor({
grammar:
function(node) {
utils.each(node.rules, compute);
},
rule:
function(node) {
registers.reset();
computeExpressionScopedAllocResult(node);
node.registerCount = registers.maxIndex() + 1;
},
named:
function(node) {
reuseResult(node, node.expression);
compute(node.expression);
},
choice:
function(node) {
utils.each(node.alternatives, function(alternative) {
reuseResult(node, alternative);
scoped(function() {
compute(alternative);
});
});
},
action:
function(node) {
savePos(node, function() {
reuseResult(node, node.expression);
scoped(function() {
compute(node.expression);
computeParams(node);
});
});
},
sequence:
function(node) {
savePos(node, function() {
utils.each(node.elements, function(element) {
element.resultIndex = registers.alloc();
compute(element);
});
utils.each(node.elements, function(element) {
registers.release(element.resultIndex);
});
});
},
labeled:
function(node) {
vars.add(node.label, node.resultIndex);
computeExpressionScopedReuseResult(node);
},
text: computeExpressionScopedReuseResultSavePos,
simple_and: computeExpressionScopedReuseResultSavePos,
simple_not: computeExpressionScopedReuseResultSavePos,
semantic_and: computeParams,
semantic_not: computeParams,
optional: computeExpressionScopedReuseResult,
zero_or_more: computeExpressionScopedAllocResult,
one_or_more: computeExpressionScopedAllocResult,
rule_ref: nop,
literal: nop,
"class": nop,
any: nop
});
compute(ast);
};

594
lib/compiler/passes/generate-bytecode.js

@ -0,0 +1,594 @@
var utils = require("../../utils"),
op = require("../opcodes");
/* Generates bytecode.
*
* Instructions
* ============
*
* Stack Manipulation
* ------------------
*
* [0] PUSH c
*
* stack.push(consts[c]);
*
* [1] PUSH_CURR_POS
*
* stack.push(currPos);
*
* [2] POP
*
* stack.pop();
*
* [3] POP_CURR_POS
*
* currPos = stack.pop();
*
* [4] POP_N n
*
* stack.pop(n);
*
* [5] NIP
*
* value = stack.pop();
* stack.pop();
* stack.push(value);
*
* [6] NIP_CURR_POS
*
* value = stack.pop();
* currPos = stack.pop();
* stack.push(value);
*
* [8] APPEND
*
* value = stack.pop();
* array = stack.pop();
* array.push(value);
* stack.push(array);
*
* [9] WRAP n
*
* stack.push(stack.pop(n));
*
* [10] TEXT
*
* stack.pop();
* stack.push(input.substring(stack.top(), currPos));
*
* Conditions and Loops
* --------------------
*
* [11] IF t, f
*
* if (stack.top()) {
* interpret(ip + 3, ip + 3 + t);
* } else {
* interpret(ip + 3 + t, ip + 3 + t + f);
* }
*
* [12] IF_ERROR t, f
*
* if (stack.top() === null) {
* interpret(ip + 3, ip + 3 + t);
* } else {
* interpret(ip + 3 + t, ip + 3 + t + f);
* }
*
* [13] IF_NOT_ERROR t, f
*
* if (stack.top() !== null) {
* interpret(ip + 3, ip + 3 + t);
* } else {
* interpret(ip + 3 + t, ip + 3 + t + f);
* }
*
* [14] WHILE_NOT_ERROR b
*
* while(stack.top() !== null) {
* interpret(ip + 2, ip + 2 + b);
* }
*
* Matching
* --------
*
* [15] MATCH_ANY a, f, ...
*
* if (input.length > currPos) {
* interpret(ip + 3, ip + 3 + a);
* } else {
* interpret(ip + 3 + a, ip + 3 + a + f);
* }
*
* [16] MATCH_STRING s, a, f, ...
*
* if (input.substr(currPos, consts[s].length) === consts[s]) {
* interpret(ip + 4, ip + 4 + a);
* } else {
* interpret(ip + 4 + a, ip + 4 + a + f);
* }
*
* [17] MATCH_STRING_IC s, a, f, ...
*
* if (input.substr(currPos, consts[s].length).toLowerCase() === consts[s]) {
* interpret(ip + 4, ip + 4 + a);
* } else {
* interpret(ip + 4 + a, ip + 4 + a + f);
* }
*
* [18] MATCH_REGEXP r, a, f, ...
*
* if (consts[r].test(input.charAt(currPos))) {
* interpret(ip + 4, ip + 4 + a);
* } else {
* interpret(ip + 4 + a, ip + 4 + a + f);
* }
*
* [19] ACCEPT_N n
*
* stack.push(input.substring(currPos, n));
* currPos += n;
*
* [20] ACCEPT_STRING s
*
* stack.push(consts[s]);
* currPos += consts[s].length;
*
* [21] FAIL e
*
* stack.push(null);
* fail(consts[e]);
*
* Calls
* -----
*
* [22] REPORT_SAVED_POS p
*
* reportedPos = stack[p];
*
* [23] REPORT_CURR_POS
*
* reportedPos = currPos;
*
* [25] CALL f, n, pc, p1, p2, ..., pN
*
* value = consts[f](stack[p1], ..., stack[pN]);
* stack.pop(n);
* stack.push(value);
*
* Rules
* -----
*
* [26] RULE r
*
* stack.push(parseRule(r));
*
* Failure Reporting
* -----------------
*
* [27] SILENT_FAILS_ON
*
* silentFails++;
*
* [28] SILENT_FAILS_OFF
*
* silentFails--;
*/
module.exports = function(ast, options) {
var consts = [];
function addConst(value) {
var index = utils.indexOf(consts, function(c) { return c === value; });
return index === -1 ? consts.push(value) - 1 : index;
}
function addFunctionConst(params, code) {
return addConst(
"function(" + params.join(", ") + ") {" + code + "}"
);
}
function buildSequence() {
return Array.prototype.concat.apply([], arguments);
}
function buildCondition(condCode, thenCode, elseCode) {
return condCode.concat(
[thenCode.length, elseCode.length],
thenCode,
elseCode
);
}
function buildLoop(condCode, bodyCode) {
return condCode.concat([bodyCode.length], bodyCode);
}
function buildCall(functionIndex, delta, env, sp) {
var params = utils.map( utils.values(env), function(p) { return sp - p; });
return [op.CALL, functionIndex, delta, params.length].concat(params);
}
function buildSimplePredicate(expression, negative, context) {
var emptyStringIndex = addConst('""'),
nullIndex = addConst('null');
return buildSequence(
[op.PUSH_CURR_POS],
[op.SILENT_FAILS_ON],
generate(expression, {
sp: context.sp + 1,
env: { },
action: null
}),
[op.SILENT_FAILS_OFF],
buildCondition(
[negative ? op.IF_ERROR : op.IF_NOT_ERROR],
buildSequence(
[op.POP],
[negative ? op.POP : op.POP_CURR_POS],
[op.PUSH, emptyStringIndex]
),
buildSequence(
[op.POP],
[negative ? op.POP_CURR_POS : op.POP],
[op.PUSH, nullIndex]
)
)
);
}
function buildSemanticPredicate(code, negative, context) {
var functionIndex = addFunctionConst(utils.keys(context.env), code),
emptyStringIndex = addConst('""'),
nullIndex = addConst('null');
return buildSequence(
[op.REPORT_CURR_POS],
buildCall(functionIndex, 0, context.env, context.sp),
buildCondition(
[op.IF],
buildSequence(
[op.POP],
[op.PUSH, negative ? nullIndex : emptyStringIndex]
),
buildSequence(
[op.POP],
[op.PUSH, negative ? emptyStringIndex : nullIndex]
)
)
);
}
function buildAppendLoop(expressionCode) {
return buildLoop(
[op.WHILE_NOT_ERROR],
buildSequence([op.APPEND], expressionCode)
);
}
var generate = utils.buildNodeVisitor({
grammar: function(node) {
utils.each(node.rules, generate);
node.consts = consts;
},
rule: function(node) {
node.bytecode = generate(node.expression, {
sp: -1, // stack pointer
env: { }, // mapping of label names to stack positions
action: null // action nodes pass themselves to children here
});
},
named: function(node, context) {
var nameIndex = addConst(utils.quote(node.name));
/*
* The code generated below is slightly suboptimal because |FAIL| pushes
* to the stack, so we need to stick a |POP| in front of it. We lack a
* dedicated instruction that would just report the failure and not touch
* the stack.
*/
return buildSequence(
[op.SILENT_FAILS_ON],
generate(node.expression, context),
[op.SILENT_FAILS_OFF],
buildCondition([op.IF_ERROR], [op.FAIL, nameIndex], [])
);
},
choice: function(node, context) {
function buildAlternativesCode(alternatives, context) {
return buildSequence(
generate(alternatives[0], {
sp: context.sp,
env: { },
action: null
}),
alternatives.length > 1
? buildCondition(
[op.IF_ERROR],
buildSequence(
[op.POP],
buildAlternativesCode(alternatives.slice(1), context)
),
[]
)
: []
);
}
return buildAlternativesCode(node.alternatives, context);
},
action: function(node, context) {
var env = { },
emitCall = node.expression.type !== "sequence"
|| node.expression.elements.length === 0;
expressionCode = generate(node.expression, {
sp: context.sp + (emitCall ? 1 : 0),
env: env,
action: node
}),
functionIndex = addFunctionConst(utils.keys(env), node.code);
return emitCall
? buildSequence(
[op.PUSH_CURR_POS],
expressionCode,
buildCondition(
[op.IF_NOT_ERROR],
buildSequence(
[op.REPORT_SAVED_POS, 1],
buildCall(functionIndex, 1, env, context.sp + 2)
),
[]
),
buildCondition([op.IF_ERROR], [op.NIP_CURR_POS], [op.NIP])
)
: expressionCode;
},
sequence: function(node, context) {
var emptyArrayIndex, nullIndex;
function buildElementsCode(elements, context) {
var processedCount, functionIndex;
if (elements.length > 0) {
processedCount = node.elements.length - elements.slice(1).length;
return buildSequence(
generate(elements[0], context),
buildCondition(
[op.IF_NOT_ERROR],
buildElementsCode(elements.slice(1), {
sp: context.sp + 1,
env: context.env,
action: context.action
}),
buildSequence(
processedCount > 1 ? [op.POP_N, processedCount] : [op.POP],
[op.POP_CURR_POS],
[op.PUSH, nullIndex]
)
)
);
} else {
if (context.action) {
functionIndex = addFunctionConst(
utils.keys(context.env),
context.action.code
);
return buildSequence(
[op.REPORT_SAVED_POS, node.elements.length],
buildCall(
functionIndex,
node.elements.length,
context.env,
context.sp
),
buildCondition([op.IF_ERROR], [op.NIP_CURR_POS], [op.NIP])
);
} else {
return buildSequence([op.WRAP, node.elements.length], [op.NIP]);
}
}
}
if (node.elements.length > 0) {
nullIndex = addConst('null');
return buildSequence(
[op.PUSH_CURR_POS],
buildElementsCode(node.elements, {
sp: context.sp + 1,
env: context.env,
action: context.action
})
);
} else {
emptyArrayIndex = addConst('[]');
return [op.PUSH, emptyArrayIndex];
}
},
labeled: function(node, context) {
context.env[node.label] = context.sp + 1;
return generate(node.expression, {
sp: context.sp,
env: { },
action: null
});
},
text: function(node, context) {
return buildSequence(
[op.PUSH_CURR_POS],
generate(node.expression, {
sp: context.sp + 1,
env: { },
action: null
}),
buildCondition([op.IF_NOT_ERROR], [op.TEXT], []),
[op.NIP]
);
},
simple_and: function(node, context) {
return buildSimplePredicate(node.expression, false, context);
},
simple_not: function(node, context) {
return buildSimplePredicate(node.expression, true, context);
},
semantic_and: function(node, context) {
return buildSemanticPredicate(node.code, false, context);
},
semantic_not: function(node, context) {
return buildSemanticPredicate(node.code, true, context);
},
optional: function(node, context) {
var emptyStringIndex = addConst('""');
return buildSequence(
generate(node.expression, {
sp: context.sp,
env: { },
action: null
}),
buildCondition(
[op.IF_ERROR],
buildSequence([op.POP], [op.PUSH, emptyStringIndex]),
[]
)
);
},
zero_or_more: function(node, context) {
var emptyArrayIndex = addConst('[]');
expressionCode = generate(node.expression, {
sp: context.sp + 1,
env: { },
action: null
});
return buildSequence(
[op.PUSH, emptyArrayIndex],
expressionCode,
buildAppendLoop(expressionCode),
[op.POP]
);
},
one_or_more: function(node, context) {
var emptyArrayIndex = addConst('[]');
nullIndex = addConst('null');
expressionCode = generate(node.expression, {
sp: context.sp + 1,
env: { },
action: null
});
return buildSequence(
[op.PUSH, emptyArrayIndex],
expressionCode,
buildCondition(
[op.IF_NOT_ERROR],
buildSequence(buildAppendLoop(expressionCode), [op.POP]),
buildSequence([op.POP], [op.POP], [op.PUSH, nullIndex])
)
);
},
rule_ref: function(node) {
return [op.RULE, utils.indexOfRuleByName(ast, node.name)];
},
literal: function(node) {
var stringIndex, expectedIndex;
if (node.value.length > 0) {
stringIndex = addConst(node.ignoreCase
? utils.quote(node.value.toLowerCase())
: utils.quote(node.value)
);
expectedIndex = addConst(utils.quote(utils.quote(node.value)));
/*
* For case-sensitive strings the value must match the beginning of the
* remaining input exactly. As a result, we can use |ACCEPT_STRING| and
* save one |substr| call that would be needed if we used |ACCEPT_N|.
*/
return buildCondition(
node.ignoreCase
? [op.MATCH_STRING_IC, stringIndex]
: [op.MATCH_STRING, stringIndex],
node.ignoreCase
? [op.ACCEPT_N, node.value.length]
: [op.ACCEPT_STRING, stringIndex],
[op.FAIL, expectedIndex]
);
} else {
stringIndex = addConst('""');
return [op.PUSH, stringIndex];
}
},
"class": function(node) {
var regexp, regexpIndex, expectedIndex;
if (node.parts.length > 0) {
regexp = '/^['
+ (node.inverted ? '^' : '')
+ utils.map(node.parts, function(part) {
return part instanceof Array
? utils.quoteForRegexpClass(part[0])
+ '-'
+ utils.quoteForRegexpClass(part[1])
: utils.quoteForRegexpClass(part);
}).join('')
+ ']/' + (node.ignoreCase ? 'i' : '');
} else {
/*
* IE considers regexps /[]/ and /[^]/ as syntactically invalid, so we
* translate them into euqivalents it can handle.
*/
regexp = node.inverted ? '/^[\\S\\s]/' : '/^(?!)/';
}
regexpIndex = addConst(regexp);
expectedIndex = addConst(utils.quote(node.rawText));
return buildCondition(
[op.MATCH_REGEXP, regexpIndex],
[op.ACCEPT_N, 1],
[op.FAIL, expectedIndex]
);
},
any: function(node) {
var expectedIndex = addConst(utils.quote("any character"));
return buildCondition(
[op.MATCH_ANY],
[op.ACCEPT_N, 1],
[op.FAIL, expectedIndex]
);
}
});
generate(ast);
};

867
lib/compiler/passes/generate-code.js

@ -1,867 +0,0 @@
var utils = require("../../utils");
/* Generates the parser code. */
module.exports = function(ast, options) {
options = utils.clone(options);
utils.defaults(options, {
cache: false,
allowedStartRules: [ast.startRule]
});
/*
* Codie 1.1.0
*
* https://github.com/dmajda/codie
*
* Copyright (c) 2011-2012 David Majda
* Licensend under the MIT license.
*/
var Codie = (function(undefined) {
function stringEscape(s) {
function hex(ch) { return ch.charCodeAt(0).toString(16).toUpperCase(); }
/*
* ECMA-262, 5th ed., 7.8.4: All characters may appear literally in a
* string literal except for the closing quote character, backslash,
* carriage return, line separator, paragraph separator, and line feed.
* Any character may appear in the form of an escape sequence.
*
* For portability, we also escape escape all control and non-ASCII
* characters. Note that "\0" and "\v" escape sequences are not used
* because JSHint does not like the first and IE the second.
*/
return s
.replace(/\\/g, '\\\\') // backslash
.replace(/"/g, '\\"') // closing double quote
.replace(/\x08/g, '\\b') // backspace
.replace(/\t/g, '\\t') // horizontal tab
.replace(/\n/g, '\\n') // line feed
.replace(/\f/g, '\\f') // form feed
.replace(/\r/g, '\\r') // carriage return
.replace(/[\x00-\x07\x0B\x0E\x0F]/g, function(ch) { return '\\x0' + hex(ch); })
.replace(/[\x10-\x1F\x80-\xFF]/g, function(ch) { return '\\x' + hex(ch); })
.replace(/[\u0180-\u0FFF]/g, function(ch) { return '\\u0' + hex(ch); })
.replace(/[\u1080-\uFFFF]/g, function(ch) { return '\\u' + hex(ch); });
}
function push(s) { return '__p.push(' + s + ');'; }
function pushRaw(template, length, state) {
function unindent(code, level, unindentFirst) {
return code.replace(
new RegExp('^.{' + level +'}', "gm"),
function(str, offset) {
if (offset === 0) {
return unindentFirst ? '' : str;
} else {
return "";
}
}
);
}
var escaped = stringEscape(unindent(
template.substring(0, length),
state.indentLevel(),
state.atBOL
));
return escaped.length > 0 ? push('"' + escaped + '"') : '';
}
var Codie = {
/* Codie version (uses semantic versioning). */
VERSION: "1.1.0",
/*
* Specifies by how many characters do #if/#else and #for unindent their
* content in the generated code.
*/
indentStep: 2,
/* Description of #-commands. Extend to define your own commands. */
commands: {
"if": {
params: /^(.*)$/,
compile: function(state, prefix, params) {
return ['if(' + params[0] + '){', []];
},
stackOp: "push"
},
"else": {
params: /^$/,
compile: function(state) {
var stack = state.commandStack,
insideElse = stack[stack.length - 1] === "else",
insideIf = stack[stack.length - 1] === "if";
if (insideElse) { throw new Error("Multiple #elses."); }
if (!insideIf) { throw new Error("Using #else outside of #if."); }
return ['}else{', []];
},
stackOp: "replace"
},
"for": {
params: /^([a-zA-Z_][a-zA-Z0-9_]*)[ \t]+in[ \t]+(.*)$/,
init: function(state) {
state.forCurrLevel = 0; // current level of #for loop nesting
state.forMaxLevel = 0; // maximum level of #for loop nesting
},
compile: function(state, prefix, params) {
var c = '__c' + state.forCurrLevel, // __c for "collection"
l = '__l' + state.forCurrLevel, // __l for "length"
i = '__i' + state.forCurrLevel; // __i for "index"
state.forCurrLevel++;
if (state.forMaxLevel < state.forCurrLevel) {
state.forMaxLevel = state.forCurrLevel;
}
return [
c + '=' + params[1] + ';'
+ l + '=' + c + '.length;'
+ 'for(' + i + '=0;' + i + '<' + l + ';' + i + '++){'
+ params[0] + '=' + c + '[' + i + '];',
[params[0], c, l, i]
];
},
exit: function(state) { state.forCurrLevel--; },
stackOp: "push"
},
"end": {
params: /^$/,
compile: function(state) {
var stack = state.commandStack, exit;
if (stack.length === 0) { throw new Error("Too many #ends."); }
exit = Codie.commands[stack[stack.length - 1]].exit;
if (exit) { exit(state); }
return ['}', []];
},
stackOp: "pop"
},
"block": {
params: /^(.*)$/,
compile: function(state, prefix, params) {
var x = '__x', // __x for "prefix",
n = '__n', // __n for "lines"
l = '__l', // __l for "length"
i = '__i'; // __i for "index"
/*
* Originally, the generated code used |String.prototype.replace|, but
* it is buggy in certain versions of V8 so it was rewritten. See the
* tests for details.
*/
return [
x + '="' + stringEscape(prefix.substring(state.indentLevel())) + '";'
+ n + '=(' + params[0] + ').toString().split("\\n");'
+ l + '=' + n + '.length;'
+ 'for(' + i + '=0;' + i + '<' + l + ';' + i + '++){'
+ n + '[' + i +']=' + x + '+' + n + '[' + i + ']+"\\n";'
+ '}'
+ push(n + '.join("")'),
[x, n, l, i]
];
},
stackOp: "nop"
}
},
/*
* Compiles a template into a function. When called, this function will
* execute the template in the context of an object passed in a parameter and
* return the result.
*/
template: function(template) {
var stackOps = {
push: function(stack, name) { stack.push(name); },
replace: function(stack, name) { stack[stack.length - 1] = name; },
pop: function(stack) { stack.pop(); },
nop: function() { }
};
function compileExpr(state, expr) {
state.atBOL = false;
return [push(expr), []];
}
function compileCommand(state, prefix, name, params) {
var command, match, result;
command = Codie.commands[name];
if (!command) { throw new Error("Unknown command: #" + name + "."); }
match = command.params.exec(params);
if (match === null) {
throw new Error(
"Invalid params for command #" + name + ": " + params + "."
);
}
result = command.compile(state, prefix, match.slice(1));
stackOps[command.stackOp](state.commandStack, name);
state.atBOL = true;
return result;
}
var state = { // compilation state
commandStack: [], // stack of commands as they were nested
atBOL: true, // is the next character to process at BOL?
indentLevel: function() {
return Codie.indentStep * this.commandStack.length;
}
},
code = '', // generated template function code
vars = ['__p=[]'], // variables used by generated code
name, match, result, i;
/* Initialize state. */
for (name in Codie.commands) {
if (Codie.commands[name].init) { Codie.commands[name].init(state); }
}
/* Compile the template. */
while ((match = /^([ \t]*)#([a-zA-Z_][a-zA-Z0-9_]*)(?:[ \t]+([^ \t\n][^\n]*))?[ \t]*(?:\n|$)|#\{([^}]*)\}/m.exec(template)) !== null) {
code += pushRaw(template, match.index, state);
result = match[2] !== undefined && match[2] !== ""
? compileCommand(state, match[1], match[2], match[3] || "") // #-command
: compileExpr(state, match[4]); // #{...}
code += result[0];
vars = vars.concat(result[1]);
template = template.substring(match.index + match[0].length);
}
code += pushRaw(template, template.length, state);
/* Check the final state. */
if (state.commandStack.length > 0) { throw new Error("Missing #end."); }
/* Sanitize the list of variables used by commands. */
vars.sort();
for (i = 0; i < vars.length; i++) {
if (vars[i] === vars[i - 1]) { vars.splice(i--, 1); }
}
/* Create the resulting function. */
return new Function("__v", [
'__v=__v||{};',
'var ' + vars.join(',') + ';',
'with(__v){',
code,
'return __p.join("").replace(/^\\n+|\\n+$/g,"");};'
].join(''));
}
};
return Codie;
})();
var templates = (function() {
var name,
templates = {},
sources = {
grammar: [
'(function(){',
' /*',
' * Generated by PEG.js 0.7.0.',
' *',
' * http://pegjs.majda.cz/',
' */',
' ',
/* This needs to be in sync with |subclass| in utils.js. */
' function subclass(child, parent) {',
' function ctor() { this.constructor = child; }',
' ctor.prototype = parent.prototype;',
' child.prototype = new ctor();',
' }',
' ',
/* This needs to be in sync with |quote| in utils.js. */
' function quote(s) {',
' /*',
' * ECMA-262, 5th ed., 7.8.4: All characters may appear literally in a',
' * string literal except for the closing quote character, backslash,',
' * carriage return, line separator, paragraph separator, and line feed.',
' * Any character may appear in the form of an escape sequence.',
' *',
' * For portability, we also escape escape all control and non-ASCII',
' * characters. Note that "\\0" and "\\v" escape sequences are not used',
' * because JSHint does not like the first and IE the second.',
' */',
' return \'"\' + s',
' .replace(/\\\\/g, \'\\\\\\\\\') // backslash',
' .replace(/"/g, \'\\\\"\') // closing quote character',
' .replace(/\\x08/g, \'\\\\b\') // backspace',
' .replace(/\\t/g, \'\\\\t\') // horizontal tab',
' .replace(/\\n/g, \'\\\\n\') // line feed',
' .replace(/\\f/g, \'\\\\f\') // form feed',
' .replace(/\\r/g, \'\\\\r\') // carriage return',
' .replace(/[\\x00-\\x07\\x0B\\x0E-\\x1F\\x80-\\uFFFF]/g, escape)',
' + \'"\';',
' }',
' ',
' var result = {',
' /*',
' * Parses the input with a generated parser. If the parsing is successful,',
' * returns a value explicitly or implicitly specified by the grammar from',
' * which the parser was generated (see |PEG.buildParser|). If the parsing is',
' * unsuccessful, throws |PEG.parser.SyntaxError| describing the error.',
' */',
' parse: function(input) {',
' var parseFunctions = {',
' #for rule in options.allowedStartRules',
' #{string(rule) + ": parse_" + rule + (rule !== options.allowedStartRules[options.allowedStartRules.length - 1] ? "," : "")}',
' #end',
' };',
' ',
' var options = arguments.length > 1 ? arguments[1] : {},',
' startRule;',
' ',
' if (options.startRule !== undefined) {',
' startRule = options.startRule;',
' ',
' if (parseFunctions[startRule] === undefined) {',
' throw new Error("Can\'t start parsing from rule " + quote(startRule) + ".");',
' }',
' } else {',
' startRule = #{string(options.allowedStartRules[0])};',
' }',
' ',
' var pos = 0;',
' var reportedPos = 0;',
' var cachedReportedPos = 0;',
' var cachedReportedPosDetails = { line: 1, column: 1, seenCR: false };',
' var reportFailures = 0;', // 0 = report, anything > 0 = do not report
' var rightmostFailuresPos = 0;',
' var rightmostFailuresExpected = [];',
' #if options.cache',
' var cache = {};',
' #end',
' ',
/* This needs to be in sync with |padLeft| in utils.js. */
' function padLeft(input, padding, length) {',
' var result = input;',
' ',
' var padLength = length - input.length;',
' for (var i = 0; i < padLength; i++) {',
' result = padding + result;',
' }',
' ',
' return result;',
' }',
' ',
/* This needs to be in sync with |escape| in utils.js. */
' function escape(ch) {',
' var charCode = ch.charCodeAt(0);',
' var escapeChar;',
' var length;',
' ',
' if (charCode <= 0xFF) {',
' escapeChar = \'x\';',
' length = 2;',
' } else {',
' escapeChar = \'u\';',
' length = 4;',
' }',
' ',
' return \'\\\\\' + escapeChar + padLeft(charCode.toString(16).toUpperCase(), \'0\', length);',
' }',
' ',
' function computeReportedPosDetails() {',
' function advanceCachedReportedPos() {',
' var ch;',
' ',
' for (; cachedReportedPos < reportedPos; cachedReportedPos++) {',
' ch = input.charAt(cachedReportedPos);',
' if (ch === "\\n") {',
' if (!cachedReportedPosDetails.seenCR) { cachedReportedPosDetails.line++; }',
' cachedReportedPosDetails.column = 1;',
' cachedReportedPosDetails.seenCR = false;',
' } else if (ch === "\\r" || ch === "\\u2028" || ch === "\\u2029") {',
' cachedReportedPosDetails.line++;',
' cachedReportedPosDetails.column = 1;',
' cachedReportedPosDetails.seenCR = true;',
' } else {',
' cachedReportedPosDetails.column++;',
' cachedReportedPosDetails.seenCR = false;',
' }',
' }',
' }',
' ',
' if (cachedReportedPos !== reportedPos) {',
' if (cachedReportedPos > reportedPos) {',
' cachedReportedPos = 0;',
' cachedReportedPosDetails = { line: 1, column: 1, seenCR: false };',
' }',
' advanceCachedReportedPos();',
' }',
' ',
' return cachedReportedPosDetails;',
' }',
' ',
' function text() {',
' return input.substring(reportedPos, pos);',
' }',
' ',
' function offset() {',
' return reportedPos;',
' }',
' ',
' function line() {',
' return computeReportedPosDetails().line;',
' }',
' ',
' function column() {',
' return computeReportedPosDetails().column;',
' }',
' ',
' function matchFailed(failure) {',
' if (pos < rightmostFailuresPos) {',
' return;',
' }',
' ',
' if (pos > rightmostFailuresPos) {',
' rightmostFailuresPos = pos;',
' rightmostFailuresExpected = [];',
' }',
' ',
' rightmostFailuresExpected.push(failure);',
' }',
' ',
' #for rule in node.rules',
' #block emit(rule)',
' ',
' #end',
' ',
' function cleanupExpected(expected) {',
' expected.sort();',
' ',
' var lastExpected = null;',
' var cleanExpected = [];',
' for (var i = 0; i < expected.length; i++) {',
' if (expected[i] !== lastExpected) {',
' cleanExpected.push(expected[i]);',
' lastExpected = expected[i];',
' }',
' }',
' return cleanExpected;',
' }',
' ',
' #if node.initializer',
' #block emit(node.initializer)',
' #end',
' ',
' var result = parseFunctions[startRule]();',
' ',
' /*',
' * The parser is now in one of the following three states:',
' *',
' * 1. The parser successfully parsed the whole input.',
' *',
' * - |result !== null|',
' * - |pos === input.length|',
' * - |rightmostFailuresExpected| may or may not contain something',
' *',
' * 2. The parser successfully parsed only a part of the input.',
' *',
' * - |result !== null|',
' * - |pos < input.length|',
' * - |rightmostFailuresExpected| may or may not contain something',
' *',
' * 3. The parser did not successfully parse any part of the input.',
' *',
' * - |result === null|',
' * - |pos === 0|',
' * - |rightmostFailuresExpected| contains at least one failure',
' *',
' * All code following this comment (including called functions) must',
' * handle these states.',
' */',
' if (result === null || pos !== input.length) {',
' reportedPos = Math.max(pos, rightmostFailuresPos);',
' var found = reportedPos < input.length ? input.charAt(reportedPos) : null;',
' var reportedPosDetails = computeReportedPosDetails();',
' ',
' throw new this.SyntaxError(',
' cleanupExpected(rightmostFailuresExpected),',
' found,',
' reportedPos,',
' reportedPosDetails.line,',
' reportedPosDetails.column',
' );',
' }',
' ',
' return result;',
' }',
' };',