Reserved word detection as it was implemented in the JavaScript example
grammar had two big downsides:
1. It required changes in ordering of choices in some rules in order
not to trigger the detection prematurely. One of the changes was
already implemented (in the |Statement| rule, see the diff), but
apparently more were needed (the grammar didn't parse inputs like
|true| or |function f() {}|). And I'm not 100% sure that would be
the end of it (maybe deeper structural changes would be needed).
2. It made error messages confusing. Consider the following example:
var a = @;
Instead of reporting:
Expected ... but "@" found.
the generated parser reported:
Reserved word "var" can't be used as an identifier.
This was because the parser parsed the statement first as
|VariableStatement| and when this failed, it tried to parse it as
|ExpressionStatement|, triggering the reserved word detection.
Because of these, I decided to remove reserved word detection from the
JavaScript example grammar.
Fixes a problem where statements starting with a reserved word produced
errors like this:
Reserved word "return" can't be used as an identifier.
The problem was in a wrong ordering of choices in the |Statement| rule
together with aggressive reserved word detection in the |Identifier|
rule.
This is a complete rewrite of the CSS example grammar. It is now based
on CSS 2.1 *including the errata* and the generated parser builds a
nicer syntax tree. There is also a number of cleanups, formatting
changes, naming changes, and bug fixes.
Beside this, the rewrite reflects how I write grammars today (as opposed
to few years ago) and what style I would recommend to others.
This is a complete rewrite of the JavaScript example grammar. It is now
based on ECMA-262, 5.1 Edition and the generated parser builds a syntax
tree compatible with Mozilla SpiderMonkey Parser API. There is also a
number of cleanups, formatting changes, naming changes, and bug fixes.
Beside this, the rewrite reflects how I write grammars today (as opposed
to few years ago) and what style I would recommend to others.
This is a complete rewrite of the JSON example grammar. It is now based
on RFC 7159 instead of an informal description at the JSON website.
Beside this, the rewrite reflects how I write grammars today (as opposed
to few years ago) and what style I would recommend to others.
This is a complete rewrite of the arithmetics example grammar. It now
allows whitespace between tokens, supports "-" and "/" operators, and
gets the operator associativity right. Also, rule names now match the usual
conventions (term, factor,...).
Beside this, the rewrite reflects how I write grammars today (as opposed
to few years ago) and what style I would recommend to others.
It's not necessary to parse |parts| in the |integer| and |float| rule
into integer/float value. Everywhere these rules are used the result is
converted back into string anyway.
Before this commit, the |?| operator returned an empty string upon
unsuccessful match. This commit changes the returned value to |null|. It
also updates the PEG.js grammar and the example grammars, which used the
value returned by |?| quite often.
Returning |null| is possible because it no longer indicates a match
failure.
I expect that this change will simplify many real-world grammars, as an
empty string is almost never desirable as a return value (except some
lexer-level rules) and it is often translated into |null| or some other
value in action code.
Implements part of #198.
Using a special value to indicate match failure instead of |null| allows
actions to return |null| as a regular value. This simplifies e.g. the
JSON parser.
Note the special value is internal and intentionally undocumented. This
means that there is currently no official way how to trigger a match
failure from an action. This is a temporary state which will be fixed
soon.
The negative performance impact (see below) is probably caused by
changing lot of comparisons against |null| (which likely check the value
against a fixed constant representing |null| in the interpreter) to
comparisons against the special value (which likely check the value
against another value in the interpreter).
Implements part of #198.
Speed impact
------------
Before: 1146.82 kB/s
After: 1031.25 kB/s
Difference: -10.08%
Size impact
-----------
Before: 950817 b
After: 973269 b
Difference: 2.36%
(Measured by /tools/impact with Node.js v0.6.18 on x86_64 GNU/Linux.)
JavaScript allows one to skip (elide) elements in array literals. It
also allows a trailing comma, which doesn't imply an element elision.
For example, an array literal:
[,,,]
contains three elided elements (one before each comma) and a trailing
comma.
Example JavaScript parser handled elided elements incorrectly and just
threw them away. This commit fixes this behvior and inserts |null| in
the AST for each elided element. This is in line with how SpiderMonkey's
JavaScript parser (the |Reflect.parse| API), Esprima and Acorn behave.
Based on a patch by @fpirsch:
https://github.com/dmajda/pegjs/pull/177
Makes the |ArrayLiteral| and |ElementList| rules more in line with the
ECMAScript grammar.
Based on a patch by @fpirsch:
https://github.com/dmajda/pegjs/pull/177
We couldn't return |null| in the |value| rule of the JSON example
parser because that would mean parse failure. So until now, we just
returned |"null"| (a string).
This was obviously stupid, so this commit changes the |value| rule to
return a special object instead that is converted to |null| later.
Based on patches by Patrick Logan (GH-91) and Jakub Vrána (GH-191).
Fix automatic semi-colon insertion in var statements without
initialisers.
var i
i = 1;
is valid and not accepted by the parser
but
var i = 2
i = 3;
is valid and accepted by the parser, as it should be.
With this fix, both are accepted.
Labeled expressions lead to more maintainable code and also will allow
certain optimizations (we can ignore results of expressions not passed
to the actions).
This does not speed up the benchmark suite execution statistically
significantly on V8.
Detailed results (benchmark suite totals):
---------------------------------
Test # Before After
---------------------------------
1 28.43 kB/s 28.46 kB/s
2 28.38 kB/s 28.56 kB/s
3 28.22 kB/s 28.58 kB/s
4 28.76 kB/s 28.55 kB/s
5 28.57 kB/s 28.48 kB/s
---------------------------------
Average 28.47 kB/s 28.53 kB/s
---------------------------------
Mozilla/5.0 (X11; U; Linux i686; en-US) AppleWebKit/533.4 (KHTML, like Gecko) Chrome/5.0.375.55 Safari/533.4
I'll introduce labelled expressions shortly and I want to use ":" as a
label-expression separator. This change avoids conflict between the two
meanings of ":". (What would e.g. "foo: 'bar'" mean? Rule "foo"
matching string "bar", or string "bar" labelled "foo"?)