The "unescaped" rule was created by mechanically translating original
RFC 7159 rule:
unescaped = %x20-21 / %x23-5B / %x5D-10FFFF
into:
unescaped = [\x20-\x21\x23-\x5B\x5D-\u10FFFF]
However, this mechanical translation was incorrect as PEG.js grammars
don't have 6-digit Unicode escape sequences. Sequence "\u10FFFF" was
interpreted as "\u10FF" followed by two "F" characters.
This commit rewrites the "unescaped" rule into a form which, while not
being a mechanical translation of the original rule, matches the same
characters in the whole Unicode range. It also macthes textual
description of string representation in RFC 7159:
All Unicode characters may be placed within the quotation marks,
except for the characters that must be escaped: quotation mark,
reverse solidus, and the control characters (U+0000 through U+001F).
Fixes#417.
In the past year I worked on various grammars where first/rest or
head/tail were used as labels for parts of lists. I found I associate
head/tail with a list immediately, while in case of first/rest I have to
"parse" grammar rules for a while before understanding their structure.
Moreover, I tend to assume that rest is a list of the same thigs as
first, but I don't have such assumption in case of head/tail. This
assumption was in conflict with the grammar structure.
I'm not sure how much these observations are applicable to others, but I
decided to act on them and switch from first/rest to head/tail.
This is a complete rewrite of the JSON example grammar. It is now based
on RFC 7159 instead of an informal description at the JSON website.
Beside this, the rewrite reflects how I write grammars today (as opposed
to few years ago) and what style I would recommend to others.
Using a special value to indicate match failure instead of |null| allows
actions to return |null| as a regular value. This simplifies e.g. the
JSON parser.
Note the special value is internal and intentionally undocumented. This
means that there is currently no official way how to trigger a match
failure from an action. This is a temporary state which will be fixed
soon.
The negative performance impact (see below) is probably caused by
changing lot of comparisons against |null| (which likely check the value
against a fixed constant representing |null| in the interpreter) to
comparisons against the special value (which likely check the value
against another value in the interpreter).
Implements part of #198.
Speed impact
------------
Before: 1146.82 kB/s
After: 1031.25 kB/s
Difference: -10.08%
Size impact
-----------
Before: 950817 b
After: 973269 b
Difference: 2.36%
(Measured by /tools/impact with Node.js v0.6.18 on x86_64 GNU/Linux.)
We couldn't return |null| in the |value| rule of the JSON example
parser because that would mean parse failure. So until now, we just
returned |"null"| (a string).
This was obviously stupid, so this commit changes the |value| rule to
return a special object instead that is converted to |null| later.
Based on patches by Patrick Logan (GH-91) and Jakub Vrána (GH-191).
Labeled expressions lead to more maintainable code and also will allow
certain optimizations (we can ignore results of expressions not passed
to the actions).
This does not speed up the benchmark suite execution statistically
significantly on V8.
Detailed results (benchmark suite totals):
---------------------------------
Test # Before After
---------------------------------
1 28.43 kB/s 28.46 kB/s
2 28.38 kB/s 28.56 kB/s
3 28.22 kB/s 28.58 kB/s
4 28.76 kB/s 28.55 kB/s
5 28.57 kB/s 28.48 kB/s
---------------------------------
Average 28.47 kB/s 28.53 kB/s
---------------------------------
Mozilla/5.0 (X11; U; Linux i686; en-US) AppleWebKit/533.4 (KHTML, like Gecko) Chrome/5.0.375.55 Safari/533.4
I'll introduce labelled expressions shortly and I want to use ":" as a
label-expression separator. This change avoids conflict between the two
meanings of ":". (What would e.g. "foo: 'bar'" mean? Rule "foo"
matching string "bar", or string "bar" labelled "foo"?)