Simplify regexps that specify ranges of characters to escape with "\xXX"
and "\uXXXX" in various escaping functions. Until now, these regexps
were (mostly) mutually exclusive with more selective regexps applied
before them, but this became a maintenance headache. I decided to
abandon the exclusivity, which allowed to simplify these regexps (at the
cost of introducing an ordering dependency).
Change how found strings are escaped when building syntax error
messages:
* Do not escape non-ASCII characters (U+0100-U+FFFF). They are
typically more readable in their raw form.
* Escape DEL (U+007F). It is a control character.
* Escape NUL (U+0000) as "\0", not "\x00".
* Do not use less known shortcut escape sequences ("\b", "\f"), only the
well-known ones ("\0", "\t", "\n", "\r").
These changes mirror expectation escaping changes done in
4fe682794d.
Part of work on #428.
The wrapping functions are also generated by PEG.js, so the comment
should be above them to mark them as such. This shouldn't cause any
problems technically.
Introduce two ways of specifying parser dependencies: the "dependencies"
option of PEG.buildParser and the -d/--dependency CLI option. Specified
dependencies are translated into AMD dependencies and Node.js's
"require" calls when generating an UMD parser.
Part of work on #362.
Extract "generateWrapper" code which generates code of the intro and the
returned parser object into helper functions. This is pure refactoring,
generated parser code is exactly the same as before.
This change will make it easier to modifiy "generateWrapper" to produce
UMD modules.
Part of work on #362.
Code which was at the toplevel of the "generateJS" function in the code
generator is now split into "generateToplevel" (which genreates parser
toplevel code) and "generateWrapper" (which generates a wrapper around
it). This is pure refactoring, generated parser code is exactly the same
as before.
This change will make it easier to modifiy the code genreator to produce
UMD modules.
Part of work on #362.
Instead of testing arguments.length to see whether an optional parameter
was passed to a function, compare its value to "undefined". This
approach has two advantages:
* It is in line with handling of default parameters in ES6.
* Optional parameters are actually spelled out in the parameter
list.
There is also one important disadvantage, namely that it's impossible to
pass "undefined" as an optional parameter value. This required a small
change in two tests.
Additional notes:
* Default parameter values are set in assignments immediately
after the function header. This reflects the fact that these
assignments really belong to the parameter list (which is where they
are in ES6).
* Parameter values are checked against "void 0" in places where
"undefined" can potentially be redefiend.
Before this commit, generated parsers considered the following character
sequences as newlines:
Sequence Description
------------------------------
"\n" Unix
"\r" Old Mac
"\r\n" Windows
"\u2028" line separator
"\u2029" paragraph separator
This commit limits the sequences only to "\n" and "\r\n". The reason is
that nobody uses Unicode newlines or "\r" in practice.
A positive side effect of the change is that newline-handling code
became simpler (and likely faster).
The expectation deduplication algorithm called |Array.prototype.splice|
to eliminate each individual duplication, which was slow. This caused
problems with grammar/input combinations that generated a lot of
expecations (see #377 for an example).
This commit replaces the algorithm with much faster one, eliminating the
problem.
The |found| property wasn't very useful as it mostly contained just one
character or |null| (the exception being syntax errors triggered by
|error| or |expected|). Similarly, the "but XXX found" part of the error
message (based on the |found| property) wasn't much useful and was
redundant in presence of location info.
For these reasons, this commit removes the |found| property and
corresponding part of the error message from syntax errors. It also
modifies error location info slightly to cover a range of 0 characters,
not 1 character (except when the error is triggered by |error| or
|expected|). This corresponds more precisely to the actual situation.
Fixes#372.