- Revert ES6 changes to arithmetics.pegjs
- Use Array#forEach instead of for..of
- Don't use native Array#find & Array#findIndex
- Added util/arrays.js (find & findIndex)
- Use Function instead of eval
- use value plucking
- remove helpers not needed now
- types in OPS_* are now returned by *Operator
- RESERVED_WORDS is now a `Object<Identifier,true>`
- use ES2015+ JavaScript
- cleanup source code
All comments stored in the `comments` property of the `grammar` node.
Comments extracted only if the `extractComments` options set to `true` when you generate parser.
This property is object with mapping start offset of comment to comment object, that looks like:
```js
{
text: 'text in the comment, just after // or /* and before */',
multiline: true|false,// true for /**/ comments, false for // comments
location: location()
}
```
* Split 'consts' collection by content types into:
- literals: for literal expressions, like `"a"`
- classes: for character class expressions, like `[a]`
- expectations: for constants describing expected values when parse failed
- functions: for constants with function code
* Move any JavaScript code generation from 'generateBytecode' to 'generateJs'.
* Rename opcode 'MATCH_REGEXP' to 'MATCH_CLASS' (name reflects purpose, not implementation).
* Replace 'PUSH' opcode with 'PUSH_EMPTY_STRING' opcode because it is only used with empty strings
Before this commit, the PEG.js parser always created the AST using a plain JavaSctript object, but allthough simple and effective for the job, this method sacrificies performance slightly.
From now on the parser shall call a Node creator. This should help with performance, as well as in the future move some AST helpers into the new AST functions.
Before this commit error looks like (for input `start = break:'a'`)
> Expected "!", "$", "&", "(", "*", "+", ".", "/", "/*", "//", ";", "?", character class, code block, comment, end of line, identifier, literal, or whitespace but ":" found.
After this error looks like
> Label can't be a reserved word "break".
Currently, an open brace without a corresponding brace will emit this confusing error message:
> Expected "!", "$", "&", "(", "*", "+", ".", "/", "/*", "//", ";", "?", character class, code block, comment, end of line, identifier, literal, or whitespace but "{" found.
This change adds an error case to the grammar to make it clear what the problem is.
Because arrow functions work rather differently than normal functions (a
bad design mistake if you ask me), I decided to be conservative with the
conversion.
I converted:
* event handlers
* callbacks
* arguments to Array.prototype.map & co.
* small standalone lambda functions
I didn't convert:
* functions assigned to object literal properties (the new shorthand
syntax would be better here)
* functions passed to "describe", "it", etc. in specs (because Jasmine
relies on dynamic "this")
See #442.
This is purely a mechanical change, not taking advantage of block scope
of "let" and "const". Minimizing variable scope will come in the next
commit.
In general, "var" is converted into "let" and "const" is used only for
immutable variables of permanent character (generally spelled in
ALL_CAPS). Using it for any immutable variable regardless on its
permanence would feel confusing.
Any code which is not transpiled and needs to run in ES6 environment
(examples, code in grammars embedded in specs, ...) is kept unchanged.
This is also true for code generated by PEG.js.
See #442.
I no longer think that using raw literal texts in error messages is the
right thing to do. The main reason is that it couples error messages
with details of the grammar such as use of single or double quotes in
literals. A better solution is coming in the next commit.
This reverts commit 69a0f769fc.
Labels in expressions like "(a:"a")" or "(a:"a" b:"b" c:"c")" were
visible to the outside despite being wrapped in parens. This commit
makes them invisible, as they should be.
Note this required introduction of a new "group" AST node, whose purpose
is purely to provide label scope isolation. This was necessary because
"label" and "sequence" nodes don't (and can't!) provide this isolation
themselves.
Part of a fix of #396.
In the past year I worked on various grammars where first/rest or
head/tail were used as labels for parts of lists. I found I associate
head/tail with a list immediately, while in case of first/rest I have to
"parse" grammar rules for a while before understanding their structure.
Moreover, I tend to assume that rest is a list of the same thigs as
first, but I don't have such assumption in case of head/tail. This
assumption was in conflict with the grammar structure.
I'm not sure how much these observations are applicable to others, but I
decided to act on them and switch from first/rest to head/tail.
This is mostly done for consistency with the JavaScript example grammar,
from which the |Identifier| rule is taken from. See the previous commit
for details.
Instead of matching segments between blocks character by character,
match them as a whole. Also align the style with other similar rules
(e.g. the comment ones).
Before this commit, line continuations in character classes contributed
an empty string to the list of characters and character ranges matched
by a class. While this didn't lead to a buggy behavior with the current
code generator, the AST was wrong and the problem could have caused bugs
later.
This commit fixes the problem.
Semantic predicates are kind of |PrimaryExpression|, not kind of
|PrefixedExpression|. Therefore I extracted a rule for them and
referenced it from the |PrimaryExpression|.
Initializer and rules are now separated in a similar way as JavaScript
statements -- either by a semicolon or a line terminator, possibly with
whitespace and comments mixed in.
One consequence is that the grammars like this are now illegal:
foo = "a" bar = "b"
A semicolon needs to be inserted between the rules:
foo = "a";bar = "b"
I consider this a good change as the now-illegal syntax was somewhat
confusing.
This makes the |Primary| rule a bit more tidy. Also, matching the |.|
character really belongs to the lexical part of the grammar, next to
literals and character classes.
* Rename the |Action| rule to |CodeBlock| (it better describes what
the rule matches).
* Implement the rule in a simpler way and move it after more basic
lexical elements.
This change has two side effects:
* Label names can no longer be JavaScript reserved words.
* |$| is allowed again in label names. However, because of the
preference rules, names starting with it will be usually parsed as a
text operator followed by another identifier (denoting a rule
reference or label name).
Before this commit, whitespace was handled at the lexical level by
making tokens consume any whitespace coming after them. This was
accomplished by appending |__| to every token rule.
This commit changes whitespace handling to be more explicit. Tokens no
longer consume whitespace coming after them and syntactic rules have to
cope with it. While this slightly complicates the syntactic grammar, I
think it's a cleaner way. Moreover, it is what JavaScript example
grammar does.
One small side-effect of thich change is that the grammar is now
stand-alone (it doesn't require utils.js anymore).