73 Commits (5a833bd982f8f43f8df7234d0f177030db53b053)

Author SHA1 Message Date
Futago-za Ryuu 0ed8c6f89a Rewrote command line tool
- Split into 3 files: "peg.js", "options.js" and "usage.txt"
- Rewrote arguments parser and helpers to be more precise
- Any arguments after "--" will be passed to "options['--']" now
- Added negation options: "--no-cache" and "--no-trace"
- Added "bare" to accepted module formats
- Added 2 aliases for "--extra-options-file": "-c" and "--config"
- Added short options: "-a", "-f" and "-p"
- Reformatted help text in "usage.txt"
- Updated documentation related to command line options and normal options
- Changed "bin" field in "package.json" from "bin/pegjs" to "bin/peg"
- Added documentation note about command line options that are repeated
- Updated gulpfile.js, replacing "bin/pegjs" with "bin/*.js"

See #429, which was what I intended to fix/solve, but instead pushed back and did this instead.
7 years ago
fatfisz 9c60380f86 Add info about es to the cmd tool 8 years ago
fatfisz c541911c04 Extract formats into array
Previously there was a very long condition.
8 years ago
fatfisz e3b7f0c3a9 Change "esm" to "es" 8 years ago
fatfisz aab928de91 Add support for ES modules 8 years ago
David Majda 65afb7fd5d Remove unnecessary parens around arrow function parameters 8 years ago
David Majda 45b62d66d2 Whitespace fixes 8 years ago
David Majda 47bc456dcd Remove useless variable reference 8 years ago
David Majda ec3f7f5bb0 Remove extra indentation 8 years ago
David Majda ff7193776e Avoid aligning "="
The only exception left are instances where aligning "=" helps to
express symmetry between lines.

See #443.
8 years ago
David Majda 400a3cfa3c Avoid aligning object keys
The only exception left are objects representing a mapping with simple
keys and values -- essentially tables written as object literals.

See #443.
8 years ago
David Majda 6294bb5b13 Use only "//" comments
See #443.
8 years ago
David Majda 516023546d Use one var/let/const per variable (for initialized variables)
Use one var/let/const per variable, but only for initialized variables.
Uninitialized variables are still grouped into one var/let/const
declaration as I don't see any value in separating them. This approach
reflects the fact that initialized and uninitialized var/let/const
declarations are really two different things.

See #443.
8 years ago
David Majda 7f01db2fb8 Get rid of for-in loops
The for-in statement in JavaScript iterates also over inherited
properties. This is typically not desired and requires adding a
check using Object.prototype.hasOwnProperty inside the loop.

This commit replaces all for-in statements and related checks inside
them with iteration over Object.keys(...). The iteration is performed
using either Array.prototype.forEach of a plain for loop.
8 years ago
David Majda 6fa8ad63f9 Replace some functions with arrow functions
Because arrow functions work rather differently than normal functions (a
bad design mistake if you ask me), I decided to be conservative with the
conversion.

I converted:

  * event handlers
  * callbacks
  * arguments to Array.prototype.map & co.
  * small standalone lambda functions

I didn't convert:

  * functions assigned to object literal properties (the new shorthand
    syntax would be better here)
  * functions passed to "describe", "it", etc. in specs (because Jasmine
    relies on dynamic "this")

See #442.
8 years ago
David Majda bdf91b5941 Replace "var" with "let" & "const"
This is purely a mechanical change, not taking advantage of block scope
of "let" and "const". Minimizing variable scope will come in the next
commit.

In general, "var" is converted into "let" and "const" is used only for
immutable variables of permanent character (generally spelled in
ALL_CAPS). Using it for any immutable variable regardless on its
permanence would feel confusing.

Any code which is not transpiled and needs to run in ES6 environment
(examples, code in grammars embedded in specs, ...) is kept unchanged.
This is also true for code generated by PEG.js.

See #442.
8 years ago
David Majda d346d2a66d Replace objects.keys with Object.keys
See #441.
8 years ago
David Majda 8003edafc9 Rename the "node" module format to "commonjs"
Parsers generated in this format use module.exports, so they are not
strictly CommonJS, but this is a common extension and the original name
would be confusing once Node.js implements ES2015 modules.
8 years ago
David Majda 8962dcfd16 Rename the "global" module format to "globals"
I think the new name is more widely used when describing the pattern.
8 years ago
David Majda 75cd17ed58 bin/pegjs: Implement the --format option 9 years ago
David Majda d83e4d5a48 bin/pegjs: Generate parsers in "node" format
I think the "node" format is what most people want these days.

An option to override will be added in the next commit.
9 years ago
David Majda 3672eff31c bin/pegjs: Order peg.generate options alphabetically 9 years ago
David Majda 0a2217d3da bin/pegjs: Order options in a switch statement alphabetically 9 years ago
David Majda ff330a0d4b bin/pegjs: Order options in help text alphabetically 9 years ago
David Majda a57431955e bin/pegjs: Use the -o/--output option to specify the output file
This is more traditional compiler interface. Its main advantage against
specifying the output file as a second argument (which is what bin/pegjs
used until now) is that input and output files can't be mixed up.

Part of #370.
9 years ago
David Majda 9bf7c0c5ff bin/pegjs: Remove detailed instructions from the help text
They don't belong there.

Part of #370.
9 years ago
David Majda 35b3971366 bin/pegjs: Rename the -o option to -O
This will make room for -o to mean --output instead of --optimize. Also,
-O is more traditional option name for describing optimization config
than -o.

Part of #370.
9 years ago
David Majda 1c14a2c8f2 bin/pegjs: Allow using "-" to mean standard input and output
Part of #370.
9 years ago
David Majda f4504a93fe Rename the "buildParser" function to "generate"
In most places, we talk about "generating a parser", not "building a
parser", which the function name should reflect. Also, mentioning a
parser in the name is not necessary as in case of a parser generator
it's pretty clear what is generated.
9 years ago
David Majda 0847a69643 Rename the "PEG" variable to "peg"
So far, PEG.js was exported in a "PEG" global variable when no module
loader was detected. The same variable name was also conventionally used
when requiring it in Node.js or otherwise referring to it. This was
reflected in various places in the code, documentation, examples, etc.

This commit changes the variable name to "peg" and fixes all relevant
occurrences. The main reason for the change is that in Node.js, modules
are generally referred to by lower-case variable names, so "PEG" was
sticking out when used in Node.js projects.
9 years ago
David Majda f390c7cf45 ESLint: Disable no-console in bin/.eslintrc.json, not bin/pegjs
The less clutter in JavaScript files themselves, the better.
9 years ago
David Majda 810567d865 UMD parsers: Allow specifying parser dependencies
Introduce two ways of specifying parser dependencies: the "dependencies"
option of PEG.buildParser and the -d/--dependency CLI option. Specified
dependencies are translated into AMD dependencies and Node.js's
"require" calls when generating an UMD parser.

Part of work on #362.
9 years ago
David Majda a0a57cd22d UMD parsers: Make bin/pegjs generate UMD parsers
Part of work on #362.
9 years ago
David Majda 6a04067a76 bin/pegjs: Do not overwrite extension-less files
Running bin/pegjs with one argument which was an extension-less file
name caused the file to be overwritten. This was because internal
extension rewriting logic didn't handle this case corectly.

This commit changes the logic from regexp-based to path.extname-based,
fixing the problem. The new code generates file names like this:

  Input file name     Output file name
  ------------------------------------
  grammar.ext         grammar.js
  grammar.ext1.ext2   grammar.ext1.js
  grammar.            grammar.js
  grammar             grammar.js

Fixes #405.
9 years ago
David Majda e61c23c634 ESLint: Set environments better
Instead of setting ESLint environment to "node" globally, set it on
per-directory basis using separate .eslintrc.json files:

  Directory   Environment
  -----------------------
  bin         node
  lib         commonjs
  spec        jasmine

It was impossible to use this approach for the "benchmark" directory
which contains a mix of files used in various environments. For
benchmark/run, the environment is set inline. For the other files, as
well as spec/helpers.js, the globals are declared manually (it is
impossible to express how these files are used just by a list of
environments).

Fixes #408.
9 years ago
David Majda c8d23e5471 Fix ESLint errors in bin/pegjs
Fix the following errors:

   12:3  error  Unexpected console statement       no-console
   16:3  error  Unexpected console statement       no-console
   17:3  error  Unexpected console statement       no-console
   18:3  error  Unexpected console statement       no-console
   19:3  error  Unexpected console statement       no-console
   20:3  error  Unexpected console statement       no-console
   21:3  error  Unexpected console statement       no-console
   22:3  error  Unexpected console statement       no-console
   23:3  error  Unexpected console statement       no-console
   24:3  error  Unexpected console statement       no-console
   25:3  error  Unexpected console statement       no-console
   26:3  error  Unexpected console statement       no-console
   27:3  error  Unexpected console statement       no-console
   28:3  error  Unexpected console statement       no-console
   29:3  error  Unexpected console statement       no-console
   30:3  error  Unexpected console statement       no-console
   31:3  error  Unexpected console statement       no-console
   32:3  error  Unexpected console statement       no-console
   33:3  error  Unexpected console statement       no-console
   34:3  error  Unexpected console statement       no-console
   35:3  error  Unexpected console statement       no-console
   36:3  error  Unexpected console statement       no-console
   37:3  error  Unexpected console statement       no-console
   38:3  error  Unexpected console statement       no-console
   39:3  error  Unexpected console statement       no-console
   40:3  error  Unexpected console statement       no-console
   41:3  error  Unexpected console statement       no-console
   42:3  error  Unexpected console statement       no-console
   43:3  error  Unexpected console statement       no-console
   44:3  error  Unexpected console statement       no-console
   56:3  error  Unexpected console statement       no-console
  232:9  error  "inputStream" is already defined   no-redeclare
  240:9  error  "outputStream" is already defined  no-redeclare
9 years ago
David Majda de1704f007 Replace |util.{puts,error}| by |console.{log,error}|
The |util.puts| and |util.error| functions are deprecated in Node.js
0.12.x.

Based on a pull request by Jan Stránský (@burningtree):

  https://github.com/pegjs/pegjs/pull/334
10 years ago
Arlo Breault 12c169e7b5 Convert PEG.js code to strict mode
* Issues #323
10 years ago
David Majda 065f4e1b75 Improve location info in syntax errors
Replace |line|, |column|, and |offset| properties of |SyntaxError| with
the |location| property. It contains an object similar to the one
returned by the |location| function available in action code:

  {
    start: { offset: 23, line: 5, column: 6 },
    end:   { offset: 25, line: 5, column: 8 }
  }

For syntax errors produced in the middle of the input, |start| refers to
the first unparsed character and |end| refers to the character behind it
(meaning the span is 1 character). This corresponds to the portion of
the input in the |found| property.

For syntax errors produced the end of the input, both |start| and |end|
refer to a character past the end of the input (meaning the span is 0
characters).

For syntax errors produced by calling |expected| or |error| functions in
action code the location info is the same as the |location| function
would return.
10 years ago
David Majda da57118a43 Implement basic support for tracing
Parsers can now be generated with support for tracing using the --trace
CLI option or a boolean |trace| option to |PEG.buildParser|. This makes
them trace their progress, which can be useful for debugging. Parsers
generated with tracing support are called "tracing parsers".

When a tracing parser executes, by default it traces the rules it enters
and exits by writing messages to the console. For example, a parser
built from this grammar:

  start = a / b
  a = "a"
  b = "b"

will write this to the console when parsing input "b":

  1:1 rule.enter start
  1:1 rule.enter   a
  1:1 rule.fail    a
  1:1 rule.enter   b
  1:2 rule.match   b
  1:2 rule.match start

You can customize tracing by passing a custom *tracer* to parser's
|parse| method using the |tracer| option:

  parser.parse(input, { trace: tracer });

This will replace the built-in default tracer (which writes to the
console) by the tracer you supplied.

The tracer must be an object with a |trace| method. This method is
called each time a tracing event happens. It takes one argument which is
an object describing the tracing event.

Currently, three events are supported:

  * rule.enter -- triggered when a rule is entered
  * rule.match -- triggered when a rule matches successfully
  * rule.fail  -- triggered when a rule fails to match

These events are triggered in nested pairs -- for each rule.enter event
there is a matching rule.match or rule.fail event.

The event object passed as an argument to |trace| contains these
properties:

  * type   -- event type
  * rule   -- name of the rule the event is related to
  * offset -- parse position at the time of the event
  * line   -- line at the time of the event
  * column -- column at the time of the event
  * result -- rule's match result (only for rule.match event)

The whole tracing API is somewhat experimental (which is why it isn't
documented properly yet) and I expect it will evolve over time as
experience is gained.

The default tracer is also somewhat bare-bones. I hope that PEG.js user
community will develop more sophisticated tracers over time and I'll be
able to integrate their best ideas into the default tracer.
10 years ago
David Majda 95fd64ec15 .jshintrc: Add the "forin" option & fix fallout
Also added few missing |hasOwnProperty| calls that JSHint didn't detect
because it only looks whether there is an |if| statement wrapping the
loop body.
11 years ago
David Majda f22d7aabb5 Fix JSHint errors in bin/pegjs
Fixes the following JSHint errors:

  bin/pegjs: line 66, col 14, 'extraOptions' used out of scope.
  bin/pegjs: line 70, col 19, 'extraOptions' used out of scope.
  bin/pegjs: line 71, col 20, 'extraOptions' used out of scope.
  bin/pegjs: line 80, col 10, Wrap the /regexp/ literal in parens to disambiguate the slash operator.
  bin/pegjs: line 128, col 43, Missing semicolon.
  bin/pegjs: line 128, col 45, Don't make functions within a loop.
  bin/pegjs: line 150, col 13, Redefinition of 'module'.
  bin/pegjs: line 217, col 34, Expected '===' and instead saw '=='.
  bin/pegjs: line 243, col 44, 'source' used out of scope.
  bin/pegjs: line 243, col 61, 'source' used out of scope.
11 years ago
David Majda 851681d663 Implement the --extra-options and --extra-options-file options
These are mainly useful to pass additional options to plugins.
12 years ago
David Majda d013016717 bin/pegjs: Fix help wrapping
All help text should be wrapped at column 80.
12 years ago
David Majda 2dc39bb779 bin/pegjs: Output just the parser source if --export-var is empty
This will make embedding generated parsers into other files easier.

Based on a patch by Glen Huang:

  https://github.com/dmajda/pegjs/pull/143
12 years ago
David Majda e1af175af8 Plugin API: Implement the --plugin option
Implements part of GH-106.
12 years ago
David Majda fe1ca481ab Code generator rewrite
This is a complete rewrite of the PEG.js code generator. Its goals are:

  1. Allow optimizing the generated parser code for code size as well as
     for parsing speed.

  2. Prepare ground for future optimizations and big features (like
     incremental parsing).

  2. Replace the old template-based code-generation system with
     something more lightweight and flexible.

  4. General code cleanup (structure, style, variable names, ...).

New Architecture
----------------

The new code generator consists of two steps:

  * Bytecode generator -- produces bytecode for an abstract virtual
    machine

  * JavaScript generator -- produces JavaScript code based on the
    bytecode

The abstract virtual machine is stack-based. Originally I wanted to make
it register-based, but it turned out that all the code related to it
would be more complex and the bytecode itself would be longer (because
of explicit register specifications in instructions). The only downsides
of the stack-based approach seem to be few small inefficiencies (see
e.g. the |NIP| instruction), which seem to be insignificant.

The new generator allows optimizing for parsing speed or code size (you
can choose using the |optimize| option of the |PEG.buildParser| method
or the --optimize/-o option on the command-line).

When optimizing for size, the JavaScript generator emits the bytecode
together with its constant table and a generic bytecode interpreter.
Because the interpreter is small and the bytecode and constant table
grow only slowly with size of the grammar, the resulting parser is also
small.

When optimizing for speed, the JavaScript generator just compiles the
bytecode into JavaScript. The generated code is relatively efficient, so
the resulting parser is fast.

Internal Identifiers
--------------------

As a small bonus, all internal identifiers visible to user code in the
initializer, actions and predicates are prefixed by |peg$|. This lowers
the chance that identifiers in user code will conflict with the ones
from PEG.js. It also makes using any internals in user code ugly, which
is a good thing. This solves GH-92.

Performance
-----------

The new code generator improved parsing speed and parser code size
significantly. The generated parsers are now:

  * 39% faster when optimizing for speed

  * 69% smaller when optimizing for size (without minification)

  * 31% smaller when optimizing for size (with minification)

(Parsing speed was measured using the |benchmark/run| script. Code size
was measured by generating parsers for examples in the |examples|
directory and adding up the file sizes. Minification was done by |uglify
--ascii| in version 1.3.4.)

Final Note
----------

This is just a beginning! The new code generator lays a foundation upon
which many optimizations and improvements can (and will) be made.

Stay tuned :-)
12 years ago
David Majda 3333cdd18d Position tracking: Kill the |trackLineAndColumn| option
Getting rid of the |trackLineAndColumn| simplifies the code generator
(by unifying two paths in the code).

The |line| and |column| functions currently always compute all the
position info from scratch, which is horribly ineffective. This will be
improved in later commit(s).
12 years ago
David Majda 05a6bad989 Kill the |toSource| method, introduce the |output| option
Before this commit, |PEG.buildParser| always returned a parser object.
The only way to get its source code was to call the |toSource| method on
it. While this method worked for parsers produced by |PEG.buildParser|
directly, it didn't work for parsers instantiated by executing their
source code. In other words, it was unreliable.

This commit remvoes the |toSource| method on generated parsers and
introduces a new |output| option to |PEG.buildParser|. It allows callers
to specify whether they want to get back the parser object
(|options.output === "parser"|) or its source code (|options.output ===
"source"|). This is much better and more reliable API.
12 years ago
David Majda 208cc33930 Allowed start rules must be specified explicitly
Before this commit, generated parser were able to start parsing from any
rule. This was nice, but it made rule code inlining impossible.

Since this commit, the list of allowed start rules has to be specified
explicitly using the |allowedStartRules| option of the |PEG.buildParser|
method (or the --allowed-start-rule option on the command-line). These
rules will be excluded from inlining when it's implemented.
12 years ago