54 Commits (9446e07f6acba905481e2e71c0a37774e479be70)

Author SHA1 Message Date
David Majda 75cd17ed58 bin/pegjs: Implement the --format option 8 years ago
David Majda d83e4d5a48 bin/pegjs: Generate parsers in "node" format
I think the "node" format is what most people want these days.

An option to override will be added in the next commit.
8 years ago
David Majda 3672eff31c bin/pegjs: Order peg.generate options alphabetically 9 years ago
David Majda 0a2217d3da bin/pegjs: Order options in a switch statement alphabetically 9 years ago
David Majda ff330a0d4b bin/pegjs: Order options in help text alphabetically 9 years ago
David Majda a57431955e bin/pegjs: Use the -o/--output option to specify the output file
This is more traditional compiler interface. Its main advantage against
specifying the output file as a second argument (which is what bin/pegjs
used until now) is that input and output files can't be mixed up.

Part of #370.
9 years ago
David Majda 9bf7c0c5ff bin/pegjs: Remove detailed instructions from the help text
They don't belong there.

Part of #370.
9 years ago
David Majda 35b3971366 bin/pegjs: Rename the -o option to -O
This will make room for -o to mean --output instead of --optimize. Also,
-O is more traditional option name for describing optimization config
than -o.

Part of #370.
9 years ago
David Majda 1c14a2c8f2 bin/pegjs: Allow using "-" to mean standard input and output
Part of #370.
9 years ago
David Majda f4504a93fe Rename the "buildParser" function to "generate"
In most places, we talk about "generating a parser", not "building a
parser", which the function name should reflect. Also, mentioning a
parser in the name is not necessary as in case of a parser generator
it's pretty clear what is generated.
9 years ago
David Majda 0847a69643 Rename the "PEG" variable to "peg"
So far, PEG.js was exported in a "PEG" global variable when no module
loader was detected. The same variable name was also conventionally used
when requiring it in Node.js or otherwise referring to it. This was
reflected in various places in the code, documentation, examples, etc.

This commit changes the variable name to "peg" and fixes all relevant
occurrences. The main reason for the change is that in Node.js, modules
are generally referred to by lower-case variable names, so "PEG" was
sticking out when used in Node.js projects.
9 years ago
David Majda f390c7cf45 ESLint: Disable no-console in bin/.eslintrc.json, not bin/pegjs
The less clutter in JavaScript files themselves, the better.
9 years ago
David Majda 810567d865 UMD parsers: Allow specifying parser dependencies
Introduce two ways of specifying parser dependencies: the "dependencies"
option of PEG.buildParser and the -d/--dependency CLI option. Specified
dependencies are translated into AMD dependencies and Node.js's
"require" calls when generating an UMD parser.

Part of work on #362.
9 years ago
David Majda a0a57cd22d UMD parsers: Make bin/pegjs generate UMD parsers
Part of work on #362.
9 years ago
David Majda 6a04067a76 bin/pegjs: Do not overwrite extension-less files
Running bin/pegjs with one argument which was an extension-less file
name caused the file to be overwritten. This was because internal
extension rewriting logic didn't handle this case corectly.

This commit changes the logic from regexp-based to path.extname-based,
fixing the problem. The new code generates file names like this:

  Input file name     Output file name
  ------------------------------------
  grammar.ext         grammar.js
  grammar.ext1.ext2   grammar.ext1.js
  grammar.            grammar.js
  grammar             grammar.js

Fixes #405.
9 years ago
David Majda e61c23c634 ESLint: Set environments better
Instead of setting ESLint environment to "node" globally, set it on
per-directory basis using separate .eslintrc.json files:

  Directory   Environment
  -----------------------
  bin         node
  lib         commonjs
  spec        jasmine

It was impossible to use this approach for the "benchmark" directory
which contains a mix of files used in various environments. For
benchmark/run, the environment is set inline. For the other files, as
well as spec/helpers.js, the globals are declared manually (it is
impossible to express how these files are used just by a list of
environments).

Fixes #408.
9 years ago
David Majda c8d23e5471 Fix ESLint errors in bin/pegjs
Fix the following errors:

   12:3  error  Unexpected console statement       no-console
   16:3  error  Unexpected console statement       no-console
   17:3  error  Unexpected console statement       no-console
   18:3  error  Unexpected console statement       no-console
   19:3  error  Unexpected console statement       no-console
   20:3  error  Unexpected console statement       no-console
   21:3  error  Unexpected console statement       no-console
   22:3  error  Unexpected console statement       no-console
   23:3  error  Unexpected console statement       no-console
   24:3  error  Unexpected console statement       no-console
   25:3  error  Unexpected console statement       no-console
   26:3  error  Unexpected console statement       no-console
   27:3  error  Unexpected console statement       no-console
   28:3  error  Unexpected console statement       no-console
   29:3  error  Unexpected console statement       no-console
   30:3  error  Unexpected console statement       no-console
   31:3  error  Unexpected console statement       no-console
   32:3  error  Unexpected console statement       no-console
   33:3  error  Unexpected console statement       no-console
   34:3  error  Unexpected console statement       no-console
   35:3  error  Unexpected console statement       no-console
   36:3  error  Unexpected console statement       no-console
   37:3  error  Unexpected console statement       no-console
   38:3  error  Unexpected console statement       no-console
   39:3  error  Unexpected console statement       no-console
   40:3  error  Unexpected console statement       no-console
   41:3  error  Unexpected console statement       no-console
   42:3  error  Unexpected console statement       no-console
   43:3  error  Unexpected console statement       no-console
   44:3  error  Unexpected console statement       no-console
   56:3  error  Unexpected console statement       no-console
  232:9  error  "inputStream" is already defined   no-redeclare
  240:9  error  "outputStream" is already defined  no-redeclare
9 years ago
David Majda de1704f007 Replace |util.{puts,error}| by |console.{log,error}|
The |util.puts| and |util.error| functions are deprecated in Node.js
0.12.x.

Based on a pull request by Jan Stránský (@burningtree):

  https://github.com/pegjs/pegjs/pull/334
9 years ago
Arlo Breault 12c169e7b5 Convert PEG.js code to strict mode
* Issues #323
10 years ago
David Majda 065f4e1b75 Improve location info in syntax errors
Replace |line|, |column|, and |offset| properties of |SyntaxError| with
the |location| property. It contains an object similar to the one
returned by the |location| function available in action code:

  {
    start: { offset: 23, line: 5, column: 6 },
    end:   { offset: 25, line: 5, column: 8 }
  }

For syntax errors produced in the middle of the input, |start| refers to
the first unparsed character and |end| refers to the character behind it
(meaning the span is 1 character). This corresponds to the portion of
the input in the |found| property.

For syntax errors produced the end of the input, both |start| and |end|
refer to a character past the end of the input (meaning the span is 0
characters).

For syntax errors produced by calling |expected| or |error| functions in
action code the location info is the same as the |location| function
would return.
10 years ago
David Majda da57118a43 Implement basic support for tracing
Parsers can now be generated with support for tracing using the --trace
CLI option or a boolean |trace| option to |PEG.buildParser|. This makes
them trace their progress, which can be useful for debugging. Parsers
generated with tracing support are called "tracing parsers".

When a tracing parser executes, by default it traces the rules it enters
and exits by writing messages to the console. For example, a parser
built from this grammar:

  start = a / b
  a = "a"
  b = "b"

will write this to the console when parsing input "b":

  1:1 rule.enter start
  1:1 rule.enter   a
  1:1 rule.fail    a
  1:1 rule.enter   b
  1:2 rule.match   b
  1:2 rule.match start

You can customize tracing by passing a custom *tracer* to parser's
|parse| method using the |tracer| option:

  parser.parse(input, { trace: tracer });

This will replace the built-in default tracer (which writes to the
console) by the tracer you supplied.

The tracer must be an object with a |trace| method. This method is
called each time a tracing event happens. It takes one argument which is
an object describing the tracing event.

Currently, three events are supported:

  * rule.enter -- triggered when a rule is entered
  * rule.match -- triggered when a rule matches successfully
  * rule.fail  -- triggered when a rule fails to match

These events are triggered in nested pairs -- for each rule.enter event
there is a matching rule.match or rule.fail event.

The event object passed as an argument to |trace| contains these
properties:

  * type   -- event type
  * rule   -- name of the rule the event is related to
  * offset -- parse position at the time of the event
  * line   -- line at the time of the event
  * column -- column at the time of the event
  * result -- rule's match result (only for rule.match event)

The whole tracing API is somewhat experimental (which is why it isn't
documented properly yet) and I expect it will evolve over time as
experience is gained.

The default tracer is also somewhat bare-bones. I hope that PEG.js user
community will develop more sophisticated tracers over time and I'll be
able to integrate their best ideas into the default tracer.
10 years ago
David Majda 95fd64ec15 .jshintrc: Add the "forin" option & fix fallout
Also added few missing |hasOwnProperty| calls that JSHint didn't detect
because it only looks whether there is an |if| statement wrapping the
loop body.
11 years ago
David Majda f22d7aabb5 Fix JSHint errors in bin/pegjs
Fixes the following JSHint errors:

  bin/pegjs: line 66, col 14, 'extraOptions' used out of scope.
  bin/pegjs: line 70, col 19, 'extraOptions' used out of scope.
  bin/pegjs: line 71, col 20, 'extraOptions' used out of scope.
  bin/pegjs: line 80, col 10, Wrap the /regexp/ literal in parens to disambiguate the slash operator.
  bin/pegjs: line 128, col 43, Missing semicolon.
  bin/pegjs: line 128, col 45, Don't make functions within a loop.
  bin/pegjs: line 150, col 13, Redefinition of 'module'.
  bin/pegjs: line 217, col 34, Expected '===' and instead saw '=='.
  bin/pegjs: line 243, col 44, 'source' used out of scope.
  bin/pegjs: line 243, col 61, 'source' used out of scope.
11 years ago
David Majda 851681d663 Implement the --extra-options and --extra-options-file options
These are mainly useful to pass additional options to plugins.
12 years ago
David Majda d013016717 bin/pegjs: Fix help wrapping
All help text should be wrapped at column 80.
12 years ago
David Majda 2dc39bb779 bin/pegjs: Output just the parser source if --export-var is empty
This will make embedding generated parsers into other files easier.

Based on a patch by Glen Huang:

  https://github.com/dmajda/pegjs/pull/143
12 years ago
David Majda e1af175af8 Plugin API: Implement the --plugin option
Implements part of GH-106.
12 years ago
David Majda fe1ca481ab Code generator rewrite
This is a complete rewrite of the PEG.js code generator. Its goals are:

  1. Allow optimizing the generated parser code for code size as well as
     for parsing speed.

  2. Prepare ground for future optimizations and big features (like
     incremental parsing).

  2. Replace the old template-based code-generation system with
     something more lightweight and flexible.

  4. General code cleanup (structure, style, variable names, ...).

New Architecture
----------------

The new code generator consists of two steps:

  * Bytecode generator -- produces bytecode for an abstract virtual
    machine

  * JavaScript generator -- produces JavaScript code based on the
    bytecode

The abstract virtual machine is stack-based. Originally I wanted to make
it register-based, but it turned out that all the code related to it
would be more complex and the bytecode itself would be longer (because
of explicit register specifications in instructions). The only downsides
of the stack-based approach seem to be few small inefficiencies (see
e.g. the |NIP| instruction), which seem to be insignificant.

The new generator allows optimizing for parsing speed or code size (you
can choose using the |optimize| option of the |PEG.buildParser| method
or the --optimize/-o option on the command-line).

When optimizing for size, the JavaScript generator emits the bytecode
together with its constant table and a generic bytecode interpreter.
Because the interpreter is small and the bytecode and constant table
grow only slowly with size of the grammar, the resulting parser is also
small.

When optimizing for speed, the JavaScript generator just compiles the
bytecode into JavaScript. The generated code is relatively efficient, so
the resulting parser is fast.

Internal Identifiers
--------------------

As a small bonus, all internal identifiers visible to user code in the
initializer, actions and predicates are prefixed by |peg$|. This lowers
the chance that identifiers in user code will conflict with the ones
from PEG.js. It also makes using any internals in user code ugly, which
is a good thing. This solves GH-92.

Performance
-----------

The new code generator improved parsing speed and parser code size
significantly. The generated parsers are now:

  * 39% faster when optimizing for speed

  * 69% smaller when optimizing for size (without minification)

  * 31% smaller when optimizing for size (with minification)

(Parsing speed was measured using the |benchmark/run| script. Code size
was measured by generating parsers for examples in the |examples|
directory and adding up the file sizes. Minification was done by |uglify
--ascii| in version 1.3.4.)

Final Note
----------

This is just a beginning! The new code generator lays a foundation upon
which many optimizations and improvements can (and will) be made.

Stay tuned :-)
12 years ago
David Majda 3333cdd18d Position tracking: Kill the |trackLineAndColumn| option
Getting rid of the |trackLineAndColumn| simplifies the code generator
(by unifying two paths in the code).

The |line| and |column| functions currently always compute all the
position info from scratch, which is horribly ineffective. This will be
improved in later commit(s).
12 years ago
David Majda 05a6bad989 Kill the |toSource| method, introduce the |output| option
Before this commit, |PEG.buildParser| always returned a parser object.
The only way to get its source code was to call the |toSource| method on
it. While this method worked for parsers produced by |PEG.buildParser|
directly, it didn't work for parsers instantiated by executing their
source code. In other words, it was unreliable.

This commit remvoes the |toSource| method on generated parsers and
introduces a new |output| option to |PEG.buildParser|. It allows callers
to specify whether they want to get back the parser object
(|options.output === "parser"|) or its source code (|options.output ===
"source"|). This is much better and more reliable API.
12 years ago
David Majda 208cc33930 Allowed start rules must be specified explicitly
Before this commit, generated parser were able to start parsing from any
rule. This was nice, but it made rule code inlining impossible.

Since this commit, the list of allowed start rules has to be specified
explicitly using the |allowedStartRules| option of the |PEG.buildParser|
method (or the --allowed-start-rule option on the command-line). These
rules will be excluded from inlining when it's implemented.
12 years ago
David Majda 8f71c07cec Implement the "--cache" command-line option 13 years ago
David Majda 58cc5b739d Implement "--track-line-and-column" command-line option 13 years ago
David Majda a0898388fb /bin/pegjs: Avoid calling |process.openStdin|
While |process.openStdin| is not officially deprecated, it's no longer
documented and just using |process.stdin| and resuming it seems to be
the official way.
13 years ago
David Majda de256105eb /bin/pegjs: Don't close standard output
Avoids "Error: process.stdout cannot be closed" error when invoked
without file arguments.
13 years ago
David Majda fb5028eb90 Use |util| module instead of |sys|
|sys| emits a warning in Node.js 0.6.x.
13 years ago
David Majda c90e7f369b Fix regexp for detecting command-line options in /bin/pegjs
Closes GH-51.
13 years ago
David Majda dcf904c392 bin/pegjs: Default parser variable name is "module.exports"
The previous default name was "exports.parser". This meant that to use
the generated parser in Node.js, you had to use code like this:

  var parser = require("./my-cool-parser").parser;
  parser.parse(...);

Now you can shorten it a bit:

  var parser = require("./my-cool-parser");
  parser.parse(...);

The shorter version makes sense since no other objects except the parser
are exported from the module.
14 years ago
David Majda d5caaa7877 Nicer messages in command-line mode on read/write errors 14 years ago
David Majda 957b96c1b5 Add check for missing parameter of the -e/--export-var option. 14 years ago
David Majda d0c074e2f8 Small style fixes 14 years ago
David Majda 814ce7d9db Switch command-line mode backend from Rhino to Node 14 years ago
David Majda 4d68812b65 Fix usage description 14 years ago
David Majda 977d1d20c7 Fix wrong version reported by "bin/pegjs --version"
DRY: Now the version is stored only in the VERSION file.
14 years ago
David Majda a12a24fca1 Make parsers generated by /bin/pegjs CommonJS modules by default 14 years ago
David Majda e59f3ba338 Split the source code into several files, introduce build system
The source code is now in the src directory. The library needs to be
built using "rake", which creates the lib/peg.js file by combining the
source files.
14 years ago
David Majda 917cf1cf2a Start rule of the grammar is now implicitly its first rule
Before this change, the start rule was the one named "start" and there
was an option to override that. This is now impossible.

The goal of this change is to contain all information for the parser
generation in the grammar itself.

In the future, some override directive for the start rule (like Bison's
"%start") may be added to the grammar.
15 years ago
David Majda 81eced29b2 Whitespace fixes 15 years ago
David Majda 08635b658b Make bin/pegjs work when called via a symlink
Similar issue exists on Windows too (they have symlinks since Vista), but I
could not find how to dereference symlinks from batch files, so I did not fix
it. I guess this does not matter much given how little the symlinks are used in
the Windows world.

Closes #1.
15 years ago
David Majda e63f64a3d5 Make the generated parsers standalone (no runtime is required).
This and also speeds up the benchmark suite execution by 7.83 % on V8.

Detailed results (benchmark suite totals):

---------------------------------
 Test #     Before       After
---------------------------------
      1   26.17 kB/s   28.16 kB/s
      2   26.05 kB/s   28.16 kB/s
      3   25.99 kB/s   28.10 kB/s
      4   26.13 kB/s   28.11 kB/s
      5   26.14 kB/s   28.07 kB/s
---------------------------------
Average   26.10 kB/s   28.14 kB/s
---------------------------------

Mozilla/5.0 (X11; U; Linux i686; en-US) AppleWebKit/533.2 (KHTML, like Gecko) Chrome/5.0.342.7 Safari/533.2
15 years ago