25 Commits (64eb5faf54c679058f44f2ff9cf7d81dca9f6af1)

Author SHA1 Message Date
David Majda 2263a30034 Update version to 0.8.0 11 years ago
David Majda 4fe0167a70 Convert CHANGELOG to Markdown
* Convert CHANGELOG to Markdown.
  * Improve formatting a bit.
  * Add links to GitHub issues
  * Fix typos.
11 years ago
David Majda a449f12efe Require Node.js >= 0.8.0 11 years ago
David Majda b3c6a997b0 Use JSHint 2.3.0 11 years ago
David Majda 1ea9a5f340 Use UglifyJS 2.4.7
The |uglifyjs| call had to be adapted because the options changed
significantly in version 2.
11 years ago
David Majda 2f2152204a Refine error handling further
Before this commit, the |expected| and |error| functions didn't halt the
parsing immediately, but triggered a regular match failure. After they
were called, the parser could backtrack, try another branches, and only
if no other branch succeeded, it triggered an exception with information
possibly based on parameters passed to the |expected| or |error|
function (this depended on positions where failures in other branches
have occurred).

While nice in theory, this solution didn't work well in practice. There
were at least two problems:

  1. Action expression could have easily triggered a match failure later
     in the input than the action itself. This resulted in the
     action-triggered failure to be shadowed by the expression-triggered
     one.

     Consider the following example:

       integer = digits:[0-9]+ {
         var result = parseInt(digits.join(""), 10);

         if (result % 2 === 0) {
           error("The number must be an odd integer.");
           return;
         }

         return result;
       }

     Given input "2", the |[0-9]+| expression would record a match
     failure at position 1 (an unsuccessful attempt to parse yet another
     digit after "2"). However, a failure triggered by the |error| call
     would occur at position 0.

     This problem could have been solved by silencing match failures in
     action expressions, but that would lead to severe performance
     problems (yes, I tried and measured). Other possible solutions are
     hacks which I didn't want to introduce into PEG.js.

  2. Triggering a match failure in action code could have lead to
     unexpected backtracking.

     Consider the following example:

       class = "[" (charRange / char)* "]"

       charRange = begin:char "-" end:char {
         if (begin.data.charCodeAt(0) > end.data.charCodeAt(0)) {
           error("Invalid character range: " + begin + "-" + end + ".");
         }

         // ...
       }

       char = [a-zA-Z0-9_\-]

     Given input "[b-a]", the |charRange| rule would fail, but the
     parser would try the |char| rule and succeed repeatedly, resulting
     in "b-a" being parsed as a sequence of three |char|'s, which it is
     not.

     This problem could have been solved by using negative predicates,
     but that would complicate the grammar and still wouldn't get rid of
     unintuitive behavior.

Given these problems I decided to change the semantics of the |expected|
and |error| functions. They don't interact with regular match failure
mechanism anymore, but they cause and immediate parse failure by
throwing an exception. I think this is more intuitive behavior with less
harmful side effects.

The disadvantage of the new approach is that one can't backtrack from an
action-triggered error. I don't see this as a big deal as I think this
will be rarely needed and one can always use a semantic predicate as a
workaround.

Speed impact
------------
Before:     993.84 kB/s
After:      998.05 kB/s
Difference: 0.42%

Size impact
-----------
Before:     1019968 b
After:      975434 b
Difference: -4.37%

(Measured by /tools/impact with Node.js v0.6.18 on x86_64 GNU/Linux.)
11 years ago
David Majda af701dcf80 Error handling: Implement the |expected| function
The |expected| function allows users to report regular match failures
inside actions.

If the |expected| function is called, and the reported match failure
turns out to be the cause of a parse error, the error message reported
by the parser will be in the usual "Expected ... but found ..." format
with the description specified in the |expected| call used as part of
the message.

Implements part of #198.

Speed impact
------------
Before:     1146.82 kB/s
After:      1031.25 kB/s
Difference: -10.08%

Size impact
-----------
Before:     950817 b
After:      973269 b
Difference: 2.36%

(Measured by /tools/impact with Node.js v0.6.18 on x86_64 GNU/Linux.)
11 years ago
Andrei Neculau 7dc9a9ae76 Upgrade jasmine and jasmine-node 11 years ago
David Majda 3b3798fa39 Merge lib/compiler/passes.js into lib/compiler.js
It didn't make sense to have the passes in a separate file.
12 years ago
David Majda fe1ca481ab Code generator rewrite
This is a complete rewrite of the PEG.js code generator. Its goals are:

  1. Allow optimizing the generated parser code for code size as well as
     for parsing speed.

  2. Prepare ground for future optimizations and big features (like
     incremental parsing).

  2. Replace the old template-based code-generation system with
     something more lightweight and flexible.

  4. General code cleanup (structure, style, variable names, ...).

New Architecture
----------------

The new code generator consists of two steps:

  * Bytecode generator -- produces bytecode for an abstract virtual
    machine

  * JavaScript generator -- produces JavaScript code based on the
    bytecode

The abstract virtual machine is stack-based. Originally I wanted to make
it register-based, but it turned out that all the code related to it
would be more complex and the bytecode itself would be longer (because
of explicit register specifications in instructions). The only downsides
of the stack-based approach seem to be few small inefficiencies (see
e.g. the |NIP| instruction), which seem to be insignificant.

The new generator allows optimizing for parsing speed or code size (you
can choose using the |optimize| option of the |PEG.buildParser| method
or the --optimize/-o option on the command-line).

When optimizing for size, the JavaScript generator emits the bytecode
together with its constant table and a generic bytecode interpreter.
Because the interpreter is small and the bytecode and constant table
grow only slowly with size of the grammar, the resulting parser is also
small.

When optimizing for speed, the JavaScript generator just compiles the
bytecode into JavaScript. The generated code is relatively efficient, so
the resulting parser is fast.

Internal Identifiers
--------------------

As a small bonus, all internal identifiers visible to user code in the
initializer, actions and predicates are prefixed by |peg$|. This lowers
the chance that identifiers in user code will conflict with the ones
from PEG.js. It also makes using any internals in user code ugly, which
is a good thing. This solves GH-92.

Performance
-----------

The new code generator improved parsing speed and parser code size
significantly. The generated parsers are now:

  * 39% faster when optimizing for speed

  * 69% smaller when optimizing for size (without minification)

  * 31% smaller when optimizing for size (with minification)

(Parsing speed was measured using the |benchmark/run| script. Code size
was measured by generating parsers for examples in the |examples|
directory and adding up the file sizes. Minification was done by |uglify
--ascii| in version 1.3.4.)

Final Note
----------

This is just a beginning! The new code generator lays a foundation upon
which many optimizations and improvements can (and will) be made.

Stay tuned :-)
12 years ago
David Majda dd2216da7e Fix versions of development dependencies
This ensures stable environment for development, CI, browser builds,
etc.
12 years ago
David Majda 32e372be92 package.json: Formatting 12 years ago
David Majda 0519d7e3ce Git repo npmization: Make the repo a npm package
Includes:

  * Moving the source code from /src to /lib.
  * Adding an explicit file list to package.json
  * Updating the Makefile.
  * Updating the spec and benchmark suites and their READMEs.

Part of a fix for GH-32.
12 years ago
David Majda a2672e0b48 Make "npm test" work
This is will be useful for Travis CI integration
12 years ago
David Majda adfeb87c82 Do not preprecess package.json
Before this commit, package.json in the project root directory was
preprocessed in order to insert correct version into it. This made it
invalid JSON and thus unusable for npm purposes.

This commit makes package.json a valid JSON by hardcoding the version
into it. I think that introducing this small duplicity is outweighted by
being able to use npm in project root directory. For example, it is now
possible to make the "npm test" command work and introduce Travis CI
integration.
12 years ago
David Majda c27b96051a Jasmine: Initial infrastructure
This is the first of many commits that gradually convert PEG.js's test
suite from QUnit to Jasmine, cleaning it up on the way.

Main reason for the change is that Jasmine allows nested contexts,
allowing to structure the tests in a better way than QUnit. Moreover,
the tests needed to be cleaned up a bit.
13 years ago
David Majda bc5abfef5c Replace Jakefile with Makefile
Doing scripting tasks in JavaScript is painful.
13 years ago
David Majda fa1523b651 Update version of Node.js and development dependencies in package.json
The new versions are the ones I test with.
13 years ago
David Majda c7f99019c2 Add "jake hint" task that checks all javaScript files using JSHint
This currently outputs many issues. These will be fixed in subsequent
commits.
13 years ago
David Majda bafb8655f7 Clean up package.json
The engine's and dependencies' versions are the ones I've tested with.
Lower version will probably work too, but I don't want to spend more
time testing now so I'll play it safe.
14 years ago
David Majda 69044e9d0b Add "dist" Jakefile task that prepares the distribution files 14 years ago
David Majda aca15d6f36 Change Node.js pacakge name to from "peg" to "pegjs"
The only place where we use the name without "js" is the library
filename (peg.js) and consequently the module name (PEG).
14 years ago
David Majda db32ff2d0d Change version to 0.6.0pre 14 years ago
David Majda 1e57bf778d Require Node.js 0.4 or higher
This is not strictly necessary now, but I won't test PEG.js with lower
versions, so I can't guarantee correct functionality.
14 years ago
David Majda 595d3adb82 Add package.json for installing as Node package 14 years ago