53 Commits (9446e07f6acba905481e2e71c0a37774e479be70)

Author SHA1 Message Date
David Majda 9446e07f6a Use uglify-js 2.7.0 8 years ago
David Majda a8d01e1834 Use eslint 3.2.0
Note the update required disabling the "no-control-regex" rule, which
started reporting following errors:

  /Users/dmajda/Programming/PEG.js/pegjs/lib/compiler/js.js
    22:16  error  Unexpected control character in regular expression  no-control-regex
    27:16  error  Unexpected control character in regular expression  no-control-regex
    28:16  error  Unexpected control character in regular expression  no-control-regex
    51:16  error  Unexpected control character in regular expression  no-control-regex
    52:16  error  Unexpected control character in regular expression  no-control-regex

  /Users/dmajda/Programming/PEG.js/pegjs/lib/parser.js
    76:16  error  Unexpected control character in regular expression  no-control-regex
    77:16  error  Unexpected control character in regular expression  no-control-regex
    90:16  error  Unexpected control character in regular expression  no-control-regex
    91:16  error  Unexpected control character in regular expression  no-control-regex
8 years ago
David Majda e6c3e7180f Use browserify 13.1.0 8 years ago
David Majda 67d85f2de8 Align names of compiler passes that detect infinite loops
Rename compiler passes as follows:

  reportLeftRecursion -> reportInfiniteRecursion
  reportInfiniteLoops -> reportInfiniteRepetition

This reflects the fact that both passes detect different ways of causing
the same problem (possible infinite loop when parsing).
8 years ago
David Majda 9717dc3417 Talk about "undefined rules" instead of "missing rules" internally
The new terminology is more precise and in line with commonly used
programming languages.

The change involves mainly renaming related compiler pass and files
associated with it.
8 years ago
David Majda eb67cbedb4 Report duplicate labels as errors
Resolves #270.
9 years ago
David Majda f9bc0fe619 package.json: Sort files alphabetically 9 years ago
David Majda eb5875bc6a Report duplicate rule definitions as errors
Based on a pull request by Futago-za Ryuu (@futagoza):

  https://github.com/pegjs/pegjs/pull/329

Resolves #318.
9 years ago
David Majda 930877c3ba Move lib/compiler.js to lib/compiler/index.js
This makes "compiler" a regular Node.js module.
9 years ago
David Majda c50ad15461 Use http-server to serve specs and benchmarks to the browser
Previously, the instructions recommended using Python's SimpleHTTPServer
module, which created unnecessary dependency on Python.
9 years ago
David Majda 768ece28e6 Use ESLint instead of JSHint
Implement the swap and change various directives in the source code. The
"make hint" target becomes "make lint".

The change leads to quite some errors being reported by ESLint. These
will be fixed in subsequent commits.

Note the configuration enables just the recommended rules. Later I plan
to enable more rules to enforce the coding standard. The configuration
also sets the environment to "node", which is far from ideal as the
codebase contains a mix of CommonJS, Node.js and browser code. I hope to
clean this up at some point.
9 years ago
David Majda 3a80933430 package.json: Update the "engine" field
* Use just 0.10 instead of 0.10.0 (it's shorter).

  * Remove space after >= (it is apparently conventionally not used in
    the npm world).
9 years ago
David Majda 3796cbfbad package.json: Reformat to formatting used by npm
This makes it possible to do e.g. "npm install --save" without a lot of
whitespace noise in the resulting diff.
9 years ago
Joseph Frazier 0d8c045823 Use browserify for building the browser version
This resolves https://github.com/pegjs/pegjs/issues/373 and,
since `browserify` produces a UMD bundle (due to `--standalone PEG`),
addresses the first part of https://github.com/pegjs/pegjs/issues/362,

> 1. Making PEG.js itself UMD module.

This also adds a MAIN_FILE variable to the Makefile, as specified by
7fe3aeb999 (commitcomment-13817973)
9 years ago
David Majda efb420479e Tweak package.json 9 years ago
David Majda 48fe4d6580 Rename generate-javascript.js to generate-js.js
Short & sweet.
9 years ago
David Majda 575883586f Rename javascript.js to js.js
Short & sweet.
9 years ago
David Majda 20a4fb2e7f Update version to 0.9.0 9 years ago
David Majda ae407572e6 Use jshint 2.8.0 9 years ago
David Majda 9cea4502f0 Use uglify-js 2.4.24 9 years ago
David Majda 698dba299a Use jasmine-node 1.14.5 9 years ago
David Majda 9eb62051be package.json: Add the "license" attribute
Based on a pull request by Peter deHaan (@pdehaan):

  https://github.com/pegjs/pegjs/pull/338
9 years ago
Arlo Breault c71f723b3f Run `make hint` before testing 10 years ago
David Majda d7fc0b5c3b Implement infinite loop detection
Fixes #26.
10 years ago
David Majda fb7de36051 Update website URL
PEG.js website was moved from http://pegjs.majda.cz/ to http://pegjs.org/.
10 years ago
David Majda 178d56699a Update GitHub project URLs
See https://groups.google.com/d/msg/pegjs/4a6zWKQSG6U/n8Pm257Lz6wJ.

I didn't update CHANGELOG.md as I consider issue URLs there historical artifacts
;-)
10 years ago
David Majda 4a3b9cbb8d Require Node.js >= 0.10.0
Travis CI builds with Node.js 0.8.x started to fail:

  https://travis-ci.org/dmajda/pegjs/jobs/26691570

Rather than investigating what's wrong I decided to stop supporting Node
0.8.x. Node.js 0.10.x is here for over a year, which should be enough
time for everyone to upgrade in the fast-paced Node.js world.
11 years ago
David Majda 5adad3ae12 Utility functions cleanup: Split lib/utils.js
Split lib/utils.js into multiple files. Some of the functions were
generic, these were moved into files in lib/utils. Other funtions were
specific for the compiler, these were moved to files in lib/compiler.

This commit only moves functions around -- there is no renaming and
cleanup performed. Both will come later.
11 years ago
David Majda 2263a30034 Update version to 0.8.0 11 years ago
David Majda 4fe0167a70 Convert CHANGELOG to Markdown
* Convert CHANGELOG to Markdown.
  * Improve formatting a bit.
  * Add links to GitHub issues
  * Fix typos.
11 years ago
David Majda a449f12efe Require Node.js >= 0.8.0 11 years ago
David Majda b3c6a997b0 Use JSHint 2.3.0 11 years ago
David Majda 1ea9a5f340 Use UglifyJS 2.4.7
The |uglifyjs| call had to be adapted because the options changed
significantly in version 2.
11 years ago
David Majda 2f2152204a Refine error handling further
Before this commit, the |expected| and |error| functions didn't halt the
parsing immediately, but triggered a regular match failure. After they
were called, the parser could backtrack, try another branches, and only
if no other branch succeeded, it triggered an exception with information
possibly based on parameters passed to the |expected| or |error|
function (this depended on positions where failures in other branches
have occurred).

While nice in theory, this solution didn't work well in practice. There
were at least two problems:

  1. Action expression could have easily triggered a match failure later
     in the input than the action itself. This resulted in the
     action-triggered failure to be shadowed by the expression-triggered
     one.

     Consider the following example:

       integer = digits:[0-9]+ {
         var result = parseInt(digits.join(""), 10);

         if (result % 2 === 0) {
           error("The number must be an odd integer.");
           return;
         }

         return result;
       }

     Given input "2", the |[0-9]+| expression would record a match
     failure at position 1 (an unsuccessful attempt to parse yet another
     digit after "2"). However, a failure triggered by the |error| call
     would occur at position 0.

     This problem could have been solved by silencing match failures in
     action expressions, but that would lead to severe performance
     problems (yes, I tried and measured). Other possible solutions are
     hacks which I didn't want to introduce into PEG.js.

  2. Triggering a match failure in action code could have lead to
     unexpected backtracking.

     Consider the following example:

       class = "[" (charRange / char)* "]"

       charRange = begin:char "-" end:char {
         if (begin.data.charCodeAt(0) > end.data.charCodeAt(0)) {
           error("Invalid character range: " + begin + "-" + end + ".");
         }

         // ...
       }

       char = [a-zA-Z0-9_\-]

     Given input "[b-a]", the |charRange| rule would fail, but the
     parser would try the |char| rule and succeed repeatedly, resulting
     in "b-a" being parsed as a sequence of three |char|'s, which it is
     not.

     This problem could have been solved by using negative predicates,
     but that would complicate the grammar and still wouldn't get rid of
     unintuitive behavior.

Given these problems I decided to change the semantics of the |expected|
and |error| functions. They don't interact with regular match failure
mechanism anymore, but they cause and immediate parse failure by
throwing an exception. I think this is more intuitive behavior with less
harmful side effects.

The disadvantage of the new approach is that one can't backtrack from an
action-triggered error. I don't see this as a big deal as I think this
will be rarely needed and one can always use a semantic predicate as a
workaround.

Speed impact
------------
Before:     993.84 kB/s
After:      998.05 kB/s
Difference: 0.42%

Size impact
-----------
Before:     1019968 b
After:      975434 b
Difference: -4.37%

(Measured by /tools/impact with Node.js v0.6.18 on x86_64 GNU/Linux.)
11 years ago
David Majda af701dcf80 Error handling: Implement the |expected| function
The |expected| function allows users to report regular match failures
inside actions.

If the |expected| function is called, and the reported match failure
turns out to be the cause of a parse error, the error message reported
by the parser will be in the usual "Expected ... but found ..." format
with the description specified in the |expected| call used as part of
the message.

Implements part of #198.

Speed impact
------------
Before:     1146.82 kB/s
After:      1031.25 kB/s
Difference: -10.08%

Size impact
-----------
Before:     950817 b
After:      973269 b
Difference: 2.36%

(Measured by /tools/impact with Node.js v0.6.18 on x86_64 GNU/Linux.)
11 years ago
Andrei Neculau 7dc9a9ae76 Upgrade jasmine and jasmine-node 11 years ago
David Majda 3b3798fa39 Merge lib/compiler/passes.js into lib/compiler.js
It didn't make sense to have the passes in a separate file.
12 years ago
David Majda fe1ca481ab Code generator rewrite
This is a complete rewrite of the PEG.js code generator. Its goals are:

  1. Allow optimizing the generated parser code for code size as well as
     for parsing speed.

  2. Prepare ground for future optimizations and big features (like
     incremental parsing).

  2. Replace the old template-based code-generation system with
     something more lightweight and flexible.

  4. General code cleanup (structure, style, variable names, ...).

New Architecture
----------------

The new code generator consists of two steps:

  * Bytecode generator -- produces bytecode for an abstract virtual
    machine

  * JavaScript generator -- produces JavaScript code based on the
    bytecode

The abstract virtual machine is stack-based. Originally I wanted to make
it register-based, but it turned out that all the code related to it
would be more complex and the bytecode itself would be longer (because
of explicit register specifications in instructions). The only downsides
of the stack-based approach seem to be few small inefficiencies (see
e.g. the |NIP| instruction), which seem to be insignificant.

The new generator allows optimizing for parsing speed or code size (you
can choose using the |optimize| option of the |PEG.buildParser| method
or the --optimize/-o option on the command-line).

When optimizing for size, the JavaScript generator emits the bytecode
together with its constant table and a generic bytecode interpreter.
Because the interpreter is small and the bytecode and constant table
grow only slowly with size of the grammar, the resulting parser is also
small.

When optimizing for speed, the JavaScript generator just compiles the
bytecode into JavaScript. The generated code is relatively efficient, so
the resulting parser is fast.

Internal Identifiers
--------------------

As a small bonus, all internal identifiers visible to user code in the
initializer, actions and predicates are prefixed by |peg$|. This lowers
the chance that identifiers in user code will conflict with the ones
from PEG.js. It also makes using any internals in user code ugly, which
is a good thing. This solves GH-92.

Performance
-----------

The new code generator improved parsing speed and parser code size
significantly. The generated parsers are now:

  * 39% faster when optimizing for speed

  * 69% smaller when optimizing for size (without minification)

  * 31% smaller when optimizing for size (with minification)

(Parsing speed was measured using the |benchmark/run| script. Code size
was measured by generating parsers for examples in the |examples|
directory and adding up the file sizes. Minification was done by |uglify
--ascii| in version 1.3.4.)

Final Note
----------

This is just a beginning! The new code generator lays a foundation upon
which many optimizations and improvements can (and will) be made.

Stay tuned :-)
12 years ago
David Majda dd2216da7e Fix versions of development dependencies
This ensures stable environment for development, CI, browser builds,
etc.
12 years ago
David Majda 32e372be92 package.json: Formatting 12 years ago
David Majda 0519d7e3ce Git repo npmization: Make the repo a npm package
Includes:

  * Moving the source code from /src to /lib.
  * Adding an explicit file list to package.json
  * Updating the Makefile.
  * Updating the spec and benchmark suites and their READMEs.

Part of a fix for GH-32.
12 years ago
David Majda a2672e0b48 Make "npm test" work
This is will be useful for Travis CI integration
12 years ago
David Majda adfeb87c82 Do not preprecess package.json
Before this commit, package.json in the project root directory was
preprocessed in order to insert correct version into it. This made it
invalid JSON and thus unusable for npm purposes.

This commit makes package.json a valid JSON by hardcoding the version
into it. I think that introducing this small duplicity is outweighted by
being able to use npm in project root directory. For example, it is now
possible to make the "npm test" command work and introduce Travis CI
integration.
12 years ago
David Majda c27b96051a Jasmine: Initial infrastructure
This is the first of many commits that gradually convert PEG.js's test
suite from QUnit to Jasmine, cleaning it up on the way.

Main reason for the change is that Jasmine allows nested contexts,
allowing to structure the tests in a better way than QUnit. Moreover,
the tests needed to be cleaned up a bit.
13 years ago
David Majda bc5abfef5c Replace Jakefile with Makefile
Doing scripting tasks in JavaScript is painful.
13 years ago
David Majda fa1523b651 Update version of Node.js and development dependencies in package.json
The new versions are the ones I test with.
13 years ago
David Majda c7f99019c2 Add "jake hint" task that checks all javaScript files using JSHint
This currently outputs many issues. These will be fixed in subsequent
commits.
13 years ago
David Majda bafb8655f7 Clean up package.json
The engine's and dependencies' versions are the ones I've tested with.
Lower version will probably work too, but I don't want to spend more
time testing now so I'll play it safe.
14 years ago
David Majda 69044e9d0b Add "dist" Jakefile task that prepares the distribution files 14 years ago
David Majda aca15d6f36 Change Node.js pacakge name to from "peg" to "pegjs"
The only place where we use the name without "js" is the library
filename (peg.js) and consequently the module name (PEG).
14 years ago