pegjs/Makefile
David Majda 2f2152204a Refine error handling further
Before this commit, the |expected| and |error| functions didn't halt the
parsing immediately, but triggered a regular match failure. After they
were called, the parser could backtrack, try another branches, and only
if no other branch succeeded, it triggered an exception with information
possibly based on parameters passed to the |expected| or |error|
function (this depended on positions where failures in other branches
have occurred).

While nice in theory, this solution didn't work well in practice. There
were at least two problems:

  1. Action expression could have easily triggered a match failure later
     in the input than the action itself. This resulted in the
     action-triggered failure to be shadowed by the expression-triggered
     one.

     Consider the following example:

       integer = digits:[0-9]+ {
         var result = parseInt(digits.join(""), 10);

         if (result % 2 === 0) {
           error("The number must be an odd integer.");
           return;
         }

         return result;
       }

     Given input "2", the |[0-9]+| expression would record a match
     failure at position 1 (an unsuccessful attempt to parse yet another
     digit after "2"). However, a failure triggered by the |error| call
     would occur at position 0.

     This problem could have been solved by silencing match failures in
     action expressions, but that would lead to severe performance
     problems (yes, I tried and measured). Other possible solutions are
     hacks which I didn't want to introduce into PEG.js.

  2. Triggering a match failure in action code could have lead to
     unexpected backtracking.

     Consider the following example:

       class = "[" (charRange / char)* "]"

       charRange = begin:char "-" end:char {
         if (begin.data.charCodeAt(0) > end.data.charCodeAt(0)) {
           error("Invalid character range: " + begin + "-" + end + ".");
         }

         // ...
       }

       char = [a-zA-Z0-9_\-]

     Given input "[b-a]", the |charRange| rule would fail, but the
     parser would try the |char| rule and succeed repeatedly, resulting
     in "b-a" being parsed as a sequence of three |char|'s, which it is
     not.

     This problem could have been solved by using negative predicates,
     but that would complicate the grammar and still wouldn't get rid of
     unintuitive behavior.

Given these problems I decided to change the semantics of the |expected|
and |error| functions. They don't interact with regular match failure
mechanism anymore, but they cause and immediate parse failure by
throwing an exception. I think this is more intuitive behavior with less
harmful side effects.

The disadvantage of the new approach is that one can't backtrack from an
action-triggered error. I don't see this as a big deal as I think this
will be rarely needed and one can always use a semantic predicate as a
workaround.

Speed impact
------------
Before:     993.84 kB/s
After:      998.05 kB/s
Difference: 0.42%

Size impact
-----------
Before:     1019968 b
After:      975434 b
Difference: -4.37%

(Measured by /tools/impact with Node.js v0.6.18 on x86_64 GNU/Linux.)
2013-12-06 21:43:27 +01:00

134 lines
6.4 KiB
Makefile

# ===== Variables =====
PEGJS_VERSION = `cat $(VERSION_FILE)`
# ===== Modules =====
# Order matters -- dependencies must be listed before modules dependent on them.
MODULES = utils \
grammar-error \
parser \
compiler/opcodes \
compiler/passes/generate-bytecode \
compiler/passes/generate-javascript \
compiler/passes/remove-proxy-rules \
compiler/passes/report-left-recursion \
compiler/passes/report-missing-rules \
compiler \
peg
# ===== Directories =====
SRC_DIR = src
LIB_DIR = lib
BIN_DIR = bin
BROWSER_DIR = browser
SPEC_DIR = spec
BENCHMARK_DIR = benchmark
NODE_MODULES_DIR = node_modules
NODE_MODULES_BIN_DIR = $(NODE_MODULES_DIR)/.bin
# ===== Files =====
PARSER_SRC_FILE = $(SRC_DIR)/parser.pegjs
PARSER_OUT_FILE = $(LIB_DIR)/parser.js
BROWSER_FILE_DEV = $(BROWSER_DIR)/peg-$(PEGJS_VERSION).js
BROWSER_FILE_MIN = $(BROWSER_DIR)/peg-$(PEGJS_VERSION).min.js
VERSION_FILE = VERSION
# ===== Executables =====
JSHINT = $(NODE_MODULES_BIN_DIR)/jshint
UGLIFYJS = $(NODE_MODULES_BIN_DIR)/uglifyjs
JASMINE_NODE = $(NODE_MODULES_BIN_DIR)/jasmine-node
PEGJS = $(BIN_DIR)/pegjs
BENCHMARK_RUN = $(BENCHMARK_DIR)/run
# ===== Targets =====
# Default target
all: browser
# Generate the grammar parser
parser:
$(PEGJS) $(PARSER_SRC_FILE) $(PARSER_OUT_FILE)
# Build the browser version of the library
browser:
mkdir -p $(BROWSER_DIR)
rm -f $(BROWSER_FILE_DEV)
rm -f $(BROWSER_FILE_MIN)
# The following code is inspired by CoffeeScript's Cakefile.
echo '/*' >> $(BROWSER_FILE_DEV)
echo " * PEG.js $(PEGJS_VERSION)" >> $(BROWSER_FILE_DEV)
echo ' *' >> $(BROWSER_FILE_DEV)
echo ' * http://pegjs.majda.cz/' >> $(BROWSER_FILE_DEV)
echo ' *' >> $(BROWSER_FILE_DEV)
echo ' * Copyright (c) 2010-2012 David Majda' >> $(BROWSER_FILE_DEV)
echo ' * Licensed under the MIT license' >> $(BROWSER_FILE_DEV)
echo ' */' >> $(BROWSER_FILE_DEV)
echo 'var PEG = (function(undefined) {' >> $(BROWSER_FILE_DEV)
echo ' var modules = {' >> $(BROWSER_FILE_DEV)
echo ' define: function(name, factory) {' >> $(BROWSER_FILE_DEV)
echo ' var dir = name.replace(/(^|\/)[^/]+$$/, "$$1"),' >> $(BROWSER_FILE_DEV)
echo ' module = { exports: {} };' >> $(BROWSER_FILE_DEV)
echo '' >> $(BROWSER_FILE_DEV)
echo ' function require(path) {' >> $(BROWSER_FILE_DEV)
echo ' var name = dir + path,' >> $(BROWSER_FILE_DEV)
echo ' regexp = /[^\/]+\/\.\.\/|\.\//;' >> $(BROWSER_FILE_DEV)
echo '' >> $(BROWSER_FILE_DEV)
echo " /* Can't use /.../g because we can move backwards in the string. */" >> $(BROWSER_FILE_DEV)
echo ' while (regexp.test(name)) {' >> $(BROWSER_FILE_DEV)
echo ' name = name.replace(regexp, "");' >> $(BROWSER_FILE_DEV)
echo ' }' >> $(BROWSER_FILE_DEV)
echo '' >> $(BROWSER_FILE_DEV)
echo ' return modules[name];' >> $(BROWSER_FILE_DEV)
echo ' }' >> $(BROWSER_FILE_DEV)
echo '' >> $(BROWSER_FILE_DEV)
echo ' factory(module, require);' >> $(BROWSER_FILE_DEV)
echo ' this[name] = module.exports;' >> $(BROWSER_FILE_DEV)
echo ' }' >> $(BROWSER_FILE_DEV)
echo ' };' >> $(BROWSER_FILE_DEV)
echo '' >> $(BROWSER_FILE_DEV)
for module in $(MODULES); do \
echo " modules.define(\"$$module\", function(module, require) {" >> $(BROWSER_FILE_DEV); \
sed -e 's/^/ /' lib/$$module.js >> $(BROWSER_FILE_DEV); \
echo ' });' >> $(BROWSER_FILE_DEV); \
echo '' >> $(BROWSER_FILE_DEV); \
done
echo ' return modules["peg"]' >> $(BROWSER_FILE_DEV)
echo '})();' >> $(BROWSER_FILE_DEV)
$(UGLIFYJS) --ascii -o $(BROWSER_FILE_MIN) $(BROWSER_FILE_DEV)
# Remove browser version of the library (created by "browser")
browserclean:
rm -rf $(BROWSER_DIR)
# Run the spec suite
spec:
$(JASMINE_NODE) --verbose $(SPEC_DIR)
# Run the benchmark suite
benchmark:
$(BENCHMARK_RUN)
# Run JSHint on the source
hint:
$(JSHINT) \
`find $(LIB_DIR) -name '*.js'` \
`find $(SPEC_DIR) -name '*.js' -and -not -path '$(SPEC_DIR)/vendor/*'` \
$(BENCHMARK_DIR)/*.js \
$(BENCHMARK_RUN) \
$(PEGJS)
.PHONY: all parser browser browserclean spec benchmark hint
.SILENT: all parser browser browserclean spec benchmark hint