pegjs

Commit Graph

Author	SHA1	Message	Date
David Majda	fe1ca481ab	Code generator rewrite This is a complete rewrite of the PEG.js code generator. Its goals are: 1. Allow optimizing the generated parser code for code size as well as for parsing speed. 2. Prepare ground for future optimizations and big features (like incremental parsing). 2. Replace the old template-based code-generation system with something more lightweight and flexible. 4. General code cleanup (structure, style, variable names, ...). New Architecture ---------------- The new code generator consists of two steps: * Bytecode generator -- produces bytecode for an abstract virtual machine * JavaScript generator -- produces JavaScript code based on the bytecode The abstract virtual machine is stack-based. Originally I wanted to make it register-based, but it turned out that all the code related to it would be more complex and the bytecode itself would be longer (because of explicit register specifications in instructions). The only downsides of the stack-based approach seem to be few small inefficiencies (see e.g. the \|NIP\| instruction), which seem to be insignificant. The new generator allows optimizing for parsing speed or code size (you can choose using the \|optimize\| option of the \|PEG.buildParser\| method or the --optimize/-o option on the command-line). When optimizing for size, the JavaScript generator emits the bytecode together with its constant table and a generic bytecode interpreter. Because the interpreter is small and the bytecode and constant table grow only slowly with size of the grammar, the resulting parser is also small. When optimizing for speed, the JavaScript generator just compiles the bytecode into JavaScript. The generated code is relatively efficient, so the resulting parser is fast. Internal Identifiers -------------------- As a small bonus, all internal identifiers visible to user code in the initializer, actions and predicates are prefixed by \|peg$\|. This lowers the chance that identifiers in user code will conflict with the ones from PEG.js. It also makes using any internals in user code ugly, which is a good thing. This solves GH-92. Performance ----------- The new code generator improved parsing speed and parser code size significantly. The generated parsers are now: * 39% faster when optimizing for speed * 69% smaller when optimizing for size (without minification) * 31% smaller when optimizing for size (with minification) (Parsing speed was measured using the \|benchmark/run\| script. Code size was measured by generating parsers for examples in the \|examples\| directory and adding up the file sizes. Minification was done by \|uglify --ascii\| in version 1.3.4.) Final Note ---------- This is just a beginning! The new code generator lays a foundation upon which many optimizations and improvements can (and will) be made. Stay tuned :-)	12 years ago
David Majda	bea6b1fde7	Implement the \|text\| function When called inside an action, the \|text\| function returns the text matched by action's expression. It can be also called inside an initializer or a predicate where it returns an empty string. The \|text\| function will be useful mainly in cases where one needs a structured representation of the input and simultaneously the raw text. Until now, the only way to get the raw text in these cases was to painfully build it from the structured representation. Fixes GH-131.	12 years ago
David Majda	cab6521690	Test \|offset\|, \|line\| and \|column\| in the initializer Add a test verifying that the \|offset\|, \|line\| and \|column\| functions are visible and properly initialized inside the initializer. See GH-132.	12 years ago
David Majda	c54483bb17	Text nodes: Use text nodes in examples/javascript.pegjs	12 years ago
David Majda	faaf9b6be1	Text nodes: Use text nodes in examples/css.pegjs	12 years ago
David Majda	d0dfe46550	Text nodes: Use text nodes in examples/json.pegjs	12 years ago
David Majda	9ec6b6aa57	Text nodes: Use text nodes in examples/arithmetics.pegjs	12 years ago
David Majda	f0a6bc92cc	Text nodes: Use text nodes in PEG.js grammar	12 years ago
David Majda	5e146fce38	Text nodes: Implement text nodes Implement a new syntax to extract matched strings from expressions. For example, instead of: identifier = first:[a-zA-Z_] rest:[a-zA-Z0-9_]* { return first + rest.join(""); } you can now just write: identifier = $([a-zA-Z_] [a-zA-Z0-9_]*) This is useful mostly for "lexical" rules at the bottom of many grammars. Note that structured match results are still built for the expressions prefixed by "$", they are just ignored. I plan to optimize this later (sometime after the code generator rewrite).	12 years ago
David Majda	af20f024c7	Text nodes: Disallow the "$" character in identifiers The "$" character will mark text nodes in the future.	12 years ago
David Majda	4e46a6e46e	Rebuild src/parser.js (forgotten in the previous commit)	12 years ago
David Majda	28860e88df	Position tracking: Cache position info computed by \|line\| and \|column\| Cache the last reported position info. If the position advances, the code uses the cache and only computes the differnece. If the position goes back, the cache is simply dropped.	12 years ago
David Majda	3333cdd18d	Position tracking: Kill the \|trackLineAndColumn\| option Getting rid of the \|trackLineAndColumn\| simplifies the code generator (by unifying two paths in the code). The \|line\| and \|column\| functions currently always compute all the position info from scratch, which is horribly ineffective. This will be improved in later commit(s).	12 years ago
David Majda	da8c455640	Position tracking: Make \|offset\|, \|line\| and \|column\| functions This will allow to compute position data lazily and get rid of the \|trackLineAndColumn\| option without affecting performance of generated parsers that don't use position data.	12 years ago
David Majda	da9ab1bf17	Remove "make build" from tools/impact There is no "build" target anymore. This was forgotten in `0519d7e3ce`.	12 years ago
David Majda	203243b884	README.md: Add link to the Trello board	12 years ago
David Majda	bc9a2528ef	Add backslash forgotten in the previous commit	12 years ago
David Majda	1988110a28	Fix code generated for classes starting with "\^" Before this commit, incorrect regexps were produced for classes starting with "\^". For example, this grammar: start = [\^a] didn't match "a" because the generated regexp inside the parser was /^[^a]/, not /^[\^a]/ as it should be. This commit fixes the issue by escaping "^" in \|quoteForRegexpClass\|. Fixes GH-125.	12 years ago
David Majda	ff819cc579	Fix whitespace	12 years ago
David Majda	05a6bad989	Kill the \|toSource\| method, introduce the \|output\| option Before this commit, \|PEG.buildParser\| always returned a parser object. The only way to get its source code was to call the \|toSource\| method on it. While this method worked for parsers produced by \|PEG.buildParser\| directly, it didn't work for parsers instantiated by executing their source code. In other words, it was unreliable. This commit remvoes the \|toSource\| method on generated parsers and introduces a new \|output\| option to \|PEG.buildParser\|. It allows callers to specify whether they want to get back the parser object (\|options.output === "parser"\|) or its source code (\|options.output === "source"\|). This is much better and more reliable API.	12 years ago
David Majda	3629d880d3	Make sure the \|options\| param passed to passes is always an object Pass code can be simpler as a result.	12 years ago
David Majda	ee1a0b5810	Add compiled examples to .gitignore Based on patch by Pavel Lang (GH-96).	12 years ago
David Majda	dd2216da7e	Fix versions of development dependencies This ensures stable environment for development, CI, browser builds, etc.	12 years ago
David Majda	51e126882b	Assume development dependencies are installed locally This is compatible with what "npm install" does and allows for isolated development environment.	12 years ago
David Majda	32e372be92	package.json: Formatting	12 years ago
David Majda	0519d7e3ce	Git repo npmization: Make the repo a npm package Includes: * Moving the source code from /src to /lib. * Adding an explicit file list to package.json * Updating the Makefile. * Updating the spec and benchmark suites and their READMEs. Part of a fix for GH-32.	12 years ago
David Majda	4cda79951a	Git repo npmization: Compose PEG.js from Node.js modules PEG.js source code becomes a set of Node.js modules that include each other as needed. The distribution version is built by bundling these modules together, wrapping them inside a bit of boilerplate code that makes \|module.exports\| and \|require\| work. Part of a fix for GH-32.	12 years ago
David Majda	c6cf129635	Git repo npmization: Do not use @VERSION When the Git repository will be a npm package, there will be no preprocessing step and thus no @VERSION substitution. Let's get rid of it. Part of a fix for GH-32.	12 years ago
David Majda	d742ca5dc6	Makefile: Small reordering Define \|PEGJS_VERSION\| before it is used. While defining it after its first use was OK technically, it made the code a tiny bit harder to read.	12 years ago
David Majda	a7584fa878	Rebuild src/parser.js (forgotten in the previous commit)	12 years ago
David Majda	277fb23411	Setup prototype chain for \|SyntaxError\| in generated parsers correctly	12 years ago
David Majda	143924357b	Setup prototype chain for \|PEG.GrammarError\| correctly	12 years ago
David Majda	428fe294cf	Change \|PEG.GrammarError\| name Change the value of the \|name\| property of \|PEG.GrammarError\| instances from "PEG.GrammarError" to just "GrammarError". This better reflects the fact that PEG.js can get required under different name than "PEG".	12 years ago
David Majda	12398ada9a	Implement Travis CI integration	12 years ago
David Majda	a2672e0b48	Make "npm test" work This is will be useful for Travis CI integration	12 years ago
David Majda	adfeb87c82	Do not preprecess package.json Before this commit, package.json in the project root directory was preprocessed in order to insert correct version into it. This made it invalid JSON and thus unusable for npm purposes. This commit makes package.json a valid JSON by hardcoding the version into it. I think that introducing this small duplicity is outweighted by being able to use npm in project root directory. For example, it is now possible to make the "npm test" command work and introduce Travis CI integration.	12 years ago
David Majda	b1db42e1b4	Merge pull request #115 from fpirsch/patch-1 Changed "arguments" to "args" in a few places.	12 years ago
David Majda	df1ecb1313	Fix typo found by Almad also in the generator	12 years ago
David Majda	710bee256a	Merge pull request #113 from Almad/master Grammar typo	12 years ago
David Majda	e5e9ce2778	README.md: Wrap lines at column 80	12 years ago
David Majda	406ac0a288	Fix banner typo	12 years ago
fpirsch	fa05142292	Update examples/javascript.pegjs Changed "arguments" to "args" in several places to avoid shadowing "arguments", which is not allowed by Google Clusure Compiler.	12 years ago
Almad	030ac3d6f9	Grammar typo	12 years ago
David Majda	208cc33930	Allowed start rules must be specified explicitly Before this commit, generated parser were able to start parsing from any rule. This was nice, but it made rule code inlining impossible. Since this commit, the list of allowed start rules has to be specified explicitly using the \|allowedStartRules\| option of the \|PEG.buildParser\| method (or the --allowed-start-rule option on the command-line). These rules will be excluded from inlining when it's implemented.	12 years ago
David Majda	6a1ec7631f	Do not modify \|options\| passed to \|PEG.buildParser\| Modifying \|options\| can lead to subtle bugs.	12 years ago
David Majda	75a78c083c	Fix typo in testcase description	12 years ago
David Majda	e97c501072	README.md: Add wiki link	12 years ago
David Majda	edb547958e	README.md: Fix project website link	12 years ago
David Majda	a4df483159	s/Modelled/Modeled/ "modelled" is a British variant, "modeled" an US one. PEG.js officially uses American English. Based on pull request by John Gietzen: https://github.com/dmajda/pegjs/pull/102	12 years ago
David Majda	98ff2eb83f	Allow passing options to the parser This commit replaces the \|startRule\| parameter of the \|parse\| method in generated parsers with more generic \|options\| -- an options object. This options object can be used to pass custom options to the parser because it is visible as the \|options\| variable inside parser code. The start rule can now be specified as the \|startRule\| option. This means you have to replace all calls like: parser.parse("input", "myStartRule"); with parser.parse("input", { startRule: "myStartRule" }); Closes GH-37.	12 years ago

1 2 3 4 5 ...

505 Commits (fe1ca481abc7ee5a499a26eed226f06c9c2024d5) All Branches Search

505 Commits (fe1ca481abc7ee5a499a26eed226f06c9c2024d5)

All Branches