pegjs

Commit Graph

Author	SHA1	Message	Date
David Majda	16f38f6380	Drop Node 0.10.x & 0.12.x support See #442.	8 years ago
David Majda	8003edafc9	Rename the "node" module format to "commonjs" Parsers generated in this format use module.exports, so they are not strictly CommonJS, but this is a common extension and the original name would be confusing once Node.js implements ES2015 modules.	8 years ago
David Majda	8962dcfd16	Rename the "global" module format to "globals" I think the new name is more widely used when describing the pattern.	8 years ago
David Majda	2a8544d86c	README.md: Remove io.js from the list of compatible environments	8 years ago
David Majda	75cd17ed58	bin/pegjs: Implement the --format option	8 years ago
David Majda	db9920e3ca	peg.generate: Implement { format: "global" }	8 years ago
David Majda	61c11ee1b4	peg.generate: Implement { format: "amd" }	8 years ago
David Majda	f633f697c9	peg.generate: Implement { format: "node" }	8 years ago
David Majda	e8be76ee3a	Don't expose the "parser" variable in parser code The "parser" variable allowed access to the parser object. Among other things, this made it possible to invoke the parser recursively using "parser.parse". One problem with the "parser" variable is that it bakes in the idea that the parser is an object, not a module. While this is true now, it won't necessarily be in the future, when parsers may be exported as ES6 modules. Also, people tend to use parsers as modules even today, e.g. like this: var parse = require("parser").parse; var result = parse(...); Such usage broke the "parser" variable (as it was implemented). For this reasons I decided to remove the "parser" variable. If someone needs to do tricks like recursive invocation of the parser, he/she must pass the parser or the "parse" function itself using options. Related to #433.	8 years ago
David Majda	c0e2bd218c	README.md: Describe the --optimize command-line option	8 years ago
David Majda	da1395e21e	README.md: Describe the --dependency command-line option	8 years ago
David Majda	6bf19ae6f8	README.md: Order command-line options alphabetically	8 years ago
David Majda	01aa32615b	README.md: Describe the "trace" peg.generate option	8 years ago
David Majda	f41535224d	README.md: Order peg.generate options alphabetically	8 years ago
David Majda	a57431955e	bin/pegjs: Use the -o/--output option to specify the output file This is more traditional compiler interface. Its main advantage against specifying the output file as a second argument (which is what bin/pegjs used until now) is that input and output files can't be mixed up. Part of #370.	8 years ago
David Majda	6b60896216	Revert "Remove info about found string from syntax errors" This reverts commit `25ab98027d`. Part of work on #428.	9 years ago
David Majda	138405d89d	Add syntax highlighting to code blocks in README.md files	9 years ago
David Majda	45de51a881	Consistently talk about generating (not building) a parser	9 years ago
David Majda	f4504a93fe	Rename the "buildParser" function to "generate" In most places, we talk about "generating a parser", not "building a parser", which the function name should reflect. Also, mentioning a parser in the name is not necessary as in case of a parser generator it's pretty clear what is generated.	9 years ago
David Majda	0847a69643	Rename the "PEG" variable to "peg" So far, PEG.js was exported in a "PEG" global variable when no module loader was detected. The same variable name was also conventionally used when requiring it in Node.js or otherwise referring to it. This was reflected in various places in the code, documentation, examples, etc. This commit changes the variable name to "peg" and fixes all relevant occurrences. The main reason for the change is that in Node.js, modules are generally referred to by lower-case variable names, so "PEG" was sticking out when used in Node.js projects.	9 years ago
David Majda	810567d865	UMD parsers: Allow specifying parser dependencies Introduce two ways of specifying parser dependencies: the "dependencies" option of PEG.buildParser and the -d/--dependency CLI option. Specified dependencies are translated into AMD dependencies and Node.js's "require" calls when generating an UMD parser. Part of work on #362.	9 years ago
David Majda	a0a57cd22d	UMD parsers: Make bin/pegjs generate UMD parsers Part of work on #362.	9 years ago
David Majda	b87268ade6	UMD parsers: Allow generating parsers in UMD format from the API Introduce new "format" and "exportVar" options to PEG.buildParser which together allow generating parsers in UMD format. Part of work on #362.	9 years ago
David Majda	a89aa11779	README.md: Mention that AMD loader will be used in the browser	9 years ago
David Majda	ce44c62f14	Support passing custom location info to "error" and "expected" Based on a pull request by Konstantin (@YemSalat): https://github.com/pegjs/pegjs/pull/391 Resolves #390.	9 years ago
David Majda	4d85464ac4	README.md: Fix npm & Bower badges to show PEG.js version Based on a pull request by Daniel Baird (@DanielBaird): https://github.com/pegjs/pegjs/pull/419	9 years ago
David Majda	d56b43bb54	README.md: Add badges Based on a pull request by Adrien Becchis (@AdrieanKhisbe): https://github.com/pegjs/pegjs/pull/392	9 years ago
David Majda	25ab98027d	Remove info about found string from syntax errors The \|found\| property wasn't very useful as it mostly contained just one character or \|null\| (the exception being syntax errors triggered by \|error\| or \|expected\|). Similarly, the "but XXX found" part of the error message (based on the \|found\| property) wasn't much useful and was redundant in presence of location info. For these reasons, this commit removes the \|found\| property and corresponding part of the error message from syntax errors. It also modifies error location info slightly to cover a range of 0 characters, not 1 character (except when the error is triggered by \|error\| or \|expected\|). This corresponds more precisely to the actual situation. Fixes #372.	9 years ago
David Majda	4466265763	README.md: Remove link to Trello board Trello board was replaced by development roadmap in the wiki.	9 years ago
David Majda	6ff005786c	Talk about "consuming input", not "advancing parser position" It's shorter, less technical, and more understandible.	9 years ago
David Majda	091e60112c	Consistently use "matched text" to describe matched part of the input	9 years ago
David Majda	cee0d6a60a	README.md: Update the "Compatibility" section * Added io.js. * Added Edge. * Spelled out IE.	9 years ago
David Majda	122d7b0737	README.md: Mention there is no backtracking for *, +, and ? Based on a pull request by Jak Wings (@jakwings): https://github.com/pegjs/pegjs/pull/333	9 years ago
David Majda	065f4e1b75	Improve location info in syntax errors Replace \|line\|, \|column\|, and \|offset\| properties of \|SyntaxError\| with the \|location\| property. It contains an object similar to the one returned by the \|location\| function available in action code: { start: { offset: 23, line: 5, column: 6 }, end: { offset: 25, line: 5, column: 8 } } For syntax errors produced in the middle of the input, \|start\| refers to the first unparsed character and \|end\| refers to the character behind it (meaning the span is 1 character). This corresponds to the portion of the input in the \|found\| property. For syntax errors produced the end of the input, both \|start\| and \|end\| refer to a character past the end of the input (meaning the span is 0 characters). For syntax errors produced by calling \|expected\| or \|error\| functions in action code the location info is the same as the \|location\| function would return.	10 years ago
David Majda	4f7145e360	Improve location info available in action code Replace \|line\|, \|column\|, and \|offset\| functions with the \|location\| function. It returns an object like this: { start: { offset: 23, line: 5, column: 6 }, end: { offset: 25, line: 5, column: 8 } } In actions, \|start\| refers to the position at the beginning of action's expression and \|end\| refers to the position after the end of action's expression. This allows one to easily add location info e.g. to AST nodes created in actions. In predicates, both \|start\| and \|end\| refer to the current position. Fixes #246.	10 years ago
David Majda	da57118a43	Implement basic support for tracing Parsers can now be generated with support for tracing using the --trace CLI option or a boolean \|trace\| option to \|PEG.buildParser\|. This makes them trace their progress, which can be useful for debugging. Parsers generated with tracing support are called "tracing parsers". When a tracing parser executes, by default it traces the rules it enters and exits by writing messages to the console. For example, a parser built from this grammar: start = a / b a = "a" b = "b" will write this to the console when parsing input "b": 1:1 rule.enter start 1:1 rule.enter a 1:1 rule.fail a 1:1 rule.enter b 1:2 rule.match b 1:2 rule.match start You can customize tracing by passing a custom tracer to parser's \|parse\| method using the \|tracer\| option: parser.parse(input, { trace: tracer }); This will replace the built-in default tracer (which writes to the console) by the tracer you supplied. The tracer must be an object with a \|trace\| method. This method is called each time a tracing event happens. It takes one argument which is an object describing the tracing event. Currently, three events are supported: * rule.enter -- triggered when a rule is entered * rule.match -- triggered when a rule matches successfully * rule.fail -- triggered when a rule fails to match These events are triggered in nested pairs -- for each rule.enter event there is a matching rule.match or rule.fail event. The event object passed as an argument to \|trace\| contains these properties: * type -- event type * rule -- name of the rule the event is related to * offset -- parse position at the time of the event * line -- line at the time of the event * column -- column at the time of the event * result -- rule's match result (only for rule.match event) The whole tracing API is somewhat experimental (which is why it isn't documented properly yet) and I expect it will evolve over time as experience is gained. The default tracer is also somewhat bare-bones. I hope that PEG.js user community will develop more sophisticated tracers over time and I'll be able to integrate their best ideas into the default tracer.	10 years ago
David Majda	cc8edd8892	README.md: Fix typo Based on a pull request by Julien Valéry: https://github.com/pegjs/website/pull/14	10 years ago
David Majda	fb7de36051	Update website URL PEG.js website was moved from http://pegjs.majda.cz/ to http://pegjs.org/.	10 years ago
David Majda	2dedce52d6	Add info about the Bower package maintainer	10 years ago
David Majda	9a822528f9	Add Bower installation instructions	10 years ago
David Majda	178d56699a	Update GitHub project URLs See https://groups.google.com/d/msg/pegjs/4a6zWKQSG6U/n8Pm257Lz6wJ. I didn't update CHANGELOG.md as I consider issue URLs there historical artifacts ;-)	10 years ago
David Majda	4a3b9cbb8d	Require Node.js >= 0.10.0 Travis CI builds with Node.js 0.8.x started to fail: https://travis-ci.org/dmajda/pegjs/jobs/26691570 Rather than investigating what's wrong I decided to stop supporting Node 0.8.x. Node.js 0.10.x is here for over a year, which should be enough time for everyone to upgrade in the fast-paced Node.js world.	11 years ago
David Majda	5a02bca34d	Clarify initializer documentation Make it clear that there is only one initializer in the whole grammar. The previous formulation could have been understood to mean that there can be an initializer for every rule in the grammar. Fixes #82.	11 years ago
David Majda	39084496ca	Expose the parser object in action/predicate code The action/predicate code didn't have access to the parser object. This was mostly a side effect actions/predicates being implemented as nested functions, in which \|this\| is a reference to the global object (an ugly JavaScript quirk). The initializer, being implemented differently, had access to the parser object via \|this\|, but this was not documented. Because having access to the parser object can be useful, this commits introduces a new \|parser\| variable which holds a reference to it, is visible in action/predicate/initializer code, and is properly documented. See also: https://groups.google.com/forum/#!topic/pegjs/Na7YWnz6Bmg	11 years ago
David Majda	a449f12efe	Require Node.js >= 0.8.0	11 years ago
David Majda	2f2152204a	Refine error handling further Before this commit, the \|expected\| and \|error\| functions didn't halt the parsing immediately, but triggered a regular match failure. After they were called, the parser could backtrack, try another branches, and only if no other branch succeeded, it triggered an exception with information possibly based on parameters passed to the \|expected\| or \|error\| function (this depended on positions where failures in other branches have occurred). While nice in theory, this solution didn't work well in practice. There were at least two problems: 1. Action expression could have easily triggered a match failure later in the input than the action itself. This resulted in the action-triggered failure to be shadowed by the expression-triggered one. Consider the following example: integer = digits:[0-9]+ { var result = parseInt(digits.join(""), 10); if (result % 2 === 0) { error("The number must be an odd integer."); return; } return result; } Given input "2", the \|[0-9]+\| expression would record a match failure at position 1 (an unsuccessful attempt to parse yet another digit after "2"). However, a failure triggered by the \|error\| call would occur at position 0. This problem could have been solved by silencing match failures in action expressions, but that would lead to severe performance problems (yes, I tried and measured). Other possible solutions are hacks which I didn't want to introduce into PEG.js. 2. Triggering a match failure in action code could have lead to unexpected backtracking. Consider the following example: class = "[" (charRange / char)* "]" charRange = begin:char "-" end:char { if (begin.data.charCodeAt(0) > end.data.charCodeAt(0)) { error("Invalid character range: " + begin + "-" + end + "."); } // ... } char = [a-zA-Z0-9_\-] Given input "[b-a]", the \|charRange\| rule would fail, but the parser would try the \|char\| rule and succeed repeatedly, resulting in "b-a" being parsed as a sequence of three \|char\|'s, which it is not. This problem could have been solved by using negative predicates, but that would complicate the grammar and still wouldn't get rid of unintuitive behavior. Given these problems I decided to change the semantics of the \|expected\| and \|error\| functions. They don't interact with regular match failure mechanism anymore, but they cause and immediate parse failure by throwing an exception. I think this is more intuitive behavior with less harmful side effects. The disadvantage of the new approach is that one can't backtrack from an action-triggered error. I don't see this as a big deal as I think this will be rarely needed and one can always use a semantic predicate as a workaround. Speed impact ------------ Before: 993.84 kB/s After: 998.05 kB/s Difference: 0.42% Size impact ----------- Before: 1019968 b After: 975434 b Difference: -4.37% (Measured by /tools/impact with Node.js v0.6.18 on x86_64 GNU/Linux.)	11 years ago
David Majda	5460a881af	Error handling: Implement the \|error\| function The \|error\| function allows users to report custom match failures inside actions. If the \|error\| function is called, and the reported match failure turns out to be the cause of a parse error, the error message reported by the parser will be exactly the one specified in the \|error\| call. Implements part of #198. Speed impact ------------ Before: 999.83 kB/s After: 1000.84 kB/s Difference: 0.10% Size impact ----------- Before: 1017212 b After: 1019968 b Difference: 0.27% (Measured by /tools/impact with Node.js v0.6.18 on x86_64 GNU/Linux.)	11 years ago
David Majda	af701dcf80	Error handling: Implement the \|expected\| function The \|expected\| function allows users to report regular match failures inside actions. If the \|expected\| function is called, and the reported match failure turns out to be the cause of a parse error, the error message reported by the parser will be in the usual "Expected ... but found ..." format with the description specified in the \|expected\| call used as part of the message. Implements part of #198. Speed impact ------------ Before: 1146.82 kB/s After: 1031.25 kB/s Difference: -10.08% Size impact ----------- Before: 950817 b After: 973269 b Difference: 2.36% (Measured by /tools/impact with Node.js v0.6.18 on x86_64 GNU/Linux.)	11 years ago
David Majda	1b2279e026	Error handling: Make predicates always return \|undefined\| After making the \|?\| operator return \|null\| instead of an empty string in the previous commit, empty strings were still returned from predicates. This didn't make much sense. Return value of a predicate is unimportant (if you have one in hand, you already know the predicate succeeded) and one could even argue that predicates shouldn't return any value at all. The closest thing to "return no value" in JavaScript is returning \|undefined\|, so I decided to make predicates return exactly that. Implements part of #198.	11 years ago
David Majda	86769a6c5c	Error handling: Make \|?\| return \|null\| on unsuccessful match Before this commit, the \|?\| operator returned an empty string upon unsuccessful match. This commit changes the returned value to \|null\|. It also updates the PEG.js grammar and the example grammars, which used the value returned by \|?\| quite often. Returning \|null\| is possible because it no longer indicates a match failure. I expect that this change will simplify many real-world grammars, as an empty string is almost never desirable as a return value (except some lexer-level rules) and it is often translated into \|null\| or some other value in action code. Implements part of #198.	11 years ago

1 2

96 Commits (5c40fff136ef9225a51685711ed884cfffeb2e4d)