Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

lexer.conditions[K].rules grows with each call to parse #52

Open
ericprud opened this issue Oct 6, 2019 · 7 comments
Open

lexer.conditions[K].rules grows with each call to parse #52

ericprud opened this issue Oct 6, 2019 · 7 comments
Assignees
Labels

Comments

@ericprud
Copy link

ericprud commented Oct 6, 2019

The lexer code in setInput switches the .rules to 1-indexed:

for (var k in conditions) {
  var spec = conditions[k];
  var rule_ids = spec.rules;
  var len = rule_ids.length;
  var rule_regexes = new Array(len + 1); // slot 0 is unused; we use a 1-based index approach here to keep the hottest code in `lexer_next()` fast and simple! 
  var rule_new_ids = new Array(len + 1);

  for (var i = 0; i < len; i++) {
    var idx = rule_ids[i];
    var rule_re = rules[idx];
    rule_regexes[i + 1] = rule_re;
    rule_new_ids[i + 1] = idx;
  }

  spec.rules = rule_new_ids;
  spec.__rule_regexes = rule_regexes;
  spec.__rule_count = len;
}

If you generate a parser and use it a second time, .rules ends up with an undefined at index [1].
For some reason "foo".match(undefined) matches so you end up an in endless look where undefined matches but doesn't advance the cursor.

I found a terrible and a maybe-not-so-bad work-around.

terrible - ignore leading undefines

      for (var i = 1; i <= len; i++) {
        tempMatch = regexes[i] ? this._input.match(regexes[i]) : null;

pros: one line to change
cons: it grows by one undefined on each iteration

maybe-no-so-bad - s/len \+ 1/len/ and unshift into the rules

        for (var k in conditions) {
          var spec = conditions[k];
          var rule_ids = spec.rules;
          var len = rule_ids.length;
          var rule_regexes = new Array(len);             // slot 0 is unused; we use a 1-based index approach here to keep the hottest code in `lexer_next()` fast and simple! 
          var rule_new_ids = new Array(len);

          for (var i = 0; i < len; i++) {
            var idx = rule_ids[i];
            var rule_re = rules[idx];
            rule_regexes[i] = rule_re;
            rule_new_ids[i] = idx;
          }

          spec.rules = rule_new_ids;
          spec.__rule_regexes = rule_regexes;
          spec.__rule_count = len;
        }

and in the constructor, prefix each rules block with an undefined:

Object.keys(lexer.conditions).forEach(c => lexer.conditions[c].rules.unshift(null))

I did this just after

    conditions: {
      'INITIAL': {
        rules: [...]

and was able to use the parser a zillion times (well, north of 2000, at least).

@GerHobbelt GerHobbelt added the bug label Oct 27, 2019
@GerHobbelt
Copy link
Owner

Good catch. Hm. Better to fix this in the code generator itself so that my (already hacky ;-) ) shift-by-1 in there is not needed any more.

@GerHobbelt GerHobbelt self-assigned this Oct 27, 2019
@ericprud
Copy link
Author

ericprud commented Nov 3, 2019

What, you don't like my terrible solution?!

Yeah, I considered diving into the generator but thought I'd need your guidance in order to make it reasonably productive.

@ericprud
Copy link
Author

/me politely pestering

@GerHobbelt
Copy link
Owner

GerHobbelt commented Nov 26, 2019 via email

@ericprud
Copy link
Author

here's another friendly pester in case the holidays bring a little time to pick through backlogs...

if you want someone to pair with, i won't be a ton of help at first but i'm keenly interested in the product.

@ericprud
Copy link
Author

another pester to remind you that you have a fan base excited to use your parser

@GerHobbelt
Copy link
Owner

This should be fixed in the upcoming 0.7.0 release, as the lexer kernel code has been fixed by checking if the lexer decompression work has already been done (the this.__decompressed flag...):

    setInput: function lexer_setInput(input, yy) {
        this.yy = yy || this.yy || {};

        // also check if we've fully initialized the lexer instance,
        // including expansion work to be done to go from a loaded
        // lexer to a usable lexer:
        if (!this.__decompressed) {
          // step 1: decompress the regex list:
            let rules = this.rules;
            for (var i = 0, len = rules.length; i < len; i++) {
                var rule_re = rules[i];

            // compression: is the RE an xref to another RE slot in the rules[] table?
                if (typeof rule_re === 'number') {
                    rules[i] = rules[rule_re];
                }
            }

          // step 2: unfold the conditions[] set to make these ready for use:
            let conditions = this.conditions;
            for (let k in conditions) {
                let spec = conditions[k];

                let rule_ids = spec.rules;

                var len = rule_ids.length;
                let rule_regexes = new Array(len + 1);            // slot 0 is unused; we use a 1-based index approach here to keep the hottest code in `lexer_next()` fast and simple!
                let rule_new_ids = new Array(len + 1);

                for (var i = 0; i < len; i++) {
                    let idx = rule_ids[i];
                    var rule_re = rules[idx];
                    rule_regexes[i + 1] = rule_re;
                    rule_new_ids[i + 1] = idx;
                }

                spec.rules = rule_new_ids;
                spec.__rule_regexes = rule_regexes;
                spec.__rule_count = len;
            }

            this.__decompressed = true;
        }

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants