Thursday, December 27, 2018

My tokenizer rule on filenames

The tokenizer recognizes end of statement as either a semi or a new line. After end of statement, filenames are first class. If an equals or left parenth is encountered, then filename characters change and filenames will be split along the forward slash char. I can force Default to live with that. So:

Load /home/so/xchars;             // works fine
Load = /home/so/;         // fails;  attempts the divide
Load  = "/home/so ";             // works fine
Load (/home/so);                        // fails, filename treated as argument expression.

./a.out;               // Works in default mode, console will pipe to execl if all else fails

In the last case, piping can be turned off and excl can be turned off.

In addition, the tokenizer should recognize standard escaped sequences inside quoted text, unless told not to.

Our architecture has created the need for a smart tokenizer with command line token rules good for both syntax engine and linuxfile  format. All the syntax engines should agree on what ends a statement and what starts an expression inside a statement. The tokenizer makes a quick check on mode, and in defuult at the start, then duplicates code to minimize steps. There should be no speed impact. Otherwise, the syntax engine needs to configure the tokenizer. Default uses the tokenizer default mode.

I am spending the day working on just the tokenizer, and not the first time. Pull it out and blast it with text strings until all the possibilities work. I expect another 29 50 lines of code on tokenizer, if I count what we need to set the console token controls. . It is going to be a tad slower with an extra checks per char.

No comments: