Wednesday, December 26, 2018

Tokenizer and filename!!

The new bug introduced by the new Default architecture. The console loop really needs prohibited chars from names, but that list grows large. o what about the dots and dashes in file names? Hmmm... Now I am back to ponder.

Somethings I know, everyone will want the dot as part of a name file or variable, no problem there. Separating the dot is optional, and default is that is dots are  parts of names.  I would prefer leaving slashes out of the picture; and I hat quotes especially when filenames are considered self quoting.

Still pondering.  On the command line, file names are first class objects, don't need quoting. Inthe syntax engines they are always quoted.  So in default mode, Derfault would treat:
ls /home
as ls divided by home, and attempt that.  IPython has similar problems,. trying to embed a syntax into thee command line. I am thinking a smart tokenizer, one that knows when to tokenize the filename and when to not.

Tokenize rule

All the new apps will follow the standard tokenizer rule on this, they want their arguments in neat linear arrays.  yntacengines andgeneral purpose tokenizers can cooperatel, via tokenizer control words and some common sense.

This violation I remember well:

gcc -fmax-erros=4 ./myfile.c

That cause an equals to appear before the file name in a sequence, to difficult for tokenizer. Note the dash is another problem. The distinguisher is the equals sign, if both engines and snippets could at least start there, leave out the equals unless you really mean it.

The other healer is the syntax mode the console lop is in because syntax engines can load tokenizer controls, or turn it off.  But if the syntax engines shut tokenizer off, then we get a host of complaints from snippet authors. We need an intelligent method so filenames are always first class, without conflict.

A problem, not a sow stopper and we can attack it little by little as it becomes a problem. Default says the tokenizer can use the equals flip flop. Tokenizer recognizes start of statement and will, upon seeing the equals,  begin tokenizing filenames until end of statement.. Tokenizer agrees, the two bots met, hashed out. That plus whatever tokenizer conrols e can make available.

No comments: