Saturday, December 22, 2018

Tokenized source code

All interpreter/compilers have to examine each character once, if they are dealing with human source code.  Once tokenized, the bots can pass around code.  Tokenizers end up doing heavier duty than expected, more spaghetti than the rest which works from standard, consistent forms.

Brings up a point, ultimately the console gets tasked with tokenizing, and the console is not art of a syntax engine.  The tokenizer I use has two string arrays tat identify token chars, and both are configureable. One array is the list of operators, the other is an array of allowed, non alphanumeric characters in a user name.  Th default setting are the linux command line standard, and when I get a good definition of that, I will announce it, right now it fluctuates.

he point on tokenized source code is that you can store and recall it as binary, skip the tokenizer step.

My Xchars interface is a good example. It is based on each caller acquiring ids to identify the screen widget being modified:

XChars interface:

get_id
get_window   id back_color fore_color
get_rect       id x1 y2 x2 y2
set_string    id string
set_color     id color
del_string   id
set_font       id font
del_screen  id

Every object preceded by a verb and id. Verbs are  [get, del, set]
Object limited to  [window, rect, string, color, font, x, y]

All arguments shall be string pointers in an
array of longs. The call on the bus is:

Xchars cmd id arg1 arg2 ...

The name Xcharsis an alias for entry. The loadable exposes entry in the code and module name in the file position. So console loop can obtain a degree of name space for the syntax engines.


 Mouse and keyboard message come later, unless someone else jumps on the job.  If some once to take charge, just call it Xchars and I am on board. Keep the idea of a simple snippet. If you want to enhance it, make a higher level snippet, keep all your snippets compatible with universal interface.

No comments: