Sunday, December 23, 2018

Tokenizers have lots of spaghetti

Here is the one I made, and it is a mess.

It has to know which characters are operators, what are the quote chars and which special characters are allowed in filenames, and the comment characters can change. This tokenizer does what Tcsh did not, it does not need spaces, it handles the problem.  This is not the end of tokenizer, and in the new Powershells they will all rely on the shell tokenizers if they integrate with the command line.

#define flush_line(a,b) {*a=0; do a++; while(*a && (*a != b));}
int argtoks(char * src,const char * op_puncts,const char * name_puncts,const char * pair_puncts ) {
PBase p = &DefaultBase;
char * op=0;
char * end=0;
char  ch=0;
while(*src) {
while(*src && isspace(*src ) ) src++; // skip valid arg separator
if(!*src) { if(ch) *end=0; break;}
op = strchr(op_puncts, *src); // Is it a special?
if(*src != CommentChar {
if(op) { // no op on first char (This is an error because of ++x)
p->Data[TokeEnd].s=op;
if(ch && !strchr(op_puncts,ch)) *end=0;
if( strchr(pair_puncts, *src))
flush_line(src,*src); // comment
ch=*op;
src++;
end=src;
}
else {
p->Data[TokeEnd].s= src;
if(ch && !strchr(op_puncts,ch)) *end=0;
while( *src && ( isalnum(*src) || strchr(name_puncts,*src)) )
src++; // while valid arg char
ch= *(src-1);
end=src;
}
TokeEnd++;
}
else{ flush_line(src,'\n');} // comment
}
return(SUCCESS);
}

No comments: