Sunday, November 4, 2018

I have to abandon my Virtual c IDE,

My IDE is not set up for the high speed IO, gcc is.  So I am on the gcc debugger with this project from now on, and you can see, that absent a workable IDE I had no choice but to  rig up a macro shell. Also, under minGw  with gcc, I can get internet IO enabled, something my IDE would have trouble with..

But the mcro processor gives me powerful tool to interrogate the machine state, and control it, greatly simplifying the debug process.. One can identify a problem pretty close to its source, then ust debug that part, with gdb from gcc.  Sorry for all the missing vowels.

I also discovered that Microsoft stole my favorite words, CreateCursor and SetCursor.  I think they do stuff with cursors.  So I see that, like ncurses, I will be cursing in the code to avoid name conflicts. That is true in general, when folks are bothering me, cursing helps.

Scraper update

I  updated the web scraper file on the right, the comments are expansive and explanatory.  The total set up can read a hundred pages a nightly, and classify them by a wordlist hierarchy, code them in word list 'IDs' There are strategies to keep your wordlists pruned and up to date. Wordlists make great algebra, they order and scale text nicely.

The optimizing function want each word list to be equally innovative over the large (and closed) set of web pages. I can do that. I have a join machine. Here it is, first pass at Fetch or not with Step or skip through the plain text structure of a web page. The algorithm iterates through the marked tag stack, and it has pointers to skip, the skip pointers taken when the source was scraped. But the algorithm needs the original source, a stack index goes with only one html page, which remains unaltered.

The MEM module uses inline pointers, int sized array and all arithmetic on array indices. The ops  cost is the same.


int StepFetchSkip(int i, int offset, int method) {
 char * p;
 char * q;
 Tag t;
 while(i < intodex) {
  if(stack[i].code == Done)
   return(-1);
  if(stack[i].code!=Skip) {
   p= stack[i].loc  + offset;
   q= stack[i+1].loc + offset;
   do {q--; } while(*q && *q != '<');  // q moved to open of post text bracket
   if(method && FetchBit)
    strncpy(key,*p,q-p);
   if(method  && SkipBit) // join already got a matcht
         break;
   else if(method && StepBit)
     return(i+1);
  }
else i++;  // try again
 }
 printf("Emit done");
 return(0);
}

No comments: