Sunday, November 4, 2018

Integrating web scraper and join

My web text collector retains the Dom structure, the parts not collapsed because they were ignored.  The index on the original web page has relative pointers back into the htam.  I have Step and Skip already set up because of the universal rule,  all computer data in linear memory becomes Step madn Skip.

I doubt I have more than 50 lines of code to make my text scraper an attachment and I will have a machine cross referencing web pages with plain text words, from words lists in Step and Skip form.

I stepped over and wrote the few lines of code needed to traverse the text, about 20, plus the copy and paste boiler plate for the method switch.  Easy, i add in the dll interface, another boiler plate.

I do not get dll hell, all new and improved web scrape tools look the same, like a directed graph. There is no interface change across versions, and, absent changes in Match, new and old versions compatible.

I have this capable macro shell, access to the plain text web structure, across the web, and a Lazy J search scripting tool for advancesd searching and card list intersections. Here is the raw, untested code: It works of the stack of pointers created when the Dom was decomposed into ignore and collect nodes.  I actually descended the Dom creating a Ignore,Collect tagged set of pointers.  So, I add the code, report the ExecScrape routine back to join.  During the Init cursor method, the user, or another graph, has produced a file name, internet IO is not enabled yet.

Mhy macro machine allows, files= "file1 file2 file3", or I can go edit the environment save file, adding the line, removing the quotes: files file1 file2,file3.  Now, Then I write the utility: GetCursorFile which  reads the file into a a big buff and the cursor will point to the first character.  Here is where piping between console commands helps, but I don;t have it.  Instead, I have the editor, and I can lay out all my cursors in slotsand establish the slot order. IO use a static map of the cursor stack, good enough.


:
int EvalScrape(PCursor self, int method,void * data) {
 int i= self->current;
 int offset=self->state;  
 switch(method) {
  case Step
   if(stack[i].code == Done)
    return(Done);
   self->current++;
   break;
  case Skip:
   if(stack[i].code == Done)
    return(Done);
   self->current = stack[i].loc  + offset;
   break;
  case Fetch
  //strcpy
  case Append
  // input only
  case Init
  case Set:
  case Init:
  default:
  break;
 }
 return(Null);}

No comments: