Sunday, October 21, 2018

My join machine is almost useful

I have it compiling and running under linux environment, I can join an arbitrary list of words against a raw text file. 

 This is relatively new for a linux utility. It means that I can check raw tect against many word lists and determine the classification of the text, determine what the text is talking about. I do this all from command line and the console, run various word lists against some raw text. Next up, memory graphs and stacking joins.

My method.
I create the following batch file:

set LAZYJ 0 science.txt 1000
set TEXT 1 grammar.txt 10000
set CONSOLE 2
mode 2 All
join
set LAZYJ 0 food.txt 1000
join
set LAZYJ 0 generalwords.txt 1000
join

In this batch file I set the left (slot 0) to LazyJ, which are comma separated list of words. The right is the raw  text being searched.  The console is set as output so it appears on the screen.  The join will spew out any words from the three words list that match words in the raw trest. The output is LazyJ compatible, and in a day or so I will be feeding that into the MEM, memory attachment, and doing further search.

Noticer I set the mode to All, meaning collect all matches.  This is a simple batch filer an a crude command interface, but it hjas utility already, the ability to cross reference lists of words against raw text with little set up, from the keyboars even. This is set intersection, just the beginning.


Word lists


For elementary school, the word list is what we use to teach reading.  The word lists are all over the web, the join machine can cross reference those list with raw text, and learn to read just like humans.  Just like IBM Watson, the core principle.


The join machine will be everywhere, on all the serves. So search bots can take their words list with them, do complicated searches right there, on the server.  Everyone needs it.

A quick point about word files

I can create a file manager attachment.  It will open a lazy J graph of files, as directed by the match process. When join asks for a fetch, it will open up the file name  given in the  keyvale and pass the words there is.  I can make this automation if I have the 'file' opcoder in lazy J, as in:

(file1:,file2:,...)

I use colon in the example.  Or I make a simple attachment. Or I can make a smart attachment, one that reads its own out[put from the cross, and determines which path to select on its graph of word lists, each node expanding into another join.

I am best getting version 1.0 done, with the basic attachments, just enough to manually run recursive search losts. Like the MEM attachment, works fine in standalone, and in pass through, but it ids not really been final tested. That one is key because we want to pass through a million if graphs per session.

Development:
I am now using notepad++ and gcc under minGW for final test and adjust. My development is still on the free Virtual c system, which I find easy for plain c.

No comments: