Monday, November 21, 2011

More on list processing using SQL and the G machine

Consider the case that the client has a blob of words, organized as a linear graph of triples. This blob may have come from a 10,000 word article, or the new, or from anywhere; and it has been pre-processed to remove offending punctuation marks.

Step one, load the word list into result.
Now, to remove commonly used words the SQL does:

insert into result select self.key,NewLink(result.link),result.rowid)
from self,other
where self.rowid == NextSet() && Match(self.key == other.key):

where my syntax is likely horribly wrong. But note the use of function calls back to G? These function calls cause G to alter the pointers based upon the match conditions. Here is what I mean.

Start with the original list:

Cats,do,sleep,on,the,couch,mostly

When the match is a deselect the list becomes:

Cats.do,sleep.on.the,couch,mostly

The discarded words are relinked as descents on the kept words, not discarded. The top of each descent are the remaining set of key words. Sql can continue to select over them skipping the discarded, The NextSet() function jumps from the top of one set element to the top of the next. Thus, the usual method is to start with the bare bones lost, and un it over successive list filters, swapping words out, descending some words. SQL cannot change order until it decides to swap the table back and re-read it with sequential selects, back to result.

But the proper design of he call back such that various weightings of properties are obvious, hence he matches work as planned. There is match but select, match any, match select, match copy to result. If the industry really gets to a common consensus on operators, ten functional result of queries become much faster, our search sequences work as planned.

For example, the wildcard. If everyone agrees on the raw, most powerful wild card, then G can skip over entire search sequences when it sees one, not even activate SQL.

What is the wild card?
I say the most raw form of it is the * character in common use. Which I previously noted is also the multiply.

So I went ahead and dumped the latest code to the right there. It has the gfun completely separated out, that is the one the industry will debate, what is a good ontology of calls and match standards and pointer formats to male a sound "table" or even codes for dfun ations.

No comments: