Wednesday, December 14, 2011

Attributes, entities, sparse matrices entropy

In Google's view, there is a large set of attributes (color, height, material..), and a large number of entities (house, cars, oats..) and the client can connect any attribute to any entity, a huge square matrix. This is the most general model. How does one encode the sparse matrix so one does not save every empty space? Huffman encoding is the general term, decode appearances in the sparse matrix by the number of times it appears, click thrus. Huffman encoding, done right also captures patterns, which the sparse matrix loses. Google done a general randomizer, row/col pair mangling. Find mangled combinations, then trim the bye sizes down and alter some bytes until you think you have the smallest index for the sparse entries, something like that.

Attribute, any to any, the general case. But humans don't do that, they were originally encoded properly by the humans, this is all about having the computer simulate human encoding which were damn near perfect, from the human point of view.

What car? The damn car in the corner of the lot, come over, it has a great price.

The human had that perfectly encoded. The idea is not to lose the original encoding, but to speed the thing up two orders of magnitude. But the entire carsaleman experience is a nested graph, the whole car lot, office, cars, garage, the whole thing is a nested subgraph of the local business district. Its the 80/20, don't lose the 80 just to generalize the 20. Capture the original semantics, they are about the best schema we have. The best row column index is to execture the few lines of grammar that get the attribute and entity together, either by pattern match, or simply nest values to definitions.

Graphs stretch, I think this is the whole mystery. The car sales man bring his graph, or a nested order schema for his car business graph, brings it to the banking graph. From the standpoint of the banking official, my best database tool is the maximum entropy patten encoder. Tell me the most common rank five small business graph in my community? Extraordinary information, it tells him the maximum entropy debt proportions he can deliver. So graphs roam, looking for matching patterns. No other way to do it. That means we apply Groovy's Law and make the graphs move at very high speeds.

Match Points:
That is the key, If the data geek can find a match point in the syntax, and some defined micro actions regarding sqlite3, or dll methods. What solution the geek has will be done in graph. Match points and micro action; unpatentable, I stole the idea from sqlite3, and both codes are free. I think I have it, I still await a challenger.

I am right, or really REGEX is right, graph convolution.

No comments: