Monday, February 13, 2012

Huffman look up trees

http://berkeley.intel-research.net/sylvia/pht.pdf
I am selectig a generic distributed hash that the market might like to include with the sematic machine. Once we get the concept that the web consists of graphs steppng alng each other in binary operation, then we see, right away, graphs will do this ore efficiently with distributed indexing. Look at that paper, he is developing a Huffman code for indexing, prefix indexing.

What unit do they count when they develop their prefixes? If I look at one web site all the time, give me the first prefix, an it will stand for that web site, one bit, one website. If I mostly look at one, but often another, I need two bits, but web site one gets one bit, x1, web site two gets one of 10 or 00, the index can tell the difference by looking at the first bit, counting right to left. This makes a bit difference when the bits stand for something. It makes a hug difference f the bit stand for the number of miles to deliver a paper postcard, these bits give me a much more accurate picture of cost.

What is the computer cost here? The number of hops to get the data. If I look at a web site every five minutes, just keep the most recent copy on my computer awith an expiration date, that is one hop, one bit.

But the number of hops also comes in play for very fast look ups, the Huffman prefixing idea allows you to traverse a tree to convert from Huffman codes to real world quantities. But in the world of bits the two costs coincide, the bot looks at the first bit, and from that makes a real hop, (real in the computer sense), The bot makes the jump, then takes a look at the second bit, at the second location! Huge gain, intelligent indexing made possible because bots can step along the net.

Also, the other cost is hops along disk repositioning, the second leading cause of delay. A set from a table to table in sqlite3 is often cause a big move in the disk head, an actual mechanical thingy.

Bu our semantic machine, our personal Watson, want to use and maintain distributed indexing. They will do so by graph hoping and modifying their own table. I'll figure out the method that makes this very efficient, simple and submerged. Not invent but find one on the web.

No comments: