Sunday, October 9, 2011

Searching the web for a semantic grammar

Searching, for the search engine is often a missing key word search.  The user often is sexploring sub topic of a well formed network in a category.  The query:

 fermion boson
Will generally turn up articles about those specific sub atomic particles, the serah engine has figured out the real query:

physics.standard_model.(fermions, bosons)

The search engine discovered the query path includes physics and standard_model.  How did the search engine discover those hidden words?  Mainly due to Wiki, where many physics searches end up.  Using the clink count for similar physics, the search engine simply was dominated by Wiki.  Wiki has a fairly formal and minimum redundancy semantic grammar, they enforce the proper style in links and references.

If the search engine desires a canonical form for a linked set of key words, then it can do pretty good by looking the the anchor text.  If it did so in physics searches, then is would quickly deive the Wiki grammar.

When we click through on some link results in a web search, then we confirm the search engines choice of canonical form.  The engine and the user negotiating a canonical form to minimize key strokes. It is Huffman encoding, the network segment that make a search  form is a Huffman symbol.

There are also standard reduction algorithms that will strip a page of everything but information bearing words in the main text.  That gives the engine a highly encoded set which it can order by probability of occurance.  This gives the engine another, large set of key words in addition to anchor text.  So, then a search path may be described in sematic some graph notation, substitution sets of key words as possible solutions.

So, give the huans some control over the bots, let the humans teach the bots in a comfortable fashion, and the bot keeps query sets of key words, tuned to the individual human.

For security we can do this entirely within the browser, using a search interface that decodes the returns pages from standard market search engines.  Then using a web sql within the browser, with text reduction software, the users search personality can be captured locally as link lists of key words.

I should convene the staff at Imagisoft. OK, we met and here is our product description

Search Gadget is a widget that runs in you browser an acts as a layer between you and the web search engine. We give the user a few mild setting to control search word extraction, but silently, underneath the user. Search Gadget is maintaining a maximally selected sets of key words that comprise canonical search forms. Search Gadget learns my search personality, and stores that personality as a semantic net of key words, in the browser local store.

Later when the user types an ad hoc term, the local Search Gadget will respond with likely and completed formal search queries, which it then submits to the web.

The data store code is simple, we have dealt with semantic nets in WebSql when we stored and recovered DOM trees in nested order. So under user control Search gadget transforms DOM Trees, in nested order, into private key words lists, in semantic nets, a graph transformation.

The main code we need to add text reduction, I will look around.

No comments: