Sunday, October 30, 2011

High speed, accurate retrieval in corporate ontologies

The goal is to make the median corporate document accessible with two to three search words. That means rapid, forward traversal of large corporate meshes, matching a client supplied ontology to contiguous portions of the large mesh. The grammar of the ontology is a simple form:
(a,b,c.f...).(x.k.j,y,z..).(k,..)...

A two operator (so far) algebra with distributive property and partial associative property (partial ordering). the a,b,c.. are partially ordered elements of keyword class (which may be subclassed). The two operators are predicates on the one forward link. The one forward link may also be a terminal, a URL or RDF pointer.
So we have the basic store of a node:

(keyword_subclass,link_subclass)
Though we include any proprietary extension field vendors may add, including click through counts.

If the link does not point to an external URL terminal, then it points to a node of the ontology segment in which it is contained (with the and/or predicate). Ontology segments are meant to be stored in separate object stored in nested compact order for fast searching. So the internal node pointer may be relative to the current object store or may reference an absolute object store. So the ontology machine runs in the background optimizing the placement of the ontology among object stores to, again, maximize search entropy.

So we can see why an embedded sqlite, accessed via te native programming interface, would be very popular with large search vendors. As an open source project, the industry could generate an extensible engine, getting the right mix of proprietary extensions a basic ontology engine.

At least that is how the Imagisoft systems imagine it.

No comments: