Tuesday 10 August 2010

Visual document search with Ontolica Preview


I recently had the pleasure of installing some Ontolica products for a client, Ontolica search webparts for SharePoint is a well known product but perhaps not so well know is Ontolica Preview for SharePoint

The product provides a visual search for documents and works in conjunction with the standard Ontolica webparts.

The addition of an Ontolica Preview webpart which can be added to the search results page adds visual search capabilities which include:

1) indexing of the number of pages in a document
2) thumbnail screenshots of the front and all subsequent pages
3) highlighting of the search term in the screenshot
4) ability to zoom into document preview
5) ability  to read the content that the search term hit without opening the document

this is very cool, and can save a lot of time where clients have a vast number of documents to wade through in search results. The ontolica webparts with the extra granularity to fine tune search queries helps a lot but he visual context and not having to open the document to check a search hit is really valuable.

Here are a few screenshots of Ontolica Preview in action:













Fig.1. Here you see in the Ontolica Preview webpart a new link to show the document preview, this is configurable, you can opt to show screenshots by default but this has a real performance hit on your search results rendering as you 'd imagine.





















Fig.2. Clicking on the "show document preview" opens the preview data window showing thumbnail screenshots of all the pages in the document, this particular document had 42 pages (you can see the scroll bar). Also note the "Most relevant pages:" 5, 7, 19 on the Fig.4 screenshot you'll see on page 7 the search term "document" is highlighted in the screenshot! how cool is that


















Fig.3. Here you'll see i've hovered over the thumbnail screenshot to get a slightly larger view this is readable but you zoom too, click on the screenshot and you get the below:






















Fig.4. Look how the search term "document" is highlighted in the screenshot. Here i've clicked on the thumbnail to zoom in and read the document (without ever opening the actual document)

Now this is all the great but what are the drawbacks?

Well naturally loading all these screenshots into your search results can be a big performance hit, in these screenshots we've used the Show/Hide feature so that screenshots are only loaded when the user is interested enough in that search hit.

I've not noticed however any reduction in search query response time happily.

When Ontolica Preview is installed it uses a new database and a new content source to gather these results, so they are separate to your existing results, this allows you to setup additional crawl rules for the preview results set.

Be warned, this particular client had 30000 approx documents to index and i found the gatherer service installed to crawl these was using up to 1Gb of server memory to run the crawl and the Preview database quickly grew to tens of gigs so don't forget to provide plenty of resources for this puppy.

No comments: