Tuesday 10 August 2010

Visual document search with Ontolica Preview


I recently had the pleasure of installing some Ontolica products for a client, Ontolica search webparts for SharePoint is a well known product but perhaps not so well know is Ontolica Preview for SharePoint

The product provides a visual search for documents and works in conjunction with the standard Ontolica webparts.

The addition of an Ontolica Preview webpart which can be added to the search results page adds visual search capabilities which include:

1) indexing of the number of pages in a document
2) thumbnail screenshots of the front and all subsequent pages
3) highlighting of the search term in the screenshot
4) ability to zoom into document preview
5) ability  to read the content that the search term hit without opening the document

this is very cool, and can save a lot of time where clients have a vast number of documents to wade through in search results. The ontolica webparts with the extra granularity to fine tune search queries helps a lot but he visual context and not having to open the document to check a search hit is really valuable.

Here are a few screenshots of Ontolica Preview in action:













Fig.1. Here you see in the Ontolica Preview webpart a new link to show the document preview, this is configurable, you can opt to show screenshots by default but this has a real performance hit on your search results rendering as you 'd imagine.





















Fig.2. Clicking on the "show document preview" opens the preview data window showing thumbnail screenshots of all the pages in the document, this particular document had 42 pages (you can see the scroll bar). Also note the "Most relevant pages:" 5, 7, 19 on the Fig.4 screenshot you'll see on page 7 the search term "document" is highlighted in the screenshot! how cool is that


















Fig.3. Here you'll see i've hovered over the thumbnail screenshot to get a slightly larger view this is readable but you zoom too, click on the screenshot and you get the below:






















Fig.4. Look how the search term "document" is highlighted in the screenshot. Here i've clicked on the thumbnail to zoom in and read the document (without ever opening the actual document)

Now this is all the great but what are the drawbacks?

Well naturally loading all these screenshots into your search results can be a big performance hit, in these screenshots we've used the Show/Hide feature so that screenshots are only loaded when the user is interested enough in that search hit.

I've not noticed however any reduction in search query response time happily.

When Ontolica Preview is installed it uses a new database and a new content source to gather these results, so they are separate to your existing results, this allows you to setup additional crawl rules for the preview results set.

Be warned, this particular client had 30000 approx documents to index and i found the gatherer service installed to crawl these was using up to 1Gb of server memory to run the crawl and the Preview database quickly grew to tens of gigs so don't forget to provide plenty of resources for this puppy.

Friday 23 July 2010

Adventures in SharePoint variations - part deux

So to recap - recently i had problems with SharePoint variations, the client reported that pages weren't being propagated to all child variation sites

after some research see "adventures in sharepoint variations" - i discovered that the client was only running SP1 of MOSS and that SP2 had significant improvements in the reliability of the variations infrastructure and that SP2 also included some powerful new stsadm commands to help debug problems with variations.

This includes the very aptly named variationsfixuptool 

Which provides the two functions i used to fix the variations "relationships list" for Compass

stsadm -o variationsfixuptool -scan -url http://server/sites/pub/vhome/source > C:\report1.html

This does what it says on the tin, it scans through the relationships list and outputs the relationships in a nice HTML table as URL to URL relationships. This is much easier to read and understand than the relationships list.

Looking at this you can see which pages have failed to deploy to all the variations and focus on fixing the relationships for these pages.

stsadm -o variationsfixuptool -fix -url http://server/sites/pub/vhome/source/sub1 -recurse

But wait you don't need to fix these relationships, the second cool thing about variationsfixuptool is that it includes a method that will actually recurse through the root variation spot your missing relationships and fix them for you! 

I would have been here until the end of time trying to fix relationships without this, i so feel justified for installing SP2.

However this isn't the end of the story......

Although the relationships are fixed there is still a lot of damage to repair not to mention figuring out how this happened in the first place.

When you enable variations on a publishing site template two timer jobs are created to propagate content out to the variation children








I initially suspected these as the culprits for why the variations failed in the first place. On the stage environment these hadn't actually run when i'd published a test page.

running the stsadm -o execadmsvcjobs command seemed to free these up and i was able to publish pages to variations again on staging. However on production the timer jobs seemed to have been running recently. 

If that's the case then all i need to do is republish all the pages identified in the initial scan and with their fixed relationships they should propagate out again as intended. 

Now what's the problem with that you might think. Quite a lot of clicking to do but hey it's not difficult to republish everything.

Well this client has just been on a translation spree. The idea is that when you publish a page in the root site this page is created in the same place on each child variation in DRAFT mode ready to be translated into the native culture for that variation. If you then republish the same page from the root variation again the same process occurs and a new version of the page is propagated to the child variation in draft mode over-writing the previously translated version. This is not something you want to have to inform the client.

So a lot of hard work ahead assessing which pages need to be republished but any new pages fixed should propagate as normal 

As yet i'm still working on how it got into this state in the first place. I only hope the installation of SP2 means it won't happen again... 










Thursday 15 July 2010

Adventures in SharePoint variations

I'm currently dealing with a MOSS 2007 site using variations.

They have 29 variations currently running and are adding new variations at a rate of 3 or 4 a year

Recently on publishing a page to the root site they found the page didn't get published to all the variations. Uh oh.

This isn't going to be a great day.

The workflow that runs to publish the page showed the message "error occurred". I spent some time investigating the logs around the workflow but this didn't really get me anywhere. I checked on disk space and the usual suspects anything that might stop workflows and normal processes running correctly - as they had been upto this point.

Nothing

So i switched the logs to verbose and published a test page from the root site to see if i could track the messaging in the trace log to see what was happening ( i used SharePoint ULS log viewer to track the relevant messages) .

Sure enough SharePoint was happily publishing the said page to a number of variations but then failed and stopped when it reached a variation it couldn't propagate to.

The error was "Cannot create Variation Publishing Page because the target Publishing Web cannot be determined"

Strange because the particular site was seemingly available within "Manage Content and Structure"

But this lead me to a chilling and dark conclusion.

To understand the chill i experienced you need to understand how SharePoint propagates pages from the root site to the different variation sites.

SharePoint maintains a list of relationships between pages in the root site and pages in each variation. This information is stored in an hidden list called not surprisingly "relationships list".

This list has some several thousand items in it. Lots of Guid's and object id's enough to make your vision go blurry trying to read them.

When i checked the relationship list for the variation country that had failed i was immediately suspicious as there were only a few links assigned to that country variation whereby most of the current variations clearly had a link seemingly for every page in the site.

This was telling me i had problems with relationships. (Something my friends have been telling me for years)

I investigated this by installing a tool widely used (as there aren't any others). The SharePoint Variations Editor. This however didn't work for me, most disappointingly it correctly found my site but threw an error when trying to enumerate the sites. I didn't spend too much time working out why as it was a long hopeful shot anyway.

So having now spent the best part of the day investigating the problem i turned to Gary LaPointes  stsadm extensions for fixing variations.

http://stsadm.blogspot.com/2008/04/fix-variation-relationships-list.html

I installed the solution ran the command. But again failure.

I got the message "Value out of range" or similar. I knew what was coming.

At this point i'd checked the version of MOSS and found this to be only running SharePoint SP1.

Some googling later i found plenty of evidence that SP2 contained a number of improvements as well as some stsadm commands to help fix problems with variations and the relationships list.

http://support.microsoft.com/kb/953334

So not sure if this was good news or not but at least if i get the latest SP2 and cumulatives installed i can get help from MS.

So Installing SP2 it is. I hope it works! Stay tuned for Part 2 where i will hopefully be cheerily informing you how i fixed my relationship troubles or alternatively how i will be crying into my tea at taking down a client server for a couple of hours for no apparent reason.

Sunday 4 July 2010

Download the conferencing add-in for Microsoft Office Outlook

An outlook add-in is available that allows you to schedule livemeetings from outlook

Get it here:
http://office.microsoft.com/en-us/help/HA102368901033.aspx?pid=CL100605171033

Co-existence between BPOS communicator and on premise exchange


Office Communications Online integration with on-premises Exchange Server 2007

Office Communications Online will integrate with an on-premise server running Microsoft Exchange Server 2007. Office Communicator will display presence updates and Out-of-Office information from the on-premises Exchange server. Office Communicator will communicate with your on-premises Exchange server using its external Exchange Web Service (EWS) URLs. For more information about EWS URLs, see How to Configure an External Host Name for Outlook Anywhere.

For more information on EWS click here


Microsoft Online Services release notes

Friday 2 July 2010