Archive for the ‘Enterprise Search’ Category

Kristian Norling

Video: Search Analytics in Practice

May 9 - 2012 | Kristian Norling

Search Analytics in Practice from Findwise on Vimeo.

This presentation is about how to use search analytics to improve the search experience. A small investment in time and effort can really improve the search on your intranet or website. You will get practical advice on what metrics to look at and what actions can be taken as a result of the analysis.

Video in swedish “Sökanalys i praktiken”.

The presentation was recorded in Gothenburg on the 4th of May 2012.

The presentation featured in the video:

Search Analytics in Practice

View more presentations from Findwise
Description: Search analytics done right helps to improve the search experience. A small investment in time can make your search function a lot better.  •  About: Search Analytics  •  Accountable Person: Kristian Norling  •  Author:  •  Keywords: Search analytics, search,  • 

Pawel Wroblewski

Architecture of Search Systems and Measuring the Search Effectiveness

April 24 - 2012 | Pawel Wroblewski
Lecture made at the 19th of April 2012, at the Warsaw University of Technology. This is the 9th lecture in the regular course for master grade studies, “Introduction to text mining”.

View more presentations from Findwise
Keywords: Search, search effectiveness, search architecture  • 

Kristian Norling

Update on The Enterprise Search and Findability Survey

April 12 - 2012 | Kristian Norling

A quick update on the status of the Enterprise Search survey.

We now have well over a hundred respondents. The more respondents the better the data will be, so please help spreading the word. We’d love to have  several hundred more. The survey will now be open until the end of April.

But most important of all, if you haven’t already, have a cup of coffee and fill in the survey.

A Few Results from the Survey about Enterprise Search

More than 60% say that the amount of searchable content in their organizations today are less or far less than needed. And in three years time 85% say that the amount of searchable content in the organisation will increase och increase significantly.

75% say that it is critical to find the right information to support their organizations business goals and success. But the interesting to note is that over 70% of the respondents say that users don’t know where to find the right information or what to look for – and about 50% of the respondents say that it is not possible to search more than one source of information from a single search query.

In this context it is interesting that the primary goal for using search in organisations (where the answer is imperative or signifact) is to:

  • Improve re-use of information and/or knowledge) – 59%
  • Accelerate brokering of people and/or expertise – 55%
  • Increase collaboration – 60%
  • Raise awareness of “What We Know” – 57%
  • and finally to eliminate siloed repositories – 59%

In many organisations search is owned either by IT (60%) or Communication (27%), search has no specified budget (38%) and has less than 1 dedicated person working with search (48%).  More than 50% have a search strategy in place or are planning to have one in 2012/13.

These numbers I think are interesting, but definitely need to be segmented and analyzed further. That will of course be done in the report which is due to be ready in June.

Svetoslav Marinov

Bryan, Brian, Briane, Bryne, or … what was his name again?

March 21 - 2012 | Svetoslav Marinov

Let the spelling loose …

What do Callie and Kelly have in common (except for the double ‘l’ in the middle)? What about “no” and “know”, or “Ceasar’s” and “scissors” and what about “message” and “massage”? You definitely got it – Callie and Kelly, “no” and “know”, “Ceasar’s” and “scissors” sound alike, but are spelled quite differently. “message” and “massage” on the other hand differ by only one vowel (“a” vs “e”) but their pronunciation is not at all the same.

It’s a well known fact for many languages that ortography does not determine the pronunciation of words. English is a classic example. George Bernard Shaw was the attributed author of “ghoti” as an alternative spelling of “fish”. And while phonology often reflects the current state of the development of the language, orthography may often lag centuries behind. And while English is notorious for that phenomenon it is not the only one. Swedish, French, Portuguese, among others, all have their ortography/pronunciation discrepancies.

Phonetic Algorithms

So how do we represent things that sound similar but are spelled different? It’s not trivial but for most cases it is not impossible either. Soundex is probably the first algorithm to tackle this problem. It is an example of the so called phonetic algorithms which attempt to solve the problem of giving the same encoding to strings which are pronounced in a similar fashion. Soundex was designed for English only but has its limits. DoubleMetaphone (DM) is one of the possible replacements and relatively successful. Designed by Lawrence Philips in the beginning of 1990s it not only deals with native English names but also takes proper care of foreign names so omnipresent in the language. And what is more – it can output two possible encodings for a given name, hence the “Double” in the naming of the algorithm, – an anglicised and a native (be that Slavic, Germanic, Greek, Spanish, etc.) version.

By relying on DM one can encode all the four names in the title of this post as “PRN”. The name George will get two encodings – JRJ and KRK, the second version reflecting a possible German pronunciation of the name. And a name with Polish origin, like Adamowicz, would also get two encodings – ATMTS and ATMFX, depending on whether you pronounce the “cz” as the English “ch” in “church” or “ts” in “hats”.

The original implementation by Lawrence Philips allowed a string to be encoded only with 4 characters. However, in most subsequent
implementations of the algorithm this option is parameterized or just omitted.

Apache Commons Codec has an implementation of the DM among others (Soundex, Metaphone, RefinedSoundex, ColognePhonetic, Coverphone, to
name just a few.) and here is a tiny example with it:

import org.apache.commons.codec.language.DoubleMetaphone;

public class DM {

public static void main(String[] args) {

String s = "Adamowicz";

DoubleMetaphone dm = new DoubleMetaphone();

// Default encoding length is 4!

// Let's make it 10

dm.setMaxCodeLen(10);

System.out.println("Alternative 1: " + dm.doubleMetaphone(s) +

// Remember, DM can output 2 possible encodings:

"\nAlternative 2: " + dm.doubleMetaphone(s, true));

}
}

The above code will print out:

Alternative 1: ATMTS

Alternative 2: ATMFX

It is also relatively straightforward to do phonetic search with Solr. You just need to ensure that you add the phonetic analysis to a field which contains names in your schema.xml:

Enhancements

While DM does perform quite well, at first sight, it has its limitations. We should know that it still originated from the English language and although it aims to tackle a variety of non-native borrowings most of the rules are English-centric. Suppose you work on any of the Scandinavian languages (Swedish, Danish, Norwegian, Icelandic) and one of the names you want to encode is ”Örjan”. However, “Orjan” and “Örjan” get different encodings – ARJN vs RJN. Why is that? One look under the hood (the implementation in DoubleMetaphone.java) will give you the answer:

private static final String VOWELS = "AEIOUY";

So the Scandinavian vowels “ö”, “ä”, “å”, “ø” and “æ” are not present. If we just add these then compile and use the new version of the DM implementation we get the desired output – ARJN for both “Örjan” and “Orjan”.

Finally, if you don’t want to use DM or maybe it is really not suitable for your task, you still may use the same principles and create your own encoder by relying on regular expressions for example. Suppose you have a list of bogus product names which are just (mis)spelling variations of some well known names and you want to search for the original name but get back all ludicrous variants. Here is one albeit very naïve way to do it. Given the following names:

CupHoulder

CappHolder

KeepHolder

MacKleena

MackCliiner

MacqQleanAR

Ma’cKcle’an’ar

and with a bunch of regular expressions you can easily encode them as ”cphldR” and “mclnR”.

String[] ar = new String[]{"CupHoulder", "CappHolder", "KeepHolder",
"MacKleena", "MackCliiner", "MacqQleanAR", "Ma'cKcle'an'ar"};

for (String a : ar) {
a = a.toLowerCase();
a = a.replaceAll("[ae]r?$", "R");
a = a.replaceAll("[aeoiuy']", "");
a = a.replaceAll("pp+", "p");
a = a.replaceAll("q|k", "c");
a = a.replaceAll("cc+", "c");
System.out.println(a);
}

You can now easily find all the ludicrous spellings of “CupHolder” och ”MacCleaner”.

I hope this blogpost gave you some ideas of how you can use phonetic algorithms and their principles in order to better discover names and entities that sound alike but are spelled unlike. At Findwise we have done a number of enhancements to DM in order to make it work better with Swedish, Danish and Norwegian.

References

You can learn more about Double Metaphone from the following article by the creator of the algorithm:
http://drdobbs.com/cpp/184401251?pgno=2

A German phonetic algorithm is the Kölner Phonetik:
http://de.wikipedia.org/wiki/Kölner_Phonetik

And SfinxBis is a phonetic algorithm based on Soundex and is Swedish specific:
http://www.swami.se/projekt/sfinxbis.68.html

Kristian Norling

Video interview: How to Improve the Search Experience

March 15 - 2012 | Kristian Norling

Video interview with Kristian Norling at the Intrateam Event in Copenhagen 2012. Kristian talks about his former work at VGR and what he thinks is important for improving the search experience.

Kristian Norling

Watch the video

Description: Improving the search experience on the intranet or your website is important. Kristian gives advice on how to improve the search experience.  •  About: Kristian Norling is interviewed on how to improve the search experience.  •  Accountable Person: Kristian Norling  •  Keywords: search experience intranet intrateam  • 

HakanKjellman

Mobile clients and Enterprise Search – What are the Implications?

March 14 - 2012 | HakanKjellman

As we all know the smartphone user base is growing explosively. According to www.statcounter.com, internet access from handheld mobile devices has doubled yearly since 2009 adding up to 8,5 % of all page views globally in January 2012. And mobile users want to be able to do all the same things that they are able to do on their PC. And that includes access to the company’s Enterprise Search solution!

The benefits of the sales force being able to search for vital customer information before a meeting or for field service personnel being able to find documentation quickly are quite obvious. So how can an organization tweak its search solution in order to provide convenient access for the mobile users? And above all, what will it cost?

Well, to answer the last question first: much less than you think. Providing for the mobile user is mainly about creating a new front end/UI. The main bulk of your search solution remains the same; indexing, metadata structure and content publishing, for instance, remain essentially unaffected.

But you do need to provide a quite different UI in order for the user interaction to work smoothly considering the specific characteristics of the mobile client primarily when it comes to screen size/resolution and text input. But the smartphone also has a lot of features that the PC lacks – it is always available and it knows exactly where you are, it always has a camera, microphone, speaker, possibly a magnetometer and accelerometer and of course a touchscreen with motions like pinching and swiping etc. And many of these features can be quite useful as the following examples prove:

Illustration 1. Google Mobile Voice Search on the iPhone. Courtesy of UX Matters, www.uxmatters.com

  • Google Mobile App for iPhone: in this app, the iPhone senses when the phone is lifted towards the ear and hence knows when to listen for a search command. Since the phone also knows where the user is, a search for “restaurant” automatically generates hits with restaurants in your vicinity.
  • Scanning a Barcode or QR-code: scanning a Barcode or QR-code with your phone is another way of entering a search string. An example could be a product in a store where the customer could open a price-search-engine and scan the QR-code of the product and see where the best price is.

As you can see, there are plenty of opportunities for those who want to be creative. But for the most part, the I/O will still be done via the screen. At UX Matters there is a great article by Greg Nudelman describing the considerations when implementing search for mobile clients and suggestions for various design patterns that can be efficient (see http://www.uxmatters.com/mt/archives/2010/04/design-patterns-for-mobile-faceted-search-part-i.php). I have included a brief summary below together with illustrations courtesy of UX Matters. But first, some general considerations for mobile clients:

  • Use Javascript code to detect what type of device is accessing your search solution and if it is a mobile client you display the mobile interface.
  • Native App or Mobile Web App: Creating a Mobile Web App is easier and cheaper than creating a native App – for one thing you don’t have to create multiple versions for different OS’s (although you still need to test your solution with different browsers/resolutions). Performance wise there isn’t a big difference between Native Apps and Web Apps and mobile browsers are increasingly gaining access to most of the phones hardware as well.
  • Authentication: SSO for mobile web applications works the same as for desktop browsers.  There are also new solutions currently being launched enabling usage of the company’s existing Active Directory infrastructure. One example is Centrify Directcontrol for Mobile enabling a centralized administration within Active Directory of all device security settings, profiles, certificates and restrictions.
  • Use HTML5 instead of FLASH: iPhones don’t support FLASH but HTML5 is a very capable alternative
  • Testing: How the design looks for different resolutions can be tested through various emulators but it is always recommendable to test on a limited set of real smartphones as well.
  • Access needs to be quick and simple: user interaction is more cumbersome on a phone than on a PC. Normally try to avoid solutions that require more than 3 input actions.
  • Menu navigation: links on the right side are normally used to drill down in the menu hierarchy and left up/towards the home screen
  • Gestures: is a very powerful toolbox that can be used in many different ways to create an efficient UI. For example, use “pinch to show more” if you want to expand the summary information of a specific item in the search hit list or “swipe” to expose the metadata (or whatever action you want to assign to that gesture).
  • Be creative: the mobile client is inherently different from a PC, limited in some ways but more powerful in others. So if you just try to adopt design solutions from the PC and fit them into a mobile UI you are missing out on a lot of powerful design solutions that only make sense on a mobile client and you are definitely not giving the users the best possible search experience. Also, since mobile design is still evolving you don’t need to be limited by conventions and expectations as much as on the PC side – make the most of this freedom to be creative!
  • W3C mobile: for more information about mobile web development, see http://www.w3.org/Mobile/ which also includes a validating scheme to assess the readiness of content for the mobile web

Design patterns for mobile UI (with courtesy of Greg Nudelman/UX Matters)

Mobile faceting can be tricky but by using design patterns like “4 Corners”, “Modal Overlays”, “Watermarks” and “Teaser Design” the UI can become both intuitive and easy to learn as well as providing reasonably powerful functionality. As mentioned, these techniques are summaries from an article written by Greg Nudelman for UX Matters. If you are eager to learn more, feel free to check out Greg’s website and his upcoming workshops focused on mobile design http://www.designcaffeine.com/category/workshops/

4 Corners: instead of stealing scarce real estate by adding faceting options directly on the screen together with the search result, semitransparent buttons are available in each corner enabling the user to bring up a faceting menu by tapping in a corner (see illustration 2).

Modal Overlays: the modal overlay is displayed on top of the original page. The modal overlay works well together with the 4 corners design – tapping a corner opens up the overlay containing faceting functions like filtering and sorting (see illustration 2).

Illustration 2. Four Corners and Modal Overlay patterns. Courtesy of UX Matters, www.uxmatters.com

Watermarks: a great technique for guiding users and showing the possibility of using new functions. The watermarks, possibly animated, show a symbol for the available action, for instance arrows indicating that a swiping gesture could be used (see illustration 3).

Full-Page Refinement Options Pattern: gives the user plenty of refinement options to choose from (see illustration 3).

Illustration 3. Two variations of the Watermark pattern and a Refinement Options pattern. Courtesy of UX Matters, www.uxmatters.com

Teaser Design: show part of the next available content so that the user is aware that there is more content available (see illustration 4).

Illustration 4. Teaser design pattern facilitates the discovery of faceted search filters. Courtesy of UX Matters, www.uxmatters.com

Persistent Status Bar: always maintain a persistent status bar containing the search string together with applied filters in the search result page. This helps the user maintain orientation. Note that all of the illustrations above have a persistent status bar.

Conclusion

Although Best Practices for mobile UI design are still evolving, plenty of progress has already been made and there are several solutions and design patterns to choose from depending on the specific circumstances at hand. So an implementation project need not be rocket science, as long as you learn the right tricks…

Bringing enterprise information to the field, readily available in a mobile handset or tablet, will mobilize your employees. The UI requires rethinking as we have seen. And security needs to be addressed properly to avoid having sensitive data compromised. But other than that, you are ready to go!

Mickel Gronroos

Automated Testing of Enterprise Search

March 8 - 2012 | Mickel Gronroos

Quality assuring an enterprise search solution is challenging, yet important. The challenge is to be able to do continuous follow-up of the quality of the solution during implementation but also after release, when the solution is in production and operated by an operations team. Testing is important, but it is also costly – unless it can be automated.

So what kind of testing is specific for a search application? And what of that can be automated?

The whole idea of Enterprise Search is to provide the right information to the right people at the right time. The information made findable is normally stored in many different information systems and the information in these systems is constantly changing. In the end, every enterprise search solution operates in a context where the requirements of the end-users and the available content changes on a daily basis. In other words, assuring the quality of enterprise search is about assuring the quality of the information and the way that information is accessed by and delivered to the end-users.

During our engagements over the years, we have set routines and developed tools for automated testing of enterprise search. What we specifically want to track in an automated fashion is:

  • Completeness
  • Freshness
  • Access restrictions
  • Metadata quality
  • Performance
  • Relevance

Allow me to take a few moments and describe what this means.

Completeness testing

Completeness tests aim to make sure that the search index is complete – that all information objects (such as web pages and documents) that are supposed to be searchable are really searchable. In addition, completeness testing provides proof that the correct parts of the information objects are indexed for retrieval, e.g. all pages in a multi-page document, as well as titles and other searchable metadata. It is also important to monitor that information that should not be searchable is indeed not indexed, e.g. headers and footers of web pages.

Freshness testing

Freshness tests aim to make sure that the search index is up to date, i.e. new content that has been added to a source (such as a document management system) becomes searchable, deleted content is removed automatically from the search index and updated content is updated in the search index – all in due time.

Testing access restrictions

If an enterprise search solution provides access to access-controlled information, it is of uttermost importance to be able to prove that security is never compromised. Testing access restrictions aim to do precisely that. What one needs to monitor is that existing document-level security works, i.e. that people who should have access to an information object really has access and that people who shouldn’t have access, don’t have access. The tricky part is to monitor that a change in access privileges in for instance Active Directory or in the access restrictions (the ACL) for a particular document is handled in the search index as well in due time.

Testing metadata quality

Each information object in the search index contains a set of fields containing metadata and text, e.g. a title, the text body, an author, a timestamp containing last modification date, information on file format, a keywords field and many more.

In an enterprise search setting, many different information models implemented in the source systems need to be harmonized into one common domain model (schema/index profile/information model) in the search index. This means information regarding a creator of an information object in one system and a publisher of an information object in another system can be stored in a common author metadata field in the search index in a common, defined format such as Firstname Lastname regardless of formatting in the source system. Unless you have a common model in the index, you can’t provide features like cross-system filtering with facets.

So how do you track that the metadata in the search index stays in good shape? This is the aim of metadata testing. The test cases provided for metadata testing need to check that the metadata in the search index conforms to the defined domain model and formatting even when the underlying content changes in the source systems.

Performance testing

Performance testing is probably the easiest type of tests you can create and run. In the end you will have a threshold or pain limit in milliseconds under which a query in the enterprise search solution will be required to provide an answer even under peak times with high query loads. Normally you will also be monitoring issues like RAM and processor capacity usage of the software components of your solution to be able to generate automatic alerts to the maintenance team if the hardware is under too much pressure.

Relevance testing

Quality assuring the relevance model of an enterprise search solution is tricky. Largely because relevance in a result set is to some extent subjective. However, when implementing search, one does need to set a relevance model that presupposes a set of business rules for what type of content is to be deemed more important than other. For example, when making documents in a document management system searchable, a typical business rule would be that documents tagged with Status=Approved must always be deemed more important than documents with any other status (such as Preliminary or Deprecated). Another typical rule is that a document for which a query term can be found in the title or in the keywords metadata field is most likely more important than documents where the query term is found elsewhere in the text body.

What it all boils down to is the definition of the business rules for relevance. Once you have defined the rules that govern how the results are to be ranked, you can also create test cases, i.e. associate query terms with information objects that must be returned as top results given these terms.

Automating it all

Once you have defined you test cases for all the above mentioned types of tests in a test plan, you are ready to automate, i.e. enter the test plan into a test automation framework. The beauty of it all is that you can automate regression testing during the implementation phase of an enterprise search solution, i.e. continuously test that new development does not break such parts of the solution that worked as intended before. This is in particular important if you add new information sources to your enterprise search solution, when there is a high risk that the relevance model that worked fine yesterday all of the sudden gets out of order. In addition, after the release of the enterprise search solution, the test automation framework will assist the operations team in monitoring that the solution behaves as expected even after the implementation team has left the building. All in all this leads to continuously good quality of the solution while lowering the costs for monitoring.

Kristian Norling

Enterprise Search and Findability Survey

March 7 - 2012 | Kristian Norling

A few days ago we launched the “Enterprise Search and Findability Survey“. The survey closes at the end of March.

If you complete the survey you will get the report when it  is finished.

Take me to the Survey!

The survey is for people who are responsible for search in their organisations. If you are a search manager, intranet manager, product owner of search, search editor, in-house developer for search, this survey is for you!

The survey aims to help you by finding out your views about Enterprise Search and Findability. The research will help show what business value an Enterprise Search solution can provide.

The survey is structured into five sections, each of which provides a specific perspective on Findability:
• Business
• Organisation
• User
• Information
• Search Technology

More information about the perspectives is provided in each section.

The survey will take approximately 20-30 minutes of your time. If you need a break, you can continue answering the survey at the same question where you left. If you give us your contact information we will send you the finished report based on this survey when it is finished, we are aiming to have it finished by the month of June.

The survey results will be presented at Enterprise Search Europe 2012 (London, 30-31 May 2012) and Enterprise Search Summit (New York, 15-16 May 2012).

Pawel Wroblewski

Search Stuffed up with GIS

February 3 - 2012 | Pawel Wroblewski

When I browsed through marketing brochures of GIS (Geographic Information System) vendors I noticed that the message is quite similar to search analytics. It refers in general to integration of various separate sources into analysis based on geo-visualizations. I have recently seen quite nice and powerful combination of search and GIS technologies and so I would like to describe it a little bit. Let us start from the basic things.

Search result visualization

It is quite obvious to use a map instead of simple list of results to visualize what was returned for an entered query. This technique is frequently used for plenty of online search applications especially in directory services like yellow pages or real estate web sites. The list of things that are required to do this is pretty short:

- geoloalization of items  – it means to assign accurate geo coordinates to location names, addresses, zip codes or whatever expected to be shown in the map; geo localization services are given more less for free by Google or Bing maps.

- backgroud map – this is necessity and also given by Google or Bing; there are also plenty of vendors for more specialized mapping applications

- returned results with geo-coordinates  as metadata – to put them in the map

Normally this kind of basic GIS visualisation delivers basic map operations like zooming, panning, different views and additionally some more data like traffic, parks, shops etc. Results are usually pins [Bing] or drops [Google].

Querying / filtering with the map

The step further of integration between search and GIS would be utilizing the map as a tool for definition of search query. One way is to create area of interest that could be drawn in the map as circle, rectangle or polygon. In simple way it could be just the current window view on the map as the area of query. In such an approach full text query is refined to include only results belonging to area defined.

Apart from map all other query refinement tools should be available as well, like date-time sliders or any kind of navigation and fielded queries.

Simple geo-spatial analysis

Sometimes it is important to sort query results by distance from a reference point in order to see all the nearest Chinese restaurant in the neighborhood.  I would also categorize as simple geo-spatial analysis grouping of search result into a GIS layers like e.g. density heatmap, hot spots using geographical and other information stored in results metadata etc.

Advanced geo-spatial analysis

More advance query definition and refinement would involve geo-spatial computations. Basing on real needs it could be possible for example to refine search results by an area of sight line from a picked reference point or select filtering areas like those inside specific borders of cities, districts, countries etc.

So the idea is to use relevant output from advanced GIS analysis as an input for query refinement. In this way all the power of GIS can be used to get to the unstructured data through a search process.

What kind of applications do you think could get advantage of search stuffed with really advanced GIS? Looking forward to your comments on this post.

Kristian Norling

Text Analytics in Enterprise Search

January 11 - 2012 | Kristian Norling

A presentation made by Daniel Ling at Apache Lucene Eurocon in Barcelona, october 2011.

We think this is the first of many forthcoming presentations.

We also want to get more involved in the community in the future. By doing presentations, sponsoring, contributing code. Hope to bring more news on this subject in the next few weeks. Enjoy the presentation:

Text Analytics in Enterprise Search, Daniel Ling, Findwise, Eurocon 2011 from Lucene Revolution on Vimeo.