Archive for the ‘Search Watch’ Category

Martin Johansson

Data and Search Going Big?

April 25 - 2012 | Martin Johansson

A few enterprise search specialists from Findwise recently attended the Scandinavian Developer Conference 2012. One of the tracks was Big Data, which is very much related to search. It had some interesting talks about how to handle large amounts of data in an efficient way. Special thanks to Theo Hultberg, Jim Webber and Tim Berglund!

The theme was that you should choose a storage system which is well suited for the task. This may seem like an obvious point, but for a long time this was simply ignored; I’m talking about the era of relational databases. Don’t get me wrong, sometimes a relational database is the very best for the job, but in many cases it isn’t.

Data is jagged by nature, i.e. not all objects have the same properties. This is why we shouldn’t force them to fit into a square table, instead everything should be denormalized! The application accessing the data will be aware of the information structure and will handle it accordingly. This will also avoid expensive assembly operations (such as joins) to get the data in the format we want when retrieving it. Why should you split up your data if you are going to assemble it over and over again? Also remember that disk space is cheap, pre-compute as much as possible. The design of a Big Data system should be governed by how the data will be retrieved.

Another step away from the relational databases is the relaxation of some of the ACID properties: Atomicity, Consistency, Isolation and Durability. Again, this is along the lines of choosing the components best suited for the system. Decide which properties are a must have and which are not so important.

Relaxing the ACID properties, such as consistency, can give great performance gains. The NoSQL database Cassandra is eventually consistent and its write performance scales linearly up to 288 nodes (and probably even higher) which gives a write performance of over 1 million writes per second!

However, relaxation of these properties is not a new concept in the world of search engines. When indexing a document, it will usually take a number of seconds before it is searchable. This is called eventual consistency, i.e. the state of the search engine will be brought from one valid state to another, within a sufficiently long period of time. Do we really need documents that were just submitted to the search engine to be
searchable instantly? Most likely, no. Isolation is another property that is not crucial to a search engine. Since a document in an index doesn’t have any explicit relations to any other documents in the same index, there isn’t a great need for isolation. If two writes for the same document are submitted at the same time, there is probably something wrong in another part of the system.

So what does all this mean for search? There is an interesting challenge in storing jagged data in large amounts and then making good use out of it. To search in vast amounts jagged data, you need a lot of querytime field mappings (to make relevant data searchable) … or do you? There is also the issue of retaining a good relevancy model, which is absolutely vital to a search engine. How do you measure the relevance of arbitrary metadata and then weigh it all together? Maybe we need to think in new ways about relevance all together?

Whomever can solve these problems in a good way with a minimum amount of manual labor, is a name we’ll be hearing from a lot in the future.

Description: Big Data, which is very much related to search  •  About: Big data and Search  •  Author:  •  Keywords: search, big data, enterprise search, conference  • 

Kristian Norling

Update on The Enterprise Search and Findability Survey

April 12 - 2012 | Kristian Norling

A quick update on the status of the Enterprise Search survey.

We now have well over a hundred respondents. The more respondents the better the data will be, so please help spreading the word. We’d love to have  several hundred more. The survey will now be open until the end of April.

But most important of all, if you haven’t already, have a cup of coffee and fill in the survey.

A Few Results from the Survey about Enterprise Search

More than 60% say that the amount of searchable content in their organizations today are less or far less than needed. And in three years time 85% say that the amount of searchable content in the organisation will increase och increase significantly.

75% say that it is critical to find the right information to support their organizations business goals and success. But the interesting to note is that over 70% of the respondents say that users don’t know where to find the right information or what to look for – and about 50% of the respondents say that it is not possible to search more than one source of information from a single search query.

In this context it is interesting that the primary goal for using search in organisations (where the answer is imperative or signifact) is to:

  • Improve re-use of information and/or knowledge) – 59%
  • Accelerate brokering of people and/or expertise – 55%
  • Increase collaboration – 60%
  • Raise awareness of “What We Know” – 57%
  • and finally to eliminate siloed repositories – 59%

In many organisations search is owned either by IT (60%) or Communication (27%), search has no specified budget (38%) and has less than 1 dedicated person working with search (48%).  More than 50% have a search strategy in place or are planning to have one in 2012/13.

These numbers I think are interesting, but definitely need to be segmented and analyzed further. That will of course be done in the report which is due to be ready in June.

Pawel Wroblewski

Search Stuffed up with GIS

February 3 - 2012 | Pawel Wroblewski

When I browsed through marketing brochures of GIS (Geographic Information System) vendors I noticed that the message is quite similar to search analytics. It refers in general to integration of various separate sources into analysis based on geo-visualizations. I have recently seen quite nice and powerful combination of search and GIS technologies and so I would like to describe it a little bit. Let us start from the basic things.

Search result visualization

It is quite obvious to use a map instead of simple list of results to visualize what was returned for an entered query. This technique is frequently used for plenty of online search applications especially in directory services like yellow pages or real estate web sites. The list of things that are required to do this is pretty short:

- geoloalization of items  – it means to assign accurate geo coordinates to location names, addresses, zip codes or whatever expected to be shown in the map; geo localization services are given more less for free by Google or Bing maps.

- backgroud map – this is necessity and also given by Google or Bing; there are also plenty of vendors for more specialized mapping applications

- returned results with geo-coordinates  as metadata – to put them in the map

Normally this kind of basic GIS visualisation delivers basic map operations like zooming, panning, different views and additionally some more data like traffic, parks, shops etc. Results are usually pins [Bing] or drops [Google].

Querying / filtering with the map

The step further of integration between search and GIS would be utilizing the map as a tool for definition of search query. One way is to create area of interest that could be drawn in the map as circle, rectangle or polygon. In simple way it could be just the current window view on the map as the area of query. In such an approach full text query is refined to include only results belonging to area defined.

Apart from map all other query refinement tools should be available as well, like date-time sliders or any kind of navigation and fielded queries.

Simple geo-spatial analysis

Sometimes it is important to sort query results by distance from a reference point in order to see all the nearest Chinese restaurant in the neighborhood.  I would also categorize as simple geo-spatial analysis grouping of search result into a GIS layers like e.g. density heatmap, hot spots using geographical and other information stored in results metadata etc.

Advanced geo-spatial analysis

More advance query definition and refinement would involve geo-spatial computations. Basing on real needs it could be possible for example to refine search results by an area of sight line from a picked reference point or select filtering areas like those inside specific borders of cities, districts, countries etc.

So the idea is to use relevant output from advanced GIS analysis as an input for query refinement. In this way all the power of GIS can be used to get to the unstructured data through a search process.

What kind of applications do you think could get advantage of search stuffed with really advanced GIS? Looking forward to your comments on this post.

Caroline Abrahamsson

Gartner and the magic quadrants – crowning the leaders of Enterprise Search

January 25 - 2011 | Caroline Abrahamsson

For years Gartner, the research and advisory company, has been publishing their magic quadrants – and their verdict of everything from ECM-systems to Data Warehouse and E-commerce plays a big role in many company’s decision to choose the right tools.
Simply put, the vendors are presented in a matrix measuring the different players by ability to execute (product, overall viability, customer experience etc.) and the completeness of their vision (offering strategy, innovation etc.). The vendors are then positioned as niche players (a rather crowded spot), visionaries, challengers and leaders.

At the end of last year Gartner decided to retire their old “Information Access Quadrant” and introduce “Enterprise Search MarketScope” due to a more mature market. A number of vendors (such as Vivisimo and Recommind) were removed, in order to exclude those whose businesses were not entirely search driven.

The evaluation criteria’s for MarketScope cover: offering (product) strategy, Innovation, Overall viability (business unit, financial, strategy, and organization), Customer experience, Market understanding and business model.
To summarize: the criteria’s are to a large extent the same, but the two areas “overall viability” and “customer experience” are weighted higher than the rest. This is most likely a result of the last years discussion around user friendly interfaces, easier administration and the fact that some customers have suffered quite bad when vendors do not survive (one example in Northen Europe is the Danish vendor that went bankrupted for some time)

The yearly fight between the three leaders; Microsoft, Endeca and Autonomy has been somewhat disrupted and Microsoft, Endeca and Google are now seen as the leaders.
Microsoft has got a very broad product line, which stretches from low-price and less functionality to Enterprise Search built on the former FAST technology. Endeca follow the same trend, as Gartner puts it their “products (are) intended to serve organizations seeking to develop general search installations..(..) broadly applicable for a variety of different search challenges”.
In the old quadrant, Google remained a “challenger” for quite some time – but never made it to the “leaders” corner. Ease of administration and “user friendly” are two words that keeps being repeated. That, in combination with a profit of $ 7290000000 during the last quarter of 2010 makes Google a player that easily can continue to develop their Enterprise business.

Gartner's MarketScope for Enterprise Search

 

Autonomy should still not be disregarded, the main reason for it falling a bit behind the three others seem to be conquerable problems with support and pricing transparency. It will be interesting to see how Autonomy chooses to handle these issues during 2011.

To put it short: the new MarketScope is good reading with quite few surprises. If you wish to get a better understanding of the development going on at the different vendors, start with Gartner and continue to search among our blog posts.

Caroline Abrahamsson

Findability blog: Wrapping up 2010

December 23 - 2010 | Caroline Abrahamsson

Christmas is finally here and at Findwise we are taking a few days off to spend time with family and friends.

During 2010 we’ve delivered more than 25 successful projects, arranged breakfast seminars to talk about customer solutions (based on Microsoft, IBM, Autonomy and Open source), meet-ups in a number of cities as well as networking meetings for profound Findability discussions and moving in parties for our new offices.

At our Findability blog we have been discussing technology and vendor solutions (Microsoft and FAST, Autonomy, IBM, Google and open source), reasearchconferences, customized solutions and how to find a balance between technology and people.

Some of our posts have resulted in discussions, both on our own blog and in other forums. Please get involved in some of the previous ongoing discussions on “Solr Processing Pipeline”,  “Search and Business Intelligence” or “If a piece of content is never read, does it exist?”  if you have thoughts to share.

Findability blog is taking a break and we will be back with new posts is January.

If you have some spare time during the vacation some of customers run their own blogs, and good reading tips within Findability are the blogs driven by Kristian Norling (VGR) and Alexandra Larsson (Swedish armed forces).

Merry Christmas and a Happy New Year to you all!

Caroline Abrahamsson

Google instant – can a search engine predict what we want?

September 26 - 2010 | Caroline Abrahamsson

On September 8th Google released their new search experience: Google instant.
If you haven’t seen it yet, there is an introduction on Youtube that is worth spending 1:41 minutes on.

Simply put, Google instant is a new way of displaying results and helping users find information faster. As you type, results will be presented in the background. In most cases it is enough to write two or three characters and the results you expect are already right in front of you.

Google instant

Google instant in action

The Swedish site Prisjakt has been using this for years, helping the users to get a better precision in their searches.
At Google you have previously been guided by “query suggestion” i.e. you got suggestions of what others have searched for before – a function also used by other search engines such as Bing (called Type Ahead).
Google instant is taking it one step further.

When looking at what the blog community has to say about the new feature it seems to split the users in two groups; you either hate it or love it.

So, what are the consequences?
From an end-user perspective we will most likely stop typing if something interesting appears that draws our attention. The result?
The search results shown at the very top will generate more traffic , it will be more personalized over time and we will most probably be better at phrasing our queries better.

From an advertising perspective, this will most likely affect the way people work with search engine optimization. Some experts, like Steve Rubel, claims Google instant will make SEO irrelevant, wheas others, like Matt Cutts think it will change people behavior in a positive way over time  and explains why.

What Google is doing is something that they constantly do: change the way we consume information. So what is the next step?

CNN summarizes what the Eric Schmidt, the CEO of Google says:
“The next step of search is doing this automatically. When I walk down the street, I want my smartphone to be doing searches constantly: ‘Did you know … ?’ ‘Did you know … ?’ ‘Did you know … ?’ ‘Did you know … ?’ ” Schmidt said at the IFA consumer electronics event in Berlin, Germany, this week.

“This notion of autonomous search — to tell me things I didn’t know but am probably interested in — is the next great stage, in my view, of search.”

Do you agree? Can we predict what the users want from search? Is this the sort of functionality that we want to use on the web and behind the firewall?

Lina Westerling

Structured and actionable results – there is more to results presentation than blue links

June 22 - 2010 | Lina Westerling

Search patterns are standardized patterns describing search functionality as well as human information seeking behavior. Earlier this year Peter Morville and Jeffery Callender released a book about search patterns.  Morville also gave a presentation based on the book at the IA Summit 2010 (slides, mp3), which my colleague Maria and I attended. Among the patterns Peter Morville mentions my favorite ones are structured and actionable search results.

Structured results
Let us start with structured results. You might have seen that for certain queries you submit on Google, you get a richer results presentation than for other results. For example, typing the query ‘weather stockholm’ gives a basic weather forecast for the upcoming four days, directly visible in the results list. Other examples include local movie showtimes and stock information. It is even possible to use google as a calculator or a currency converter by typing in certain kinds of searches. For the curious, here is a list of all google.com search features. Structured results is about offering a more informative presentation of search results than just a title, summary, and possibly some basic metadata. It is also about not presenting all information in the same way, because the information in itself differs. Richer results presentations speeds up the process of finding relevant information since the system has already done some pre-processing for user.

Google structured results

Examples of structured results from Google. Image from http://www.flickr.com/photos/morville/4274340130/sizes/l/in/set-72157623210542674/#cc_license.

Structured metadata is a prerequisite for structured results presentation. Web pages and documents normally come with standard metadata such as date and author, but in some cases they will have to be augmented with additional information in order to create a more useful presentation. Presenting results in a custom way requires some extra development effort, especially if the structure is not initially available. However, I believe it creates much value to the user. Also, this need not be done for all types of contents. My advice would be to identify the cases where a more elaborate results presentation would be most usable. Which information is frequently requested by many people and perhaps also difficult to find because it is embedded in pages with lots of text or other contents? Search logs and user feedback in combination with thorough knowledge about the contents provides a key basis for the selection.

Actionable results
Related to structured results are actionable results. Entries in the search results list can be more than just displays of information; they can also be means of performing tasks. Common examples found on the web include printing, saving or sharing the search result directly from the results list. Other examples include adding to shopping cart, commenting and rating. Within the enterprise or organization additional relevant actions could perhaps be checking in or out a document, add an event to the personal calendar, starting a chat with a co-worker, and so on. As with structured results, it is about identifying the cases where it would add most value. What are the most common tasks and possibly also what tasks are complicated to perform in the source system? Structured and actionable results share the advantage that users do not have to open the actual results web page or download the document to find or do what they need. Speeding up information seeking and other tasks in this way is not only valuable in web search, it can also be very useful within the enterprise or organization. Search results lists in enterprise search solutions still look quite homogeneous and there are lots of opportunities for improvement.

To conclude, there can and should be more to search results presentation than just a snippet. I believe we will benefit from putting focus on the results presentation, and not only on tools surrounding it (filtering for example). After all, the list of results is where the user’s attention is first drawn. What do you think? How can your organization benefit from working with structured and actionable search results? If you are curious about this approach, we would be happy to help you look into what can be done in your organization.

Caroline Abrahamsson

Search in SharePoint 2010

May 15 - 2010 | Caroline Abrahamsson

This week there has been a lot of buzz about Microsoft’s launch of SharePoint 2010 and Office 2010. Since SharePoint 2007 has been the quickest growing server product in the history of Microsoft, the expectations on SharePoint 2010 is tremendous.

Apart from a great deal of possibilities when it comes to content creation, collaboration and networking, easy business intelligence etc.  the launch also holds another promise: that of even better search capabilities (with the integration of FAST).

Since Microsoft acquired FAST in 2008, there have been a lot of speculations about what the future SharePoint versions may include in terms of search. And since Microsoft announced that they will drop their Linux and UNIX versions in order to focus on higher innovation speed, Microsoft customer are expecting something more than the regular. In an early phase it was also clear that Microsoft is eager to take market shares from the growing market in internet business.

So, simply put, the solutions that Microsoft now provide in terms of search is solutions for Business productivity (where the truly sophisticated search capabilities are available if you have Enterprise CAL-licenses, i.e. you pay for the number of users you have) and Internet Sites (where the pricing is based on the number of servers). These can then be used in a number of scenarios, all dependent on the business and end-user needs.
Microsoft has chosen to describe it like this:

  • Foundation” is, briefly put, basic SharePoint search (Site Search).
  • Standard” adds collaboration features to the “Foundation” edition and allows it to tie into repositories outside of SharePoint.
  • Enterprise ” adds a number of capabilities, previously only available through FAST licenses, such as contextual search (recognition of departments, names, geographies etc), ability to tag meta data to unstructured content, more scalability etc.

I’m not going to go into detail, rather just conclude that the more Microsoft technology the company or organization already use, the more benefits it will gain from investing in SharePoint search capabilities.

And just to be clear:  non-SharePoint versions (stand-alone) of FAST are still available, even though they are not promoted as intense as the SharePoint ones.

Apart from Microsoft’s overview above, Microsoft Technet provides a more deepdrawing description of the features and functionality from both an end-user and administrator point of view.

We look forward describing the features and functions in more detail in our upcoming customer cases. If you have any questions to our SharePoint or FAST search specialist, don’t hesitate to post them here on the blog. We’ll make sure you get all the answers.

Caroline Abrahamsson

FAST goes Microsoft for real– drops Linux and UNIX versions

February 8 - 2010 | Caroline Abrahamsson

‘Innovation is at the heart of our enterprise search strategy, and a commitment to innovation is what brought FAST and Microsoft together’ says Bjørn Olstad, Microsoft Distinguished Engineer, in his blog post published this Thursday. And further more ‘As a part of that planning process, we have decided that in order to deliver more innovation per release in the future, the 2010 products will be the last to include a search core that runs on Linux and UNIX’.

(more…)

Caroline Abrahamsson

Roadmap FAST Search: for SharePoint and Internet Business

February 13 - 2009 | Caroline Abrahamsson

In view of the fact that it has been a year since Microsoft acquired FAST, there has been a lot of hush-hush about the Enterprise search roadmap. However, at the yearly FAST forward conference, Microsoft’s press release Microsoft Unveils New Enterprise Search Road Map reached the public.

There are no big surprises, but a lot of interesting details to come.
Briefly speaking Microsoft is focusing on two areas: search to enhance business productivity and search to earn money online.

Here at Findwise we have been working with customers integrating SharePoint and FAST ESP for some time, and ESP certainly adds a lot of value by extending SharePoint’s main strengths: content management and collaboration. Office 14, which will probably see the light early next year, will hopefully add more flexibility to their infrastructure solutions out of the box.
More information about the licensing models are yet to come and even though FAST will continue to develop ESP as a standalone (to run on both Unix and Linux), the roadmap ties existing and potential Microsoft customers closer by presenting search as an integrated part of their business productivity offering.

As for FAST Search Internet Business Microsoft’s target group are companies looking for earning money online. During FAST forward 2007 there was a lot of talk about the future search driven portals and during the 2008 event about the ability to understanding user intent.
Today online consumers have higher expectations when it comes to search and the ability to show related information (such as Amazon’s “people that bought this product also bought”..) as well as showing contextual advertising (related to search terms, geographical location etc) and recommendations will create loyal customers. FAST has quite a few customers using search for strategic online business so one should keep an eye on the release of the new beta version during 2009.

If you read Swedish, Helge Legernes, one of the founders of Findwise is giving his comment in Computer Sweden.