<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>The Findability blog &#187; Content refinement</title>
	<atom:link href="http://blog.findwise.com/category/content-refinement/feed/" rel="self" type="application/rss+xml" />
	<link>http://blog.findwise.com</link>
	<description>The enterprise search and findability blog</description>
	<lastBuildDate>Wed, 09 May 2012 17:59:56 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.2</generator>
<xhtml:meta xmlns:xhtml="http://www.w3.org/1999/xhtml" name="robots" content="noindex" />
		<item>
		<title>Bryan, Brian, Briane, Bryne, or &#8230; what was his name again?</title>
		<link>http://blog.findwise.com/bryan-brian-briane-bryne-or-what-was-his-name-again/</link>
		<comments>http://blog.findwise.com/bryan-brian-briane-bryne-or-what-was-his-name-again/#comments</comments>
		<pubDate>Wed, 21 Mar 2012 13:06:35 +0000</pubDate>
		<dc:creator>Svetoslav Marinov</dc:creator>
				<category><![CDATA[Content refinement]]></category>
		<category><![CDATA[Development]]></category>
		<category><![CDATA[Enterprise Search]]></category>
		<category><![CDATA[Information quality]]></category>
		<category><![CDATA[Language support]]></category>
		<category><![CDATA[Search]]></category>
		<category><![CDATA[Text Analytics]]></category>
		<category><![CDATA[Callie]]></category>
		<category><![CDATA[Daitch-Mokotoff Soundex]]></category>
		<category><![CDATA[George Bernard Shaw]]></category>
		<category><![CDATA[Java]]></category>
		<category><![CDATA[Kelly]]></category>
		<category><![CDATA[Lawrence Philips]]></category>
		<category><![CDATA[Linguistics]]></category>
		<category><![CDATA[massage]]></category>
		<category><![CDATA[Metaphone]]></category>
		<category><![CDATA[Pattern matching]]></category>
		<category><![CDATA[phonetic algorithm]]></category>
		<category><![CDATA[phonetic algorithms]]></category>
		<category><![CDATA[phonetic search]]></category>
		<category><![CDATA[Soundex]]></category>
		<category><![CDATA[XML]]></category>

		<guid isPermaLink="false">http://blog.findwise.com/?p=2992</guid>
		<description><![CDATA[Let the spelling loose &#8230; What do Callie and Kelly have in common (except for the double &#8216;l&#8217; in the middle)? What about &#8220;no&#8221; and &#8220;know&#8221;, or &#8220;Ceasar&#8217;s&#8221; and &#8220;scissors&#8221; and what about &#8220;message&#8221; and &#8220;massage&#8221;? You definitely got it &#8211; Callie and Kelly, &#8220;no&#8221; and &#8220;know&#8221;, &#8220;Ceasar&#8217;s&#8221; and &#8220;scissors&#8221; sound alike, but are spelled quite differently. &#8220;message&#8221; and [...]]]></description>
			<content:encoded><![CDATA[<span itemprop="mainContentOfPage"><span itemprop="articleBody"><h2>Let the spelling loose &#8230;</h2>
<p>What do Callie and Kelly have in common (except for the double &#8216;l&#8217; in the middle)? What about &#8220;no&#8221; and &#8220;know&#8221;, or &#8220;Ceasar&#8217;s&#8221; and &#8220;scissors&#8221; and what about &#8220;message&#8221; and &#8220;massage&#8221;? You definitely got it &#8211; Callie and Kelly, &#8220;no&#8221; and &#8220;know&#8221;, &#8220;Ceasar&#8217;s&#8221; and &#8220;scissors&#8221; sound alike, but are spelled quite differently. &#8220;message&#8221; and &#8220;massage&#8221; on the other hand differ by only one vowel (&#8220;a&#8221; vs &#8220;e&#8221;) but their pronunciation is not at all the same.</p>
<p>It&#8217;s a well known fact for many languages that ortography does not determine the pronunciation of words. English is a classic example. George Bernard Shaw was the attributed author of &#8220;ghoti&#8221; as an alternative spelling of &#8220;fish&#8221;. And while phonology often reflects the current state of the development of the language, orthography may often lag centuries behind. And while English is notorious for that phenomenon it is not the only one. Swedish, French, Portuguese, among others, all have their ortography/pronunciation discrepancies.</p>
<h2>Phonetic Algorithms</h2>
<p>So how do we represent things that sound similar but are spelled different? It&#8217;s not trivial but for most cases it is not impossible either. Soundex is probably the first algorithm to tackle this problem. It is an example of the so called phonetic algorithms which attempt to solve the problem of giving the same encoding to strings which are pronounced in a similar fashion. Soundex was designed for English only but has its limits. DoubleMetaphone (DM) is one of the possible replacements and relatively successful. Designed by Lawrence Philips in the beginning of 1990s it not only deals with native English names but also takes proper care of foreign names so omnipresent in the language. And what is more &#8211; it can output two possible encodings for a given name, hence the &#8220;Double&#8221; in the naming of the algorithm, &#8211; an anglicised and a native (be that Slavic, Germanic, Greek, Spanish, etc.) version.</p>
<p>By relying on DM one can encode all the four names in the title of this post as &#8220;PRN&#8221;. The name George will get two encodings &#8211; JRJ and KRK, the second version reflecting a possible German pronunciation of the name. And a name with Polish origin, like Adamowicz, would also get two encodings &#8211; ATMTS and ATMFX, depending on whether you pronounce the &#8220;cz&#8221; as the English &#8220;ch&#8221; in &#8220;church&#8221; or &#8220;ts&#8221; in &#8220;hats&#8221;.</p>
<p>The original implementation by Lawrence Philips allowed a string to be encoded only with 4 characters. However, in most subsequent<br />
implementations of the algorithm this option is parameterized or just omitted.</p>
<p>Apache Commons Codec has an implementation of the DM among others (Soundex, Metaphone, RefinedSoundex, ColognePhonetic, Coverphone, to<br />
name just a few.) and here is a tiny example with it:</p>
<p><code> import org.apache.commons.codec.language.DoubleMetaphone;</code></p>
<p><code>public class DM {<br />
</code><br />
<code>public static void main(String[] args) {<br />
</code><br />
<code>String s = "Adamowicz";<br />
</code><br />
<code>DoubleMetaphone dm = new DoubleMetaphone();<br />
</code><br />
<code>// Default encoding length is 4!<br />
</code><br />
<code>// Let's make it 10<br />
</code><br />
<code>dm.setMaxCodeLen(10);<br />
</code><br />
<code>System.out.println("Alternative 1: " + dm.doubleMetaphone(s) +<br />
</code><br />
<code>// Remember, DM can output 2 possible encodings:<br />
</code><br />
<code>"\nAlternative 2: " + dm.doubleMetaphone(s, true));<br />
</code><br />
<code>}</code><br />
<code>}</code></p>
<p>The above code will print out:</p>
<p>Alternative 1: ATMTS</p>
<p>Alternative 2: ATMFX</p>
<p>It is also relatively straightforward to do phonetic search with Solr. You just need to ensure that you add the phonetic analysis to a field which contains names in your schema.xml:</p>
<h2>Enhancements</h2>
<p>While DM does perform quite well, at first sight, it has its limitations. We should know that it still originated from the English language and although it aims to tackle a variety of non-native borrowings most of the rules are English-centric. Suppose you work on any of the Scandinavian languages (Swedish, Danish, Norwegian, Icelandic) and one of the names you want to encode is &#8221;Örjan&#8221;. However, &#8220;Orjan&#8221; and &#8220;Örjan&#8221; get different encodings &#8211; ARJN vs RJN. Why is that? One look under the hood (the implementation in DoubleMetaphone.java) will give you the answer:</p>
<p><code> private static final String VOWELS = "AEIOUY";<br />
</code></p>
<p>So the Scandinavian vowels &#8220;ö&#8221;, &#8220;ä&#8221;, &#8220;å&#8221;, &#8220;ø&#8221; and &#8220;æ&#8221; are not present. If we just add these then compile and use the new version of the DM implementation we get the desired output &#8211; ARJN for both &#8220;Örjan&#8221; and &#8220;Orjan&#8221;.</p>
<p>Finally, if you don&#8217;t want to use DM or maybe it is really not suitable for your task, you still may use the same principles and create your own encoder by relying on regular expressions for example. Suppose you have a list of bogus product names which are just (mis)spelling variations of some well known names and you want to search for the original name but get back all ludicrous variants. Here is one albeit very naïve way to do it. Given the following names:</p>
<p>CupHoulder</p>
<p>CappHolder</p>
<p>KeepHolder</p>
<p>MacKleena</p>
<p>MackCliiner</p>
<p>MacqQleanAR</p>
<p>Ma&#8217;cKcle&#8217;an&#8217;ar</p>
<p>and with a bunch of regular expressions you can easily encode them as &#8221;cphldR&#8221; and &#8220;mclnR&#8221;.</p>
<p><code> String[] ar = new String[]{"CupHoulder", "CappHolder", "KeepHolder",<br />
"MacKleena", "MackCliiner", "MacqQleanAR", "Ma'cKcle'an'ar"};</code></p>
<p><code>for (String a : ar) {</code><br />
<code>a = a.toLowerCase();</code><br />
<code>a = a.replaceAll("[ae]r?$", "R");</code><br />
<code>a = a.replaceAll("[aeoiuy']", "");</code><br />
<code>a = a.replaceAll("pp+", "p");</code><br />
<code>a = a.replaceAll("q|k", "c");</code><br />
<code>a = a.replaceAll("cc+", "c");</code><br />
<code>System.out.println(a);</code><br />
<code>}</code></p>
<p>You can now easily find all the ludicrous spellings of &#8220;CupHolder&#8221; och &#8221;MacCleaner&#8221;.</p>
<p>I hope this blogpost gave you some ideas of how you can use phonetic algorithms and their principles in order to better discover names and entities that sound alike but are spelled unlike. At Findwise we have done a number of enhancements to DM in order to make it work better with Swedish, Danish and Norwegian.</p>
<h2>References</h2>
<p>You can learn more about Double Metaphone from the following article by the creator of the algorithm:<br />
<a href="http://drdobbs.com/cpp/184401251?pgno=2"> http://drdobbs.com/cpp/184401251?pgno=2</a></p>
<p>A German phonetic algorithm is the Kölner Phonetik:<br />
<a href="http://de.wikipedia.org/wiki/Kölner_Phonetik"> http://de.wikipedia.org/wiki/Kölner_Phonetik</a></p>
<p>And SfinxBis is a phonetic algorithm based on Soundex and is Swedish specific:<br />
<a href="http://www.swami.se/projekt/sfinxbis.68.html">http://www.swami.se/projekt/sfinxbis.68.html</a></p>
</span></span><div class="schema_property_wrap"></div><meta itemprop="url" content="http://blog.findwise.com/bryan-brian-briane-bryne-or-what-was-his-name-again/"><meta itemprop="discussionUrl" content="http://blog.findwise.com/bryan-brian-briane-bryne-or-what-was-his-name-again/"><meta itemprop="datePublished" content="2012-03-21T14:06:35+00:00"><meta itemprop="dateModified" content="2012-03-27T21:01:15+00:00"><meta itemprop="dateCreated" content="2012-03-21T13:53:24+00:00"><meta itemprop="keywords" content="Callie,Daitch-Mokotoff Soundex,George Bernard Shaw,Java,Kelly,Lawrence Philips,Linguistics,massage,Metaphone,Pattern matching,phonetic algorithm,phonetic algorithms,phonetic search,Soundex,XML"><meta itemprop="wordCount" content="969"><meta itemprop="blogPosts" content="http://blog.findwise.com">]]></content:encoded>
			<wfw:commentRss>http://blog.findwise.com/bryan-brian-briane-bryne-or-what-was-his-name-again/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Solr Processing Pipeline</title>
		<link>http://blog.findwise.com/solr-processing-pipeline/</link>
		<comments>http://blog.findwise.com/solr-processing-pipeline/#comments</comments>
		<pubDate>Mon, 19 Apr 2010 13:07:25 +0000</pubDate>
		<dc:creator>Max Charas</dc:creator>
				<category><![CDATA[Connector]]></category>
		<category><![CDATA[Content refinement]]></category>
		<category><![CDATA[Data Processing]]></category>
		<category><![CDATA[Open source]]></category>
		<category><![CDATA[Search]]></category>
		<category><![CDATA[Solr]]></category>
		<category><![CDATA[Technology]]></category>
		<category><![CDATA[Apache Commons Processing Pipeline]]></category>
		<category><![CDATA[API]]></category>
		<category><![CDATA[data processing layer]]></category>
		<category><![CDATA[Prague]]></category>
		<category><![CDATA[Solr REST protocol]]></category>

		<guid isPermaLink="false">http://findabilityblog.se/?p=1952</guid>
		<description><![CDATA[Hi again Internet, For once I have had time to do some thinking. Why is there no powerful data processing layer between the Lucene Connector Framework and Solr? I´ve been looking into the Apache Commons Processing Pipeline. It seems like a likely candidate to do some cool stuff.  Look at the diagram below. What I´m thinking [...]]]></description>
			<content:encoded><![CDATA[<span itemprop="mainContentOfPage"><span itemprop="articleBody"><p>Hi again Internet,</p>
<p>For once I have had time to do some thinking. Why is there no powerful data processing layer between the <a title="the Lucene Connector Framework" href="http://incubator.apache.org/connectors/" target="_blank">Lucene Connector Framework</a> and Solr? I´ve been looking into the <a title=" the Apache Commons Processing Pipeline" href="http://commons.apache.org/sandbox/pipeline/" target="_blank">Apache Commons Processing Pipeline</a>. It seems like a likely candidate to do some cool stuff.  Look at the diagram below.</p>
<div id="attachment_1953" class="wp-caption aligncenter" style="width: 310px"><a href="http://media.findabilityblog.se/2010/04/Drawing11.jpg"><img itemprop="image" class="size-medium wp-image-1953  " src="http://media.findabilityblog.se/2010/04/Drawing1-300x148.jpg" alt="" width="300" height="148" /></a><p class="wp-caption-text">A schematic drawing of a Solr Pipeline concept. (Click to enlarge)</p></div>
<p>What I´m thinking of is to make a transparent Solr pipeline that speaks the Solr REST protocol on each end. This means that you would be able to use SolrJ or any other API to communicate with the Pipeline.</p>
<p>Has anyone attempted this before?  If you’re interested in chatting about the pipeline drop me a mail or just grab me at Eurocon in Prague this year.</p>
</span></span><div class="schema_property_wrap"></div><meta itemprop="url" content="http://blog.findwise.com/solr-processing-pipeline/"><meta itemprop="discussionUrl" content="http://blog.findwise.com/solr-processing-pipeline/"><meta itemprop="datePublished" content="2010-04-19T14:07:25+00:00"><meta itemprop="dateModified" content="2010-04-19T14:07:25+00:00"><meta itemprop="dateCreated" content=""><meta itemprop="keywords" content="Apache Commons Processing Pipeline,API,data processing layer,Prague,Solr REST protocol"><meta itemprop="wordCount" content="133"><meta itemprop="blogPosts" content="http://blog.findwise.com">]]></content:encoded>
			<wfw:commentRss>http://blog.findwise.com/solr-processing-pipeline/feed/</wfw:commentRss>
		<slash:comments>11</slash:comments>
		</item>
		<item>
		<title>Findwise releases Open Pipeline plugins</title>
		<link>http://blog.findwise.com/findwise-releases-open-pipeline-plugins/</link>
		<comments>http://blog.findwise.com/findwise-releases-open-pipeline-plugins/#comments</comments>
		<pubDate>Fri, 09 Oct 2009 06:54:57 +0000</pubDate>
		<dc:creator>Karl Jansson</dc:creator>
				<category><![CDATA[Content refinement]]></category>
		<category><![CDATA[Future development]]></category>
		<category><![CDATA[Information quality]]></category>
		<category><![CDATA[Open source]]></category>
		<category><![CDATA[API]]></category>
		<category><![CDATA[developing document processors]]></category>
		<category><![CDATA[document processing]]></category>
		<category><![CDATA[document processors]]></category>
		<category><![CDATA[Enterprise Search]]></category>
		<category><![CDATA[IBM]]></category>
		<category><![CDATA[incomplete solutions]]></category>
		<category><![CDATA[index solutions]]></category>
		<category><![CDATA[job scheduler]]></category>
		<category><![CDATA[Open Pipeline]]></category>
		<category><![CDATA[open source software]]></category>
		<category><![CDATA[search engine]]></category>
		<category><![CDATA[service/product]]></category>

		<guid isPermaLink="false">http://www.findwise.se/?p=1141</guid>
		<description><![CDATA[Findwise is proud to announce that we now have released our first publicly available plugins to the Open Pipeline crawling and document processing framework. A list of all available plugins can be found on the Open Pipeline Plugins page and the ones Findwise have created can be downloaded on our Findwise Open Pipeline Plugins page. [...]]]></description>
			<content:encoded><![CDATA[<span itemprop="mainContentOfPage"><span itemprop="articleBody"><p>Findwise is proud to announce that we now have released our first publicly available plugins to the Open Pipeline crawling and document processing framework. A list of all available plugins can be found on the <a href="http://www.openpipeline.org/plugins/">Open Pipeline Plugins page</a> and the ones Findwise have created can be downloaded on our <a href="&lt;br &gt;&lt;/a&gt; http://www.findwise.se/findwise-open-pipeline">Findwise Open Pipeline Plugins page.</a></p>
<p><span id="more-1141"></span></p>
<p>OpenPipeline is an open source software for crawling, parsing, analyzing and routing documents. It ties together otherwise incomplete solutions for enterprise search and document processing. OpenPipeline provides a common architecture for connectors to data sources, file filters, text analyzers and modules to distribute documents across a network. It includes a job scheduler and a full UI with a point-and-click interface.</p>
<p>Findwise have been using this framework in a number of customer projects with great success. It ties particularly good together with Apache Solr, not only because it is open source but most importantly because it fills a hole in functionality that Solr lacks &#8211; an easy to use framework for developing document processors and connectors. However we are not using this for Solr only, a number of plugins for the Google Search Appliance have also been made and we have started investigating how Open Pipeline can be integrated with the IBM Omnifind search engine as well.</p>
<p>The best thing with this framework is that it is very flexible and customizable but still easy to use AND, maybe most importantly for me as a developer, easy to work with and develop against. It has a simple yet powerful enough API to handle all that you need. And because it is an open source framework any shortcomings and limitations that we find along the way can be investigated in detail and a better solution can be proposed to the Open Pipeline team for inclusion in future releases.</p>
<p>We have in fact already contributed to the development of the project in a great deal by using it, testing it and by reporting bugs and suggested improvements on their forums. And the response from the team has been very good &#8211; some of our suggested improvements have already been included and some are on the way in the new 0.8 version. We are also in the process of further deepening the collaboration by signing a contributors agreement so that we eventually can be able to contribute with code as well.</p>
<p>So how do our customers benefit from this?</p>
<p>First it makes us develop and deliver search and index solutions more quickly and of better quality to our customers. This is because more developers can work with the same framework as a base and the overall code base will be used more, tested more and is thus of better quality. We have also the possibility to reuse good and well tested components so that several customers together can share the costs of development and thus get a better service/product for less money which is always a good thing of course!</p>
</span></span><div class="schema_property_wrap"></div><meta itemprop="url" content="http://blog.findwise.com/findwise-releases-open-pipeline-plugins/"><meta itemprop="discussionUrl" content="http://blog.findwise.com/findwise-releases-open-pipeline-plugins/"><meta itemprop="datePublished" content="2009-10-09T08:54:57+00:00"><meta itemprop="dateModified" content="2009-10-09T08:54:57+00:00"><meta itemprop="dateCreated" content=""><meta itemprop="keywords" content="API,developing document processors,document processing,document processors,Enterprise Search,IBM,incomplete solutions,index solutions,job scheduler,Open Pipeline,open source software,search engine,service/product"><meta itemprop="wordCount" content="488"><meta itemprop="blogPosts" content="http://blog.findwise.com">]]></content:encoded>
			<wfw:commentRss>http://blog.findwise.com/findwise-releases-open-pipeline-plugins/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>What differentiates a good search engine from a bad one?</title>
		<link>http://blog.findwise.com/what-differentiates-a-good-search-engine-from-a-bad-one/</link>
		<comments>http://blog.findwise.com/what-differentiates-a-good-search-engine-from-a-bad-one/#comments</comments>
		<pubDate>Wed, 28 Nov 2007 10:43:07 +0000</pubDate>
		<dc:creator>Maria Johansson</dc:creator>
				<category><![CDATA[Content refinement]]></category>
		<category><![CDATA[Information quality]]></category>
		<category><![CDATA[Internet search]]></category>
		<category><![CDATA[Intranet]]></category>
		<category><![CDATA[Research]]></category>
		<category><![CDATA[Usability]]></category>
		<category><![CDATA[e-commerce sites]]></category>
		<category><![CDATA[e-commerce sites bad search]]></category>
		<category><![CDATA[i.e. search]]></category>
		<category><![CDATA[intranet search]]></category>
		<category><![CDATA[intranet search solutions]]></category>
		<category><![CDATA[Jared Spool]]></category>
		<category><![CDATA[search engine]]></category>
		<category><![CDATA[search logs]]></category>
		<category><![CDATA[search results]]></category>
		<category><![CDATA[search solution]]></category>
		<category><![CDATA[search vendors]]></category>
		<category><![CDATA[site search]]></category>

		<guid isPermaLink="false">http://www.findwise.se/?p=52</guid>
		<description><![CDATA[That was one of the questions the UIE research group asked themselves when conducting a study of on-site search. One of the things they discovered was that the choice of search engine was not as important as the implementation. Most of the big search vendors were found in both the top sites and the bottom [...]]]></description>
			<content:encoded><![CDATA[<span itemprop="mainContentOfPage"><span itemprop="articleBody"><p>That was one of the questions the <a href="http://www.uie.com">UIE</a> research group asked themselves when conducting a study of <a href="http://www.uie.com/brainsparks/2007/11/26/usability-tools-podcast-on-site-search/">on-site search</a>. One of the things they discovered was that the choice of search engine was not as important as the implementation. Most of the big search vendors were found in both the top sites and the bottom sites.</p>
<p>So even though the choice of vendor influences what functionality you can achieve and the control you have over your content there are other things that matter, maybe even more. Because the best search engine in the world will not work for you unless you configure it properly.</p>
<p><span id="more-52"></span>According to Jared Spool there are four kinds of search results:</p>
<ul>
<li> ‘Match relevant results’ &#8211;  returns the exact thing you were looking for.</li>
<li> ‘Zero results’ – no relevant results found.</li>
<li> ‘Related results’ &#8211;  i.e. search for a sweater and also get results for a cardigan. (If you know that a cardigan is a type of sweater you are satisfied. Otherwise you just get frustrated and wonder why you got a result for a cardigan when you searched for a sweater).</li>
<li> ‘Wacko results – the results seem to have nothing in common with your query.</li>
</ul>
<p>So what did the best sites do according to Jared Spool and his colleagues?<br />
They returned match relevant results, and they did not return 0 results for searches.</p>
<p>So how do you achieve that then? We have previously written about the importance of <a href="http://www.findwise.se/?cat=19#jump">content refinement</a> and <a href="http://www.findwise.se/?p=50#jump">information quality</a>. But what do you do when trying to achieve good search results with your search engine? And what if you do not have the time or knowledge to do a proper content tuning process?</p>
<p>Well, the search logs are a good way to start. Start looking at them to identify the 100 most common searches and the results they return. Are they match relevant results? It is also a good idea to look at the searches that return zero results and see if there is anything that can be done to improve those searches as well.</p>
<p>Jared Spool and his colleagues at UIE mostly talk about site search for e-commerce sites. For e-commerce sites bad search results mean loss of revenue while good search results hopefully give an increase in revenue (if other things such as check out do not fail). Working with intranet search the implications are a bit different.</p>
<p>With intranet search solutions the searches can be more complex when information not items, is what users are searching for. It might not be as easy to just add synonyms or group similar items to achieve better search results. I believe that in such a complex information universe, proper content tuning is the key to success. But looking at the search logs is a good way for you to start. And me and my colleagues here at Findwise can always help you how to get the most out of your search solution.</p>
</span></span><div class="schema_property_wrap"></div><meta itemprop="url" content="http://blog.findwise.com/what-differentiates-a-good-search-engine-from-a-bad-one/"><meta itemprop="discussionUrl" content="http://blog.findwise.com/what-differentiates-a-good-search-engine-from-a-bad-one/"><meta itemprop="datePublished" content="2007-11-28T12:43:07+00:00"><meta itemprop="dateModified" content="2007-11-28T12:43:07+00:00"><meta itemprop="dateCreated" content=""><meta itemprop="keywords" content="e-commerce sites,e-commerce sites bad search,i.e. search,intranet search,intranet search solutions,Jared Spool,Research,search engine,search logs,search results,search solution,search vendors,site search"><meta itemprop="wordCount" content="484"><meta itemprop="blogPosts" content="http://blog.findwise.com">]]></content:encoded>
			<wfw:commentRss>http://blog.findwise.com/what-differentiates-a-good-search-engine-from-a-bad-one/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Search as a tool for information quality assurance</title>
		<link>http://blog.findwise.com/information-quality-assurance-through-search/</link>
		<comments>http://blog.findwise.com/information-quality-assurance-through-search/#comments</comments>
		<pubDate>Thu, 25 Oct 2007 15:22:42 +0000</pubDate>
		<dc:creator>Daniel Johansson</dc:creator>
				<category><![CDATA[Company]]></category>
		<category><![CDATA[Content refinement]]></category>
		<category><![CDATA[Information quality]]></category>
		<category><![CDATA[enterprise search platforms]]></category>

		<guid isPermaLink="false">http://www.findwise.se/?p=50</guid>
		<description><![CDATA[Feedback from stakeholders in ongoing projects has highlighted the real need for a supporting tool to assist in the analysis of large amounts of content. This would introduce a phase where super users and information owners have the possibility to go through a quality assurance process across the information silos, before releasing information directly to [...]]]></description>
			<content:encoded><![CDATA[<span itemprop="mainContentOfPage"><span itemprop="articleBody"><p>Feedback from stakeholders in ongoing projects has highlighted the real need for a supporting tool to assist in the analysis of large amounts of content.<br />
This would introduce a phase where super users and information owners have the possibility to go through a quality assurance process across the information silos, before releasing information directly to end users.<br />
<span id="more-50"></span><br />
Using standard features contained within enterprise search platforms, great value can be delivered as well as time saved in extracting essential information. Furthermore, you have the possibility to detect key information objects that are hidden by a lack of a holistic view.</p>
<p>In this way adapted applications can easily be built on top to support process specific analysing demands e.g. through entity extraction (automatic detection and extraction of names, places, dates etc) and cross-referencing unstructured and structured sources. The time is here to gain control of your enterprise information and turn it into knowledge.</p>
</span></span><div class="schema_property_wrap"></div><meta itemprop="url" content="http://blog.findwise.com/information-quality-assurance-through-search/"><meta itemprop="discussionUrl" content="http://blog.findwise.com/information-quality-assurance-through-search/"><meta itemprop="datePublished" content="2007-10-25T17:22:42+00:00"><meta itemprop="dateModified" content="2007-10-25T17:22:42+00:00"><meta itemprop="dateCreated" content=""><meta itemprop="keywords" content="enterprise search platforms"><meta itemprop="wordCount" content="152"><meta itemprop="blogPosts" content="http://blog.findwise.com">]]></content:encoded>
			<wfw:commentRss>http://blog.findwise.com/information-quality-assurance-through-search/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Search-driven process to increase content quality</title>
		<link>http://blog.findwise.com/search-driven-process-to-increase-content-quality/</link>
		<comments>http://blog.findwise.com/search-driven-process-to-increase-content-quality/#comments</comments>
		<pubDate>Mon, 09 Jul 2007 07:44:05 +0000</pubDate>
		<dc:creator>Daniel Johansson</dc:creator>
				<category><![CDATA[Content refinement]]></category>
		<category><![CDATA[Future development]]></category>
		<category><![CDATA[Intranet]]></category>
		<category><![CDATA[enterprise search solution]]></category>
		<category><![CDATA[recent and ongoing search]]></category>
		<category><![CDATA[search engine]]></category>
		<category><![CDATA[search platform]]></category>
		<category><![CDATA[Search-driven process]]></category>
		<category><![CDATA[Tune search quality]]></category>

		<guid isPermaLink="false">http://www.findwise.se/?p=22</guid>
		<description><![CDATA[Experience from recent and ongoing search and retrieval projects have shown that enterprises have got a better and deeper insight in their content when deploying a new search platform. Not only in unstructured content repositories, but also in structured sources. As information is indexed and is visualized in a more user friendly way it doesn’t [...]]]></description>
			<content:encoded><![CDATA[<span itemprop="mainContentOfPage"><span itemprop="articleBody"><p>Experience from recent and ongoing search and retrieval projects have shown that enterprises have got a better and deeper insight in their content when deploying a new search platform. Not only in unstructured content repositories, but also in structured sources. As information is indexed and is visualized in a more user friendly way it doesn’t take much time before the people responsible find content issues that are brought out in the light. Content that e.g. is misplaced, tagged wrongly, documents with poorly defined security information etc. Issues that earlier were hidden due to lack of a holistic view of content. <span id="more-22"></span></p>
<p>It has been said that before enterprises should think of deploying an enterprise search solution one is recommended to get a completely clear picture of all it’s content; but maybe one should reformulate this and also think of an enterprise search solution as a supporting tool in the process when improving the content as well.<br />
Taking it a step further would be to allow write-backs from the search engine to content sources to enrich and improve quality and completeness of stored information.<br />
Tune search quality and content quality at the same time!</p>
</span></span><div class="schema_property_wrap"></div><meta itemprop="url" content="http://blog.findwise.com/search-driven-process-to-increase-content-quality/"><meta itemprop="discussionUrl" content="http://blog.findwise.com/search-driven-process-to-increase-content-quality/"><meta itemprop="datePublished" content="2007-07-09T09:44:05+00:00"><meta itemprop="dateModified" content="2007-07-09T09:44:05+00:00"><meta itemprop="dateCreated" content=""><meta itemprop="keywords" content="enterprise search solution,recent and ongoing search,search engine,search platform,Search-driven process,Tune search quality"><meta itemprop="wordCount" content="195"><meta itemprop="blogPosts" content="http://blog.findwise.com">]]></content:encoded>
			<wfw:commentRss>http://blog.findwise.com/search-driven-process-to-increase-content-quality/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

