<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>The Findability blog &#187; Open source</title>
	<atom:link href="http://blog.findwise.com/category/open-source/feed/" rel="self" type="application/rss+xml" />
	<link>http://blog.findwise.com</link>
	<description>The enterprise search and findability blog</description>
	<lastBuildDate>Wed, 09 May 2012 17:59:56 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.2</generator>
<xhtml:meta xmlns:xhtml="http://www.w3.org/1999/xhtml" name="robots" content="noindex" />
		<item>
		<title>Open source tools for text analytics</title>
		<link>http://blog.findwise.com/open-source-tools-for-text-analytics/</link>
		<comments>http://blog.findwise.com/open-source-tools-for-text-analytics/#comments</comments>
		<pubDate>Mon, 21 Mar 2011 09:24:58 +0000</pubDate>
		<dc:creator>Daniel Ling</dc:creator>
				<category><![CDATA[Data Processing]]></category>
		<category><![CDATA[Open Pipeline]]></category>
		<category><![CDATA[Open source]]></category>
		<category><![CDATA[Search]]></category>
		<category><![CDATA[central algorithms]]></category>
		<category><![CDATA[charge solutions]]></category>
		<category><![CDATA[document processing]]></category>
		<category><![CDATA[enterprise search architecture]]></category>
		<category><![CDATA[Findwise]]></category>
		<category><![CDATA[IBM]]></category>
		<category><![CDATA[machine learning]]></category>
		<category><![CDATA[Microsoft]]></category>
		<category><![CDATA[Open source tools]]></category>
		<category><![CDATA[search implementations]]></category>

		<guid isPermaLink="false">http://findabilityblog.se/?p=2482</guid>
		<description><![CDATA[Recently, both clients of Findwise as well as the Enterprise Search community in general are increasingly showing interest in text analytics in order to get a higher business value out of their (often large) volumes of unstructured information. Text Analytics merges techniques from linguistics, computer science, machine learning, statistics and many of the central algorithms [...]]]></description>
			<content:encoded><![CDATA[<span itemprop="mainContentOfPage"><span itemprop="articleBody"><p>Recently, both clients of Findwise as well as the Enterprise Search community in general are increasingly showing interest in <a title="text analytics" href="http://en.wikipedia.org/wiki/Text_analytics" target="_blank">text analytics</a> in order to get a higher business value out of their (often large) volumes of unstructured information.</p>
<p>Text Analytics merges techniques from linguistics, computer science, machine learning, statistics and many of the central algorithms in this field are publically available as open source tools and packages with easily accessible APIs. While many customers of commercial Enterprise Search solutions, such as Automomy, IBM Omnifind, Microsoft FAST ESP, etc., have long benefitted from some sort of Text Analytics (e.g. Entity Extraction, Keyword Extraction and document summarization), the open source components have now come a long way in providing alternative, free of charge solutions with similar performance and feature set.<br />
As every modern enterprise search architecture today has some kind of document processing that is extensible by additional stages or APIs (for example the Open Pipeline with Solr or the pipeline that comes with Microsoft FAST) &#8211; the opportunity for plugging new text analytics stages to existing search implementations is open and ready for new innovation.</p>
<p>Among the most popular applications of text analytics that have emerged lately are customized entity extraction, sentiment analysis and document classification &#8211; each with a set of open source alternatives (such as <a title="Balie" href="http://balie.sourceforge.net/" target="_blank">Balie</a>, <a title="OpenNLP" href="http://incubator.apache.org/opennlp/" target="_blank">OpenNLP</a> and <a title="GATE" href="http://gate.ac.uk/" target="_blank">GATE</a>) readily available for customization and implementation to your document processing.</p>
<p>Regardless of your industry domain, these techniques open up for a wide variety of new ways to interpret the content and discover new trends from your unstructured textual data &#8211; be it through sentiment analysis to support the decision making process, trend analysis or relevance model of search, or entity extraction in order to navigate your content by entities (such as company name or person), the enhancement of your texts by meta-data tagging or finding similar and related content.</p>
<p>How are you taking advantage of modern text analytics?</p>
</span></span><div class="schema_property_wrap"></div><meta itemprop="url" content="http://blog.findwise.com/open-source-tools-for-text-analytics/"><meta itemprop="discussionUrl" content="http://blog.findwise.com/open-source-tools-for-text-analytics/"><meta itemprop="datePublished" content="2011-03-21T10:24:58+00:00"><meta itemprop="dateModified" content="2011-03-21T10:24:58+00:00"><meta itemprop="dateCreated" content=""><meta itemprop="keywords" content="central algorithms,charge solutions,document processing,enterprise search architecture,Findwise,IBM,machine learning,Microsoft,Open Pipeline,Open source tools,search implementations"><meta itemprop="wordCount" content="316"><meta itemprop="blogPosts" content="http://blog.findwise.com">]]></content:encoded>
			<wfw:commentRss>http://blog.findwise.com/open-source-tools-for-text-analytics/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Apache Nutch making use of Open Pipeline</title>
		<link>http://blog.findwise.com/apache-nutch-making-use-of-open-pipeline/</link>
		<comments>http://blog.findwise.com/apache-nutch-making-use-of-open-pipeline/#comments</comments>
		<pubDate>Thu, 11 Nov 2010 16:14:33 +0000</pubDate>
		<dc:creator>Anders Rask</dc:creator>
				<category><![CDATA[Data Processing]]></category>
		<category><![CDATA[Findability]]></category>
		<category><![CDATA[Lucene]]></category>
		<category><![CDATA[Open source]]></category>
		<category><![CDATA[Solr]]></category>
		<category><![CDATA[Open Pipeline]]></category>
		<category><![CDATA[search application]]></category>
		<category><![CDATA[university web site]]></category>
		<category><![CDATA[Uppsala University]]></category>
		<category><![CDATA[web crawler]]></category>
		<category><![CDATA[web information]]></category>

		<guid isPermaLink="false">http://findabilityblog.se/?p=2361</guid>
		<description><![CDATA[During the last couple of months I’ve been working on a project for Uppsala University. The project’s goal is to improve the findability on the university web site. The solution that we are working on is based on Apache Nutch 1.1 in conjunction with Apache Solr 1.4. Nutch provides us with a robust web crawler [...]]]></description>
			<content:encoded><![CDATA[<span itemprop="mainContentOfPage"><span itemprop="articleBody"><p>During the last couple of months I’ve been working on a project for <a href="http://www.uu.se/en/">Uppsala University</a>. The project’s goal is to improve the findability on the university web site. The solution that we are working on is based on <a href="http://nutch.apache.org/">Apache Nutch 1.1</a> in conjunction with <a href="http://lucene.apache.org/solr/">Apache Solr 1.4</a>. Nutch provides us with a robust web crawler that scales very well and also gives us a page rank for each page that we can use for relevance tuning. Besides the web information crawled by Nutch, the search application will also be used to search people and organizational information that we index from another source. I thought that I would share some details on how we are using Nutch.</p>
<p>We have made two extensions to Nutch, one is a parser plug-in that can run <a href="http://www.openpipeline.org/">Open Pipeline</a> embedded in it. This was an important extension in order to get better control of the information that we index to Solr and also to be able to reuse our different Open Pipeline components. The main stages of the pipeline are the following:</p>
<ol>
<li>Extract the encoding of a web page</li>
<li>Extract all links from a web page</li>
<li>Extract all headings (hx) from a web page</li>
<li>Remove all tags that don’t contain complete sentences on a web page</li>
<li>Extract text and metadata from different types of documents with <a href="http://tika.apache.org/">Tika</a></li>
<li>Do some metadata mapping and cleaning</li>
<li>Populate facets according to metadata and/or URL</li>
<li>Do static URL ranking</li>
<li>Replace certain common titles with the largest heading of the web page</li>
</ol>
<p>The other extension we made to Nutch is an indexing filter that makes sure all our metadata fields are indexed to Solr.</p>
<p>So far so good. The fetching, parsing and indexing works well now and currently our largest challenge is tuning all the different relevance parameters we have, as well as harmonizing the relevance of web information to that of people and organizational information. I will have to get back to you on how that went!</p>
</span></span><div class="schema_property_wrap"></div><meta itemprop="url" content="http://blog.findwise.com/apache-nutch-making-use-of-open-pipeline/"><meta itemprop="discussionUrl" content="http://blog.findwise.com/apache-nutch-making-use-of-open-pipeline/"><meta itemprop="datePublished" content="2010-11-11T17:14:33+00:00"><meta itemprop="dateModified" content="2010-11-11T17:14:33+00:00"><meta itemprop="dateCreated" content=""><meta itemprop="keywords" content="Open Pipeline,search application,university web site,Uppsala University,web crawler,web information"><meta itemprop="wordCount" content="328"><meta itemprop="blogPosts" content="http://blog.findwise.com">]]></content:encoded>
			<wfw:commentRss>http://blog.findwise.com/apache-nutch-making-use-of-open-pipeline/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Structure First or Structure Last?</title>
		<link>http://blog.findwise.com/structure-first-or-structure-last/</link>
		<comments>http://blog.findwise.com/structure-first-or-structure-last/#comments</comments>
		<pubDate>Sun, 17 Oct 2010 22:03:07 +0000</pubDate>
		<dc:creator>Max Charas</dc:creator>
				<category><![CDATA[Data Processing]]></category>
		<category><![CDATA[Development]]></category>
		<category><![CDATA[Lucene]]></category>
		<category><![CDATA[Open source]]></category>
		<category><![CDATA[Search]]></category>
		<category><![CDATA[Solr]]></category>
		<category><![CDATA[Technology]]></category>
		<category><![CDATA[search consultant]]></category>

		<guid isPermaLink="false">http://findabilityblog.se/?p=2340</guid>
		<description><![CDATA[I’d like to share two different development techniques I commonly use when setting up a Apache Solr project. To explain it I’ll start by introducing the way I used to work. (The wrong way ) The Structure First Technique Since I work as a search consultant I come across a lot of different data sources.  [...]]]></description>
			<content:encoded><![CDATA[<span itemprop="mainContentOfPage"><span itemprop="articleBody"><p>I’d like to share two different development techniques I commonly use when setting up a Apache Solr project. To explain it I’ll start by introducing the way I used to work. (The wrong way <img itemprop="image" src='http://blog.findwise.com/wp-includes/images/smilies/icon_wink.gif' alt=';)' class='wp-smiley' />  )</p>
<h3>The Structure First Technique</h3>
<p>Since I work as a search consultant I come across a lot of different data sources.  All of these data sources have at least some structure, some more than others.</p>
<p>My objective as a backend developer was then to first of all figure out how the data source was structured and then design a Solr schema that fit the requirements, both technical and business.</p>
<p>The problem with this was of course that the requirements were quite fuzzy until I actually figured out how the data was structured and even more importantly what the data quality was.</p>
<p>In many cases I would spend a lot of time on extracting a date from the source, converting that to an ISO 8601 date format (Supported by Solr), updating the schema with that field and then finally reindexing. Only to learn that the date was either not required or had too poor data quality to be used.</p>
<p>My point being that I spent a lot of time designing a schema (and connector) for a source which I, and most others, knew almost nothing about.</p>
<h3>The Structure Last Technique</h3>
<p>Ok so what’s the supposed “right way” of doing this?</p>
<p>In Solr there is a concept called dynamic fields. It allows you to map fields that fulfil a certain name criteria to a specific type. In the example Solr schema you can find the following section:</p>
<p><em> &lt;!&#8211; uncomment the following to ignore any fields that don&#8217;t already match an existing </em></p>
<p><em> field name or dynamic field, rather than reporting them as an error. </em></p>
<p><em> alternately, change the type=&#8221;ignored&#8221; to some other type e.g. &#8220;text&#8221; if you want </em></p>
<p><em> unknown fields indexed and/or stored by default &#8211;&gt; </em></p>
<p><em> &lt;!&#8211;dynamicField type=&#8221;ignored&#8221; multiValued=&#8221;true&#8221; /&#8211;&gt;</em></p>
<p>The section above will drop any fields that are not explicitly declared in the schema. But what I usually do to start with is to do the complete opposite. I map all fields to a string type.</p>
<p><em> &lt;dynamicField multiValued=&#8221;true&#8221; indexed=&#8221;true&#8221; stored=&#8221;true&#8221;/&gt; </em></p>
<p>I start with a minimalist schema that only has an id field and the above stated dynamic field.</p>
<p>With this schema it doesn’t matter what I do, everything is mapped to a string field, exactly as it is entered.</p>
<p>This allows me to focus on getting the data into Solr without caring about what to name the fields, what properties they should have and most importantly to even having to declare them at all.</p>
<p>Instead I can focus on getting the data out of the source system and then into Solr. When that’s done I can use Solr´s schema browser to see what fields are high quality, contain a lot of text or are suited to be used as facets and use this information to help out in the requirements process.</p>
<p>The Structure Last Technique lets you be more pragmatic about your requirements.</p>
</span></span><div class="schema_property_wrap"></div><meta itemprop="url" content="http://blog.findwise.com/structure-first-or-structure-last/"><meta itemprop="discussionUrl" content="http://blog.findwise.com/structure-first-or-structure-last/"><meta itemprop="datePublished" content="2010-10-17T23:03:07+00:00"><meta itemprop="dateModified" content="2010-10-17T23:03:07+00:00"><meta itemprop="dateCreated" content=""><meta itemprop="keywords" content="search consultant"><meta itemprop="wordCount" content="520"><meta itemprop="blogPosts" content="http://blog.findwise.com">]]></content:encoded>
			<wfw:commentRss>http://blog.findwise.com/structure-first-or-structure-last/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Information flow in VGR</title>
		<link>http://blog.findwise.com/information-flow-in-vgr/</link>
		<comments>http://blog.findwise.com/information-flow-in-vgr/#comments</comments>
		<pubDate>Sun, 17 Oct 2010 20:47:04 +0000</pubDate>
		<dc:creator>Caroline Abrahamsson</dc:creator>
				<category><![CDATA[Information Architecture]]></category>
		<category><![CDATA[Information management]]></category>
		<category><![CDATA[Intranet]]></category>
		<category><![CDATA[Open source]]></category>
		<category><![CDATA[Search]]></category>
		<category><![CDATA[author]]></category>
		<category><![CDATA[content management system]]></category>
		<category><![CDATA[HTML]]></category>
		<category><![CDATA[Kristian Norling]]></category>
		<category><![CDATA[search alert]]></category>
		<category><![CDATA[search engine]]></category>
		<category><![CDATA[similar solutions]]></category>
		<category><![CDATA[Västra Götaland Regional Council]]></category>
		<category><![CDATA[VGR]]></category>

		<guid isPermaLink="false">http://findabilityblog.se/?p=2327</guid>
		<description><![CDATA[The previous week Kristian Norling from VGR (Västra Götaland Regional Council) posted a really interesting and important blog post about information flow. Those of you who doesn’t know what VGR has been up to previously, here is a short background. For a number of years the organization has been working to give reality to a [...]]]></description>
			<content:encoded><![CDATA[<span itemprop="mainContentOfPage"><span itemprop="articleBody"><p>The previous week <a title="Kristian Norling, VGR" href="http://se.linkedin.com/in/kristiannorling" target="_blank">Kristian Norling</a> from VGR (<a title="Västra Götalandsregionen Wikipedia" href="http://en.wikipedia.org/wiki/V%C3%A4stra_G%C3%B6taland_Regional_Council" target="_blank">Västra Götaland Regional Council</a>) posted a really interesting and important <a title="Blog post about information flow" href="http://sys64738.se/" target="_blank">blog post</a> about information flow.<br />
Those of you who doesn’t know what VGR has been up to previously, <a title="VGR search" href="http://findabilityblog.se/how-to-create-better-search-vgr-leads-the-way" target="_blank">here</a> is a short background.</p>
<p>For a number of years the organization has been working to give reality to a model for how information is created, managed, stored and distributed. And perhaps the most important part – integrated.</p>
<div id="attachment_2328" class="wp-caption aligncenter" style="width: 310px"><a rel="attachment wp-att-2328" href="http://findabilityblog.se/information-flow-in-vgr/informationflow/"><img itemprop="image" class="size-medium wp-image-2328" title="Informationflow " src="http://media.findabilityblog.se//2010/10/informationflow1-300x195.jpg" alt="" width="300" height="195" /></a><p class="wp-caption-text">Information flow in VGR</p></div>
<p>So, why is this important?<br />
In order to give your users access to <em>the right</em> information it is essential to get control of the whole information flow i.e. from the time it is created until it reaches the end user. If we lack knowledge about this, it is almost impossible to ensure quality and accuracy.</p>
<p>The fact that we have control also gives us endless possibilities when it comes to distributing the right information at the right time (an old cliché that is finally becoming reality). To sum up: that is what search is all about!</p>
<p>When information is being created VGR uses a <a title="Metadata service used by VGR (in Swedish)" href="http://code.google.com/p/oppna-program-metadata-service/" target="_blank">Metadata service</a> which helps the editors to tag their content by giving keyword suggestions.<br />
In reality this means that the information can be distributed in the way it is intended. News are for example tagged with subject, target group and organizational info (apart from dates, author, expiring date etc which is automated) – meaning that the people belonging to specific groups with certain roles will get the news that are important to them.</p>
<p>Once the information is tagged correctly and published it is indexed by search. This is done in a number of different ways: by HTML-crawling, through RSS, by feeding the search engine or through direct indexing.</p>
<p>The information is after this available through search and ready to be distributed to the right target groups.<br />
<a title="Portlets to display atoms and rss feeds (in Swedish)" href="http://code.google.com/p/oppna-program-rss-client/" target="_blank">Portlets</a> are used to give single sign-on access to a number of information systems and template pages in the WCM (Web Content Management system) uses search alerts to give updated information.<br />
Simply put: a search alert for e.g. meeting minutes that contains your department&#8217;s name will give you an overview of all information that concerns this when it is published, regardless of in which system it resides.</p>
<p>Furthermore, the blog post describes VGRs work with creating short and persistent URL:s (through an URL-service) and how to ”monitor” and “listen to” the information flow (for real-time indexing and distribution) &#8211; areas where we all have things to learn.<br />
Over time Kristian will describe the different parts of the model in detail, be sure to keep an eye on the <a title="Blog Kristian Norling VGR" href="http://sys64738.se" target="_blank">blog</a>.</p>
<p>What are your thoughts on how to get control of the information flow? Have you been developing similar solutions for part of this?</p>
</span></span><div class="schema_property_wrap"></div><meta itemprop="url" content="http://blog.findwise.com/information-flow-in-vgr/"><meta itemprop="discussionUrl" content="http://blog.findwise.com/information-flow-in-vgr/"><meta itemprop="datePublished" content="2010-10-17T21:47:04+00:00"><meta itemprop="dateModified" content="2011-06-30T12:41:58+00:00"><meta itemprop="dateCreated" content="2010-10-17T21:47:04+00:00"><meta itemprop="keywords" content="author,content management system,HTML,Kristian Norling,search alert,search engine,similar solutions,V&Atilde;&curren;stra G&Atilde;&para;taland Regional Council,VGR"><meta itemprop="wordCount" content="463"><meta itemprop="blogPosts" content="http://blog.findwise.com">]]></content:encoded>
			<wfw:commentRss>http://blog.findwise.com/information-flow-in-vgr/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Solr Processing Pipeline</title>
		<link>http://blog.findwise.com/solr-processing-pipeline/</link>
		<comments>http://blog.findwise.com/solr-processing-pipeline/#comments</comments>
		<pubDate>Mon, 19 Apr 2010 13:07:25 +0000</pubDate>
		<dc:creator>Max Charas</dc:creator>
				<category><![CDATA[Connector]]></category>
		<category><![CDATA[Content refinement]]></category>
		<category><![CDATA[Data Processing]]></category>
		<category><![CDATA[Open source]]></category>
		<category><![CDATA[Search]]></category>
		<category><![CDATA[Solr]]></category>
		<category><![CDATA[Technology]]></category>
		<category><![CDATA[Apache Commons Processing Pipeline]]></category>
		<category><![CDATA[API]]></category>
		<category><![CDATA[data processing layer]]></category>
		<category><![CDATA[Prague]]></category>
		<category><![CDATA[Solr REST protocol]]></category>

		<guid isPermaLink="false">http://findabilityblog.se/?p=1952</guid>
		<description><![CDATA[Hi again Internet, For once I have had time to do some thinking. Why is there no powerful data processing layer between the Lucene Connector Framework and Solr? I´ve been looking into the Apache Commons Processing Pipeline. It seems like a likely candidate to do some cool stuff.  Look at the diagram below. What I´m thinking [...]]]></description>
			<content:encoded><![CDATA[<span itemprop="mainContentOfPage"><span itemprop="articleBody"><p>Hi again Internet,</p>
<p>For once I have had time to do some thinking. Why is there no powerful data processing layer between the <a title="the Lucene Connector Framework" href="http://incubator.apache.org/connectors/" target="_blank">Lucene Connector Framework</a> and Solr? I´ve been looking into the <a title=" the Apache Commons Processing Pipeline" href="http://commons.apache.org/sandbox/pipeline/" target="_blank">Apache Commons Processing Pipeline</a>. It seems like a likely candidate to do some cool stuff.  Look at the diagram below.</p>
<div id="attachment_1953" class="wp-caption aligncenter" style="width: 310px"><a href="http://media.findabilityblog.se/2010/04/Drawing11.jpg"><img itemprop="image" class="size-medium wp-image-1953  " src="http://media.findabilityblog.se/2010/04/Drawing1-300x148.jpg" alt="" width="300" height="148" /></a><p class="wp-caption-text">A schematic drawing of a Solr Pipeline concept. (Click to enlarge)</p></div>
<p>What I´m thinking of is to make a transparent Solr pipeline that speaks the Solr REST protocol on each end. This means that you would be able to use SolrJ or any other API to communicate with the Pipeline.</p>
<p>Has anyone attempted this before?  If you’re interested in chatting about the pipeline drop me a mail or just grab me at Eurocon in Prague this year.</p>
</span></span><div class="schema_property_wrap"></div><meta itemprop="url" content="http://blog.findwise.com/solr-processing-pipeline/"><meta itemprop="discussionUrl" content="http://blog.findwise.com/solr-processing-pipeline/"><meta itemprop="datePublished" content="2010-04-19T14:07:25+00:00"><meta itemprop="dateModified" content="2010-04-19T14:07:25+00:00"><meta itemprop="dateCreated" content=""><meta itemprop="keywords" content="Apache Commons Processing Pipeline,API,data processing layer,Prague,Solr REST protocol"><meta itemprop="wordCount" content="133"><meta itemprop="blogPosts" content="http://blog.findwise.com">]]></content:encoded>
			<wfw:commentRss>http://blog.findwise.com/solr-processing-pipeline/feed/</wfw:commentRss>
		<slash:comments>11</slash:comments>
		</item>
		<item>
		<title>Solr – the Sunny Side of Search</title>
		<link>http://blog.findwise.com/solr-%e2%80%93-the-sunny-side-of-search/</link>
		<comments>http://blog.findwise.com/solr-%e2%80%93-the-sunny-side-of-search/#comments</comments>
		<pubDate>Thu, 01 Apr 2010 09:32:17 +0000</pubDate>
		<dc:creator>Max Charas</dc:creator>
				<category><![CDATA[Open source]]></category>
		<category><![CDATA[Search]]></category>
		<category><![CDATA[Solr]]></category>
		<category><![CDATA[large enterprise search platforms]]></category>
		<category><![CDATA[no-name search platforms]]></category>
		<category><![CDATA[Norway]]></category>
		<category><![CDATA[Oslo]]></category>

		<guid isPermaLink="false">http://findabilityblog.se/solr-%e2%80%93-the-sunny-side-of-search</guid>
		<description><![CDATA[When I started working for Findwise two years ago, Apache Solr was one of those no-name search platforms. We could barely get our customers to consider Solr even after proving that the platform would be a perfect match for their business needs. As time passed and the financial crisis hit the world, a few of [...]]]></description>
			<content:encoded><![CDATA[<span itemprop="mainContentOfPage"><span itemprop="articleBody"><p>When I started working for Findwise two years ago, Apache Solr was one of those no-name search platforms. We could barely get our customers to consider Solr even after proving that the platform would be a perfect match for their business needs. As time passed and the financial crisis hit the world, a few of our customers started considering Solr, but then usually for the reason that it was “free” – not for the functionality of the platform.</p>
<p>Things have changed. More and more companies now offer support and training for Solr. It seems that the platform is gaining momentum on the enterprise market.<br />
In fact, I was just in Oslo, Norway to become a certified Lucid Imagination training partner, as the need for training is growing rapidly, even up here in the snow-covered Nordics.</p>
<p>Today we even have customers approaching us asking questions about how, and not if, they should use Solr. I wouldn’t have imagined that two years ago &#8230;</p>
<p>Could this be the year that Solr goes head to head with the large enterprise search platforms?<br />
And where will we be in another two years?</p>
<p>I wish I knew.</p>
</span></span><div class="schema_property_wrap"></div><meta itemprop="url" content="http://blog.findwise.com/solr-%e2%80%93-the-sunny-side-of-search/"><meta itemprop="discussionUrl" content="http://blog.findwise.com/solr-%e2%80%93-the-sunny-side-of-search/"><meta itemprop="datePublished" content="2010-04-01T10:32:17+00:00"><meta itemprop="dateModified" content="2010-04-01T10:32:17+00:00"><meta itemprop="dateCreated" content=""><meta itemprop="keywords" content="large enterprise search platforms,no-name search platforms,Norway,Oslo"><meta itemprop="wordCount" content="191"><meta itemprop="blogPosts" content="http://blog.findwise.com">]]></content:encoded>
			<wfw:commentRss>http://blog.findwise.com/solr-%e2%80%93-the-sunny-side-of-search/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Faceted Search by LinkedIn</title>
		<link>http://blog.findwise.com/faceted-search-by-linkedin/</link>
		<comments>http://blog.findwise.com/faceted-search-by-linkedin/#comments</comments>
		<pubDate>Fri, 12 Mar 2010 11:00:38 +0000</pubDate>
		<dc:creator>Maria Johansson</dc:creator>
				<category><![CDATA[Interaction Design]]></category>
		<category><![CDATA[Lucene]]></category>
		<category><![CDATA[Open source]]></category>
		<category><![CDATA[Search]]></category>
		<category><![CDATA[Solr]]></category>
		<category><![CDATA[Technology]]></category>
		<category><![CDATA[Usability]]></category>
		<category><![CDATA[Daniel Tunkelang]]></category>
		<category><![CDATA[faceted search]]></category>
		<category><![CDATA[John Wang]]></category>
		<category><![CDATA[LinkedIn]]></category>
		<category><![CDATA[people search]]></category>
		<category><![CDATA[prominent search functionality]]></category>
		<category><![CDATA[reference search]]></category>
		<category><![CDATA[Sara Alpern]]></category>
		<category><![CDATA[search architect]]></category>
		<category><![CDATA[search experience]]></category>
		<category><![CDATA[search interface]]></category>
		<category><![CDATA[similar solution]]></category>

		<guid isPermaLink="false">http://findabilityblog.se/?p=1839</guid>
		<description><![CDATA[My RSS feeds have been buzzing about the LinkedIn faceted search since it was first released from beta in December. So why is the new search at LinkedIn so interesting that people are almost constantly discussing it? I think it’s partly because LinkedIn is a site that is used by most professionals and searching for [...]]]></description>
			<content:encoded><![CDATA[<span itemprop="mainContentOfPage"><span itemprop="articleBody"><p>My RSS feeds have been buzzing about the LinkedIn <a href="http://en.wikipedia.org/wiki/Faceted_search">faceted search</a> since it was first released from beta in December. So why is the new search at <a href="http://www.linkedin.com">LinkedIn</a> so interesting that people are almost constantly discussing it? I think it’s partly because LinkedIn is a site that is used by most professionals and searching for people is core functionality on LinkedIn. But the search interface on LinkedIn is also a very good example of faceted search.</p>
<p>I decided to have a closer look into their search. The first thing I realized was just how many different kinds of searches there are on LinkedIn. Not only the obvious people search but also, job, news, forum, group, company, address book, answers and reference search. LinkedIn has managed to integrate search so that it’s the natural way of finding information on the site. People search is the most prominent search functionality but not the only one.</p>
<p>I’ve seen several different people search implementations and they often have a tendency to work more or less like phone books. If you know the name you type it and get the number. And if you’re lucky you can also get the name if you only have the number. There is seldom anyway to search for people with a certain competence or from a geographic area. LinkedIn sets a good example of how searching for people could and should work.</p>
<p>LinkedIn has taken careful consideration of their users; What information they are looking for, how they want it presented and how they need to filter searches in order to find the right people. The details that I personally like are the possibility to search within filters for matching options (I worked on a similar solution last year) and how different filters are displayed (or at least in different order) depending on what query the user types. If you want to know more about how the faceted search at LinkedIn was designed, check out the <a href="http://blog.linkedin.com/2010/03/05/designing-linkedin-faceted-search/">blog post</a> by Sara Alpern.</p>
<p>But LinkedIn is not only interesting because of the good search experience. It’s also interesting from a technical perspective. The LinkedIn search is built on open source so they have developed everything themselves. For those of you interested in the technology behind the new LinkedIn search I recommend “<a href="http://thenoisychannel.com/2010/01/31/linkedin-search-a-look-beneath-the-hood/">LinkedIn search a look beneath the hood</a>”, by <a href="http://thenoisychannel.com">Daniel Tunkelang</a> where he links to a presentation by <a href="http://www.linkedin.com/in/javasoze">John Wang</a> search architect at LinkedIn.</p>
</span></span><div class="schema_property_wrap"></div><meta itemprop="url" content="http://blog.findwise.com/faceted-search-by-linkedin/"><meta itemprop="discussionUrl" content="http://blog.findwise.com/faceted-search-by-linkedin/"><meta itemprop="datePublished" content="2010-03-12T12:00:38+00:00"><meta itemprop="dateModified" content="2010-03-12T12:00:38+00:00"><meta itemprop="dateCreated" content=""><meta itemprop="keywords" content="Daniel Tunkelang,faceted search,John Wang,LinkedIn,people search,prominent search functionality,reference search,Sara Alpern,search architect,search experience,search interface,similar solution"><meta itemprop="wordCount" content="408"><meta itemprop="blogPosts" content="http://blog.findwise.com">]]></content:encoded>
			<wfw:commentRss>http://blog.findwise.com/faceted-search-by-linkedin/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>How to create better search &#8211; VGR leads the way</title>
		<link>http://blog.findwise.com/how-to-create-better-search-vgr-leads-the-way/</link>
		<comments>http://blog.findwise.com/how-to-create-better-search-vgr-leads-the-way/#comments</comments>
		<pubDate>Mon, 11 Jan 2010 22:26:13 +0000</pubDate>
		<dc:creator>Caroline Abrahamsson</dc:creator>
				<category><![CDATA[Future development]]></category>
		<category><![CDATA[Information quality]]></category>
		<category><![CDATA[Internet search]]></category>
		<category><![CDATA[Intranet]]></category>
		<category><![CDATA[Lucene]]></category>
		<category><![CDATA[Open source]]></category>
		<category><![CDATA[Solr]]></category>
		<category><![CDATA[Fredrik Wackå]]></category>
		<category><![CDATA[functionality and solutions]]></category>
		<category><![CDATA[Google]]></category>
		<category><![CDATA[Kristian Norling]]></category>
		<category><![CDATA[PDF]]></category>
		<category><![CDATA[search experience]]></category>
		<category><![CDATA[search solution]]></category>
		<category><![CDATA[senior IT-strategist]]></category>
		<category><![CDATA[Twitter]]></category>
		<category><![CDATA[writer]]></category>

		<guid isPermaLink="false">http://www.findwise.se/?p=1328</guid>
		<description><![CDATA[I realise we are a bit late. Fredrik Wackå, a senior IT-strategist, has already written an excellent article on his blog (in Swedish). He has, among other things, been interviewing Kristian Norling (at Twitter), who has been working with portal strategies and search for many years at Västra Götalands regionen. Although, for all our non-Swedish speaking [...]]]></description>
			<content:encoded><![CDATA[<span itemprop="mainContentOfPage"><span itemprop="articleBody"><p>I realise we are a bit late. Fredrik Wackå, a senior IT-strategist, has already written an excellent article on <a title="Fredrik Wackås blogg" href="http://www.wpr.se/2010/01/snabbhet-grunden-metadata-forfiningen-nar-vg-regionen-skapade-sokmotor/" target="_blank">his blog</a> (in Swedish). He has, among other things, been interviewing <a title="Kristian Norling" href="http://se.linkedin.com/in/kristiannorling" target="_blank">Kristian Norling</a> (at <a title="Kristian Norling at Twitter" href="https://twitter.com/kristiannorling" target="_blank">Twitter</a>), who has been working with portal strategies and search for many years at Västra Götalands regionen.<br />
Although, for all our non-Swedish speaking guests here is a short summary:</p>
<p>Findwise has during the last few months been working on a new search solution for Västra Götalands regionen.  The two main goals have been to deliver a search experience that seems both fast and accurate.<br />
The result?<br />
Today making a search at VGR takes about 0,1-0,2 seconds, faster than a Google search on the web.</p>
<p>Furthermore, there was a need for context. Large amount of information requires ways to filter and sort – otherwise the users will drown in the result list.<br />
By giving the end-users the ability to sort the search result the users can look for general information within an area as well as quickly narrow down to a specific piece (for example by two clicks be able to see only the PDF-files created in 2009). The filters (and thereby metadata standard) includes:</p>
<p>• Information type<br />
• Where the document resides<br />
• Where it belongs in the organization<br />
• What source it has<br />
• When it was last changed<br />
• Who has written it<br />
• What format it resides in<br />
• Keywords that has been created</p>
<div id="attachment_1329" class="wp-caption alignleft" style="width: 310px"><a href="http://None"><img itemprop="image" class="size-medium wp-image-1329" title="Västra Götalands regionen" src="http://www.findwise.se/wp/wp-content/vgr-300x192.jpg" alt="VGR" width="300" height="192" /></a><p class="wp-caption-text">VGR</p></div>
<p>The search solution also includes a metadata service. As so many others VGR has been struggling with getting the metadata in place.<br />
Apart from the metadata supported by the system (where <a title="Dublin core" href="http://www.dublincore.org/" target="_blank">Dublin Core</a> is being used) the metadata service is doing two things:<br />
• Analyses the content in the text, compares it to taxonomy and gives the writer suggestions of keywords that he/she can use<br />
• Gives the writer the ability to add additional keywords</p>
<p>Apart from this the end-users will be able to add etiquettes (tags). These will be compared with two lists. If the tags appears in the “white list” it will be published right away, if they are in the “blacklist” they will be deleted. Anything inbetween are controlled before they are published.</p>
<p>To conclude: a lot of effort has been put into creating a good search experience and VGR continues to deliver functionality and solutions that are light-years ahead of many others. The combination of supporting systems and using the &#8220;collected intelligence&#8221; of the writers and end-users will make it even better over time.<br />
Search is about both supporting systems, content and people.</p>
<p>Read more in <a title="Fredrik Wackås blogg" href="http://www.wpr.se/2010/01/snabbhet-grunden-metadata-forfiningen-nar-vg-regionen-skapade-sokmotor/" target="_blank">Fredrik Wackås blog</a></p>
</span></span><div class="schema_property_wrap"></div><meta itemprop="url" content="http://blog.findwise.com/how-to-create-better-search-vgr-leads-the-way/"><meta itemprop="discussionUrl" content="http://blog.findwise.com/how-to-create-better-search-vgr-leads-the-way/"><meta itemprop="datePublished" content="2010-01-11T23:26:13+00:00"><meta itemprop="dateModified" content="2010-01-11T23:26:13+00:00"><meta itemprop="dateCreated" content=""><meta itemprop="keywords" content="Fredrik Wack&Atilde;&yen;,functionality and solutions,Google,Kristian Norling,PDF,search experience,search solution,senior IT-strategist,Twitter,writer"><meta itemprop="wordCount" content="428"><meta itemprop="blogPosts" content="http://blog.findwise.com">]]></content:encoded>
			<wfw:commentRss>http://blog.findwise.com/how-to-create-better-search-vgr-leads-the-way/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Findwise releases Open Pipeline plugins</title>
		<link>http://blog.findwise.com/findwise-releases-open-pipeline-plugins/</link>
		<comments>http://blog.findwise.com/findwise-releases-open-pipeline-plugins/#comments</comments>
		<pubDate>Fri, 09 Oct 2009 06:54:57 +0000</pubDate>
		<dc:creator>Karl Jansson</dc:creator>
				<category><![CDATA[Content refinement]]></category>
		<category><![CDATA[Future development]]></category>
		<category><![CDATA[Information quality]]></category>
		<category><![CDATA[Open source]]></category>
		<category><![CDATA[API]]></category>
		<category><![CDATA[developing document processors]]></category>
		<category><![CDATA[document processing]]></category>
		<category><![CDATA[document processors]]></category>
		<category><![CDATA[Enterprise Search]]></category>
		<category><![CDATA[IBM]]></category>
		<category><![CDATA[incomplete solutions]]></category>
		<category><![CDATA[index solutions]]></category>
		<category><![CDATA[job scheduler]]></category>
		<category><![CDATA[Open Pipeline]]></category>
		<category><![CDATA[open source software]]></category>
		<category><![CDATA[search engine]]></category>
		<category><![CDATA[service/product]]></category>

		<guid isPermaLink="false">http://www.findwise.se/?p=1141</guid>
		<description><![CDATA[Findwise is proud to announce that we now have released our first publicly available plugins to the Open Pipeline crawling and document processing framework. A list of all available plugins can be found on the Open Pipeline Plugins page and the ones Findwise have created can be downloaded on our Findwise Open Pipeline Plugins page. [...]]]></description>
			<content:encoded><![CDATA[<span itemprop="mainContentOfPage"><span itemprop="articleBody"><p>Findwise is proud to announce that we now have released our first publicly available plugins to the Open Pipeline crawling and document processing framework. A list of all available plugins can be found on the <a href="http://www.openpipeline.org/plugins/">Open Pipeline Plugins page</a> and the ones Findwise have created can be downloaded on our <a href="&lt;br &gt;&lt;/a&gt; http://www.findwise.se/findwise-open-pipeline">Findwise Open Pipeline Plugins page.</a></p>
<p><span id="more-1141"></span></p>
<p>OpenPipeline is an open source software for crawling, parsing, analyzing and routing documents. It ties together otherwise incomplete solutions for enterprise search and document processing. OpenPipeline provides a common architecture for connectors to data sources, file filters, text analyzers and modules to distribute documents across a network. It includes a job scheduler and a full UI with a point-and-click interface.</p>
<p>Findwise have been using this framework in a number of customer projects with great success. It ties particularly good together with Apache Solr, not only because it is open source but most importantly because it fills a hole in functionality that Solr lacks &#8211; an easy to use framework for developing document processors and connectors. However we are not using this for Solr only, a number of plugins for the Google Search Appliance have also been made and we have started investigating how Open Pipeline can be integrated with the IBM Omnifind search engine as well.</p>
<p>The best thing with this framework is that it is very flexible and customizable but still easy to use AND, maybe most importantly for me as a developer, easy to work with and develop against. It has a simple yet powerful enough API to handle all that you need. And because it is an open source framework any shortcomings and limitations that we find along the way can be investigated in detail and a better solution can be proposed to the Open Pipeline team for inclusion in future releases.</p>
<p>We have in fact already contributed to the development of the project in a great deal by using it, testing it and by reporting bugs and suggested improvements on their forums. And the response from the team has been very good &#8211; some of our suggested improvements have already been included and some are on the way in the new 0.8 version. We are also in the process of further deepening the collaboration by signing a contributors agreement so that we eventually can be able to contribute with code as well.</p>
<p>So how do our customers benefit from this?</p>
<p>First it makes us develop and deliver search and index solutions more quickly and of better quality to our customers. This is because more developers can work with the same framework as a base and the overall code base will be used more, tested more and is thus of better quality. We have also the possibility to reuse good and well tested components so that several customers together can share the costs of development and thus get a better service/product for less money which is always a good thing of course!</p>
</span></span><div class="schema_property_wrap"></div><meta itemprop="url" content="http://blog.findwise.com/findwise-releases-open-pipeline-plugins/"><meta itemprop="discussionUrl" content="http://blog.findwise.com/findwise-releases-open-pipeline-plugins/"><meta itemprop="datePublished" content="2009-10-09T08:54:57+00:00"><meta itemprop="dateModified" content="2009-10-09T08:54:57+00:00"><meta itemprop="dateCreated" content=""><meta itemprop="keywords" content="API,developing document processors,document processing,document processors,Enterprise Search,IBM,incomplete solutions,index solutions,job scheduler,Open Pipeline,open source software,search engine,service/product"><meta itemprop="wordCount" content="488"><meta itemprop="blogPosts" content="http://blog.findwise.com">]]></content:encoded>
			<wfw:commentRss>http://blog.findwise.com/findwise-releases-open-pipeline-plugins/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Comparing open souce for search</title>
		<link>http://blog.findwise.com/comparing-open-souce-for-search/</link>
		<comments>http://blog.findwise.com/comparing-open-souce-for-search/#comments</comments>
		<pubDate>Mon, 31 Dec 2007 00:05:35 +0000</pubDate>
		<dc:creator>Caroline Abrahamsson</dc:creator>
				<category><![CDATA[Internet search]]></category>
		<category><![CDATA[Open source]]></category>
		<category><![CDATA[open source solutions]]></category>
		<category><![CDATA[search tools]]></category>

		<guid isPermaLink="false">http://www.findwise.se/?p=53</guid>
		<description><![CDATA[Even Gartner has talked about open source solutions as interesting search tools. For those of you who needs an introduction, a slideshow comparing Lucene, Solr and Nutch can be found here.]]></description>
			<content:encoded><![CDATA[<span itemprop="mainContentOfPage"><span itemprop="articleBody"><p>Even Gartner has talked about open source solutions as interesting search tools. For those of you who needs an introduction, a slideshow comparing Lucene, Solr and Nutch can be found <a target="_blank" href="http://www.slideshare.net/dnaber/apache-lucene-searching-the-web-and-everything-else-jazoon07/" title="Comparing open source for search">here</a>.</p>
</span></span><div class="schema_property_wrap"></div><meta itemprop="url" content="http://blog.findwise.com/comparing-open-souce-for-search/"><meta itemprop="discussionUrl" content="http://blog.findwise.com/comparing-open-souce-for-search/"><meta itemprop="datePublished" content="2007-12-31T02:05:35+00:00"><meta itemprop="dateModified" content="2007-12-31T02:05:35+00:00"><meta itemprop="dateCreated" content=""><meta itemprop="keywords" content="open source solutions,search tools"><meta itemprop="wordCount" content="31"><meta itemprop="blogPosts" content="http://blog.findwise.com">]]></content:encoded>
			<wfw:commentRss>http://blog.findwise.com/comparing-open-souce-for-search/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

