Because the change we are making is at the core level, we need this executed before tokens are generated from the character stream by the search engine. The customization needed to support above requirement is to be able to intercept during the index time for a given field at the character stream level and modify the character stream adding the merged tokens we are looking for before the next stage in the pipeline is called. Solr allows to add custom behavior for both index and search operations by manipulating the index/search pipelines defined in the solrconfig.xml. Solr provides this ability for users to build custom filters which is very powerful and differentiating factor from commercial search engines in the market. But we can build our own custom plugin to meet the above requirement and add it to the platform. This is currently not supported out of the box from any of the Solr analyzers and filters. For example, a name "De Vera Michael" which is an European origin should be returned in search results when someone search for "devera". In this post, we consider a custom requirement related to name search application which involves names with European and American origins. But there will always be business use cases where a few customization are needed to achieve the desired results.
![geodist solr geodist solr](https://raw.githubusercontent.com/PatrickCallaghan/images/master/PostCodeBBox.png)
The Wikipedia Bob Alice HMM example using scikit-l.Solr has a lot of bells and whistles to use out of the box for building a robust search platform for a company.Implementing the RAKE Algorithm with NLTK.We add the mscore and fscore field definitions also in the schema.xml file in the fields block as follows: The title is already present in schema.xml with type="text_general", which works fine for us, since it will tokenize individual words (we want to be able to search on coffee, cocoa and sugar). Source: src/main/scala/com/mycompany/solr4extras/funcquery/FuncQueryDataGenerator.scala package import import ._ import. import .SolrInputDocument object FuncQueryDataGenerator extends App
#GEODIST SOLR CODE#
Here is some Scala/SolrJ code that will generate and populate the data into a vanilla Solr 4.1.0 instance.
#GEODIST SOLR PLUS#
The mscore and fscore are random integers in a range of 1-1000, and the title contains one of three strings "coffee", "cocoa" and "sugar" plus the mscore and fscore values (primarily for visual feedback). In this post, I will describe a possible implementation that uses Function Queries to rerank search results using male/female appeal document scores.įor testing, I created some dummy data of 100,000 records with three fields - title, mscore and fscore. This idea can be easily extended for multi-category features such as ethnicity as well. So the idea is that if we know that the profile is male, we should boost the documents that have a high male appeal score and deboost the ones that have a high female appeal score, and vice versa if the profile is female. For example, we can assign a score to a document that indicates its appeal/information value to males versus females that would correspond to the profile's gender. On the content side, we can annotate the document with various features corresponding to these profile features. This could be gender, age, ethnicity and a variety of other things. We want to be able to customize our search results based on what a (logged-in) user tells us about himself or herself via their profile.
![geodist solr geodist solr](http://img.youtube.com/vi/QGgM76HWIcA/0.jpg)
My introduction to Function Queries was through a problem posed to me by one of my coworkers. So far, I haven't had the opportunity to personally use either feature in a real application. Most people get introduced to Function Queries through the bf parameter in the DisMax Query Parser or through the geodist function in Spatial Search. Which is probably why when I would read about Function Queries, they would seem like a nice idea, but not interesting enough to pursue further. Solr has had support for Function Queries since version 3.1, but before sometime last week, I did not have a use for it.