Is it possible to make a search for different encodings in one step, to search with the same keyword latin and cyrilic:
example keywords: (nokia, нокиа). I’m using solr in one project and I want to be able to make a search like I’ve described above without doing 2 searches and merge the results. Any ideas?
Thanks in advance :)
Do you get both values as input? I don’t know cyrillic, but can you derive “nokia” from “нокиа” (i mean, are there rules that would make “н” be “n” and “и” be an “i”? If there’s some kind of correspondence between the letters, you can write a FilterFactory just like the one that accepts accented characters and then removes the accents from them.
If there’s no such kind of correspondence between letters, that depends on how you store these values. Are all fields stored in the same field in Solr or do you have different fields for the cyrilic and for the english values?
There is only one input and yes I can derive “нокиа” from “nokia”
lets say “nokia”.to_cyr == “нокиа” and I wish to be able to do search with both in one go, example:
It’s a RoR app :)
That’s easy, it depends on how deep into Solr you want to go. You could build a TokenFilter that would get “нокиа” and make it “nokia”, that way you could add this filter for both the index and search analyser.
If you don’t want to code in Java and hack into Solr, you can just do it in your Rails models. Heres an example:
text :name do
self.name.to_ascii #this would transform all cyrilic characters into common ascii ones
And when searching you’d do the same:
Product.solr_search do |s|
This way you’ll do everything with a single search.
Thanks Maurício! I’ve used Solr on a Rails 2 project with great success. I’ve been toying with the idea of trying out other search solutions with Rails 3, but your tutorial makes it enticing to stick with the devil I know.
There’s another article in the oven already about ElasticSearch, it should probably be published the next week but it’s a great solution for full text search with Rails and Big Time competition for Solr. Stay tuned to read it too :)