Most openSUSE users are aware that a new version of the English wiki was released back in July, with the other wikis soon to follow. Among many other changes, the new wiki came with a laundry list of new features. However, users have noticed that one important feature was still missing in the new wiki… a decent search engine.
I mentioned in a previous article that I am working on a replacement for the default search engine. Finally, after overcoming some technical hurdles and getting some servers upgraded, the new search engine is ready to go live. If all goes well, this should happen within the next day. An article will be written on openSUSE news when the new search goes live, but I also wanted to write about some technical details of the Lucene search engine for those who might be interested.
- Performance - Testing shows that the new search engine can process about 20 queries a second on the staging server, with the average search running in about 0.2 seconds. It should be noted that end users will not see searches run this fast, as the wiki takes some time to process the results and display the page to the user. Even so, this search can handle much more load, and do it much more quickly than the default search, which uses the MySQL search capabilities.
- Suggestions and Fuzzy Searching – Rather than relying on an external dictionary (e.g. aspell), the search engine uses internal algorithms to build and use a suggestion index based on the wiki content. There are a number of advantages to this approach:
- Suggestions are relevant to the content of wiki. For example, my last name (Ehle) would be flagged as an error by any standard dictionary. However, a search for my last name with this engine will not generate a suggestion because that word exists in the wiki. Also, a search for “Ehl” will generate a suggestion for “Ehle” because it is most similar to the search term. As a bonus, the spelling index is built along with the main index, so new words are automatically added.
- Suggestions can be performed on phrases as well as words. For example, a search for “novel linux” would produce a suggestion for “novell linux” even though “novel” and “linux” are both valid terms on their own.
- Suggestions work for any language. It doesn’t matter if a good dictionary is available for the language, or if the wiki uses multiple languages (languages.opensuse.org).
- This approach allows for fuzzy searches, which are searches that automatically include results for similar terms. The same index that is used for suggestions is leveraged for fuzzy searches, which would not be possible with spelling dictionary.
- Related Searches – Two articles are considered to be related if they are both referenced in at least one other article. Thus, if Nvidia and ATI are both referenced in an article about video drivers, those two articles are considered to be related to each other and will show in the other article’s related search. This feature will become more useful as articles are added to the wiki.
- Stemming and Synonyms – Stemming lets the search engine use the stems of search terms (“run” in place of “running”) and is available for the more common languages (English, German, Spanish, etc.). Synonym searching lets the search engine use synonyms of the search term (“operating system” in place of “OS”) and is only available for English. Synonym searching is not yet enabled, but it likely will be after some additional testing.
- Indexing – For practical purposes, only full indexing will be performed. For now, the indexes will be built once a day, but this will probably be adjusted as time goes on. A full index takes a little over a minute to build, but this will increase to between 5 and 10 minutes as the old wikis are migrated to the new wiki system.
That is about all I can think of for now. Be sure to watch for the announcement, which will contain information for end users.