Installation:
brew install elasticsearch # or whatever your package manager is called.
Indexing:
django cron reindex_addons # Index all the add-ons.
The reindex job uses celery to parallelize indexing. Running the job multiple times will replace old index items with a new document.
The index is maintained incrementally through post_save and post_delete hooks.
Setting up other indexes:
django cron reindex_collections # Index all the collections.
django cron reindex_users # Index all the users.
django cron compatibility_report # Set up the compatibility index.
django index_stats # Index all the update and download counts.
We use a custom analyzer for indexing add-on names since they’re a little different from normal text. To get the same results as our servers, put this in your elasticsearch.yml (available at scripts/elasticsearch/elasticsearch.yml)
cluster:
name: wooyeah
path:
logs: /usr/local/var/log
data: /usr/local/var/data
index:
analysis:
analyzer:
standardPlusWordDelimiter:
type: custom
tokenizer: standard
filter: [standard, wordDelim, lowercase, stop, dict]
filter:
wordDelim:
type: word_delimiter
preserve_original: true
dict:
type: dictionary_decompounder
word_list: [cool, iris, fire, bug, flag, fox, grease, monkey, flash, block, forecast, screen, grab, cookie, auto, fill, text, all, so, think, mega, upload, download, video, map, spring, fix, input, clip, fly, lang, up, down, persona, css, html, all, http, ball, firefox, bookmark, chat, zilla, edit, menu, menus, status, bar, with, easy, sync, search, google, time, window, js, super, scroll, title, close, undo, user, inspect, inspector, browser, context, dictionary, mail, button, url, password, secure, image, new, tab, delete, click, name, smart, down, manager, open, query, net, link, blog, this, color, select, key, keys, foxy, translate, word]
If you don’t do this your results will be slightly different, but you probably won’t notice.