- Updated the schema config so that the uniqueKey definition for doc_id included the multiValued="false" attribute
- Changed the solr_url in the settings.py to conform to the multicore standard (e.g.: solr_url = ‘http://localhost:8983/solr/collection1/’ for the default example install of solr)
- Similarly, when copying the conf files over, you have to make sure that you put them into the proper cores folder… (i.e. solr/collection1/conf NOT solr/conf)
- Replaced the included sunburnt with the latest sunburnt (just named the included sunburnt folder sunburntOLD and copied the new sunburnt folder in) I did this just for good measure as so much of Solr had changed and I wanted to make sure that the sunburnt I was using to interface with it was a current as possible
- In the Clade lib/taxonomy.py the sunburnt query in get_docs_for_category is being executed with the fields limited to just score=true… this causes a KeyError as doc_id and title are not returned. There is a note in the code to “FIX” it… this may have worked against the old versions of Solr and sunburnt as perhaps that limiting was not working properly in those versions, however, now you need to change the line to: results = query.field_limit(["doc_id", "title"], score=True).paginate(rows=10).execute() (see: the Sunburnt Docs)
- That brings back all three expected fields to create the desired tuple for return.
Doing the above got Clade to work the same with Solr4.4 on my Mac as I had got it working with Solr 3.6 on my Linux machine.
One other thing I have noted with Clade: re-running the classify script on the same data appears to create dupes in the index. This is not the expected indexing behavior so I will have to look a little closer as to what Clade is doing in that script. Fortunately, just blowing away the solr data directory clears it out - but that feels a bit heavy handed.
I hope this helps someone. Now to start wiring up my Solr instance via Spring Data...
No comments:
Post a Comment