Jump to content

Support for Elasticsearch beside Database and Sphinx


daFish

Recommended Posts

Hi there.

The search is one of the most delicate parts of the forums and having members and guests finding valuable information it is necessary to have a good search. While database search is good for small communities, it is soon outgrown when the community evolves or the fulltext search isn't simply working the way it should.

Dealing with Elasticsearch myself for an project, it would be great if IPS could provide a addon or built-in support for it. As per announcement, Xenforo is going to provide an addon which will provide Elasticsearch.

Link to comment
Share on other sites

Both of you are free to chose not to use it if it would be an AddOn.
Elasticsearch would provide nice features like facetted search (like Amazon where you can filter through the set of results), schema less documents, JSON-REST-API, suggest search and so on.

And the fact XenForo is offering such an solution shouldn't mean IPS should not.

Link to comment
Share on other sites

When CycleChat was on IP.Board 3.1.4 I used Sphinx for search and it made a good job of speeding up search results, but as someone who uses ES with XF I would second your suggestion for an add-on for IPS.

It's not a solution that most small sites would need (or even want) so an add-on would be the best way to provide it. The instant injection of content data, fast speed, and small resource footprint of ES (combined with the reduction in MySQL overhead because it's not being used for fulltext searching - something it's not good at once you pass 1 million records) would be great for larger sites - and with the ability to use stemming in searches it makes a big difference to relevance.

I'm not plugging my XF board, but just offering it as a "live" example of a site with 1.7+ million posts using ES for search (on a 4 year old Debian server with 8GB RAM): www.cyclechat.net

Judge for yourself whether you feel it is fast/relevant or not?

Cheers,
Shaun :D

Link to comment
Share on other sites

Until this post, I had never even heard of ElasticSearch. Setting that aside, however, I have heard of Lucene (which Elasticsearch and Solr simply reside on top of) and considered it in the past. The main selling point, to me, of Sphinx over Lucene is that Lucene is Java-based which I'm not a fan of. It means running a Java engine on the server (which may or may not be there already, would depend on the server I'd imagine) but Java applications generally seem to run slower in my personal experience. Sphinx is compiled on the server and does not have this requirement.

I think Lucene is probably an excellent choice in many scenarios, however I don't necessarily think it's better for IP.Board than Sphinx. There are many comparisons online (on stackoverflow alone I've read 4 or 5 recently) that you can check out.

http://stackoverflow.com/questions/1284083/choosing-a-stand-alone-full-text-search-server-sphinx-or-solr
http://stackoverflow.com/questions/737275/comparison-of-full-text-search-engine-lucene-sphinx-postgresql-mysql

Just to show a few. Most of the comparisons I've read basically come down to this: for a web-based application where a database is involved, Sphinx is the superior choice. It reindexes much faster, and has built in database integration because it is designed to pull results from the database directly. Lucene is completely stand alone which has its advantages, but we don't need those advantages in our case.


Many of the features cited as benefits to using Elasticsearch (or Lucene) have no bearing in our situation.

  • Faceted search - you can replicate this at the application level, and I suspect in many instances you'd almost have to. There are Sphinx solutions to handle this, but we've not seen a huge need or demand.
  • Schema-less documents - great techy term, but what does this mean in real-world usage? Sphinx works just fine while maintaining some sort of schema on the backend.
  • JSON-REST API - we do not submit items to index via REST, so it's moot. I'd argue that letting Sphinx pull them directly from MySQL is probably signficantly more efficient than forcing them into the index over an HTTP connection anyways. This is not a plus over Sphinx IMO, but rather a downside, when the database is already there and available to pull the items from.
  • Suggest search - again, can be implemented at the software level. The main problem for us is non-Sphinx installations. Allowing type-ahead searches would bring regular fulltext search instances down pretty quickly on a moderate sized site, so we'd have to force such a feature to sphinx-only.


Once you get past the techie stuff, both search engines accomplish similar goals and both are very well backed, maintained and developed regularly, etc. I feel Sphinx is a better solution for our needs, personally.
Link to comment
Share on other sites


Thanks for your extensive answer, Brandon. Even if IPS doesn't consider adding ES in the near time, can I assume that the current product would allow me to develop an addon who would offer ES integration?




Yes, you could certainly develop your own search engine on top of the current infrastructure. "sql" and "sphinx" are dropins to the search engine core code - you could develop your own and override the setting that switches which code to load to load yours instead.
Link to comment
Share on other sites

Archived

This topic is now archived and is closed to further replies.

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...