After seeing some relative success in our Solr implementations xml response times by switching on Tomcats http gzip compression, I've been doing some comparisons between the other formats solr can return. We use Solrnet, an excellent open source .NET Solr client. At the moment, it only supports xml responses, but every request sends the "Accept-encoding:gzip" header as standard, so all you have to do is switch it on on your server and you've got some nicely compressed responses. There is talk of supporting javabin de-serialisation, but it's not there yet. I've decided to compare the following using curl with 1000 rows and 10000 rows in json, javabin, json/gzip compressed and javabin/gzip compressed.

My test setup is a solr 1.4 instance with around 11000 records in sitting behind an nginx reverse proxy handling the gzip compression. As I said, this could easily be achieved by switching on gzip compression in Apache Tomcat. The same 10000 records, returned using the q=*:* directive with wt=json when http gzip compressed is the smallest, but only marginally, compared to wt=javabin. It would seem that json compresses very well indeed. You can also see the massive drop just switching on gzip compression gives to xml.

My conclusion to this would be that because json is a widely accepted content-type, with many well known and fast de-serialising libraries, it would probably be worth implementing that rather than trying to de-serialise javabin. But this was only a quick test and does't take into account how quickly solr handles serialisation of the documents server-side.

Tag: 
Solr
sharri.morris@7digital.com
Saturday, July 7, 2012 - 13:03

We have recently been working on an incremental indexer for our Solr based search implementation, which was being updated sporadically due to the time it took to perform a complete re-index; it was taking about 5 days to create the 13GB of XML, zip, upload to the server, unzip and then re-index. We have created a Windows service which queries a denormalised data structure using NHibernate. We then use SolrNet to create our Solr documents and push them to the server in batches.

Solr Update Process

sharri.morris@7digital.com
Friday, March 2, 2012 - 11:47

After having read the o’Reilly book “REST in Practice” , I set myself the challenge of using OpenRasta to create a basic RESTful web service. I decided for the first day to just concentrate on getting a basic CRUD app as outlined in chapter 4 working. This involved the ability to create, read, update and delete physical file xml representations of Artists. It is described in the book as a Level 2 application on Richardson’s maturity model, as it doesn’t make use of Hypermedia yet. One reason why OpenRasta is such a good framework to implement a RESTful service is that it deals with “resources” and their representations. As outlined in “REST in Practice”, a resource is defined as any resource accessible via a URI, and OpenRasta deals with this perfectly as it was built to handle this model from the ground up.

The Basic Web Service

sharri.morris@7digital.com
Thursday, February 2, 2012 - 17:05

When bootstrapping a structure map registry, you are able to set the "life style"  of that particular instance using Structuremaps fluent interface. For example, when using NHibernate, it is essential that you set up ISessionFactory to be a Singleton and ISession to be on a per Http Request basis (achievable with StructureMaps HybridHttpOrThreadLocalScoped directive). Example:

For() .Singleton() .Use(SessionFactoryBuilder.BuildFor("MY.DSN.NAME", typeof(TokenMap).Assembly)) .Named("MyInstanceName");
For() .HybridHttpOrThreadLocalScoped() .Use(context =>; context.GetInstance("MyInstanceName") .OpenSession()) .Named("MyInstanceName");
It's nice and easy to test a Singleton was created with a Unit Test like so:

[TestFixtureSetUp] public void FixtureSetup(){ ObjectFactory.Initialize(ctx => ctx.AddRegistry(new NHibernateRegistry())); } [Test] public void SessionBuilder_should_be_singleton(){ var sessionBuilder1 = ObjectFactory.GetInstance(); var sessionBuilder2 = ObjectFactory.GetInstance(); Assert.That(sessionBuilder1, Is.SameAs(sessionBuilder2)); }

sharri.morris@7digital.com
Wednesday, February 1, 2012 - 15:42

Introduction

We have been using Solr for a while for search, Solr is fantastic, but the way we get our data into Solr is not so good. The DB is checked for new/updated/removed
content, then written into a jobs table, which is checked to see if there are any pending jobs. There are numerous issues with using a DB table as a queue, some for MySQL are listed at:

http://www.engineyard.com/blog/2011/5-subtle-ways-youre-using-mysql-as-a...

To stop using our DB as a queue I decided to test out setting up and using an AMQP based message queue. AMQP is an open standard for passing messages via queues. The finally goal would be to allow other teams to push high priority updates or new content directly to the queue rather than have to go through the DB, which can add considerable latency to the system.

For this test RabbitMQ was used, as it has a .Net library and it runs on virtually all OSs, has good language support, and good documentation. This can be found at the RabbitMQ site: http://www.rabbitmq.com/

Getting Started

I strongly advise reading these before you start:
http://www.rabbitmq.com/install-windows.html
and