Why metrics? Since I joined 7digital I've seen the API grow from a brand new feature side by side with the (then abundant) websites to be the main focus of the company. The traffic grew and grew and keeps on growing in an accelerated pace and that brings us new challenges. We've brought the agile perspective into play which has made us adapt faster and make fewer errors but:
Comparing Solr Response Sizes
After seeing some relative success in our Solr implementations xml response times by switching on Tomcats http gzip compression, I've been doing some comparisons between the other formats solr can return. We use Solrnet, an excellent open source .NET Solr client. At the moment, it only supports xml responses, but every request sends the "Accept-encoding:gzip" header as standard, so all you have to do is switch it on on your server and you've got some nicely compressed responses. There is talk of supporting javabin de-serialisation, but it's not there yet. I've decided to compare the following using curl with 1000 rows and 10000 rows in json, javabin, json/gzip compressed and javabin/gzip compressed.
My test setup is a solr 1.4 instance with around 11000 records in sitting behind an nginx reverse proxy handling the gzip compression. As I said, this could easily be achieved by switching on gzip compression in Apache Tomcat. The same 10000 records, returned using the q=*:* directive with wt=json when http gzip compressed is the smallest, but only marginally, compared to wt=javabin. It would seem that json compresses very well indeed. You can also see the massive drop just switching on gzip compression gives to xml.
My conclusion to this would be that because json is a widely accepted content-type, with many well known and fast de-serialising libraries, it would probably be worth implementing that rather than trying to de-serialise javabin. But this was only a quick test and does't take into account how quickly solr handles serialisation of the documents server-side.