Why metrics? Since I joined 7digital I've seen the API grow from a brand new feature side by side with the (then abundant) websites to be the main focus of the company. The traffic grew and grew and keeps on growing in an accelerated pace and that brings us new challenges. We've brought the agile perspective into play which has made us adapt faster and make fewer errors but:

  • We can do unit tests but they don't bring out the behaviour.
  • We can do integration tests but they won't show the whole flow.
  • We can do smoke tests but they won't show us realistic usage.
  • We can do load test but they won't have realistic weighting.

Even when we do acceptance criteria we are actually being driven by assumptions, even with an experienced developer he is really just sampling all his previous work and as we move to a larger number of servers and applications it's not humanly possible to take all variables into consideration. It is common to hear statements like 'keep an eye on the error log/server log/payments log when releasing this new feature' but when something breaks it's all about 'what was released/when was it released/is it a specific server?'. As the data grows it becomes harder to sample and deduce from it quickly enough to feedback without causing issues, especially when agile tends to implement intermediary solutions which might have different behaviours from the final solution that have not been studied. The truth is that nothing replaces real life data and statistics – including developers opinions – if it the issue is a black swan then we need to churn out usable information fast!



Taken from @gregyoung This has been seen before by other companies; for example, Flickr on their Counting and Timing blog post. See also Building Scalable Websites by Flickr's Cal Henderson. This advice has been followed by other companies like Etsy on their Measure Anything Measure Everything blog post or Shopify on their StatsD blog post. How to do it? Decided to start with a winning horse I picked up the tools used by these companies: StatsD is described as “a network daemon for aggregating statistics (counters and timers), rolling them up, then sending them to graphite”. Graphite is described as “a highly scalable real-time graphing system. As a user, you write an application that collects numeric time-series data that you are interested i[...]. The data can then be visualized through graphite's web interfaces.” The way to implement these is available in several tutorials and I used StatsD own example C# client to poll our own API request log for API users, endpoints used, caching and errors. In the future it would be ideal for the application to access StatsD itself instead of running a polling daemon. There are a lot of usable features on Graphite. The ones I've used so far include Moving Average which will smooth out spikes in the graphs making it easier to see behaviour trends in a short time range and Sort by Maxima. There are even tools to forecast future behaviour and growth using Holt Winters Forecasting Statistics and this is used by companies to understand future scalability and performance requirements based on data from previous weeks, months or years (seen in this Etsy presentation on Metrics) How it looks and some findings Right away I got some usable results. An API client had a bug in their implementation which meant they required a specific endpoint more often than they would use it – this data can help out with debugging and also prevent abuse.

Sampled and smoothed usage per endpoint per API user...

Another useful graph is error rates, which might be linked with abuse, deploying new features or other causes.

 Error chart smoothed with a few spikes but even those are on the 0.001 % rate

Here is some useful caching information per endpoint to know how to tune up TTLs or look for stampede behaviour.

Sampled and smoothed Cache Miss per Endpoint

Opinion After you start using live data to provide feedback for your work there is no going back. It is my opinion that analysis of short and long term live results of any type of work should be mandatory as we move out of an environment that is small enough to be maintained exclusively by a team's knowledge.

Tag: 
Agile Development
sharri.morris@7digital.com
Wednesday, February 20, 2013 - 12:53

Somewhere in the 7digital.com web site infrastructure there are classes that override the default controller and view factories (it is an ASP MVC project). Why did we do this? In our opinion, the default project layout is a hindrance to code readability.

The idea is explained by Uncle Bob in his concept of “screaming architecture”.  i.e. if you glance at the program's folder structure, what is the most blatant thing about it, what is it “screaming about”?

If there's a folder full of controllers, and a folder full of views, and another for models, then it's screaming “I am an ASP.Net MVC project! I do ASP MVC things!”. If there's a folder called “Artists” and another called “Genres”, each containing controllers, views and other classes related to that feature, it's instead saying “I am a music catalogue on the web”.

I personally feel that “screaming architecture” is a very poor name for a very good concept. The architecture isn't having a crisis. It's not running around with hair on fire shouting “aaargh!!!”.  Maybe Uncle Bob has more positive associations with the word “screaming”? With his meaning of “screaming”, every architecture is screaming about something, but what is the important thing. 

sharri.morris@7digital.com
Friday, January 4, 2013 - 10:11

 

We’re primarily driven by meeting 7digital’s goals and objectives

  • Everything we do should be driven by clear business goals and objectives. Where they are lacking we should go and find them.
  • We expect business needs to be provided as problems that need solving with clear expectations and measurables without prejudice towards the implementation.

Release Early and Often; Fail Early and LOUDLY!

  • It’s essential we can respond quickly to changing business requirements. The best measure of our effectiveness in doing so is via frequent predictable releases through a steady rhythm of working. Things need to be easy to change (maintainable) and delivered at a sustainable pace.
  • It’s far more preferable to get something in production as soon as possible and develop iteratively based on feedback than to get bogged down in speculative analysis or a fear of not making all the right decisions up front (be that regarding technology choices or requirements).
  • Failures are expected, and welcome. When projects fail, we learn about other routes that might work. When software fails, it tells us about invalid assumptions we’ve made. The earlier and louder the failure, the more valuable that information is.

The best solutions come from everyone working together

sharri.morris@7digital.com
Wednesday, October 17, 2012 - 09:51

Overview

Servicestack is a comprehensive web framework for .NET that allows you to quickly and easily set up a REST web service with very little effort. We already use OpenRasta to achieve this same goal within our stack, so I thought it would be interesting to compare the two and see how quickly I could get something up and running. The thing that most interested me initially about ServiceStack was the fact that it claims out of the box support for Memcached, something we already use extensively to cache DTOs, and Redis, the ubiquitous NoSql namevaluecollection store.

Getting cracking

I set myself the task of creating a basic endpoint for accessing 7digital artist, release and track details. Whilst taking advantage of ServiceStack’s ability to create a listener from a console window so I didn’t have to waste time attempting to set it up via IIS:

sharri.morris@7digital.com
Tuesday, September 25, 2012 - 16:40

Over the last month we've started using ServiceStack for a couple of our api endpoints (go to the full ServiceStack story here) . We're hosting these projects on a Debian Squeeze vm using nginx and Mono. We ran into various problems along the way which we'll explain, but we also managed to achieve some interesting things; here's a summary. Hopefully you'll find this useful.

Nginx

We're using nginx and fastcgi to host the application. This is good from a systems perspective because our applications can run without root privileges. For the communication between mono-fastcgi and nginx, we are using a unix socket file instead of proxying through a local port. This makes configuration much easier, as you map applications to files rather than port numbers, so the convention rules for this are much more straightforward. (Besides, you may be hit by a memory leak if you don't use unix socket files.) Furthermore, using files instead of ports has made our life easier for automated deployments because: