7digital software developer Mia Filisch attended the October 28th Velocity conference in Amsterdam. She was kind enough to share her account of the core takeaways here with us. She found that the core recurring theme around security was enough to inspire some internal knowledge sharing sessions she has already started scheming on. The diversity of insights led to a productive and informative conference. See below for her notes.

 

Key takeaways from specific sessions:

Docker Tutorial (John Willis)

  • If you haven’t worked much with Docker yet, the slides for this tutorial might be useful to you - they are a general walk-through with both explanation of concepts, products, some hints at best practices and practical exercises for consolidation: https://www.dropbox.com/s/ofxgoout0287ca8/Docker_Training%20-%20Base%20Copy.pdf

  • Be aware it’s pretty long (at Velocity the session took 3hs and that was with him actually skipping all the exercises), but it really does cover a lot.

 

Using Docker Safely (Adrian Mouat)

 

Tracking Vulnerabilities In Your Node.js Dependencies (Guy Podjarny & Assaf Hefetz)

 

Managing Secrets At Scale (Alex Schoof)

  • Hugely valuable talk, well worth reviewing. Slides are here.

  • Some key considerations:

    • Secrets are everywhere, whether we think of them or not

    • As an industry, we don’t currently tend to manage secrets very well (even when bearing in mind that security is always about trade-offs)

    • Secret management should be considered tier 0 / core infrastructure (should be highly available, have monitoring, alerting and access control)

  • In light of this, Schoof proposed the following core principles of modern secret management:

    1. The set of actors who can do something should be as small as possible
    2. Secrets need to expire (set up efficient, easy ways to do secret rotation - this shouldn't require a deploy) ((This also implies that secrets shouldn't be in version control))
    3. It should be easier to handle secrets in secure ways than insecure ways
    4. Security of a system is only as strong as its weakest access link
    5. Secrets must be highly available (as they will stop the basic functioning of apps if they aren't)
  • The talk went on to discuss all the various aspects of building a secret management system, which I’ll leave up to you to follow along via the slides, it was quite interesting.

  • Existing services that were discussed and recommended in the talk were: Vault, Keywhiz and CredStash, but all of these solutions are still pretty new, so with any of them there’ll probably still be quite a bit of tweaking required to get a management system in place that works well.

 

Seeing the Invisible: Discovering Operations Expertise (John Allspaw)

  • John Allspaw reveals what he gets up to in his free time, i.e. pursuing an MA in “Human Factors and Systems Safety” at Lund University Sweden (obviously).

  • His own research explores the area of human factors in web engineering, both with respect to understanding catastrophic failures, but also with respect to understanding the human factors involved in not having catastrophic failures in the face of things potentially going wrong literally all the time. Human Factor & Ergonomics (HFE) research has a long history in areas like aviation, surgery and mining, but for our industry is still relatively under-researched.

 

Blame, Language, Learning: Tips For Learning From Incidents (Lindsay Holmwood)

  • Good talk on maximising learning and minimising blame when dealing with incidents; an article version of the talk can be found here: http://fractio.nl/2015/10/30/blame-language-sharing/

  • TL;DR: The language we use and views we hold when talking about failure shape the outcome of that discussion, and what we learn for the future.

  • Both “Why” and “How” questions tend to limit the scope of our inquiry into incidents; instead “What” questions are a much better device for building empathy, and also help focusing the analysis on foresight - rather than it’s less constructive counterpart hindsight, which more easily falls prey to various cognitive bias and to blameful thinking.

  • Always assume local rationality: “people make what they consider to be the best decision given the information available to them at the time.” - there isn't really a just culture that doesn't revolve around this premise.

 

Alert Overload: Adopting A Microservices Architecture Without Being Overwhelmed With Noise (Sarah Wells)

  • No huge surprises but a good summary on how to set up useful alerts - below are some key points discussed.
  • Focus on business functionality:

    • Look at architecture and decide which parts or relationships are crucial to your core functionalities

    • Decide what it is that you care about for each - speed? errors? throughput? ...

  • Focus on End-to-End - ideally you only want an alert where you actually need to take action

  • Make alerts useful, build with support in mind!

    • readability! (eg. use spaces rather than camel casing etc.)

    • add links to more information or useful lookups

    • provide helpful messages

  • If most people filter out most of the email alerts they are getting, you should probably fix your alert system.

  • Have radiators everywhere; things like http://dashing.io/ are great for dashboards.

  • Setting up an alert is part of fixing an issue!

  • Alerts need continuous cultivation, they are never finished.

  • Make sure you would know it if your alert system went down!

 

The Definition Of Normal: An Intro and guide to anomaly detection (Alois Reitbauer)

  • As anomaly detection has a nice role to play in spotting issues early (ideally before any really bad things happen), I was really excited about this talk, but it quickly turned out that if you’re not from a relatively strong maths / stochastics background (like I am not), then you probably need to rely on other people for anomaly detection magic. So the following is a more high-level view.

  • Anomalies are defined as events or observations that don’t conform to an expected pattern.

  • As such, the anomaly detection workflow is:

    1. use actual data to define / calculate what is ‘normal’, i.e. define your ‘normal model’
    2. the ‘normal model’ is continuously updated with new data
    3. hypotheses are derived from the ‘normal model’
    4. events are checked against your hypotheses, applying a likeliness judgement
    5. how the event performs against this likeliness judgement translates into whether it is an anomaly or not
  • How to approach setting the baselines which define your normal model? One thing to bear in mind that some of them (such as mean/average or median) don’t learn very well. The presenter recommended using exponential smoothing instead, since it is both easy to calculate and learns very well.

 

A Real-Life Account Of Moving 100% To A Public Cloud (Julien Simon / Antoine Guy)

  • Their company Viadeo moved to AWS for reasons not too dissimilar from our own, and like us they moved over gradually. I just wanted to note down some of the key lessons they learned in the process:

    1. Outline your key objectives!
    2. Plan and build with a temporary hybrid run in mind - be able to roll back etc.
    3. Ahead of the move, have a thorough report of your infrastructure - estimate equivalent cost in cloud; evaluate each for replacement (PaaS, Saas or leave as is?); identify pain points (tech debt; relevance of moving legacy apps?)
    4. Define a high-level migration plan
    5. Tech is only half the work - identify all stakeholders and their goals; involve Legal/Finance early (especially if you might have to battle early terminations of legacy infrastructure contracts), work on awareness and knowledge transfer across teams
 

WebPageTest using real mobile apps (Steve Souders)

  • WebPageTest.org now offer a few “Real Mobile Networks” test locations - only a handful for the time being, but if they extend this it could be pretty interesting for us testing client web apps from different locations etc.!

  • Go to www.webpagetest.org > enter web page URL > select one of the “Real Mobile Networks” options.

  • The full talk was less than 7min (!), if you are interested in some context and caveats: https://youtu.be/fg0L0UXZhkI

 

Further nifty resources:

  • There’s a pretty neat collection of relevant free O’Reilly eBooks collated here.

  • All the keynote talks (including those I already highlighted above) can be watched here; unfortunately it doesn’t currently look like they’ll make the full-length talks available to the public.

  • Slides of all sessions (where speakers have chosen to share them) can be found here.

     

Tag: 
Conference
Security
Velocity
Docker
Operations
Language Learning
sharri.morris@7digital.com
Thursday, May 8, 2014 - 17:28

Astro Malaysia held it’s annual GoInnovate Challenge Hackathon on the 10th-12th October at the Malaysian Global Innovation & Creativity Centre (MaGIC).

Hopefuls from all over Malaysia massed together for an exciting challenge set by Astro - to build a radio streaming demo. The demo product was meant to redefine the way we watch, read, listen and play with content in two unique hacks to be completed within a 48 hour deadline. Astro offered substantial rewards to those whose ideas that came out on top!

Day 0: Demo - Friday evening

Attendees ranged from junior developers to start-up teams, so long as you’re 18 years old, you can take part!

To begin the Hackathon, entrants were fully briefed and given access to the APIs of both 7digital and music metadata company, Gracenote.

7digital’s lead API developer, Marco Bettiolo, flew in to act as Tech Support for the hackathon.

This photo shows Marco presenting a demo of a radio style streaming service he had previously built.

Day 1: Get Building!

According to the brief, hackers had to choose one of two innovative challenges:

sharri.morris@7digital.com
Tuesday, May 6, 2014 - 17:43

Managing session lifecycle is reasonably simple in a web application, with a myriad of ways to implement session-per-request. But when it comes to desktop apps, or Windows services, things are a lot less clear cut.

Our first attempt used NHibernate's "contextual sessions": when we needed a session we opened a new one, bound it to the current thread, did some work, and unbound the session.

We accomplished this with some PostSharp (an AOP framework) magic. A TransactionAttribute would open the session and start a transaction before the method was called, commit the transaction (or rollback if an exception had occurred), and dispose of the session after the method had completed.

It was a neat solution, and it was very easy to slap the attribute on a method and hey presto - instant session! On the other hand it was difficult to test, and to comprehend (if you weren't involved in the first place), and to avoid long transactions we found ourselves re-attaching objects to new sessions.

These concerns made us feel there was a better solution out there, and the next couple of projects provided some inspiration.

sharri.morris@7digital.com
Thursday, August 8, 2013 - 16:04

Last year we published data on the productivity of our development team at 7digital, which you can read about here.

We've completed the productivity report for this year and would again like to share this with you. We've now been collecting data from teams for over 4 years with just under 4,000 data points collected over that time. This report is from April 2012 to April 2013.

New to this year is data on the historical team size (from January 2010), which has allowed us to look at the ratio of items completed to the size of the team and how the team size compares to productivity. There's also some analysis of long term trends over the entire 4 years.

In general the statistics are very positive and show significant improvements in all measurements against the last reported period:

sharri.morris@7digital.com
Friday, July 19, 2013 - 14:55

Blue and green servers. What?

As part of the 7digital web team's automated deployment process, we now have “Blue-green servers” It took a while to do, but it's great for continuously delivering software.

This system is also known as “red/black deployments” but we preferred the blue-green name as “red” might suggest an error or fault state. You could pick any two colours that you like.

How it works is that we have two banks of web servers – the green servers, and the blue servers. Other than the server names, they’re the same. Only one of these banks is live at any one time, but we could put both live if extra-ordinary load called for it. A new version of the site is deployed to the non-live bank, and then “going live” with the new version consists of flipping a setting on the load balancer to make the non-live bank live and vice-versa.

Why?

Why did we do this? Mostly for the speed. The previous process of deploying a new site version was getting longer. The deployment script would start with a server, upload a new version of the site to it, unpack the new website files, stop the existing web site, configure the new website and start it. Then move on to the next server and do the same.