SBA I - Space-based Architecture

Part 1 of 3: Becoming a spaceman

By Julian Browne on June 6, 2007. Filed Under architecture, development

This is the short version of the story of my experience with Space-based Architecture. Being the instigator for one of the most often referred-to commercial implementations of an SBA and unusually an implementation outside of the financial trading sector, where the approach is more mainstream, it's appropriate that I cover not just its implementation, but why it seemed like a good idea in the first place.

I was Head of Architecture & Design at Virgin Mobile in the UK until shortly after it was taken over by NTL:Telewest (now re-branded Virgin Media). Like all operators in the mobile telecommunications space, we had to contend with operational challenges caused by frantic growth in the late nineties, particularly in keeping our customer-facing systems reliable and highly available. The culture in mobile is fast paced and competitive which often means that anything that looks and smells like strategic architecture delivering non-functional requirements is hard to do. Development time-scales tend to be short and aggressive and, in Virgin Mobile particularly, very focused on great customer service.

One channel we found hard to exploit was the web. The existing site was fairly dated and hardly inspired potential customers to us as the funky consumer champion Virgin businesses like to be. We also knew that a simple skin-job and a bit of Ajax wasn't going to be enough - a few graphics and cool text would not provide a pleasant user experience if our back-end systems were down.

The business sponsor and I kicked various ideas around for a while and finally convinced our board to splash out more cash than they'd originally had in mind on a new order management system (OMS) to sit behind the web front end. The benefit case was fairly easy to draw up: we could make ordering a reliable and predictable process, and we could reuse our new OMS for other channels. Why stop at the web? Once it works and works well, why not re-use it for telesales and high street stores too?.

We clearly needed something extensible. We also needed something that would intrinsically support an atomic processing model. Order processing can get quite involved, with credit checks, stock checks, progress updates, warehouse communications, despatch notes, call centre updates. If any one of these legacy systems were temporarily out of action, we needed the customer's order to be in a reliable state when it came back up.

And most of all we needed buckets and buckets of scalability. Anyone who's worked in the mobile sector will know it's a strange place when it comes to transactional throughput. A bank, for example, will have peaks and troughs around consistently high average levels of activity. Paradoxically, this makes design easier because, while the problem may be hard to solve, you have to solve it for every minute of the day (so your mind is focussed, your business is prepared, and the money and desire are there to support you). Not so in mobile, you can have a fairly slow day, followed by a day where order activity goes through the roof - it's not unusual for Christmas orders to be many factors above a standard day's activity. If there's a promotion going on at the same time (and there nearly always is) you can be in real trouble if your systems can't cope.

It's not just the risk of poor user experience, there's lost revenue and the incalculable impact on your brand (the 'opportunity window' of users changing mobile operators is a narrow one, and these days brand perception is everything) to add to the pressures - what we were in effect looking for was a strategic solution that could bend to needs we weren't yet aware of and that could be tactically implemented using the (non EJB) Java skill base we already had.

Whilst transactional consistency was high on our list, scalability and tolerance to outages in our legacy systems ranked higher. That is to say, without putting data integrity at risk, we needed to make sure we could manage incoming orders in a user-friendly manner even if a satellite application couldn't be contacted.

We examined all the standard approaches: light and heavyweight application servers, third party OMS products, etc, but landed on a space-based architecture based on Gigaspaces, principally by the following logic:

Of course the final architecture adjustments are a little trickier, but then aren't they always? I would certainly advise playing with some simple set-ups first just to get the feel of it, and thoroughly researching the various patterns available.

We chose a fairly straightforward master-worker approach: a master process collects submitted orders from the sales channel (in this case the web, but as the master is simply the provider of an order collection service, there is scope for other channel support) and places them into a space, and a host of workers go to.. er.. work on it.

Workers can be at whatever granularity you need (credit check, stock management, status update, etc) and can operate in the sequence you need them to (this is because it's simple to tell each worker only to look for orders that match a certain pattern in the space). Getting orders in and out of the space safely is part of the API, and when you need to scale you just add another box to the space and off you go.

We assumed that legacy systems would be unavailable, rather than available, and designed the process accordingly (avoiding the common 'fallacy of order management' computing).

There were technical challenges along the way, but nothing we couldn't handle (if I had my time again I would have introduced the ideas and the training much earlier), the biggest of which by far was dealing with the few developers who were against trying anything seen as 'new' or 'different' (depressingly common in teams with an entrenched love of their own suffering).

But would I do it all again? Certainly. Despite project and political annoyances, which let's face it are prevalent everywhere, it became a raging commercial success. I can't quote figures, naturally, but the development paid for itself very quickly, and went on to support a sizeable percentage of the company's business without any downtime.

Famously, when a legacy system did go down for a short time, all the orders that otherwise would have been in jeopardy, were held safely and restarted later, and whilst the architecture may only have played a supporting role, the site went on to win best telecom web site of 2006 as voted on by customers.

Not a bad day's work, all in all.

*Please note that the views in this article are entirely my own and not those of Virgin Mobile or Gigaspaces (I am neither employed by nor affiliated with either company). Details that might be seen as commercially sensitive have been intentionally withheld.