Big changes recently at Complaints HQ.  The TL;DR is that Complaints East has become Complaints West!  The good news about moving cross country to California is that you end up in California.  The bad news is that you end up in California living in a shoe box.  Three people in a shoe box doesn’t leave much room for full tower based multi-monitor gaming rigs and rack mounted home labs.  As a result, the true star of these pages, the infamous “Big Bertha”, who served faithfully through five generations of SLI… is no more (moment of silence).  Fortunately, laptops don’t take up too much room, and the Razer Blade 14, 2015 edition has proved to be a great gaming platform.  The GTX970M can handle basically anything you throw at it, at max details and 60fps, as long as you cap the res at 720p.  Windows 10 has corrected the annoying non-native res scaling issue in DirectX on touch screens, so 720p full screen looks great even on a native 1800p panel.  Yes you could take many games to 1080p, but it’s hit or miss and you’ll often have to sacrifice detail or frames.  I’m fanatical about giving up resolution before either of those (especially minimum frames – hence all of the SLI builds over the years), so I’ve stuck with 720p.

On a 14″ panel this isn’t as bad as it sounds, and the detail level on max means most games that I have on both Windows and PS4 still look better on the Razer despite running at 1080p on the Playstation.  All of this said, there is a storm on the horizon.  For anyone who has been on an expedition in the Arctic, VR is finally a legit thing. I’ve heard some things.  And I’ve seen some things.  And believe me when I say, VR is (Trump accent) gonna be huuuge!  The problem of course is that a GTX970M doesnt quite cut it for VR.  NVIDIA acknowledges this and as a result, for the first time ever, is releasing a desktop GTX980 in MXM.  Thing is, I kind of like a laptop to have a hint of portability.  The Razer is fantastic in this area.  It’s about the same form factor (slightly smaller) as a MacBook Retina.  Highly portable.  In addition, battery life using Optimus is excellent (I’ve gotten 7 hours on battery doing regular work, and have even gotten about 2 and a half hours gaming on battery).  The GTX980 MXM isn’t going to be finding its way into anything south of a large 17″, and even then isn’t going to result in a power efficient system.

So this left me with a dilemma.  Miss out on PC VR (I’ll save PS4 VR for a different entry), or try to sell the Razer, wait for the GTX980 MXM release and switch to a tank laptop?  There had to be another way right?  I mean it’s 2015.  How could it be that the GTX980 was making its way to MXM, yet there were still no true gaming ultra small form factor solutions?  As it turns out, the past few generations have really legitimized Mini-ITX as a gaming solution (Mini-ITX?!)  In the past I had had great success with Shuttle PCs (even as gaming rigs – once building an FX-60 / ATI X1900XTX Shuttle), so it was nice to see an industry standard small form factor becoming mainstream for gaming builds.  Unfortunately though, the shoe box is a problem.  Basically, the absolute dimensional limit I was facing for the case was 7″ x 11″ x 14″.  In addition, I wasn’t going to bother unless I could manage a full height dual slot 290mm GPU.  And I wanted it delivered by unicorn!  In all seriousness, pouring those filters into any case search doesn’t yield many choices.  And the X factor here is that the design had to be “living room worthy”.  Without further ado, here is where the roulette wheel landed:

2015-10-04 18.49.00 2015-10-05 17.15.392015-10-06 17.35.26









Let’s go through the full list and I’ll explain my choices:

  • CASE: MSI Z97 Nightblade – absolutely the slickest looking case that can handle a full sized GPU, while still fitting into 7″ x 11″ x 14″ (at 6.7″ x 10.92″ x 13.46″).  The downside was that MSI is still shipping the Z97. At first I had disqualified the Nightblade for that reason.  Then, after a lot of case research and spotting a special at Fry’s, I decided I would pick it up and just swap the board for a Z170 Mini-ITX.
  • CPU: Intel 6700k.  With Skylake out, albeit yielding only nominal gains over Devils Canyon, the 6700k is a no brainer.  I briefly considered X99, but it doesn’t make a lot of sense in a SFF ITX build in my opinion.  Not a lot of use for 40 PCI-E lanes when you have one slot.  In addition, Z170 provides the M.2 slot allowing another storage option in the premium real estate of a mini ITX.
  • MOBO: MSI Z170I  Gaming Pro AC – Since the Nightblade is an  MSI package it seemed fitting that the replacement board should also be an MSI.  It doesn’t hurt that this round MSI has one of the best Z170 options for mini ITX!  Admittedly there aren’t many choices and they’re a bit of a commodity, but the MSI includes an Intel I219-V NIC, USB 3.1 and an M.2 slot underneath.
  • STORAGE:  3 x Samsung 500GB EVO 850 in both mSATA and 2.5″ form factors (2 mSATA, 1 2.5 SATA) in a “you only live once”, caution to the wind, RAID 0 setup.  Why the mix?  Simple.  It goes back to the ITX real estate issue.  The Nightblade provides room for 2 2.5″ drives, but also includes a custom caddy allowing 2 M.SATAs to occupy a single 2.5″ slot.  The motherboard also provides a bottom mounted M.2 slot, but you have to be careful here as it can only handle up to 60mm length cards.  This means that 2260 models are the physical limit.  In addition, the M.2 slot can handle either SATA or PCI-E bus cards, but if a SATA card is inserted, two SATA ports on the front of the motherboard will be disabled (port 5 and port 6).  And of course M.2 cards in SATA mode are no different than any other SATA SSD, so that tradeoff really isn’t worth it.  What this means is that only a 2240 or 2260 PCI-E M.2 SSD makes sense, but as of this writing no such part exists.  It’s a shame because I really liked the idea of an M.2 boot drive mounted right on the motherboard in super-fast PCI-E.  Possibly a future upgrade!
  • RAM: Skylake is a controversial release because early benchmarks showed modest (at best)  gains.  What has become apparent over time though, is that Skylake is hungry for bandwidth.  The faster the DDR4, the better it performs.  Paired with very fast RAM, it can put a 20% hurting on Haswell in many benchmarks.  With this in mind, I opted for Corsair Vengeance LPX PC3200 with a CL16 latency in 8×8 form factor.
  • GPU: The motherload here.  EVGA GTX980Ti SC+.  Yes it fits.  Yes the 600W supply is sufficient.  It’s a beautiful thing!
  • COOLER: Corsair H75.  A no brainer to me.  I’m a huge fan of closed loop water.  It’s quieter, easier to work with, and less ugly than a giant air tower and, while not necessarily as efficient as a well tuned bespoke water configuration (with a large rad), it’s a hell of a lot easier.  The H75 is marked 1150 on the box but is just fine for 1151.

Ok with the inventory out of the way, let’s get down to it.  First things first – the Z97 board has to go.  Pulling the board is doable (although the case is very snug).  The first step is to lay the case on its solid panel side and find the red tabs on the back.  This is a very nice tool-less design.  The two tabs slide towards each other with a nice solid motion unlocking the panel.  Once unlocked it can be slid back and lifted off:

2015-10-04 19.07.24

Underneath the vented panel, we find a fan shroud.  We also find that our tool-less design has hit a wall.  Two small screws hold the shroud in place.  Incidentally this is where we will be installing the H75 rad in just a bit:
2015-10-04 19.57.24

With the fan shroud out of the way we can get at the internals.  Again, it’s a bit cramped, but really nicely laid out with great cable routing.  The first step is to pull and tuck away all of the cables.  Four screws hold the standard Z97i AC Mini-ITX board in place:

2015-10-04 21.14.14

So the Nightblade really is a standard Mini-ITX case, but with a few caveats:

  • the case has a built in LED bar that connects to the JLED1 header on the Z97i board.  It color cycles between red and white based on the position of the front panel “Turbo” button
  • the Turbo button lead connects to the turbo jumper on the Z97i board
  • there is no standard PC speaker
  • the front panel connects via a 20 pin USB 3.0 header and a 10 pin front panel header for the power switch and power LED.  There is no reset switch.

2015-10-04 21.16.58

The Z170i is a standard Mini-ITX and so drops right in.  Important to note is that it is missing the JLED1 header, but has a speaker header.  Like its Z97 sibling it features 2 fan headers.  Otherwise the boards are functionally identical and even the layouts are similar enough that it lends itself well to the Nightblade interior.  One caveat is that the USB 3 header location on the board is along the top edge rather than the side.  It’s a minor difference, but enough that the USB 3 cable included with the kit does not reach.  Easy fix with a USB extension, but a bit of a nuisance:


With the exception of the USB3 header location the Z170 fills the shoes of the Z97 nicely:

2015-10-06 17.42.26

Next up the 6700K drops in and the H75 115x mounting bracket goes on the back:

2015-10-06 19.01.00

2015-10-06 17.58.18

Time for the H75 to mount up.  The H75 includes two 120mm fans and a 1x120mm rad.  The fans should be arranged in a push/pull config, setup as intake to bring cold air from outside the case across the radiator, and mounted on the cases fan shroud:

2015-10-06 20.07.342015-10-06 19.58.17

The CPU block is an effortless install.  Corsair has been at closed loop for a while now and has it really dialed in:

2015-10-06 20.44.30

The dual mSATA to 2.5″ format converter is a really nice custom piece that MSI includes.  Unfortunately they only include one, but basically you slot in the two mSATA drives and secure them with a single screw each, attach two data cables and one power, and then slot the carriage into the tool-less drive bay on the supplied rails:

2015-10-06 21.40.302015-10-06 21.39.10
For the other 2.5″ bay, rails are included to mount a standard SSD:

2015-10-06 21.44.522015-10-06 21.48.35

And last but not least in the component install process, the GTX980Ti.  Nestled in snugly, but a great fit.  The PSU has flexible power connectors and the slot is fully tool-less (both nice touches):
2015-10-06 23.37.49

Before closing her up, all cables are connected to their respective headers.  For the missing PC speaker, I opted to get one of these:


That leaves just the case LED. with no LED header and no turbo header on the Z170i, I decided to wire up the LED permanently.  It was a simple matter of wiring up a SATA power lead and converting it to 3 pin:



The final assembled package cleans up quite nicely and runs reasonably quietly.  In the upcoming entries Ill cover the BIOS settings, RAID config, and software configuration experience before moving on to Steam Streaming and some benchmarking, so stay tuned!

Today I was watching the old 90’s movie “The Mask”, based on the comic of the same name.  In the movie adaptation, the titular “mask” is an artifact made by the Norse god Loki which imbues its wearer with tremendous supernatural power while amplifying their personality.  It’s a surprisingly decent adaptation, was a perfect fit for Jim Carey at his comedic peak, and introduced Cameron Diaz to the world.  So there is a lot to like in The Mask, but this isn’t really about any of that.  Thinking about Loki got me thinking about the theological archetype of “the trickster”.

For anyone not versed on theology, or cultural anthropology, a quick primer might be in order.  Throughout human history, mankind has created myths and religion to fill in gaps in our knowledge and explain the unexplainable.   And in each case, these tales have been told in human terms.  Gods and goddesses were, ironically, created in our image as a reflection of various facets of the human condition and the natural world; each representing some human quality, or natural force, personified.  Love, war, death, fertility, compassion, the elements; everything one would imagine has been represented in nearly every cultures dogma.  And among all of these archetypes, the most interesting is “the trickster”.

The trickster is found in every culture.  It represents chaos.  But more importantly, non-conformity.  To attempt to brand this as “good” or “evil” completely misses the point.  And this is where I think monotheism has caused tremendous harm.  With the introduction of monotheism, all nuance was lost.  What became of the trickster?  Non-conformity, disruption, “naughtiness”, anything out of sync with the status-quo, have no real place in a purely binary system.  All of these grey area behaviors end up assigned to the devil figure.

This type of binary thinking is extremely dangerous as it takes normal human behaviors, and casts divine judgement on them.  Instead of being a personified reflection of human behavior, the divine becomes an unreasonable, and unachievable, ideal.  After all, would a perfect “god” figure every have a moment of weakness?  Or decide to break a rule?  No.  The message becomes “you are human and weak, so you break a rule, but you should beg for forgiveness and possibly be redeemed”  But the thing is, rule breaking is at times essential.  Part of being human is learning what being human means, and this requires exploration that can be messy.  Relegating all human frailty to the archetypal evil, pushing it into the shadows with “the devil”, and standing in denial and judgement, leads to the types of repression and abuse that we have seen in nearly all modern monotheistic religions.

Contrary to the notion that polytheism is the primitive, and unsophisticated, belief system, I think a strong case can be made that polytheism was far healthier and much better aligned to the needs that religion should fill.  Taking the totality of the human condiiton and being realistic about it.  Assigning a divine overseer to the full spectrum of human behavior, and all of its shades of grey, rather than attempting to impose a simplistic and binary ideal.


When Alan Turing first proposed the Imitation Game, he was essentially side-stepping the fundamental existential question of what it means to be sentient.  The point of the Imitation Game, was whether a machine could be programmed to be sufficiently non-deterministic that it could convince a human operator that it was sentient.  This is a critical difference, so it is ironic that over time, Turing’s famous “test” has been misinterpreted as a test for “artificial consciousness”.  It has now been more than a half century since Turing’s masterwork was published in “Mind” and times, they have a changed!

Today the press is crowded with click bait headlines about “Artificial Intelligence” and “Machine Intelligence” at the slightest utterance from one of the Silicon Valley elite.  And of course the immediate reaction from the masses is to recall generations of nightmarish dystopian sci-fi, and make doom laden comments about “Skynet”, “Wargames”, “The Matrix”, et al.  Lately, though, I’ve been thinking that this entire line of reasoning, both the press over exuberance, and the borderline hysteria reaction to it, is a distraction from what may very well be a vital moment in time.

It is time we return to the roots of Alan Turing’s work, and ask some significant questions.  There is no doubt today that a machine can fool a human.  The Turing Test has been passed.  There is also, however, no doubt that even the most advanced machine intelligence learning algorithms aren’t remotely “sentient”.  At least not as sentience occurs in higher level mammals.  But I think focusing on this extreme edge case, the most sophisticated organic life forms, is causing us to completely miss a critical existential dilemma.  Where do these developments leave the least sophisticated organic life forms?

Take your common, garden variety, honey bee, for example.  There is no doubt that a bee is an absolute wonder of nature.  It is a brilliant piece of organic programming.  And yes, the choice of wording here is very deliberate.  For all of their sophistication, bees do very much follow the pattern of a pre-programmed determinism.  It could be argued that possibly humans do as well, but in higher order life forms like humans, the branching logic is so complex, and the “program” is sufficiently stochastic, that a pattern is almost impossible to identify at anything less than extreme breadth (ie: “drive to reproduce”).  In a bee, things are very different.  Bees are born into a role.  They perform this role unerringly their entire lives.  They do not question it, nor can they.  They build the most wondrous structures, achieving mathematical perfection in the hives, but this is all they build.  They utilize ultraviolet vision to find and extract pollen over miles of ground, but they do no other exploration with it.  They seek pollen.  They extract pollen.  They return pollen.  They eat pollen.  They produce honey as a by-product.  They protect and nurture the queen.  The queen populates the hive.  If threatened, they defend.  The tragic, and possible catastrophic, Colony Collapse Disorder proves that if any anomaly is introduced into their environment, it can cause cataclysmic disruption of the entire life cycle.

So a bee, in many ways, is a little organic robot evolved to keep plants healthy.  It is perfect at its task.  It is unquestionably alive.  We know this because it is organic.  But our definitions don’t extend much beyond that, do they?  As of today, in the 21st century, our formula for “life” looks like this:

                                           IF organic THEN alive

This is by no means facetious.  Our only definition of “life” is the above.  Now “sentience”? Or “intelligence”?  There we resort to “you know it when you see it”.  There are some tests for “self awareness”, but they are dubious and highly questionable.  What does all of this mean?  It means that for all of our progress in science and mathematics, we haven’t made too much progress on some important existential questions.  Philosophers of course have spent lifetimes focused on these areas, but philosophers produce additional thought as their primary output, rather than anything truly tangible.

So where does this leave the bee?  It’s hard to say.  The bee is certainly alive, but is it truly intelligent?  Is it sentient?  Without being able to define these terms, the answers to those questions are left purely up to the observer, which brings us full circle, back to Mr. Turing.

I think we have reached the point today where it is worth asking… Is the most sophisticated machine intelligence (take Watson, as an example) any more or less “intelligent” or “sentient” than a bee? And is it alive?  The obvious answers are “no”.  But this answer begs the question “why not?”  It is almost certainly based on the above simple rules of “I know it when I see it” and “IF organic THEN alive”.  It is time to have a serious discussion about how sufficiently sophisticated human generated code measures up against the least sophisticated nature generated code.  Watson, meet honey bee!


Part the First: Introduction
Momentum in the cloud space continues to accelerate.  Trends that have been clearly indicated for a few years now are starting to evolve rapidly.  In short, IT is dead, all hail Shadow IT.  Dramatic, but perhaps not really accurate.  The reality is we are solidly in the midst of a period of creative destruction.   This isn’t the shift from physical to virtual, it’s the shift from Mainframe to distributed systems.  What is the proof of this? There are two indicators that mark a period of creative destruction in technology: the emergence of dramatically new design patterns and a shift in sphere of control.  The latter is clear.  Core IT, which has always struggled in its relationship with the business it serves, is clearly now ceding authority to architects, analysts and developers in the business lines who are tied closely to profit centers and the value of whose work can be clearly articulated to the CEO.  This is a shift that has been brewing for quite some time, but the emergence of viable public cloud platforms has finally enabled it.  The former, the shift in design patterns, is flowing directly from this shift to a more developer centric IT power structure.  In contrast to the shift from physical virtual, which saw almost no evolution in terms of how applications were built and managed (primarily because it was core IT infrastructure folks who drove the change), the shift from virtual to cloud is bringing revolutionary change.  And if there is any doubt that this is vital change, just step back from technology for a moment and consider this… What business wouldn’t want a continually right sized, technology footprint, located where you need it when you need it, which yields high value data while serving customers, all for a cost that scales linearly with usage?  That is the promise that is already being delivered in public cloud by those who have mastered it.  The rub though, is in mastering it.  Programmatic control and management of infrastructure  (devops) isn’t easy and the tools aren’t quite there yet for developers to be able to not worry about it. 

Part the Second: Historical Context
Before we get to where things are headed, it’s worth revisiting how we got here. Taking a look back at history, “cloud” became a meaningful trend by delivering top down value.  The first flavors of managed service which caused the term to catch on were “Software as a Service” and “Platform as a Service”, both of which are application first approaches that disintermediate core IT.  SaaS obviously brings packaged application functionality directly to the end user who consumes it with minimal IT intervention.   Salesforce is the great example here, causing huge disruption in a space as complex as CRM by simply giving sales folks the tools they needed to do their job with a billing model they could understand and sell internally to the business.

PaaS sits at the other side of the spectrum and was about giving developers the ability to build and deploy applications without caring about pesky infrastructure components like servers and storage.  Google AppEngine and Microsoft Azure (and a strong entry from Salesforce in Force) blazed the trail here.  Ironically, though,  most developers weren’t quite ready to consume technology in this way, the shift in design thinking hadn’t occurred yet, and the platforms had some maturation to do (initial releases were too limiting for current design approaches while not bringing any fully realized alternative).  It was at this point that Amazon entered the market with S3 and EC2 (basically storage and virtual machines as a service), Infrastructure as a Service was born, in turn giving birth to confusing things like “hybrid cloud” and, as early players like Google and Microsoft pivoted to also provide IaaS, it looked like maybe cloud would be more evolution than revolution.

Part the Third: Shifting Patterns
Looking deeper though,  it’s clear that commodity IaaS is just a stop gap.  Even the AWS IaaS  portfolio reveals all sorts of services, both vertically and horizontally,  that are well beyond commodity infrastructure.   While the decade long shift from physical to virtual brought no real change in how servers were deployed and managed, or applications were built, instead just adapting existing processes and patterns to a faster deployment model, the shift to cloud has already brought revolutionary change in 3 years when it comes to design patterns and how resources are managed and allocated.  The best way to understand this is to consider how cloud design patterns compare to legacy design patterns.  The AWS approach is a bit of a bridge between past and future here and so provides a very easy to understand example:


A typical 3 tier app is illustrated above.  Web, app, data, scale out, scale up; simple stuff.  Even in the AWS example though,  which maps very closely to a traditional infrastructure approach by design, there are some dramatic differences.  The A and B zones in the diagram represent AWS availability zone diversity.   What this means is physically discrete data centers.  Now note that load balancing, a fairly straight forward capability, is spanning these availability zones.  Driving down a layer, we see traditional server entities, so the atomic units of the service are still Linux or Windows VMs, but note that they are not only “autoscaling “, but are autoscaling across the physically discrete datacenters.  The implications of this architecture is that this application actually consists of n Web and App nodes across two physical locations with dynamically managed access and deployment. Moving to the data tier, we can see similar disruption.  Rather than a standard database instance, we can see a data service scaling horizontally across the physical locations.  Finally, unstructured data assets aren’t sitting in a storage array, but rather are sitting in a globally distributed object store.

On prem, much of this is very difficult (geographic diversity, autoscaling, giant object store) and some of it is impossible (capacity on demand, database run as a service).  For the N-tier app use case there is no immediate impact to the design pattern (hence Amazons mindshare success in legacy apps), but the implications of the constructs are clear.  If you can dynamically scale infrastructure globally on demand, and maintain service availability, there is no need to limit your architectures based on the operational limits of traditional infrastructure.  This is where cloud design patterns, and the notion of “design for failure” (vs legacy design approach where you assume iron clad fault tolerant infrastructure) were born.  At this stage none of this is theoretical, nor is it just a Netflix or NASA game.  Even traditional enterprises have real solutions in production.  How do we operationalize all of this though? That’s been the really hard part so far.  If there is one advantage to traditional infrastructure, it was that there is deep tribal knowledge and mature tooling to manage it.  Cloud platforms really are infrastructure as code and expect management via API.  Old tools haven’t caught up,  or no longer apply, and new tools have steep learning curves or require moderate development skill.  This is why we have seen the rise of devops and the explosion in interest for platforms like Chef, Puppet, Mulesoft,  etc.  It’s a tough problem though because ultimately devs don’t won’t to inherite ops  (this would count as a huge cloud downside for them), and it’s not clear that ops folks can reskill quickly enough, or at all, to transition.  In short, there is currently a vacuum and most folks are betting that the space will hash out quickly and that the tools will evolve before there is a real need to solve this from the customer side.  Personally I see most IT shops investing very cautiously here and “buying operate”, even as they shift real production to cloud, until the directional signals are more clear.

Part the Fourth: The Topic at Hand
So what are the directional signals? Consider; why should we limit ourselves to the constructs of legacy infrastructure if the inherent flexibility of the service can free us from them?   The answer here is that we shouldn’t and the directional signals are proving this.   So what are these technolgies headed, what do they mean, and why?  I’ve chosen a few of the big ones to explore.  Before we get into specifics let’s spend some time thinking about what is really required to get to “self operating infrastructure” and what might be missing from the architecture presented above. 

The Trinity
Where the proverbial rubber meets the road are of course are basic resource units.  We need bytes of RAM to hold data and code being currently executed, we need bytes of long term storage to hold them at rest, we need compute cycles to process them, and we need network connectivity to move them in and out.  Compute, network and storage: these abstracts remain the holy trinity and are so fundamental that nothing really changes here.   Today, access to these resources is gated either by a legacy operating system (for compute,  network, and local storage) or an API to a service (for long term storage options).  Unfortunately,  the legacy OS is a pretty inefficient thing at scale. 

A Question of Scale…
Up until very recently, traditionally operating systems really only scaled vertically (meaning you build a bigger box).  Scaling horizontally, through some form of “clustering” was either limited to low scale (64 servers let’s say) and “high availability” (moving largely clueless apps around if servers died), or high scale by way of application resiliency which could scale despite clueless base operating systems (Web being the classic example here).  This kind of OS agnostic scaling depends on lots of clever design decisions and geometry and is a fair bit of work for devs.  In addition to these models, there were some more robust niche cases.  Some application specific technologies were purpose built to combine both approaches (example being Oracles “all active shared everything” Real Application Clustering).  And finally the most interesting approaches were found in the niches where folks were already dealing with inherently massive scale problems.  This is where we find High Performance Computing and distributed processing schedulers and also, in the data space, massive data analytics frameworks like Hadoop.

Breaking down the problem domain, we find that in tackling scale, we need some intelligence that both allocates and tracks the usage of resources, accepts, prioritizes and schedules jobs against those resources, stores, manages and persists data, and provides operational workflow constructs and interfaces for managing the entire system.  There is no “one stop shopping” here.  Even the fully realized use cases above are a patchwork tapestry of layered solutions.

…and Resource Efficiency
Not only are the traditional OS platforms lacking in native horizontal scaling capabilities, but getting back to our resource trinity, they aren’t particularly efficient at resource management within a single instance.  Both Linux and Windows tend to depend on the mythical “well behaved app” and are limited in their ability to maximize utilization of physical resources (hence why virtualization had such a long run – it puts smarter scheduling intelligence between the OS and the hardware).  But how about inside that OS? Or how about eliminating the OS all together? This brings us nicely to a quick refresher on containers.  The point of any container technology (Docker, Heroku, Cloud Foundry, etc) is to partition up the OS itself.  Bringing back a useful illustration contrasting containerized IaaS to Beanstalk from the container entry, what you get is an architecture that looks like this:


The hypervisor brokers physical resources to the guest OS, but within the guest OS, the container engine allocates the guest OS resources to apps.  The developer targets the container construct as their platform and you get something forward from, but similar to, a JVM or CLR.

There is still a fundamental platform management question here though.   We now have the potential for some great granular resource control and efficiency, and if we can eventually eliminate some of these layers a huge leap forward in both hardware utilization and developer freedom, but we really don’t have any overarching control system for all of this.  And now we’ve found the eye of the storm.

A War of Controllers
Standing in between the developer and their user, given everything discussed above, remains an ocean of complexity.  There is huge promise and previously unheard of agility to be had, but the deployment challenge is daunting.  Adding containers into the mix actually increases the complexity because it adds another layer to manage and deploy.  One way to go is to be brilliant at devops and write lots of smart control code.  Netflix does this.  Google does this to run their services, as do Microsoft and Facebook.  Outside of the PaaS offerings though,  you only realize the benefit as a side effect when you consume infrastructure services.  That’s changing however.  There is pressure from the outside coming from some large open source initiatives and this is causing an increased level of sharing and,  quite likely ultimately some level of convergence.   For now, the initiatives can be somewhat divided into top down and bottom up.

The View from the Top
Top down we’re seeing the continuing evolution of cloud scale technologies focused specifically on code or data.  Googles Map Reduce, which became Hadoop, is a great early example of this.  Hadoop creates and manages clusters on top of Linux for the express purpose of running analytics code against datasets which have an analytics challenge that fits into the prescribed map/reduce approach (out of scope here, but great background reading).  Other data centric frameworks are building on this.  Of particular note is Spark, which expands the focused mission of Hadoop into a much broader, and potentially more powerful, general data clustering engine that can scale out clusters to process data for an extensible range of use cases (machine intelligence, streaming, etc). 

On the code side, the challenge of placing containers has triggered lots of work.  Google’s Kubernettes is a project which aims to manage the placement of containers not only into instances, but into clusters of instances.  Similarly, Docker itself is expanding the native capabilities beyond single node with Swarm which seeks to expand the single node centric Docker API into a transparently multi-node API.

…Looking Up
Bottom up we find initiatives to drag the base OS itself kicking and screaming into the cloud era.  Or, in the case of CoreOS, replace it entirely.  Forked from Chrome, CoreOS asks the question “is traditional Linux still applocanle as the atomic unit at cloud scale?”  I believe the answer is no, even if I’m not committing to betting that the answer is Core.  In order to be a “Datacenter OS”, capabilities need to be there that go beyond Beowulf.  I’m not sure if it’s there yet, but CoreOS does provide Fleet which is native capability for pushing components out to cluster nodes.

Taking a less scorched earth approach, Apache Mesos aims more at being the modern expression of Beowulf.  More an extensible framework built on top of a set of base clustering capabilities for Linux, Mesos is extremely powerful at orchestrating infrastructure when the entire “Mesophere” is considered.  For example Chronos is “Chron for Mesos” and provides cluster wide job scheduling.  Marathon takes this a step farther and provides cluster wide service init.  Incidentally Twitter achieves their scale through Mesos, lest anyone think this is all smoke and mirrors! And of course the logical question here might be “does Mesos run on CoreOS?” And the answer is YES, just to keep things confusing.

What you Don’t See!
As mentioned above, Google, Microsoft, Amazon and Facebook all have “secret”, or even “not so secret” (Facebook published their orchestration as OpenCompute) sauce to accomplish all, or some, of the above.  Make no mistake… This space is the future of computing and there is a talent land grab happening.

And the Winner Is!
Um… sure!  Honestly, this space is still hashing out.  There is a lot of overlap and I do feel there will need to be a lot of consolidation.  And ultimately, the promise of cloud is really just bringing all of this back to PaaS, but without the original PaaS limitations.  If I’m a developer, or a data scientist, I want to write effective code and push it out to a platform that keeps it running well, at scale, without me knowing (or caring) about the details.  I’m buying an SLA, not a container distribution system, as interesting as the plumbing may be!

Chat  —  Posted: May 25, 2015 in Computers and Internet

UPDATE: 6/12 – Huzzah!  The latest build of Windows 10 beta (fbl_impressive) fixes the issue!  Relief is on the horizon!

There’s been lots written about the relative merits of the Metro UI being applied to the traditional desktop OS.  This entry isn’t about coming to that discussion late to the party.  If Windows 10 and Server ’12 tell us anything, it’s that the design aesthetic, at the very least, is here to stay for a bit and will continue to evolve.  Unfortunately, along with the stylistic changes introduced with Metro, came a particularly annoying bug which impacts an admittedly niche (but popular) corner case.  If you’re a PC gamer, who games on a laptop, and runs a high resolution desktop (think QHD), then you know exactly where this is headed.  When the desktop is running native resolution (basically all the time), games will not scale to full screen unless they are also running native resolution (and unless you have a Titan X equipped laptop, this is “never”).  Here is the effect:


To experience this phenomenon, a few conditions have to be met:

  • Windows 8+
  • Touchscreen panel
  • Running native res
  • Trying to run lower than native res under DirectX

I will also throw in these two, but with a clarifier:

  • Running IntelHD (although this is basically all laptops with few exceptions)
  • Running NVIDIA Optimus (the problem does also occur in pure Intel setups.  Nearly all NVIDIA laptops are Optimus, so it’s not clear to me if a pure NVIDIA setup would be immune)

Potential exceptions might be rigs running the full desktop NVIDIA parts as the only GPU, or AMD parts leveraging the APU and/or mobile ATI parts.  What happens in a nutshell is this:

  • The Intel drivers do not allow you to independently set scaling behavior by resolution (so you can’t go into the control panel and say “for 720p, always scale”)
  • There is no “default scaling behavior” that you can set – if the desktop is at native resolution, “maintain aspect ratio” is hard set since it is the only setting that makes sense for native res
  • NVIDIA cedes control of scaling to Intel under Optimus since, I believe, the NVIDIA part has no physical path to the panel (it passes through the Intel and relies on it for panel setup)

This problem has been lingering for years.  A quick web search for “cannot run full-screen non-native res” will show posts as old as 2012.  The work around thus far has been one of two things:

  • Run the desktop in something below non-native res (this sucks)
  • Set the scaling option to “scale up” under the Intel drivers

I recently switched to Windows 10 CTP and discovered that this problem persists.  The workaround, however, stayed the  same.  Until 5/15.  The latest updates to Windows 10, the IntelHD Windows 10 driver, and the NVIDIA Windows 10 driver, introduced a new dimension to the problem.  In the absolute latest Windows 10 and driver bits for Intel/NVIDIA, the above work around no longer works. So are we all doomed to a life of postage stamp gaming?  No!  There is an actual work around (well actually I’ll discuss two).

First, the easy button.  Disable touch features in Windows.  For whatever reason, the touch panel HID driver is the actual root cause of this issue.  It can be disabled in Device Manager:


The other work around is functional, but can be a bit onerous for anyone who has a large catalog of games.  The solution here is to utilize the Intel Profile Manager capability to trigger a res switch when a game is run.  That can be found in the Intel HD control panel under “Profiles”:


To set this up, do the following:

  • First want to set your desktop resolution to the resolution you run full screen gaming under.  So for example, if you want the game to run 1080p, change your current default res to 1080p and set scaling to “scale fullscreen”.
  • Go into profiles (note “current settings” is what will be applied to the profile, this is why we set the resolution above) and set “trigger” to “application”
  • The “display” checkbox will be deselected, but you can reselect it
  • Browse to and select the appropriate EXE
  • Save the profile as something meaningful (ie: Crysis)
  • Rinse/repeat for all games

If you only have a few favorite titles that need to run at lower than native res when fullscreen this works fine.  If you have a big catalog, though, just disable the touchscreen.  Here’s hoping that this get’s fixed before we move to “Windows as a service”!

Gettin’ FREAKy

Posted: March 7, 2015 in Computers and Internet

Don’t let the ridiculous, “stretches the definition to the breaking point”, acronym fool you, that’s just marketing after all, FREAK is serious business.  For those not yet aware, FREAK is an exploit designed to take advantage of a critical vulnerability in SSL/TLS.  For anyone who just said “uh oh”, that’s the spirit!  Better grab a coffee.  The “Factoring Attack on RSA Export Keys” (I know, I know… This doesn’t remotely spell “freak”. I told you the acronym was agonizing) is complex to implement, and requires a fairly sophisticated attack structure, but incredibly organized and sophisticated attackers are hardly in short supply in today’s threat environment.  Before getting into how the exploit works. Some background is in order.

First is, of course, the “what” and “how” of SSL/TLS.  Secure Socket Layer/Transport Layer Security, in a nutshell, are standard mechanisms for creating encrypted network connections between a client and a server.  SSL and TLS take care of the encryption piece, agreeing on a method of how to encrypt the data (cypher), generating and exchanging keys, and then performing the actual encryption/decryption.  Network transport depends on an encryption aware protocol like HTTPS or FTPS .  Here is a nice detailed flow diagram that illustrates the conversation between the client and server (courtesy of IdenTrustSSL):

If you take a close look at the above flow, you’ll notice that there are really  two encryption stages.  Steps 1 – 3 are a standard PKI (Public Key Infrastructure) negotiation whereby a server is configured with a certificate (identifying it and providing authenticity assurance) and a public/private key pair.  When a client comes and says “hello!” (plus a random number… more on that later), the server sends on over its certificate and public key (and another random number…more later). The client then decides to trust the certificate (or not and break the connection), and then sends over a new secret computed using the two random numbers we covered above, encrypted with the servers public key.

The server decrypts this with its private key, takes the secret generated by the client and, combing it again with the random numbers, generates a new key which will now be used to secure the channel for the duration of the connection.

Astute readers will notice that this means SSL and TLS are actually multi-layer encryption models utilizing both asymmetric encryption (separate public key and private key) for quick and easy implementation (nothing needs to be shared between a client and a server up front), and symmetric encryption (a single key for encryption and decryption that both sides know… A much faster method, but one which requires pre sharing).  It is the best of both worlds.  The channel setup efficiency and low level of required preconfigusation characteristic of asymmetric encryption plus the speed and added strength of symmetric.

In PKI methodology, the algorithm which generates keys should not allow factoring the private key from the public.   To achieve a reasonable level of security, dual key systems require a very high key strength.  Generally 1024 bits or greater.  Symmetric key schemes can get away with much lower strength – 128 bit or 256 bit being reasonable.  What does all of this mean?  Well let’s take one more step back and just review quickly what encryption really is (diagram lifted from PGP docs):

The above illustrates symmetric encryption, but the principal is always the same.  There is a message that two parties want to share.  They want it to be a secret so anyone who might intercept it or otherwise eavesdrop won’t understand.  For time immemorial messages have been kept secret using codes.  Encryption is a code.  The message is put through some fancy math, using a big complex number as a constant (the key) and a scramble message is created.  To descramble it, the message, the method (cipher) and the decryption key is needed.  So when we say that symmetric relies on a 128 or 256 bit key, that’s the size of the numberical constant being used as a key (a big number). There is a lot (a LOT) more complexity here, but this is enough for the context of this entry.

Now obviously, there are many, many methods for actually encrypting data (the fancy math algorithm referenced above), and there are varying key strengths that can “work”.  Typically it’s all a trade off between performance (compute overhead), which means cost, and security.

If the message does get seized, however that was accomplished, the data thief has a bunch of scrambled nonsense.  But as with any code, it is possible to “brute force” decrypt the message.  Basically try every possible value as a key.  The catch is, with a large enough key, there just isn’t enough computing power available to try all of the combinations in a reasonable timeframe.  At least there hasn’t been until now.

Enter… The cloud.  I am a true believer when it comes to cloud.  That said, I recognize than any great good can also be twisted to serve evil.  In the case of cloud, nearly infinite compute capacity can be purchased on demand and paid for as an hourly commodity.  It’s absolutely standard to day to model any computing task as a cost per hour directly mapped to a cloud service provider. And the resources can be provisioned programmatically.  What this means is that brute force operations that would have taken a desktop PC 100 years, can now be carried out across 10,000 PCs in 10 days if you’re willing to spend the money.  Still expensive, still not worth it.  At least at 100 years.  Of course with the proliferation of bot nets (or “dark cloud” as I like to call it), there may be no cost at all.  But let’s leave that aside for now.  What happens if the encryption is weak?

Enter… The export rules.  Way back when the U.S. Government made it illegal to export strong encryption.  Full stop.  The US was, of course, also pretty much defining the technology the world was adopting.  So what was considered “too strong”?  Anything over a 56bit symmetric key system or a 512 but asymmetric.  Egads!  Over time this has strengthened (since it was ridiculous) and of course an admin could always simply force the strongest encryption (of course that would mean geographically load balancing traffic to keep non US clients outside of the US)

With this background in mind, what FREAK does is take advantage of a vulnerability in SSL/TLS (both client and server) which allows a bad packet to be injected into the up front client/server “hello!” exchange, selecting the weakest level of encryption.  What this means is that, in the case of any servers which still have support enabled for the earliest “exportable” key strength (which turns out to be a LOT of servers), the key strength drops to 512bit.

Now combine this with cloud capacity (dark or legit) and you have about an 8 hour, $100, computing challenge to brute force a private key from a server since you now have a nice packet capture at 512 bit strength to take offline.

Wait! Take offline? Why would that work? Well.  Here is the thing.  The asymmetric key pair hangs around a long time.  Sometimes like… Forever! Literally.  Many servers only refactor on reboot and thanks to the miracles of high availability and “pets” based architecture approach (vs “cattle” in cloud design), that web front end may be online for years before rebooting.

So as you probably gathered, this requires a man in the middle:

This can be as easy as a bad actor on public wifi, or as complex as a compromised ISP router in the path of a high value server.  All very feasible to accomplish for a well funded black hat org.

Consider a real world potential scenario to imagine the possibilities… High value targets like Amex, Citibank and the FBI are vulnerable and have been for ten years (meaning export grade encryption is enabled and selectable).  On the client side, nearly every platform is vulnerable! The last piece is that what used to be the hard part, compromising the network path, has become easy thanks to public wifi ubiquity.  So let’s combine and imagine…

1) a popular public hotspot with weak security (WEP) and the key published on the register (or even not)

2) you hang out all day and capture wifi traffic

3) you either actually have the key, or you easily break it offline

4) with a nice unbound, shared media, network breached, you hang around and look for connections to the high value targets

5) you compromise the channel when you find one and start capturing weak encrypted traffic

6) traffic flow in hand, you brute force factor the key pair using about $100 of EC2 time

7) you are now free to watch and manipulate all traffic to the site until they change the key.  Which may never happen.

So what are the implications if the exploit is pulled off? Well… The attacker has the private key.  This means two big scary things:

1) until the keys are refactored they can instantly decrypt any traffic they can capture. Suddenly the expense of compromising an ISP path router just got a lot more realistic!

2) they can inject anything they want into an intercepted conversation

So do we turn off the interwebs??? Yes! Well no.  But this is a big one and a ton of high profile sites are impacted.  In my opinion a few things must happen:

1) weak encryption needs to be retired even as a configurable option.  And export rules need to be gone.  Strong encryption everywhere.  Let the NSA build bigger brute force machines

2) servers need to be updated ASAP.  Force strong encryption, disable export grade, and patch

3) keys need to be refactored multiple times per day.  Yes this is computationally expensive.  There are matter ways to do this than “bigger web server” though.  Rearchitect. Design for fail.  HSM. The truth is out there.

4) clients need to be patched as soon as patches are ready.  Linux, OSX, Windows, IE, Chrome, Firefox, IOS, Android.  Yikes!

Can anything be done in the meantime? Mainly be careful with public wifi (this is just a rule really).  Stick with authenticated public wifi using stronger encryption or VPN.  VPN is just a “from/to” secure channel to bridge networks, so it isn’t a panacea here, after all, you can’t VPN direct to a public server but it can help mitigate some risk exposure until the ecosystem is corrected.

Fun times!

Even as enterprises just start to wrap their minds about how cloud in general will transform the way they operate, the goal post is already moving forward.  If anyone out there has been looking for a final proof-point that the sphere of control has officially passed to the developer, this recent shift is all you need.  What am I on about this time and what the heck does that title mean?  A little history is probably in order.

In the beginning, there were Mainframes, and they were good.  Developers wrote code and dropped it into a giant job engine which then charged them for the time it consumed.  Paying too much?  Well time to optimize your code or rethink your value proposition.  This worked for quite a while, but as technology evolved it inevitably commoditized and miniaturized and as a result became far more available.  Why wait in a queue for expensive processing time, purchased from a monopoly, when you could put it on your desk?  The mini-computer revolution was here, quickly giving way to the microcomputer revolution, and it was also all good.

Computers stranded on desks aren’t particularly useful though, so technology provided an answer in the form of local area networks, which quickly evolved into wide area networks, which ultimately enabled the evolution of what we today call the Internet.  All of these things were also good.  As technology continued to commoditize, it became a commonplace consumer product like a car or a toaster.  Emerging generations were growing up as reflexive users of technology, and their expectations were increasingly complex

To keep up, companies found they had to move fast.  Faster than IT departments were able to.  Keeping track of thousands of computers, and operating the big expensive datacenter facilities they lived in, was certainly an “easier said than done” proposition.  By the mid 2000s, rapidly evolving agility in software came to the rescue of what was, in essence, a hardware management problem.  Virtualization redefined what “computer” really means and operating systems became applications that could be deployed, removed and moved around far more easily than a physical box.  This was also good, but in reality only bought IT departments a few years.  The promise of virtualization was never fully exploited by most since the toughest challenge is almost always refining old process and , at the end of the day, there were still physical computers somewhere underneath all of that complex software.

In the last years of the last decade, a new concept called “cloud” grew out of multiple converging technologies and was the catalyst that literally blew the lid off of the IT pressure cooker.  If you think about how technology is consumed and used in any business, you have folks who look after, translate and then solve business problems (business analysts, developers, and specialists) and then you have the folks who provide them with generic technical services to get their work done (security folks, operations and engineering folks and support professionals).  By the time “cloud” arrived in a meaningful way, the gap between technology folks in the lines of business, and the technology folks in core IT, had grown to dangerous proportions.  In short, the business lines were ready for new alternatives.  From cloud providers they found the ability to buy resources in abstract chunks and focus primarily on building and running their applications.

This trend has transformed IT and we are in the midst of its impact.  The thing is, though, that technology adoption cycles at the infrastructure layer (once reduced to glacial pace by the limits of core IT adoption abilities) will now rapidly accelerate.  Developers are expecting them to since the promise of cloud is to bring them all of the efficiencies of  emerging technology with none of the complexity.

This is why, barely 2 years into the shift to cloud design patterns and cloud service consumption and operation models, we are already seeing a shift to containers (and also “SDDC”, but that’s a topic for another day)  What do these technologies mean to the new wave of IT folks though?  Well first let’s take a look at what we’re actually dealing with.  I will focus on two rival approaches.  The first is Amazons “Elastic BeanStalk”, which is how AWS answers the “Platform as a Service” question, and the next is the traditional “Platform as a Service” approach, and more recently the “Containers as a Service” (for lack of a better term) approach being provided by Google and Microsoft.  To kick things off, a quick diagram:

PaaS So what the heck is this about?  A few quick definitions:

  • Code – as implied, this represents a developer with a set of Ruby, Python, Java, or .NET code ready to be deployed onto some mix of compute, networking and storage
  • API – in the world of cloud, where developers rule, the API reigns supreme.  At code complete, developers will look for an API to interact with in order to deploy it
  • Orchestrator – if YOU don’t have to care about pesky things like servers and disks, SOMEONE must right?  Well that someone in this case is a smart piece of code called an orchestrator that will make some decisions about how to get your code running
  • Fabric Controller – the true secret sauce of cloud.  The brilliant code which allows mega providers to out-operate IT departments 1000 fold.  Think of the fabric controller as the fully realized Utopian dream state of virtualization, and “software defined everything”.  The fabric controller is able to manage a fleet of servers and disks and parcel their capacity out to customers in a secure, efficient, and highly available way that still turns a profit.
  • Instance/VM – Amazon calls them instances, everyone else calls them Virtual Machines.  It’s a regular operating system like Windows or Linux running on top of a hypervisor which in turn runs on top of a physical server (host) – OS as code.  The fabric controller monitors, provisions, deprovisions and configures both the physical servers and the virtual servers that run on top of them.
  • Container – the guest of honor here today.  The ultimate evolution of technology that started with the old concept of “resource sharing and partitioning” back in the Sun Solaris days and continued with “application virtualization” like SoftGrid (today Microsoft App-V).  The same way a hypervisor can isolate multiple versions of an operating system running together on one physical machine, a container controller can isolate multiple applications running together in one operating system.  Put the two together and you have the potential for really good resource utilization

With the definitions out of the way, let’s take a look at how AWS does things with BeanStalk.  Very simply, BeanStalk ingests the code that you upload and takes a look at the parameters (provided as metadata) that you have submitted with it.  The magic happens in that metadata since it is what defines the rules of the road in terms of how you expect your application to operate.  Lots of RAM, lots of CPU, not much RAM, more CPU than RAM… This is the sort of thing we’re talking about.  The BeanStalk orchestrator then goes ahead and starts making provisioning requests to the fabric controller which then provisions EC2 resources (instances) accordingly and configures cool plumbing like autoscaling and elastic load balancers to allow the application to graceful scale up, and down, and function.  Without caring about too much, you (as the developer), assuming your code works and you defined your parameters well, are in production and paying for resources consumed (hold onto that thought) immediately.

OK, that makes sense.  It’s basically “autoprovision my infrastructure so I don’t have to think about it”.  The developer dream of killing off their core IT counterparts.  Microsoft explored the same concepts an age ago with the Dynamic System Initiative and the Software Definition Model ages ago and ultimately (sort of) evolved them into Azure.  So how is PaaS different? And for that matter what the heck is “Containerized Infrastructure”?

Platform as a Service (PaaS), can be thought of as the final endgame.  Ironically, we got their first.  Google got the all rolling with AppEngine back in 2008. Microsoft actually lead with PaaS in cloud, a couple of years later in 2010, but for various reasons (a not so hot initial implementation, a customer segment that wasn’t ready, a hard tie-in at the time to .NET) they had to quickly backpedal a bit in order to get traction.  In the meantime Amazon was piling on marketshare and mindshare year over year with pure Infrastructure as a Service (basically “pay as you go” virtual machines) and Storage as a Service plays.

What PaaS provides, ultimately, is something akin to the original Mainframe model.  You push code into the platform, and it runs.  It reports back on how many resources you’re consuming and charges you for them.  Ideally it is the ultimately layer of abstraction where cumbersome constructs like virtual machine boundaries or where things are actually running are fully obfuscated.  No PaaS really works quite that way though.  What they really do is utilize a combination of containers and virtual machines.  This brings us to today where “container solutions” like Docker are gaining lots of traction on premises and Google and Microsoft are both educating developers on being container aware in the cloud.  Google has added the Docker compatible, Kubernetes based, “Container Engine” to their original “Compute Engine” IaaS offering and Microsoft has expanded their support for “Windows Server Containers” to include interoperability with Docker as well.

In the container model, a developer still submits code and metadata through an API, but what happens next diverges from what is happening in BeanStalk.  The orchestrator has more options.  In addition to asking for virtual machines from a fabric controller, it can also create a container on an existing virtual machine that has available resources.  The way the virtual machines maximize resource utilization of a host, the containers maximize resource utilization of each virtual machine.

Now if you’re thinking to yourself “why should I care?” the congratulations!  You get the cloud gold star of the day! I mean if we think about it, the point of cloud is that I really don’t care about what’s going on with infrastructure, so why should it matter to me how the provider is serving up the resources?  Well there are two primary reasons:

  • Economics – the whole point of this is doing more, and doing it more quickly, while spending less money.  This is why cloud is unstoppable and CFOs love it.  Despite protests to the contrary, it is proving cheaper than the legacy IT approach (no one is shocked by this except those with a legacy IT bias who have never deeply studied enterprise TCO).  With BeanStalk, you have a fairly resource heavy approach.  The atomic unit for scaling your app is a virtual machine.  As your app grows, it needs to grow in instance based chunks and you will pay for that in instance hour charges.  In theory a containerized back-end is more resource effective and should be more cost efficient.  In reality this will vary widely by use case which brings us to the second point…
  • Application Architecture – cloud design patterns are a fascinating turn around for development.  Enterprise developers spent years, and technology providers built a plethora of technology and process, in order to create invulnerable platforms for code.  Fault tolerance and high availability are industries because of this.  Cloud basically throws all of that away.  The mantra in cloud is “infrastructure is a disposable commodity” (this is the brilliant “Pets vs Cattle” analogy coined by Gavin McCance at CERN back in ’12)  The idea in cloud design is to design for fail.  You build resilience and statelessness into the application architecture and rely on smart orchestration to provide a consistent foundation even as individual components come and go.  Containers are a natural extension of this concept, extending control down into each virtual machine.  Container architectures can allow something like this:


Obviously a hypothetical example, but a hint of what ultimately could be possible.  If you think along the lines of app components being atomic units, rather than an OS, you can start to think about which components might benefit from co-location on a single machine.  Intra-OS isolation can potentially allow scenarios that traditionally might have been impractical.  So as you map out an architecture plan, you can begin to group application components by where they fit in the overall solution and allow the container orchestrator to group them accordingly.  In the example above we have front end-processing co-mingling with the web tier while app logic code co-mingles with data.  Again, this isn’t the best example, but it is definitely easy for illustration.  Personally I think we are at the dawn of what can be accomplished with this next move forward.  Now if enterprises can just catch up with the last one.  They’d better hurry because, before you know it, containers direct on hypervisor will be here to really mix things up!