Sync My Clouds!

Posted: July 29, 2014 in Computers and Internet

As cloud services mature, one of the trickiest problems is definitely data sprawl. Issues of rationalization and migration of data become a challenge as information spreads across multiple services. If you consider music as an example, it is definitely possible to end up with a collection that spans Amazon Music, Google Music and iTunes. One of the only real ways to keep those particular services synchronized is to source them from a common distribution point, preferably living on a pure storage service. Of course depending on the size of your collection, this can require a fairly significant investment in cloud. In recent months, though, there has been an incredible land grab for consumer business that has seem rates for storage drop dramatically. Currently, this is how my personal spend/GB looks:

 

Base Storage for Subscription Tier Extra Storage (bonus, referral, etc) Monthly Cost Note
DropBox 100GB 7GB $10
OneDrive 1,020GB 10GB $11 Office365 Home Sub – lots more than just storage in here – plus 20GB base storage
Google Drive 100GB 16GB $2 Includes Gmail and Google+

Pretty impressive! Tallying things up, we’re looking at a total spend of $23 which provides:

  • 1253GB storage across 3 providers
  • Office 365 access (mail, SharePoint, Office Web Applications)
  • Office local install for Mac, PC, Android, IOS (multiple machines)
  • Live Mail, GMail, Google Plus
  • Desktop/device integration for all providers

To me this seemed like a fantastic deal for less than $25 a month and 1.24TB in the cloud is a ton of storage.  As a result, over the past few months, I have been shifting to a cloud only model for data storage.  The way I decided to run things was to make DropBox my primary storage service.  Despite having by far the worst economics (ironically DropBox has become ridiculously expensive compared to the competition), it has the best client integration experience as a result (IMO) of the service maturity.

So with DropBox in prime the next challenge was figuring out a plan for the secondary services.  At first I tried a model where I would assign use cases to each service.  So music in Google only, pictures on OneDrive only, documents across all 3.  This quickly fell apart as you wind up in a model where you need to selectively sync the secondary services, and you lose redundancy for some key use cases.  In analyzing my total usage pattern though, I found that as a high watermark I consume 75GB of space in the cloud (including documents, photos and music).  With the current $/GB rates, this data volume can easily fit in all 3 providers.  Realizing this I quickly moved to a hub/spoke sync model where I utilize OneDrive and Google Drive for backup/redundancy and DropBox becomes the master.  Of course the logistics of this proved very challenging having to utilize a middle man client to funnel the data around.  There had to be a better way. Wasn’t this a great idea for a startup? Well… Enter CloudHQ!

CloudHQ aims to provide a solution of the monumental task of cloud data sync.  As a premise it sounds amazing!  Just register with these guys, add your services, create some pairings, and let their workflow (and pipes) do the rest.  I’ve been tracking these guys for a while and it appears they are delivering. Of course the challenge is that to do meaningful work (more than one pairing) you need to pony up to the commercial level.  I held off a while to see how their service would mature.  Recently, though, they had a price drop that I feel represents a fantastic deal.  I was able to get onboard with the Premium level subscription for $119 by committing to 1 year. $10 a month is just a terrific price for a service like this so hopefully this price will lock-in moving forward.  Of course the service does have to work or it’s not such a great price right?  Well let’s see how things went!

First off… The sign-up and setup process was fantastic.  I actually went through the entire setup on an iPhone over lunch using my Google oID as a login.  Once signed up you can jump right in and get started.  Here is a shot of the basic mobile UI:

2014-07-29 17.05.00

 

I love how clean this is. Very clear how you can get started creating sync pairs using the supported named services.  Clicking one of those options will trigger a guided workflow.  In addition, you can setup your own sync pairs manually.  Either option brings you to service registration:

2014-07-29 17.06.22

CloudHQ currently supports a very nice set of services.  Supported services view from the desktop UI:

Screenshot 2014-07-29 21.38.15

 

Once services are registered and sync pairs registered, the service will start to run in a lights out fashion.  Updates are emailed daily and a final update message goes out once initial sync is completed.  The stages break down as follows:

  • Initial indexing and metadata population
  • Service sync (bidirectional)
  • Initial seeding complete
  • Incremental sync process runs indefinitely

In my case, there was about 75GB of data or so in play.  The biggest share was on DropBox and there was a stale copy of some of the DropBox data already sitting on both OneDrive and Google Drive.  In addition, there was a batch of data on both OneDrive and Google Drive that did not exist on DropBox.  The breakdown was roughly as follows:

  • DropBox – 56GB or so of pictures, documents and video
  • OneDrive – subset of DropBox content, roughly 5GB of picture data and 3GB of eBooks
  • Google Drive – subset of DropBox content, roughly 12GB of music and 5GB of picture data

The picture data was largely duplicated.  In approximate numbers, about 40GB had to flow in to OneDrive and Google Drive and about 15GB had to flow into DropBox.  Keeping an eye on sync status in the UI is terrific:

2014-07-29 17.06.35

 

In the desktop UI, there is great detail:

Screenshot 2014-07-29 22.20.02

 

The email updates are great.  Here is a sample of the initial email:

Screenshot 2014-07-29 22.25.24

These updates are very straightforward and will come daily.  The pair, and transfer activity for the pair, is represented.  In addition, there is a weekly report which provides a rollup summary:

Screenshot 2014-07-29 22.26.07

So how did the service do?  Quite well actually.  Here is my experience in terms of performance:

  • Account Created, services registered, pairs added:                                                           7/26 – 12:30PM
  • Indexing and initial metadata population complete, Evernote backup complete:     7/26 – 9:52PM
  • DropBox to GMail Complete, DropBox to OneDrive partial – 63GB copied:               7/29 – 10:30PM

No conflicts occurred and there have been no problems with any of the attached volumes.  I have to say I am extremely impressed with CloudHQ so far and pushing 63GB of bits around in a matter of 3 days is a fantastic “time to sync state”.

As my experience with the service increases I will continue to post updates, so stay tuned!

Upgrades!

Posted: July 12, 2014 in Computers and Internet

Well there is truly no rest for the weary. Or is it the wicked? Let’s compromise and say in this case it’s both! It’s no surprise that even a really sweet piece of kit like the Dell T620 isn’t going to stay stock for long at ComplaintsHQ where “live to mod” is a life motto. Luckily the recent generosity of family members wise enough to provide MicroCenter gift cards as presents provided just the excuse required to get some new parts.

It was hot on the heels of the initial install of the Dell that we added an SSD for VSAN testing and two ATI cards for vDGA View testing. Honestly though, vDGA isn’t cool. You know what’s cool? vSGA! For those saying “uh, what?”, both of these are technologies which allow a hardware GPU installed in the host to be surfaced in the guest OS (View desktops generally). With vDGA, a single GPU is dedicated to a single guest OS via Intel VT-D or AMD-Vi (IO MMU remap/directed IO technologies which allow a guest OS to directly access host hardware). This does work, but obviously isn’t very scalable nor is it a particularly elegant virtualization solution. vSGA, on the other hand, allows for a GPU installed in the host to be virtualized and shared. The downside is that there is a (very) short list of boards supported none of which I had on the shelf. The last item on the “to do” list from the initial setup was to get some sort of automated UPS driven shutdown of the guests and host in the (likely around here) event of power failure.

The current status to date (prior to the new upgrades) was that I had an old Intel X25 80GB SSD successfully installed and shared to the nested ESXi hosts (and successfully recognized as SSD) and vSAN installed and running. I also had a View config setup with a small amount of SSD allocated for temporary storage. With aspirations of testing both vSAN and running View 80GB of SSD really is tight so beyond saying “OK, it works!” not much could actually be done with this setup. Since SSDs are cheap and getting cheaper, I decided to grab this guy on super sale at MicroCenter for $99:

2014-07-12 15.52.02

While there I also picked up a small carrier to mount both SSDs in. I decided to also utilize some rails and mount the SSDs properly in one of the available 5.25 bays:

2014-07-12 16.00.03

The vSGA situation is certainly trickier than simply adding a budget SSD, but perusing eBay the other day, I happened upon a great find so, since I was upgrading anyhow, I jumped on it. Not only one of the few supported cards, but an actual Dell OEM variant for $225:

quadro4000

 

Another refinement I’ve been wanting to do to the server is to add power supply redundancy (mainly because I can leave no bay unfilled!).  I’ve committed to definitely resolving my UPS driven auto-shutdown challenge this round, so while not necessary, the redundant supply fits the theme well.  Luckily eBay yielded some more good results.  Dell OEM at $145:

2014-07-12 14.32.23

On the UPS side, you may remember that during the initial install of the server I had added in a BackUPS 1500 to run the ReadyNAS and the T620.  Unfortunately,  APC is a pain in the ass and VMware doesn’t make it any better.  Getting the ReadyNAS on managed UPS backup is as easy as plugging the USB cable in and clicking a checkbox using any APC unit.  In VMware, this is pretty much impossible.  Unless you buy not only the highest end of the SmartUPS line, but also buy the optional UPS network card (hundreds more), there is really no native support to be found.  I had explored some options using USB passthrough from the host to a Windows guest, combined with some great open source tools like apcupsd and Network UPS Tools.  I never quite got things working the way I wanted though.  More on that later…

OK, so that is the part list!  Total damage for all of the above was $900.  Steep, but almost half of it was actually the UPS.  As always, there is no better way to start healing from the emotional trauma of spending money than to start installing!  Let’s begin with the super easy stuff; the PSU.  I can honestly say that installing a new hot-swap supply in a T620 actually couldn’t be any easier.  First step is to access the back of the case and pop off the PSU bay cover (it pops right out):

2014-07-12 16.02.19

With the bay open, you literally just slide the new supply in and push gently (you will feel the connector catch and seat):

2014-07-12 16.03.06

Once installed, head into iDRAC to complete the power supply reconfiguration.  The options are very basic.  You can either enable or disable PSU hot sparing once the new one is in (and set which one is primary) and you can enable input power redundancy:

Screenshot 2014-07-12 18.28.55

OK, back to the UPS quandary! The general idea of VM based UPS control is as follows:

  • plug in UPS, plug server into UPS
  • attach UPS USB cable to server
  • enable passthrough for the USB channel (requires AMD-Vi or Intel VT-d, under Advanced Options in the Server Configuration in the VIM client)
  • add the USB device to a Windows (or Linux) guest VM
  • install the open source APC driver
  • install NUT
  • develop a script that fires off scripts on the ESX host prior to executing a VM shutdown (the host scripts will ultimately pull the rug out from under the UPS host VM which is fine)
  • make sure that VMware tools is installed in all VMs so they can be gracefully shutdown by the host
  • utilize either WOL (or an awesome ILO board like the iDRAC) to ensure that the server can be remotely brought back

Since I was in a spending mood, I decided to add a companion to my BackUPS 1500 just for the server.  Here she is:

2014-07-12 19.49.55

That is the SmartUPS 1000 2RU rack mount version.  So problem solved right?  Yeah no.  But before we get into that, let’s get this beast setup.  First the batteries have to be installed.  The front bezel pops off (it actually comes off and I popped it in for this photo) revealing a removable panel:

2014-07-12 19.49.36

A single thumb screw holds the panel in place.  Removing it allows the panel to be slid left and pulled forward revealing the battery compartment.  As always, the battery is pulled out by the plastic tabs, flipped over, and put back in where it will now snap into place (it’s own weight is enough to really seat it well if the unit is a bit angled).  The final product will look like this:

2014-07-12 19.49.02

In terms of connectivity, here is what you get (not joking):

2014-07-12 19.50.15

Yes, this is *one* USB cable and thats *it* for $450!

Now, let’s take a look at what APC requires for VMware host support:

  • a SmartUPS unit – check, we have this one
  • the optional network card – bzzzt… nope
  • serial only connection to the host – bzzzt… nope! (THIS one really pissed me off)

So somehow APC can’t figure out how to get a USB connected UPS working on ESXi, and the latest SmartUPS somehow has no included serial cable.  Really fantastic!  I considered a few options including attempting to do a DB9 to USB conversion using the RJ45 to USB cable from my lesser BackUPS 750, but I shot all of the options down.  USB to serial requires driver support and there is zero chance of getting that working on the host.   Some of the other options I considered were publishing serial over network, but this seemed like a poor approach also.  At this point, I was stumped and seriously considering returning the seemingly useless SmartUPS to MicroCenter.  Before packing it in, I decided to try one more approach.

Returning to the basic architecture I had planned for the BackUPS, but this time using the native PowerChute Business app included with the SmartUPS (at least it comes with something useful!), I setup UPS support on my vCenter.  Passing through USB worked from the host and PowerChute server, console and agent installed without a hitch and successfully located the UPS.  So far so good!

The critical step was now to figure out a way to get the vCenter guest to shutdown all of the VMs and the server once PowerChute detected a power event.  Luckily, it wasn’t too difficult and I was able to find this awesome script to handle the ESX side.  Here is the logic:

  • add a custom command in PowerChute.  The custom command calls Putty from the command line with the option to run a script on the host upon connection.  The command is inserted into “batchfile_name.cmd” in the APC\agents\commandfiles directory and should be formatted like this:
@SMART "" "C:\Program Files (x86)\putty\putty.exe" -ssh -l login -pw password -m C:\script.sh
  • the contents of “script.sh” is that amazing script above.  The gist of it is:
    • use the ESX command line tools to enumerate all running VM’s to a temp file (basic string processing on the output of a -list)
    • pipe that file into a looped command to shut them down (a for or while loop construct)
    • shutdown the host

Here are the contents of the script:

#/bin/sh
VMS=`vim-cmd vmsvc/getallvms | grep -v Vmid | awk '{print $1}'`
for VM in $VMS ; do
 PWR=`vim-cmd vmsvc/power.getstate $VM | grep -v "Retrieved runtime info"`
 if [ "$PWR" == "Powered on" ] ; then
 name=`vim-cmd vmsvc/get.config $VM | grep -i "name =" | awk '{print $3}' | head -1 | cut -d "\"" -f2`
 echo "Powered on: $name"
 echo "Suspending: $name"
 vim-cmd vmsvc/power.suspend $VM > /dev/null &
 fi
done
while true ; do
 RUNNING=0
 for VM in $VMS ; do
 PWR=`vim-cmd vmsvc/power.getstate $VM | grep -v "Retrieved runtime info"`
 if [ "$PWR" == "Powered on" ] ; then
 echo "Waiting..."
 RUNNING=1
 fi
 done
 if [ $RUNNING -eq 0 ] ; then
 echo "Gone..."
 break
 fi
 sleep 1
done
echo "Now we suspend the Host..."
vim-cmd hostsvc/standby_mode_enter

I am happy to say that it worked like a charm and successfully shutdown all VMs cleanly and brought down the host!  You can set some delays in PowerChute and I set them to 8 minutes for the OS shutdown and 8 minutes as the time required for the custom command to run, but it really won’t matter since the custom command will kill the VM (and PowerChute) anyhow.

A couple of things to be aware of with this approach:

  • the PCBE Agent Service needs “interact with desktop” checked on newer versions of Windows (2k8+).  Make sure to run the SSH client once outside of the script first to deal with any interaction it needs to do (saving fingerprint, etc)
  • the USB passthrough can be a bit flaky in that the USB device doesn’t seem to be available right at first OS boot (so the service may not see the UPS).  Eventually it does refresh and catch up on its own, however

Coming up soon will be the Quadro install and the SSD setup, followed by some (finally) notes on VSAN and accelerated View (both vDGA and vSGA), so stay tuned!


The VMware NGC client is definitely super convenient being entirely browser based, but the legacy client undoubtedly had its charms. Chief among those charms is the ability to manage an actual ESXi host rather than just a vCenter instance. Except on a Mac where it doesn’t work at all. Admittedly this isn’t a huge issue for production where vCenter will be highly available and the admin console is unlikely to be a Mac, but in a home lab, it becomes a huge issue. The solution? Enter WineBottler!

For those not familiar, WINE is a recursive acronym that stands for “WINE is not Emulation”. It dates back to the early days of Linux (1993) and the idea is to provide a containerized Windows OS/API experience on *NIX systems. In a very real way WINE is one of the earliest runs at application virtualization. It’s an extremely nifty idea but, as with all cross-platform “unofficial” app virtualization technologies, it is not 100% effective. The VIM client falls into the edge cases that require some tweaking to get to work. The good news, though, is that it can be done:

Screenshot 2014-07-11 04.37.44

OK, with the proof of life out of the way, let’s walk through exactly what it takes to get this thing working step-by-step.  Note that it will not work straight out of the box.  It will fail and need to be remediated.

Step 1: Download and install WineBottler.  This article is based on the (at time of publication) current stable release 1.6.1.

Step 2: With WineBottler installed, download the MSXML Framework version 3.0  and copy it into the “Winetricks” folder (/Users/username/.cache/winetricks/msxml3)  “Winetricks” are component installs that Wine can inject into the container during packaging (middleware, support packages, etc).  VIM requires .NET 3.5 SP1 which WineBottler has standard, but also requires MSXML version 3.0 which it does not.  The first pass through packaging will generate an error if this step isn’t completed, but the errors are extremely helpful and will provide both a download link for the missing package and the path to copy it to (so no fear if you miss this step)

Step 3: We’re now ready to bottle up some WINE!  Launch the WineBottler app and click the “Advanced” tab:

Screenshot 2014-07-11 10.42.58

Lots to explain here, so let’s take it one component at a time.

Prefix Template:  this option refers to the actual app container (the virtual environment that WINE bottler creates during this sequencing step for the application).  This can be either a new container, or based on a previously created one.  For now we are creating a new template, but later we will be reusing it.

Program to Install: this is the application we are virtualizing.  In our case, at this stage, we want the actual VIM install package (VMware-viclient-all-5.5.0-1281650.exe) which can be downloaded directly from the host at https://esxi-hostname.  This is an installer, so we want to select that option.  Later on we will be repeating this with the actual app, but for now we are going to use the installer to lay the groundwork.

Winetricks: as discussed, these are optional component installs.  Here we want to check “.NET 3.5 SP1″.

Native DLL Overrides:  as the name implies, this powerful option gives us the ability to supplement and standard Windows DLL with an out-of-band version we would include here.  Huge potential with this one, but we do not need it for our purposes.

Bundle:  another powerful option, this gives us the ability to create a stand alone WINE container app.  With this option, the OSX app file created could be copied over to another machine and run without having to install WINE.

Runtime Options, Version, Identifier, Codesign Identity:  these are our important packaging options.  Runtime as implied allows us to tweak settings at time of packaging.  None required for our case here.  Version is an admin process option that allows you to version your containers.  Identifier is extremely important because the container path in the OSX filesystem will be named using the Identifier as a prefix, so use a name that makes sense and make a note of it.  I used “com.vmware.vim”.  Codesign Identity is also an admin process field allowing for providing validation of the package via unique identifier.

Silent Install:  allows you to run silent for most of the install (WINE will “auto-click” through the installers).  I left this unchecked.

Once you have checked off .NET 3.5 SP1 Winetrick and assigned an Identifier, click “Install”.  You will be asked to provide a name and location for the OSX app that will be created by the sequencing process:

Screenshot 2014-07-11 10.59.23

 

Step 4: walk through the install.  The install will now kick off in a partially unattended fashion, so watch for the dialogue prompts.  If the overall sequencer Install progress bar stalls, there is a good chance a minimized Windows installer is waiting for input:

Screenshot 2014-07-11 10.59.36

The Windows installer bits will look familiar and will be the base versions of .NET that WINE wants, the .NET 3.5 SP1 option that we selected, and the MSXML 3.0 package that is required.  The process will kickoff with .NET 2.0:

Screenshot 2014-07-11 10.59.58 Screenshot 2014-07-11 11.00.16

You’ll have to click “Finish” as each step completes and at times (during .NET 3.0), the installer will go silent or will act strangely (flashing focus on and off as it rapidly cycles through dialogues unattended).  At times you may need to pull focus back to keep things moving.  Once the .NET 2.0 setup is done, you will get a Windows “restart” prompt.  Weird I know, but definitely perform this step:

Screenshot 2014-07-11 11.10.51

During the XPS Essentials pack installation (part of base WINE package) you will also be prompted about component registration.  Go ahead and register:

Screenshot 2014-07-11 11.12.42

The XML Parser component install (part of base WINE package) will require user registration.  Go ahead and complete it:

Screenshot 2014-07-11 11.14.25

 

.NET 2.0 SP2 will require another restart. Go ahead and do that:

Screenshot 2014-07-11 11.20.34

 

 

With all of the pre-requisites finally out of the way, the core VIM install will finally extract and kickoff:

Screenshot 2014-07-11 11.21.47

You will see the VIM Installer warning about XP.  You can ignore this.  I was able to connect to vCenter without issue:

Screenshot 2014-07-11 11.22.40

The install will now look and feel normal for a bit:

Screenshot 2014-07-11 11.24.22

Until… dum dum duuuuuuuum.  This happens:

hcmon error picture

HCMON is the USB driver for the VMRC remote console (a super awesome VMware feature).  Long story short, for whatever reason, it doesn’t work in WINE.  Have no fear though, this entry is all about getting this working (minus the console capability, sorry!).  Do not OK this dialogue box.  Pause here.

Step 5:  once we acknowledge that dialogue, the installer will rollback and delete the installation which is currently being held in temp storage by WineBottler.  We want to grab that before this happens and put it somewhere safe.  So before clicking OK, go over to /tmp/winebottler_1405091227/nospace/wineprefix/drive_c/Program Files/VMware.  Copy the entire “Infrastructure” folder and paste it somewhere safe, then rename it:

Screenshot 2014-07-11 11.34.11

I dropped it into my Documents folder and renamed it “VMW”.  What we are looking for is to make sure that “Infrastructure/Virtual Infrastructure Client” is fully populated:

Screenshot 2014-07-11 11.36.24

We can now click “OK” to the HCMON error and allow the installer to rollback and WineBottler to complete.  It will look for us to select a Startfile.  There is no good option here since our installer actually didn’t finish correctly (WineBottler doesn’t actually know this).  It doesn’t matter what we select as we just want to get a completed install, so go ahead and select “WineFile”:

Screenshot 2014-07-11 11.39.09

 

This dialogue will complete this step:

Screenshot 2014-07-11 11.40.31

 

Step 6:  At this stage, we do not have a working install.  What we do have is a usable template on which we can build a working install.   First go ahead and launch the app (the shortcut will be where the container was saved in step 4).  Nothing will happen since there is no app, but the environment will be prepared.  This is the important piece.  The next step is to go back into WineBottler, and run a new sequencing, but with the options slightly changed:

Note, we are now selecting the newly created environment as the template (/Applications/VIM Client.app/Content/Resources in my case).  For our “Program to Install”, we are now selecting: /path to saved client files/Infrastructure/Virtual Infrastructure Client/Launcher/VpxClient.exe and we are letting WineBottler know that this is the actual program and that it should copy the entire folder contents to the container.  We can now go ahead and click Install (it will be quicker this time).  At the end of this install, be sure to select VpxClient.EXE as the “startup program” before completing.

Step 7: unfortunately, we’re not done yet!  The last step is the do some manually copying since the container will still not be prepared quite right.  Once again, copy the “Infrastructure” hierarchy.  Head over to /Users/username/Library/Application Support/ and find your WinBottler container folder (com.vmware.vim_UUID in my case).  Navigate to drive_c/Program Files/VMware and paste Infrastructure over the existing file structure.

With this step you should be complete!  The original environment can now be deleted and a new shortcut should exist that works.  Here is a final shot of VIM client managing vCenter via WineBottler on OSX:

Screenshot 2014-07-11 20.05.38

 

 


Depending on how things go, the title for this entry might more appropriately be “the self healing Mac”.  Only time will tell!  So what is this all about?  Well recently my trusty companion of 2 years, the “mid 2012 MacBook Pro Retina 15″, decided to have a near (as I can tell) death experience.

It all started with a single kernel panic while doing some boring daily tasks in Chrome.  Within a 24 hour period the problem accelerated to a continuous kernel panic loop.  My first thought was “recent update”, but searching high and low for clues didn’t yield much.  Basic diagnostics (read as the highest of high level) seemed to imply the hardware was OK, but it really felt like a hardware issue.  Or if not hardware, possibly drivers.  But of course neither of those made much sense.  This was OSX running on a nearly new Mac, after all!  It’s like suggesting that your brand new Toyota Corolla would up and completely die 3 miles off the lot (heavy sarcasm here).

Searching around I discovered that there were possibly some issues with Mavericks and the Retina that I had maybe been dodging.  It had also been 2 years of accumulating crap (dev tools, strange drivers, virtualization utilities, deep utilities, games) any of which could be suspect.  So I decided I would try a time machine rollback to before the first kernel panic, and if that failed, take a time machine back to the 1995 and do the classic Windows “fix” – wipe and re-install (ugh).

The time machine restore took literally ages thanks to the bizarrely slow read rates of my backup NAS (detailed here), but eventually completed (400GB, 24 hours).  Unfortunately, the system wasn’t back for more than 10 minutes before the first kernel panic!  That meant that either the condition actually pre-existed the first known occurrence and had just been lurking, or the issue was in fact hardware.  I moved forward with the clean install.

First I deployed a new version of Mavericks.  Boot up holding command R, follow the linked guide, and you’re off to the races.  The reinstall was pretty smooth (erase disk, groan, quite back to recover menu, install new OS) and first boot just felt better.  Of course you know what they say about placebos!  After an hour of installing my usual suite of apps, upgrading to Mavericks and grabbing the latest updates, the dreaded kernel panic struck!  Things were looking grim.

With little to lose I decided to maybe try rolling back to Mountain Lion on the outside chance that the latest Mavericks update was causing issues.  One more reinstall, followed by an app install only and I was feeling good.  Until terror struck!  Yes, another kernel panic.  Incidentally these kernel panics were all over the place (really suggesting RAM).

At this point I became bitter.  Suddenly the “it looks amazing and is all sealed and covered in fairy magic!” Apple approach didn’t seem so great.  Changing out a DIMM on a PC laptop is a cheap and very easy fix.  Hell, in these pages I’ve covered complete tear downs of PC laptops (down to motherboard replacements).  Compounding the issue was that I never opted for Apple Care (yes yes, I know that failing to spend more money no top of a premium $2500 laptop means I deserve what I get if said premium hardware somehow completely dies within 3 years).  Apples decision to solder the memory to the motherboard meant I’d be looking at an extremely expensive motherboard swap out and a good sized chunk of downtime (the latter being a really big issue for me).  Starting to feel truly grumpy, I decided to run a few tests.

First, memtest in OS.  Lots of failures.  Instant failures too.  As a matter of fact I’ve never seen such a horrific memtest result!  It was honestly a bit of a wonder the thing could even boot!  Thinking that maybe the software result was anomalous (memtest for OSX is a bit old at this point and in theory doesn’t support anything newer than 10.5.x) I decided to do the old faithful Apple Hardware Test.  If nothing else that utility is always a cool walk down GUI memory lane!

Well depressingly enough, AHT wouldn’t run.  I didn’t think to snap a pic at that point (I wasn’t planning on this entry), but this gives you an idea (stolen from the Apple support forums):

Image

Disclaimer: Not my pic. Error numbers have been changed to protect the guilty!

The actual error code I was faced with was -6002D.  Yep.  That’s generally memory.  So it looked like a total bust.  My Apple honeymoon appeared to be officially over.  I decided to do one final wipe, in preparation for the now seemingly inevitable hospital visit, and this time lay down only the bare minimum footprint needed to keep doing a bit of work in the meantime since one positive outcome of all of this wiping was the kernel panics had gone from continuous loop to fairly rare.

After turning in for the night, and struggling through a restless sleep fraught with nightmares of Genius Bar lines stretching to the horizon, I crept downstairs to discover that I didn’t see this:

Again, not mine… But you get the idea.  Seen one, seen em all!

Again, not mine… But you get the idea. Seen one, seen em all!

The Mac had made it through the night!  Now this was interesting.  Could it possible be that something in this lineup had become toxic?  With the cloud and “evergreen” software, it was possible.  After all, since our software library is now real time and online, it can be hard to avoid the newest version right?

  • Chrome
  • Lync
  • Skype
  • Office 2011
  • Camtasia
  • Omni Graffle
  • iMovie
  • Garage Band
  • Unarchiver
  • Evernote
  • Dropbox

That is literally the “slim” list that was in place every time the problem would happen post system wipe.  The new list, that seemed stable (against all odds), was solely Office, Lync and Skype.  It was time to do some testing!  Well the results were interesting to say the least! I decided to beat the Mac up a bit.  First, Unigine Heaven 4 in Extreme mode left running overnight (I was always curious how it would do anyhow):

Screen Shot 2014-06-19 at 7.28.35 PM

A great score it’s not, but banging through maxed out Unigine and left running overnight without a hitch kind of implies that the GPU (and drivers) are not an issue.  Well we did suspect that after all, right?  How about taking a closer look at memory?

Screen Shot 2014-06-19 at 7.29.10 PM

Hmmm… OK so far so good…

Screen Shot 2014-06-19 at 7.29.24 PM

Well it’s not ECC anyhow and who knows what this code is actually doing right?  For all we know this is just a register dump.  Time for the big guns.  How about some Prime 95 max memory torture testing?  This is another thing I’ve always wanted to subject the Mac to. No way it survives…

Screen Shot 2014-06-19 at 7.28.19 PM

 

Uh, ok.  It just got serious.  How the hell could this heap, which was unable to even start AHT one clean install ago, somehow now banging through hours of Prime 95 torture?  There was only one thing left to do (well OK two).  First up, memtest.  Keeping in mind that it might be incompatible of course!

WTF?!

WTF?!

What… the…. heck!?  This time the test ran like a charm; exactly as expected.  So not only does it appear that memtest does in fact work fine on Mountain Lion, but the MacBook passed.  With a cautious glimmer of hope starting to form, and more than a bit of fear, it was time for…. AHT!

2014-06-19 20.27.58

You have got to be kidding me!  This time not only did AHT run, but it passed the damn test!  At this point I started checking for hidden cameras, aliens and paranormal activity.  It just didn’t make any sense!

So where does this leave us?  Well at this point I have added everything back in except CHROME and have successfully repeated all of these tests!  Is it somehow possible that CHROME caused this?  But how?  Chrome certainly can’t survive reboots.  Or can it?  With modern laptops and the way they manage power, it’s hard to know if the machine is every really off. Is it possible that some software anomaly was leaving the Mac in a state that prevented it from being able to enter AHT and survived reboots?  It really does seem impossible and it doesn’t make sense, yet none of this makes sense.  How could the Mac have gone from being so unstable it couldn’t even enter AHT, to passing it over and over with flying colors and surviving brutal overnight torture tests with only a software change?  I’ve been doing this a long time (hint… Atari 400, Timex Sinclair 1000, etc) and have never seen something like this.  Is it a self healing Mac?  Is it software so insidious it can survive reboots?  I almost don’t want to know.  One thing is for sure though and that’s that I will be keeping a close eye on this and providing any updates on these pages.  And if I should suddenly vanish?  Tell them to burn the Macbook!


Last entry I touched on the idea that management and orchestration will be the future battleground for cloud providers.  The future of IT operations is likely to take multiple forms ranging from some evolutionary enhancement to what folks do today (console based administration, reactive support) all the way through cloud scale programmatic operation of IT via devops process and tooling (examine any advanced AWS shop to see this in action).  Somewhere in the middle is the vision that Microsoft and VMware are betting on to be the most; the “hybrid cloud” model.

What does “hybrid cloud” really mean though?  Well ideally,  it requires a “cleaning of the IT house” when it comes to the management of on premise resources.  Evolving ultimately into some semblance of an actual “private cloud” in terms of process and tooling, and then extended out to one or more public cloud providers in a seamless fashion.  If your IT shop presents a service catalog to empowered technologists in your business lines who are able to procure services based on budget and SLA requirements, and then have those services instantiate on the platform that best fits their needs (be it on prem or at a provider), then you have what Microsoft and VMware would define as a “hybrid cloud”.

Microsoft, more than any other technology vendor, has all of the component bits in the breadth of their portfolio.  From the hypervisor up through the server and desktop OS to the application layer and tooling (both developer and management).  With the addition of Azure, Office 365 and Dynamics they have a comprehensive XaaS platform as well.  On the consumer side there is similar breadth of service and increasingly there are points of synergy between the two (Onedrive being a good example).

The challenge for Microsoft has been in actually rationalizing all of these assets and telling a compelling holistic story.  In addition, there are weak points in the portfolio where the offerings are not accepted as best of breed (VMware leads in virtualization, AWS leads in IaaS).  Probably most importantly, Microsoft tends to approach problems from a monolithic perspective and the experience is generally not a great one unless you completely buy-in on the vision.

Since I test from a VMware perspective, the release of the Azure Pack seemed like the perfect opportunity to put the Microsoft vision through its paces and see how far they’ve come in addressing these challenges.  So what is the “Azure Pack”?  Azure Pack is, in some ways, Microsoft’s version of the vCloud Suite.  It is a set of software components that overlay the existing Microsoft stack with administrative and consumption web portals and provide multi-tenant service orchestration and management.  You can look at it as “cloud provider in a box”, designed to bolt on to a set of existing infrastructure bits.  Of course anytime something is “in a box” I approach it with some skepticism, so armed with my MSDN subscription (generously entitling you to both free Azure and the entire Microsoft catalog for testing and development) I set off the implement the Redmond version of “hybrid”, but with a heterogeneous architecture (the kind real customers tend to run!)

Before approaching an implementation challenge like this one, it’s important to understand what all of the components are.  It is also critical to know how the pieces fit together and what deployment restrictions are in play.  I think this image, courtesy of Microsoft, tells the story really well:

So what are all of these component parts?  Let’s walk through them…

  • Virtual Machine Manager:  VMM has had an interesting history within System Center.  It is the Microsoft (rough) equivalent of vCenter and these days is able to manage both native (hyper-V) and competitive (ESXi, XEN) hypervisors.  It is a critical component of the hybrid architecture in that it is responsible for surfacing virtual machine resources (organized within VMM into “clouds”) to the Azure console.
  • System Center Operations Manager: no stranger to these pages, SCOM is Microsoft’s comprehensive, and extensible, monitoring platform.   SCOM tends to be the manager of choice for Microsoft workloads and that trend continues here with the Azure hybrid model.  This product maps most closely to vCenter Operations.
  • Service Provider Foundationthis is an interesting set of bits.  It is an OData web service extension to Virtual Machine Manager that provides a multi-tenancy layer for the resources that VMM manages.  In the overall solution, this piece is closest to vCloud Director and is a standalone optional component packaged with System Center Orchestrator 2012.
  • System Center Orchestrator (optional):  this is Microsoft’s orchestration engine, also known as “what’s left of Opalis”.  While a full install of Orchestrator is not an explicit requirement of the Azure Pack (again, Service Provider Foundation is required, but is a stand alone component), an orchestration engine if a vital component in any cloud strategy.  Automation stands with identity management, in my opinion, as the two critical pillars of IT as a Service.  VMware offers a similar set of capabilities in vCenter Orchestrator.
  • System Center Service Manager (optional): service manager is Microsoft’s entry into the IT governance space.  The purpose of this class of software is to assist IT in implementing, automating and enforcing IT operational process using technology.  Essentially a policy engine, auditing system and dashboard, the service manager provides tracking and oversight of problem resolution, change control and asset lifecycle management. VMware’s offering is called, oddly enough, VMware Service Manager.
  • SQL Server: really needs no introduction.  In this case, Service Manager requires either 08 or ’12.  The rest of the products are fine with ’14 and/or are able to utilize SQLExpress.  I have 08 and ’14 in my lab and utilized ’14 for everything except Service Manager.

Since this is a complex installation, I thought it would be useful to go over what I found to be the minimum footprint for deploying all services.  Keep in mind that this is a lab build. Obviously in production these functions would all be discrete and made highly available where applicable:

BOX 1:

  • System Center Configuration Manager
  • System Center Virtual Machine Manager
  • System Center Orchestrator
  • Service Provider Foundation
  • Service Manager management server

BOX 2:

  • System Center Operations Manager

BOX 3:

  • Active Directory Domain Controller/DNS
  • Azure Pack

BOX 4:

  • SQL Server 2014
  • Service Manager Data
  • Database server for product backend
  • Provider for Azure Pack

BOX 5 (optional):

  • SQL Server 2008 R2
  • Database server for Service Manager (requires 2k8R2 or 2k12)
  • Provider for Azure Pack

There have been hundreds of pages written on all of the setup tasks required, so I decided to instead document some “heads ups” from my experience walking through the process end-to-end:

General Heads Ups

  • As always be hyper aware of firewall rules.  Lots of custom port definitions in this process and lots of services that don’t automatically get firewall rules created (Analysis and Reporting Services on SQL for example).  When facing a ‘can’t connect’, check the firewall first
  • Pick one service account, make it a domain admin and use it everywhere.  Life will be a lot easier this way with this build especially.  Of course if you are specifically testing the implications of a granular access control strategy then this doesn’t apply.

Virtual Machine manager and Service Provider Foundation Integration Notes

  • Make note of the SPF service account during install – this is super important as the permissions get tricky
  • SPF will create a set of local security groups on the SPF server.  They are all prefixed by “SPF-” quite handily.  For a lab install, add the service account to all of them.  In production more granular RBAC would likely be a better idea
  • The VMM application pool, used by the Virtual Machine Manager web administration console and API, will install as NetworkService by default.  It should be switched to a named account which also  needs to be a member of the Service Provider Foundation groups
  • The service account used for SPF and the VMM App Pool should be added to the Administrator role in Virtual Machine Manager under “Clouds”.

Service Manager Notes

  • SCOM Agent must be uninstalled prior to installation but can be re-installed after installation is complete
  • Ignore the collation warning.  Fantastic detail on that warning can be found here.
  • Management server and datawarehouse server must be separate (cannot one box this)
  • Pre-req’s will including warnings for RAM (wants 8GB) and CPU (wants 2.5Ghz) if these resources fall short
  • Service Manager Server Install requires 1GB and wants to create a 2GB database.  It also wants to map internal service account privileges to a Windows security group (local or domain)
  • Service Manager Data warehouse Install requires 1GB and wants to create 5 2GB databases.  It also wants to map internal service account privileges to a Windows security group (local or domain)
  • SQL Reporting Services requires some custom configuration for Service Manager.  Luckily the Deployment Guide covers it in detail.
  • Service Manager in general is honestly a pretty big pain in the ass.  Definitely keep the Deployment Guide handy

If everything goes well, the finished product is a working Azure style console for your on-prem private cloud:

Screenshot 2014-06-17 20.42.12

With a few clicks, SQL Server and MySQL capacity can now be rolled into a DBaaS foundation.  Adding the capacity in is as easy as selecting the category on the left hand resource family menu (SQL or MySQL) and selecting “Add” from the bottom actions.   The required options are straightforward: the server name (and optional port number if non-standard), credentials, a group assignment (Azure Pack provides the ability to associate SQL servers into a server group for easier control of consumption) and finally the amount of space that can be consumed via hosting (storage on the server allocated to Azure Pack consumption).

Screenshot 2014-06-17 14.23.20

Up next I’ll do a rundown of the experience using Azure Pack from both the service provider and consumer view.  Where possible I will compare/contrast to the VMware experience.  Stay tuned!


With management and orchestration being the real future battleground for cloud, I’ve decided to increase my focus on the existing toolsets (both ISV and OpenSource).  Microsoft has been expanding and refining the System Center suite for well over a decade now and in recent years it has become a proper cloud management support even going so far as to have a hint of heterogeneity.  As a follow-on to my cloud performance benchmarking entries it seemed like a good time to explore System Center Operations Manager support for AWS and Azure.

First up is to of course install and configure System Center Operations Manager 2012.  For anyone not familiar with the product, it is Microsoft’s monitoring platform (which started its life as SeNTry back in the 90s) that has grown to encompass both extreme depth (particularly for Microsoft products) and breadth (through extensibility via the partner ecosystem) of network and system monitoring and management tasks. The goal of the product was always to aggregate the massive volume of event data that modern systems generate and apply some expert system intelligence to it in order to produce efficient health reports and provide predictive notification of impending trouble (event X + event Y when event Z is occurring = possible hardware problem, for example).  For more info on System Center Operations Manager, and detail on implementation (out of scope for this entry), take a look at the Microsoft System Center support site.

Once you have a working installation of SCOM, the next step is to download and install the new management packs for AWS and Azure (management packs are the handy extensibility construct for SCOM).  For AWS, the management pack can be found on the AWS site, and for Azure it can, of course, be found on the Microsoft site.  Once downloaded, extract and install the MSI (Azure) which will create a path under C:\Program Files (x86|\System Center Management Packs, then extract the .mpb file (AWS) to that same path (no installer for AWS).  With the .mpb files on-hand, the first step is to import them.  For all configuration, we are going to use the SCOM Operations Manager console.

Image

To get started, click on the Administration section and highlight “Product Connectors”.  A summary will appear in the main pane.  From here, we can select “Import Management Packs” to get the process started.  Let’s walk through both AWS and Azure, starting with AWS.

Image

This is a simple process.  First step is to click “Add” to locate the management pack we will be importing:

Image

Before we locate the .mpb file, we have a decision point on automated dependency discovery.  This is a good idea since cloud component updates often happen in realtime, so I allow it and click Yes.

Image

Next we can select the management pack.  Navigate to the .mpb file and click Open.

Image

If the file passes the initial validation check a green check will light up. At that point we can go ahead and click Install.

Screen Shot 2014-06-15 at 9.27.44 PM

The status will change to “Importing” and a live progress bar will track the process.  It should take about 30 seconds.  Above is the AWS pack install dialog, below is the Azure pack install (for reference).

Screen Shot 2014-06-15 at 10.09.20 PM

Once the import process is completed, we can close the wizard and move on to the next step.  For Azure, there is a brief interlude here as the Azure Management Pack includes specific support for subscription management (the AWS management pack just uses standard SCOM “RunAs Credentials”).  To setup our Azure subscription, we can now select the Azure administration subsection which has been added to the Administration tasks following the import and, from the main pane, click “Add subscription”

Screen Shot 2014-06-15 at 10.28.44 PM

The first dialog has a fair bit of complexity and a few prerequisite steps are required.  You’ll want to have your Azure subscription id on hand.  This can be found from the account management portal.  From the Azure management console, click “View My Bill”.  This will invoke the subscription overview console.

Screen Shot 2014-06-15 at 11.42.24 PM

To get the subscription id, we need to dive a level deeper by clicking on the subscription entry.  This will invoke the billing detail page.  The subscription id is located on the bottom right.

Screen Shot 2014-06-15 at 11.42.44 PM

Make a note of the subscription id.  This will be key for the next step in SCOM.  The next thing we’ll need is our Azure management certificate (or to create one if you haven’t already!).  Azure allows you to build a library of X.509 certs that can be used for authentication of managed services.  The administration of this certificate store is managed from the main Azure Management Portal under Settings\Certificates.  If none have been configured, a cert will need to be generated (password protected), using OpenSSL (or any other cert utility) and imported.  The process of creating a self-signed cert is straightforward, but cumbersome and pretty difficult to remember, so keep a reference handy.  The basic idea is that you generate a private key, create a certificate signing request (a request for a new cert), then essentially submit the request to yourself in order to generate a new (valid) cert signed by the private key.  Azure expects the certificate to be password protected and in .pfx format.  Once generated and imported, keep the .pfx file handy.

Screen Shot 2014-06-15 at 11.51.00 PM

With the prerequisite info collected, we can head back to SCOM and add our subscription info.

Screen Shot 2014-06-15 at 10.38.24 PM

If the subscription info is valid, the setup will continue.  The next step is to select a management pool to be authorized to communicate with Azure.  This is an opportunity to limit Azure access to specific management server pools or authorize all management servers.  This is also an opportunity to direct outbound connectivity to Azure through a proxy.

Screen Shot 2014-06-15 at 10.38.39 PM

With the scope set, the subscription can be added.

Screen Shot 2014-06-15 at 10.38.46 PM

If everything is in order the wizard will indicate that the subscription has been successfully added. Once complete we are ready to return to the next step of integrating Azure.

Screen Shot 2014-06-15 at 10.41.45 PM

After installing new management packs, the next step in operationalizing them in SCOM is to add them to the monitoring configuration.  From the main menu, select Authoring tasks.  In the Authoring tasks subsection, AWS and Azure sections are listed now that we imported the management packs.  Right clicking on Azure Monitoring presents the option of starting the “Add Monitoring Wizard”.  The first step is to select the monitoring type.  In this case we will select Azure (next round we will repeat the process for AWS).

Screen Shot 2014-06-15 at 10.17.26 PM

First we provide a name.  In my case I kept it simple selecting “AWS Monitoring Account” for AWS and “Azure Monitoring Account” for Azure.  Next we assign a management pack to this monitoring configuration.  In our case since we are setting up monitoring for add-on management packs, we are going to “create a new destination management pack” for both Azure and AWS.

Screen Shot 2014-06-16 at 1.27.17 AM

Creating the new destination is very straightforward.  All that is required is a name.  I chose to name the management pack entry the same as the monitoring entity for simplicity.  Azure is shown below as an example.  Clicking next will allow the “knowledge” configuration to be edited.  This is an opportunity to build a custom processing module if the SDK is present.  In our case we can click through and complete the management pack creation wizard and return to the main monitoring configuration workflow.

Screen Shot 2014-06-15 at 10.17.59 PM

At this point, the AWS and Azure workflows diverge a bit.  In the Azure workflow, we now select the subscription that we added to SCOM in the previous section.

Screen Shot 2014-06-15 at 10.42.33 PM

For AWS, the next step in the wizard will be to create “RunAs” credentials that SCOM can use (standard SCOM process for connecting to external systems).  Since this is our first time through, we need to create a new RunAs set using the RunAs wizard.  For AWS, the correct account type is Basic Authentication.  The Display Name can be anything that makes sense.

Screen Shot 2014-06-15 at 10.01.27 PM

Next up we provide the credentials themselves.  Account Name should be set to the AWS Access Key of either the main AWS account or an IAM user correctly privileged to monitor AWS resources (IAM policy out of scope for this entry, but there are many great resources for IAM policy creation over at the AWS support site).  The Password should be set to the AWS Secret Key.

Screen Shot 2014-06-15 at 10.03.02 PM

The next step is a decision point on the scope of the RunAs credentials.  You can choose to share them with all managed systems, or manually control distribution of the credentials.  For the lab, I select the less secure path for convenience.  Obviously real world production configuration would be more restrictive.

Screen Shot 2014-06-15 at 10.03.22 PM

With the RunAs credentials created, they can now be associated with the new monitoring configuration back at the Monitoring Wizard.  With this configuration step complete we can go ahead and create the new monitoring entry.

Screen Shot 2014-06-15 at 10.03.31 PM

Back on the Azure side of the house, the next step is to start configuring the actual scope of resources to be monitored within Azure.  First up is Cloud Services.  I actually don’t have any Cloud Services to monitor, but this is where they would be added.

Screen Shot 2014-06-15 at 10.42.57 PM

I do have Virtual Machines, so I go ahead and click Add which invokes the Select Virtual Machines dialog.  For dense installations, filters can be built to parse the resource list.  I leave the filter blank and go ahead and search since I have a very small set of VMs (1 to be exact!)  I highlight my one VM and add it to the monitored set.

Screen Shot 2014-06-15 at 10.43.18 PM

Next up the process repeats for storage.  (Slightly) more to see here.

Screen Shot 2014-06-15 at 10.43.37 PM

With all resources added, we can now finalize the setup and create the new monitoring configuration.

Screen Shot 2014-06-15 at 10.43.44 PM

With management packs imported and configured, and new monitoring configurations in place for Azure and AWS, we can go ahead and take a look at our resources within SCOM.  To do this we select the Monitoring section from the main UI.  There will be subsections for Azure and AWS now that the monitoring wizards have been successfully completed. If we expand the Azure tree, we can drill down to virtual machine resource status.  Here is a view of my VMs.

Screen Shot 2014-06-16 at 1.02.11 AM

With AWS, same rules apply.  In my case no EC2 instances are currently online, but this is where they would appear.

Screen Shot 2014-06-15 at 10.05.31 PM

That’s it for basic configuration!  Next up I may do some deeper dives into SCOM.  In the meantime, happy monitoring!


A home NAS (Network Attached Storage) device is a great investment even for a casual hobbiest.  It provides a central repository for files and backups that offers a much higher level of data durability than most desktops, laptops or mobile devices assuming you implement RAID protection.  In addition, the NAS itself can be more easily backed up to an offsite mechanism (cloud share or external drive) since the data is centralized.  For those of us who run home labs or small offices, the high end of the consumer NAS space can actually do a decent job of standing in for a proper enterprise device.  At the Complaints HQ I have long relied on the ReadyNAS Ultra 6.  It’s a workhorse and, while not necessarily the fastest kid on the block, it has a very robust app ecosystem (thanks nearly single handedly to the superhuman efforts of Super Poussin).  In addition, even the 6 drive model has a compact form factor and doesn’t produce an obnoxious amount of heat and noise.  I’ve documented my setup in these pages before, but just as a review:

        • Brand: Netgear
        • Model: ReadyNAS Ultra 6
        • Firmware: RAIDiator 4.2.26
        • Disk Config: 6 x 2TB Seagate Barracuda Green in X-RAID (~10TB usable)

The Ultra 6 also has 2 1Gb/s NIC ports, but does not included native teaming capabilities.  Luckily this is where the robust app ecosystem comes into play!  An oldy but a greaty, “Team” for the old Pioneer NAS (which Netgear bought and repackaged under the ReadyNAS brand), purports to work on firmware revs up to version 4.2.11, but still works just fine on 4.2.26:

Image

 

As with most ReadyNAS add-ons, if it is compatible, “it just works”.  The installation is straightforward and the only real options are to set the Teaming Mode and decide whether you want Jumbo Frame support.  Lots of teaming options are supported:

        • Round Robin: requires static link aggregation at the switch, layer 3, alternate between links for outbound traffic only.
        • Active Backup: no switch configuration requirement, basic active/stand-by failover protection
        • XOR: no switch configuration requirement, select outbound NIC by performing a bitwise XOR on the source and destination MAC addresses (layer 2 based load balancing).  Obviously a given destination will always utilize the same NIC
        • Broadcast: no switch configuration requirement, channel mirroring, transmits every request over both links.  Useful only in pretty specific scenarios where the two interfaces are connected to disconnected switches yet packets are needed on both segments.
        • 802.3AD: requires dynamic link aggregation at the switch, used 802.3AD to negotiate a teaming config using LACP (Link Aggregation Control Protocol)
        • Transmit Load Balancing: no switch configuration requirement, balances outbound traffic based on the load on the NIC.  Inbound traffic is handled in an active backup fashion.
        • Adaptive Load Balancing: no switch configuration requirement, builds on TLB by providing inbound load balancing by alternating ARP replies on the fly.

Once configured, the add-on actually modifies the Network Configuration tab.  Really nice touch:

Screen Shot 2014-06-09 at 8.58.04 PM

With the recent addition of the Dell T620 to the lab, I have more eggs in the basket of the main server (10TB worth of eggs actually – entry on vSphere Data Protection coming soon!), but the sizable chunk of data on the NAS is even more critical (5TB and 10 years worth of photos, documents, archives and backups).  Traditionally I’ve relied on X-RAID to provide peace of mind and have copied off the most critical of critical files to a USB drive periodically for offsite protection.  Recently though, in an unfortunately recurring theme, the threat environment has ratcheted up once again.  The situation in Australia is a harsh reminder that “ransomware”, like CryptoLocker, is quickly become the most significant threat to our data.  Once a system is compromised, any write accessible storage volume (including file system mounted cloud volumes) will end up irreversibly encrypted; it’s files permanently destroyed for all intents and purposes.  It wasn’t long ago that some knowledge, common sense and good habits were enough to dodge the vast majority of attack vectors.  Phishing attacks, and even spear phishing, can generally be avoided by being well informed and conscientious user.  The proliferation of smarter and smarter web content (generally a good thing) and near ubiquitous (and increasingly sophisticated) ad serving technologies (a questionable thing) is combing to create a new kind of attack vector that, in the worst case, can be nearly impossible to dodge.  A malicious ad server run by even a moderately sophisticated attacker can do a massive amount of damage if it can seize on a 0 day exploit in a browser.  Even with reasonable day-to-day security settings on your account and browser, surfing to the wrong site, with the wrong version, on the wrong day can leave you exposed to a ransom attack.  Of course security and convenience is a tricky balancing act and there are definitely measures one can take to greatly reduce risk exposure.  Never running day-to-day as a local admin, not leaving network or cloud volumes permanently mounted with write access, and operating browsers in their strictest sandbox modes can go a long way.  Realistically though, these measures taken in total can make day to day work  more difficult and each step back introduces a bit more risk exposure.  Ultimately, the last line of defense against these destructive encryption attacks is a really solid backup strategy.

So with that in mind, I decided it was time to throw some money at the risk exposure.  If nothing else it makes for a good blog entry!  I decided that I didn’t need as extensible of a platform as the ReadyNAS for the backup NAS.  Just a device that could be an Rsync target, provide decent bandwidth, offer Time Machine and some consumer backup story, support iSCSI and NFS and have bag loads of storage.  A quick trip to Microcenter (surprise surprise) and I decided on the Seagate Business Storage 4 Bay 16TB edition.  Microcenter had this on the shelf for $989 which seemed like a damn good deal for 16TB (11TB usable with RAID 5).  Slightly smaller than the ReadyNAS 6, the two utilize a similar enough design aesthetic that they match well:

2014-06-09 11.51.40

It’s a 2 tier NAS stack capped by a Back UPS 750! And yes, that IS a Back UPS 1500 you see there. We’ll talk about that addition in a future entry!

Presentation wise the Business Storage NAS UI is simultaneously slicker, yet more bare bones, than the ReadyNAS Ultra.  For a “business line” product, it has a very consumer feel.  Still, everything we need for this project is where you’d expect to find it and there is teaming support out of the box this time, but with a much simpler option set offering only Round Robin or Failover:

Screen Shot 2014-06-09 at 9.37.49 PM

 

As mentioned, I stuck with the standard RAID 5 configuration yielding 11TB usable out of 16TB raw:

Screen Shot 2014-06-09 at 9.41.45 PM

The iSCSI setup is also a bit more bare bones than the ReadyNAS (no place to specify initiators oddly enough), but it does provide one neat feature which is file based iSCSI.  This is a nice option that provides the backend flexibility of a file based protocol (no need to fully dedicate a volume), while still presenting a block target to the initiators:

Screen Shot 2014-06-09 at 9.42.15 PM

 

One thing I did do is delete the included volume and recreate it just to see how long that operation would take.  The time to build the 11TB RAID 5 from scratch was 21 hours (yikes!).  The other performance related testing I did before moving forward with the backup configuration was NAS throughput.  Pretty interesting results.  First the ReadyNAS:

Screen Shot 2014-06-12 at 6.17.28 PM

Pretty bad performance actually.  Not quite sure why given that there is no shortage of network bandwidth in theory. Team running on the ReadyNAS as shown above, and teamed NIC’s on the desktop as shown below:

Screen Shot 2014-06-12 at 5.34.49 PM

The backing switch is the Netgear GSM7224 v2, so the backbone bandwidth is good.  The cabling is all CAT 6 so there is no cabling issue.  Enabling jumbo frames actually kills the write speed and has no measurable impact on reads.  This is definitely a mystery, but since the Seagate is sitting on the same core infrastructure, relative performance profiling should still apply.  That said, the Seagate:

Wow! What the heck is going on here?  Unknown at this point, but clearly the ReadyNAS is the better performer.  Thats fine since we are using the Seagate purely as a backup device and iSCSI testing platform for VMware, but at some point I definitely need to track down the root cause of these performance issues!  With initial setup and testing complete, it was time to setup the backup.  This process was interesting and not entirely intuitive.  As it turns out, the RSync capability of the Seagate is buried in the “Protect Server” feature.

Screen Shot 2014-06-09 at 9.52.34 PM

On the one hand, this is a super easy GUI driven way to setup RSync.  On the other hand, it’s pretty limiting and a bit odd.  It get’s the job done though!  Here are the important options:

        • Authentication Name: this is the account that your rsync source will need to authenticate as.  A bit counter intuitive as this account will not show up under users nor can you ACL it.  It seems there is a separate security model (including a dedicated path that you normally can’t see) associated with the “Protect” capability.
        • Authentication Password: this is the password for the associated credentials.  Note that the password is presented in the UI in clear text (and presumably stored that way) Yikes!  Keep this guy safely tucked behind a firewall.
        • Alias: what we are setting here are paths.  For each “alias” created, a folder will be created under the “BackupStore-1″ folder on the associated volume.  This entire folder structure will not be surfaced as a share unless that is explicitly selected (more on that later)
        • Enable/Disable Backup Service: as indicated.  Also worth noting that disabling the backup service will also delete the folder structure (including files), so be careful here.
        • Add Backup Shares to CIFS/SMB: this option will expose the BackupStore-1 folder under the default share for the volume.  Note that when this option is enabled, the backup target will be disabled.  This is interesting since, as far as I can tell, there is no way to get write access to the backup path via the share.  Accessing it as admin won’t work (writes are denied) and you cannot access a share with the backup job credentials.  Not sure why rsync availability vs CIFS presentation is an either/or proposition given this, but its workable.
        • Port Forwarding: enabled UPNP support for RSync if you want to run through a firewall.

That’s about all you are able to setup.  Hopefully it’s enough!  Before heading over to the ReadyNAS to setup some backup jobs, let’s have a look at the client protection options.  Of course TimeMachine is great for OSX integration.  Seagate also provides a backup utility for both OSX and Windows on the setup disk.  I tested the Windows app and it runs well.  10 licenses for clients are included and they are automatically applied by the software during install (you are able to discover and select the NAS).  Client settings are found under Job Manager:

Screen Shot 2014-06-09 at 10.19.06 PM

Incidentally, the NAS-NAS Backup tab here assumes the Seagate is the source so this isn’t going to work for our scenario.  Instead, it is time to head over to the ReadyNAS!  From the main menu, we can create new Backup Jobs under the Backup menu oddly enough.  Selecting “Add a New Backup Job” presents us with a form that gives us everything we need.  Under the source, a huge range of options are available:

        • All of the local shares
        • Remote configuration for RSync, NFS, WebDAV, Windows and FTP
        • The local volume (C)
        • Local USB devices
        • If the source is remote (backing up to the ReadyNAS), host information, path and credentials should be entered here

 

Screen Shot 2014-06-09 at 10.23.17 PM

After setting the source, the same UI layout is used to set the destination.  The range of options are the same as above, plus the addition of iSCSI paths.  In the case of the Seagate Protect Server, what we are setting here is the Seagate NAS IP as host, the alias we setup as the path and using the backup credentials configured under the protect job.  We also have the option to compress the backup volume, remove deleted files (sync folders), enable FAT32 compatibility mode for the backup file system and create an exclusion list for files and folders:

Screen Shot 2014-06-09 at 10.23.28 PM

Step 3 is to set a backup schedule.  Nothing fancy here.  One interesting thing is that you can set both a period and a run window.  A combination of both can allow for some decent flexibility in controlling scheduling:

Screen Shot 2014-06-09 at 10.23.59 PM

The options available require some explanation:

        • Schedule full backup allows you to specify how often to run a full backup (vs differentials)
        • The “on backup completion” option allows you to specify the end-of-backup scope for the email alert (none, errors only, all info, etc)
        • Remove backup contents before full backup allows you to create a clean slate before the next full backup (a path wipe)
        • After backup is complete change ownership to the share owner allows you to blanket set the file system ownership if the destination is also a ReadyNAS
        • Send Wake on LAN to he remote system allows a WoL packet to be sent prior to backup (allowing for a “cold host” to be utilized for backup)

Screen Shot 2014-06-09 at 10.24.07 PM

Any number of backup jobs can be created, tracked and managed from the Backup Listing tab:

Screen Shot 2014-06-09 at 10.24.25 PM

 

So how long did it take to rsync 4TB at these horrible speeds?  [drumroll] 96 hours!  Yikes!  Now thats terrible.  Luckily though the full backup only occurs once and the rate of change of this data is tiny, so the differentials should (hopefully!) complete within their daily backup window.  Off to troubleshoot the network performance!