Tag Archives: Rackspace

Rackspace Cashes Out On Its Hybrid Cloud Strategy, Acquired by Private Equity

Cloud Computing company Rackspace will be acquired for $4.3 billion by private equity firm Apollo Global Management. Deal is expected to close in Q4 2016 and Rackspace stockholders to receive $32.00 per share in cash (a nice a premium of 38% on Rackspace’s stock price).

This acquisition is a clear sign of success for Rackspace’s change of strategy, whereby Rackspace eased off on its own cloud and managed services, and started offering third-party support for the public clouds of Amazon and Microsoft. This change of strategy started showing clear positive impact on its financial results earlier this year (see this post from 2 months ago), which sent the right investor signals and paved the way to this acquisition.

For more details on the acquisition see here.

1311765722_picons03 Follow Horovits on Twitter!



Filed under Cloud, OpenStack

Can Hybrid Cloud Present An Alternative To Amazon, Microsoft, Google?

It’s not easy to be a public cloud vendor these days. The public cloud world has been undergoing serious consolidation in the past few years. Amazon, the pioneer of the cloud, has been keeping a clear lead, while Microsoft and Google have been pulling in, utilizing their accumulated experience, global data centers and software platforms, and positioned themselves as next in line. Together this trio serve the vast majority of the workloads running on public cloud.

This consolidation drove out many vendors, including some big incumbent names such as HP that shut down its cloud late last year and Verizon that did the same a couple of months ago.


So what’s their answer? I’d say it’s threefold:

  1. Multi-cloud model: If you can’t beat them, join them. Support Amazon, Microsoft, Google public clouds. If done via a good generic platform, it can help avoid vendor lock-in.
  2. Hybrid model: mix the public cloud support with support for private cloud and bare-metal to offer public-private-hosted hybrid approach.
  3. Private model: concentrate on strictly private cloud. The popular open-source project OpenStack is a leading candidate for this strategy. This approach is useful for the customers insisting to run things on their own premises.

HP (now HPE), after shutting down its public cloud, moved to a hybrid cloud strategy with a series of acquisitions and by endorsing OpenStack private cloud open source project.  Verizon went for the private cloud approach.

An interesting case is Rackspace, which eased off on its own cloud and managed services, and started offering third-party support for the public clouds of Amazon and Microsoft, leveraging its Fanatical Support brand. Also, in parallel to supporting leading public cloud vendors, Rackspace keeps its longstanding support of private cloud deployments based on OpenStack, the popular open-source platform which it co-founded.


Rackspace’s strategy seems to have hit well. quarterly results published this week show quarterly revenue $518 million, up 7.9% from the year-ago-quarter. Executives noted Rackspace’s success was buoyed particularly by a growing number of Fanatical Support customers for its Microsoft Azure and Amazon Web Services (AWS) offerings as well as customers on its OpenStack private cloud.

Hybrid cloud strategies gain traction with enterprises. While Amazon, Microsoft and Google try to convince enterprises to go all-in on the public cloud, it’s too big a change to swallow for most. Even Microsoft realized that hurdle and tried bringing its Azure cloud to the enterprise’s datacenter. Hybrid cloud seems to have demand, and may also be the focus of those who failed to take the lead in the public cloud.

1311765722_picons03 Follow Horovits on Twitter!


Filed under Cloud, OpenStack

AWS Outage – Thoughts on Disaster Recovery Policies

A couple of days ago it happened again. On June 14 around 9 pm PDT Amazon AWS hit a power outage in its Northern Virginia data center, affecting EC2, RDS, Elastic Beanstalk and other services in the US-EAST region. The AWS status page reported:

Some Cache Clusters in a single AZ in the US-EAST-1 region are currently unavailable. We are also experiencing increased error rates and latencies for the ElastiCache APIs in the US-EAST-1 Region. We are investigating the issue.

This outage affected major sites such as Quora, Foursquare, Pinterest, Heroku and Dropbox. I followed the outage reports, the tweets, the blog posts, and it all sounded all too familiar. A year ago AWS faced a mega-outage that lasted over 3 days, when another datacenter (in Virginia, no less!) went down, and took down with it major sites (Quora, Foursquare… ring a bell?).

Back during last year’s outage I analyzed the reports of the sites that managed to survive the outage, and compiled a list of field-proven guidelines and best practices to apply in your architecture to make it resilient when deployed on AWS and other IaaS providers. I find these guidelines and best practices highly useful in my architectures. I then followed up with another blog post suggesting using designated software platforms to apply some of the guidelines and best practices.

On this blog post I’d like to address one specific guideline in greater depth – architecting for Disaster Recovery.

Disaster Recovery – Characteristics and Challenges

PC Magazine defines Disaster Recovery (DR):

A plan for duplicating computer operations after a catastrophe occurs, such as a fire or earthquake. It includes routine off-site backup as well as a procedure for activating vital information systems in a new location.

DR Planning is a common practice since the days of the mainframes. An interesting question is why this practice is not as widespread in cloud-based architectures. In his recent post “Lessons from the Heroku/Amazon Outage” Nati Shalom, GigaSpaces CTO, analyzes this apparent behavior, and suggests two possible causes:

  • We give up responsability when we move to the cloud – When we move our operation to the cloud we often assume that were outsourcing our data center operation completly, that include our Disaster-Recovery procedures. The truth is that when we move to the cloud were only outsourcing the infrastructure not our operation and the responsability of using this infrastructure remain ours.
  • Complexity – The current DR processes and tools were designed for a pre-cloud world and doesn’t work well in a dynamic environment as the cloud. Many of the tools that are provided by the cloud vendor (Amazon in this sepcific case) are still fairly complex to use.

I addressed the first cause, the perception that cloud is a silver bullet that lets people give up responsibility on resilience aspects, in my previous post. The second cause, the lack of tools, is usually addressed by DevOps tools such as ChefPuppetCFEngine and Cloudify, which capture the setup and are able to bootstrap the application stack on different environments. In my example I used Cloudify to provide consistent installation between EC2 and RackSpace clouds.

Making sure your architecture incorporates a Disaster Recovery Plan is essential to ensure the business continuity, and avoid cases such as the ones seen over Amazon’s outages. Online services require the Hot Backup Site architecture, so the service can stay up even during the outage:

A hot site is a duplicate of the original site of the organization, with full computer systems as well as near-complete backups of user data. Real time synchronization between the two sites may be used to completely mirror the data environment of the original site using wide area network links and specialized software.

DR sites can be in Active/Standby architecture (as was in traditional DRPs), where the DR site starts serving only upon outage event, or they can be in Active/Active architecture (the more modern architectures). In his discussion on assuming responsibility, Nati states that DR architecture should assume responsibility for the following aspects:

  • Workload migration – specifically the ability to clone our application environment in a consistent way across sites in an on demand fashion.
  • Data Synchronization – The ability to maintain real time copy of the data between the two sites.
  • Network connectivity – The ability to enable flow of netwrok traffic between between two sites.

I’d like to experiment with an example DR architecture to address these aspects, as well as addressing Nati’s second challange – Complexity. In this part I will use an example of a simple web app and show how we can easily create two sites on-demand. I would even go as far as setting this environment on two seperate clouds to show how we can ensure even higher degree of redundancy by running our application across two different cloud providers.

A step-by step example: Disaster Recovery from AWS to RackSpace

Let’s put up our sleeves and start experimenting hands-on with DR architecture. As reference application let’s take Spring’s PetClinic Sample Application and run it on an Apache Tomcat web container. The application will persist its data locally to a MySQL relational database. On my experiment I used Amazon EC2 and RackSpace IaaS providers to simulate the two distinct environments of the primary and secondary sites, but any on-demand environments will do. We tried the same example with a combination of HP Cloud Services and a flavor of a Private cloud.

Data synchronization over WAN

How do we replicate data between the MySQL database instances over WAN? On this experiment we’ll use the following pattern:

  1. Monitor data mutating SQL statements on source site. Turn on the MySQL query log, and write a listener (“Feeder”) to intercept data mutating SQL statements, then write them to GigaSpaces In-Memory Data Grid.
  2. Replicate data mutating SQL statements over WAN. I used GigaSpaces WAN Replication to replicate the SQL statements  between the data grids of the primary and secondary sites in a real-time and transactional manner.
  3. Execute data mutating SQL statements on target site. Write a listener (“Processor”) to intercept incoming SQL statements on the data grid and execute them on the local MySQL DB.

To support bi-directional data replication we simply deploy both the Feeder and the Processor on each site.

Workload migration

I would like to address the complexity challenge and show how to automate setting up the site on demand. This is also useful for Active/Standby architectures, where the DR site is activated only upon outage.

In order to set up a site for service, we need to perform the following flow:

  1. spin up compute nodes (VMs)
  2. download and install Tomcat web server
  3. download and install the PetClinic application
  4. configure the load balancer with the new node
  5. when peak load is over – perform the reverse flow to tear down the secondary site

We would like to automate this bootstrap process to support on-demand capabilities in the cloud as we know from traditional DR solutions. I used GigaSpaces Cloudify open-source product as the automation tool for setting up and for taking down the secondary site, utilizing the out-of-the-box connectors for EC2 and RackSpace. Cloudify also provides self-healing  in case of VM or process failure, and can later help in scaling the application (in case of clustered applications).

Network Connectivity

The network connectivity between the primary and secondary sites can be addressed in several ways, ranging from load-balancing between the sites, through setting up VPN between the sites, and up to using designated products such as Cisco’s Connected Cloud Solution.

In this example I went for a simple LB solution using RackSpace’s Load Balancer Service to balance between the web instances, and automated the LB configuration using Cloudify to make the changes as seamless as possible.

Implementation Details

The application is actually a re-use of an  application I wrote recently to experiment with Cloud Bursting architectures, seeing that Cloud Bursting follows the same architecture guidelines as for DR (Active/Standby DR to be exact). The result of the experimentation is available on GitHub. It contains:

  • DB scripts for setting up the logging, schema and demo data for the PetClinic application
  • PetClinic application (.war) file
  • WAN replication gateway module
  • Cloudify recipe for automating the PetClinic deployment

See the documentation on GitHub for detailed instructions on how to configure the above with your specific deployment details.


Cloud-hosted applications should take care of non-functional requirements of the system, including resilience and scalability, just as on-premise applications. Systems that neglect to incorporate these considerations in their architecture, relying solely on the underlying cloud infrastructure, end up severely affected by cloud outage such as the one experienced a few days ago in AWS. On my previous post I listed some guidelines, an important of which is Disaster Recovery which I explored here and suggested possible architectural approaches and example implementation. I hope this discussion raises the awareness in the cloud community and helps maturing up cloud-based architectures, so that on the next outage we will not see as many systems go down.

Follow Dotan on Twitter!


Filed under Cloud, DevOps, Disaster-Recovery, IaaS, Solution Architecture, Uncategorized

Cloud integration and DevOps automation experience shared

The Cloud carries the message of automation to system architecture. The ability to spin up VMs on demand and take them down when no longer needed as per the applications’s real-time requirements and metrics is the key for making the system truely elastic, scalable and self-healing. When using external IaaS providers, this also saves the hassle of managing the IT aspects of the on-demand infrastructure.

But with potential of automation comes the challenge of integrating with the cloud provider (or providers) and automating the management of the VMs, dealing with DevOps aspects such as accessing the VM, transferring contents to it, performing installations, running and stopping processes on it, coordinating between the services, etc. On this post I’d like to share with you some of my experience integrating with IaaS cloud providers, as part of my work with customers using the open source Cloudify PaaS product. Cloudify provides out-of-the-box integration with many popular cloud providers, such as Amazon EC2 and The Rackspace Cloud, as well as integration with the popular jclouds framework and OpenStack open standard. But when encountering an emerging cloud provider or standard, you just need to pull up your sleeves and write your own integration. As a best practice, I use Java for the cloud integration and try to leverage on well-proven and community-backed open source projects wherever possible. Let’s see how I did it.

First we need to integrate with the IaaS API to enable automation of resource allocation and deallocation. The main integration point is called a Cloud Driver, which is basically a Java class that adheres to a simple API for accessing the cloud for resources. Various clouds expose various APIs for accessing them. Programmatic access is native and easy to implement from the Cloud Driver code. REST API is also quite popular, in which cases I found the Apache Jersey client open source library quite convenient for implementing a RESTful client. Jersey is based on JAX-RS Java community standard, and offers easy handling of various flavors of calls, cookie handling, policy governance, etc. Cloudify offers a convenient Groovy-based DSL that enables you to configure the cloud provider’s parameters and properties in a declarative and easy-to-read manner, and takes care of the wiring for you. When writing your custom cloud driver you should make sure to sample and use the values from the Groovy (you can add custom properties as needed), so after the cloud driver is ready for a given cloud provider, you can use it in any deployment by simply setting the configuration. I used the source code of the cloud drivers on CloudifySource public GitHub repository, as a great source of reference for writing my cloud driver.

The next DevOps aspect of the integration is accessing the VMs and managing them. Linux/Unix VMs are accessed via SSH for executing scripts, and uses SFTP for file transfer. For generic file transfer layer there’s the Apache Commons VFS2 (Virtual File System), which offers a uniform view of the files from various different sources (local FS, remote over HTTP, etc.). For remote command execution over SSH there’s JCraft’s JSch library, providing a Java implementation of SSH2. Authentication also needs to be addressed with the above. Luckily, many of these things that we used to do manually as part of DevOps integration are new being taken care of by Cloudify. Indeed, there’s still much integration headache with ports not opened, passwords incorrect etc. which takes up most of the time, and more logs are definitely required in Cloudify to figure things out and troubleshoot. What I did is I simply forked the open source project from GitHub and debugged right through the code, which has the side benefit of  fixing and improving the project on the fly and contributing back to the community. I should mention that although the environments I integrated with where Linux-based, Cloudify also provides support for Windows-based systems (based on WinRM, CIFS and PowerShell).

One of the coolest things added in Cloudify 2.1 that was launched last week was the BYON (Bring Your Own Node) driver, which allows you to take your existing bare-metal servers and use them as managed resources for deployment by Cloudify, as if they were on-demand resources. This provides a neat answer to the growing demand for bare-metal cloud services. I’m still waiting for the opportunity to give this one a wet run with a customer in the field …

All in all, it turned out to be a straight-forward task to integrate with a new cloud provider. Just make sure you have a stable environment and a test code on how to consume the APIs, and use the existing examples as reference, and you’re good to go.


Follow Dotan on Twitter!

Leave a comment

Filed under Cloud, DevOps, IaaS, PaaS