Wednesday, September 4, 2013

AWS - Running Services Locally (a quick survey of products)

My first step was to review the major services we wanted to use on Amazon. They have a massive set of services available for use and some are not easily replicated without a major infrastructure and skill set to manage it. To give you some idea of the number of services here is a summary of the services and a quick reference as of August 2013.

Amazon Web Services (AWS) as of August 2013 Compute & Networking EC2 Virtual Servers in the Cloud ELB Elastic Load Balancing or Auto Scaling of services EMR Hosted Hadoop Framework VPC Isolated Cloud Resources Route 53 Scalable Domain Name System (DNS)
Direct Connect Dedicated Network Connection to AWS Storage & CDN S3 Scalable Storage in the Cloud
Glacier Low-Cost Archive Storage in the Cloud
EBS EC2 Block Storage Volumes
Import/Export Large Volume Data Transfer Storage Gateway Integrates on-premises IT environments with Cloud storage CloudFront Global Content Delivery Network (CDN) Database RDS Managed Relational Database Service for MySQL, Oracle and SQL Server DynamoDB Fast, Predictable, Highly-scalable NoSQL data store ElastiCache In-Memory Caching Service Redshift Fast, Powerful, Fully Managed, Petabyte-scale Data Warehouse Service
Application Services CloudSearch Managed Search Service SWF Workflow service for coordinating application components SQS Message Queue Service SES Email Sending Service SNS Push Notification Service FPS Amazon Flexible Payments Service (FPS) is an API based payment service Elastic Transcoder Easy-to-use scalable media transcoding Deployment & Management Management Console Web-Based User Interface IAM Identity and Access Management (IAM) configurable access controls
CloudWatch Resource and Application Monitoring Elastic Beanstalk AWS Application Container CloudFormation Templates for AWS Resource Creation Data Pipeline Orchestration Service for Periodic, Data-Driven Workflows OpsWorks DevOps Application Management Services CloudHSM Hardware-based Key Storage for Regulatory Compliance

The items above underscored are those we deemed important or critical to our project.


PaaS (Platform as a Service)
I limited the search to the major players in the AWS private cloud market and ignored the non-AWS compatible options. AWS is the gold-standard of cloud services at this time. Other vendors are catching up to the services they offer.  Google, Yahoo, HP, Microsoft, and EMC/VMware are all attempting to gain traction and provide the services that Amazon currently offers. I don't see this changing in the near future but those options may improve as this market matures. Google in particular seems to want into this space and the EMC/VMWare has the Pivotal initiative with partners. There are other players but Amazon owns this space right now.

With those requirements, OpenStack, CloudStack and Eucalyptus are the three major packages I reviewed. They are listed in order of least AWS services to greatest number of services replicated.  The clear winner was Eucalyptus from reviewing the documentation and frequenting the IRC channels for regular users using the products.

OpenStack has a very robust hypervisor management and API for supporting those services. There is a bolted on AWS API for EC2 that works for the most part until it comes into conflict with the way the OpenStack API functions. Reviewing the usage of OpenStack and participating in the community surrounding it, left me with little patience for this product. It was not a very open community of users and there seemed to be some hostility towards AWS compatibility. My goal is to implement an AWS compatible system so this group does not seem to align with that goal. Likewise their API support is limited to EC2 and a limited subset of other services to support EC2. The OpenStack Compute API is very similar to the AWS EC2 API so many people port between them as necessary. There are even some projects out to assist in this effort that don't appear to be very active now.  An older project by Canonical called AWSOME (any web service over me) was supposed to bridge Amazon and OpenStack cloud environments.

As a side note, the OpenStack versus Eucalyptus debate has the feel of the almost religious debates that pervaded the Debian versus Ubuntu arguments several years ago. The free and open software debate is not my primary concern today but I understand the arguments about having long term freedom. Unfortunately, I need to implement something today and the AWS API is the closest thing to a standard we have for this technology right now. I am not ignorant of the risk that using Amazon controlled standards and Eucalyptus implements copies entails if they close the source but I have faith that a branch will emerge to continue the open source version if that happened like with MySQL and many other software packages over the years. Amazon is working hard to continue to innovate and others are pushing to catch up. So this will stay an active area not letting Amazon slow down. The chest beating between OpenStack and other cloud providers makes them less interesting to me.

CloudStack may gain traction now that it is under the Apache Foundation and separated from Citrix the makers of Xen. I hope they increase the number of services offered but as of today, they are not sufficient for our needs. They offer a limited EC2/S3 service without any of the other services that make AWS so interesting. If you are looking for just a virtual machines management system with an EC2 interface, then this will definitely serve your needs. One major advantage over Eucalyptus is that it offers support for additional hypervisors that the current Eucalyptus does support not including Xen, VMWare, and KVM. This could be the difference for some but was not a factor in our decision. The community was quite open and interested in new comers. If they offered more AWS services, I would have been happy with this product.

Eucalyptus which I began working with off the GitHub repository for the 3.3.0 release candidate contains a large number of complex moving parts. In a prior post on Eucalyptus, I gave a list of features that included the base EC2, S3, EBS, AMI, IAM, and the more recently added Autoscaling, Elastic Load Balancer, and Cloudwatch. This whole package was in flux while I was learning about it and building it so I had some additional self-imposed hurdles. The S3 support is lacking some features but is a decent implementing for storing small amounts of information and a relatively small number of files. It serves fine for storing EMI (Eucalyptus Machine Images) or simple configuration data. The coverage of supported API is decent. Don't do something strange like use the AWS DotNet SDK and you are likely to get it working fine. The EBS support works but requires some extra effort to create the initial images. Those issues are being worked on actively by Eucalyptus and you should see significant changes in the near future.

The EC2 support appears to be solid but is limited to the Linux KVM hypervisor only. There used to be support for Xen which was removed in the last couple of versions. Open source users have gotten it to work with Xen recently but it isn't in the main line support right now.

The S3 support is being bolstered internally in their Walrus service and with third party software like Ceph and Riak CS (S3 compatibility with HA). These are ambitious additions to their existing systems and will likely take a few revisions to work out the issues. You can review their road map to see about when they plan for these features.

There are a few options available for AWS replicated services run locally outside of Eucalyptus. I'll have a follow up post sometime in the next few days on a few of these that will include at least: S3, SQS, SNS, DynamoDB, RDS and SWF. These are services not offered or incomplete implementations on Eucalyptus.

Another post that I will flesh out will be about the shared storage used by Eucalyptus to allow for shared volumes between the various components of the system. Not having a NetApp or EMC storage device available made it necessary to learn a bit about free options in this space.

Please comment or ask questions.

Eucalyptus (AWS private Cloud Computing)

Eucalyptus (AWS private Cloud Computing)

I'm not going to give a full run down of what Eucalyptus is but just point you to their marketing material at their website. The quick summary is it offers the Amazon Web Services loaded on a local computer.  These include several of the most interesting services: EC2, S3, EBS, AMI, IAM, and recently they added Autoscaling, Elastic Load Balancer, and Cloudwatch. If that alphabet soup has your interest piqued, then you should continue reading.

Building one of these using their pre-packaged images is dead simple. I'm not one to do anything the simple way and decided to build everything from the source directly from their Github repository. This was not an easy task but definitely taught me a lot about their software and the components of the system.  I would recommend a first-time user to not take my route and just take their binary builds from RPM or their ISO image. Fedora Core has these as well and the guy who supports it is a great guy. Please take the path of least resistance first to get familiar with the software.

In my configuration, I have a couple five to seven year old servers that used to be production. I've got a system with a twelve (12) core CPU and forty-eight (48) GB of RAM and a second system with dual-core and 4GB of RAM. They are an old database server and an old web-server. The heavy-weight system with the better memory and processor was dedicated to serving out virtual machines and the lower-end system is the web services provider. I had a third system that is just a desktop box that is acting as my SAN device with FreeNAS 8.3.

The front end to the whole thing to isolate it from my network is a cheap wireless router that serves out DHCP reservations and provides a private network. The only smart thing on the router is a firewall with port forwarding. I added another desktop system with Linux installed that I use as a jump host into the environment running OpenSSH.

Added to this is a virtual machine image that is running CentOS 6.4 that runs on the virtual machine server to actually build the Eucalyptus software. This image is running under Linux KVM which will later be used by the Eucalyptus software to serve out images managed by EC2. I subverted the environment to let me use it for a build server as well.


  • router - dLink wireless router
  • marduk - web services and custom tools
  • tiamat - virtual machine provider
    • buildserver vm image
    • EC2 instances
  • anshar - SAN server (iSCSI) running FreeNAS 8.3.1
  • gozer - jumphost and utilities server
    • OpenSSH
    • Nagios service monitoring

This relatively cheap set of hardware components allows me to replicate the Amazon services and test my code locally.

I'm not going to sugar coat it, there were issues along the way.  Here is a list of the ones that pop to mind:
  1. I abandoned trying to build software on Ubuntu 12.04.02 LTS and migrated to CentOS 6.4 for the buildserver.
  2. The build process is only mostly documented (but much better in 3.3) with some dependencies missing and no separation between build and runtime environment.
  3. iSCSI is never fun to configure (but no harder than the regular iSCSI fun)
  4. The S3 support is hit and miss (DeleteObjects and multi-part POST API fail) some are fixed in version 3.4 and a major update is coming in 4.0 (Ceph and Riak CS). They are addresssing this actively.
  5. iSCSI volumes has strange behavior with KVM virtual machines
    1. cache=writethrough necessary for kvm images
    2. DAS configuration of Storage Controller takes a couple tries and is a one-way trip
  6. I'm still working on windows imaging (painful but getting better) and the 3.4 will have significant improvement in this area. Eustore may be on option soon.
  7. They are still working on bfEBS (bootable EBS) but check the IRC channel for help. It works with caveats.
On the plus side their support is excellent and I would recommend joining their IRC channel. Also, some of the above issues had to do with my learning curve. I could have used the Fedora provided RPM or their FastStart image and have gotten much further quicker but I'm stubborn when I start working on something.

I hope someone can use this later.  I'm trying to write down the entire process of building this system and will post it back out here later when it looks a little better. I've got a lot of documentation that needs to be cleaned up for release so the learning curve is reduced.

On a completely separate note, I've got a quick-and-dirty Grails application to allow for viewing and managing Eucalyptus S3 components and it's pretty cool to have a local repository to play around with before paying for the AWS service. I may post on that later when I get time to clean it up a bit. The code is a mess as I was hacking it together to help diagnose issues with S3 and write clean code.