Wednesday, September 4, 2013

AWS - Running Services Locally (a quick survey of products)

My first step was to review the major services we wanted to use on Amazon. They have a massive set of services available for use and some are not easily replicated without a major infrastructure and skill set to manage it. To give you some idea of the number of services here is a summary of the services and a quick reference as of August 2013.

Amazon Web Services (AWS) as of August 2013 Compute & Networking EC2 Virtual Servers in the Cloud ELB Elastic Load Balancing or Auto Scaling of services EMR Hosted Hadoop Framework VPC Isolated Cloud Resources Route 53 Scalable Domain Name System (DNS)
Direct Connect Dedicated Network Connection to AWS Storage & CDN S3 Scalable Storage in the Cloud
Glacier Low-Cost Archive Storage in the Cloud
EBS EC2 Block Storage Volumes
Import/Export Large Volume Data Transfer Storage Gateway Integrates on-premises IT environments with Cloud storage CloudFront Global Content Delivery Network (CDN) Database RDS Managed Relational Database Service for MySQL, Oracle and SQL Server DynamoDB Fast, Predictable, Highly-scalable NoSQL data store ElastiCache In-Memory Caching Service Redshift Fast, Powerful, Fully Managed, Petabyte-scale Data Warehouse Service
Application Services CloudSearch Managed Search Service SWF Workflow service for coordinating application components SQS Message Queue Service SES Email Sending Service SNS Push Notification Service FPS Amazon Flexible Payments Service (FPS) is an API based payment service Elastic Transcoder Easy-to-use scalable media transcoding Deployment & Management Management Console Web-Based User Interface IAM Identity and Access Management (IAM) configurable access controls
CloudWatch Resource and Application Monitoring Elastic Beanstalk AWS Application Container CloudFormation Templates for AWS Resource Creation Data Pipeline Orchestration Service for Periodic, Data-Driven Workflows OpsWorks DevOps Application Management Services CloudHSM Hardware-based Key Storage for Regulatory Compliance

The items above underscored are those we deemed important or critical to our project.


PaaS (Platform as a Service)
I limited the search to the major players in the AWS private cloud market and ignored the non-AWS compatible options. AWS is the gold-standard of cloud services at this time. Other vendors are catching up to the services they offer.  Google, Yahoo, HP, Microsoft, and EMC/VMware are all attempting to gain traction and provide the services that Amazon currently offers. I don't see this changing in the near future but those options may improve as this market matures. Google in particular seems to want into this space and the EMC/VMWare has the Pivotal initiative with partners. There are other players but Amazon owns this space right now.

With those requirements, OpenStack, CloudStack and Eucalyptus are the three major packages I reviewed. They are listed in order of least AWS services to greatest number of services replicated.  The clear winner was Eucalyptus from reviewing the documentation and frequenting the IRC channels for regular users using the products.

OpenStack has a very robust hypervisor management and API for supporting those services. There is a bolted on AWS API for EC2 that works for the most part until it comes into conflict with the way the OpenStack API functions. Reviewing the usage of OpenStack and participating in the community surrounding it, left me with little patience for this product. It was not a very open community of users and there seemed to be some hostility towards AWS compatibility. My goal is to implement an AWS compatible system so this group does not seem to align with that goal. Likewise their API support is limited to EC2 and a limited subset of other services to support EC2. The OpenStack Compute API is very similar to the AWS EC2 API so many people port between them as necessary. There are even some projects out to assist in this effort that don't appear to be very active now.  An older project by Canonical called AWSOME (any web service over me) was supposed to bridge Amazon and OpenStack cloud environments.

As a side note, the OpenStack versus Eucalyptus debate has the feel of the almost religious debates that pervaded the Debian versus Ubuntu arguments several years ago. The free and open software debate is not my primary concern today but I understand the arguments about having long term freedom. Unfortunately, I need to implement something today and the AWS API is the closest thing to a standard we have for this technology right now. I am not ignorant of the risk that using Amazon controlled standards and Eucalyptus implements copies entails if they close the source but I have faith that a branch will emerge to continue the open source version if that happened like with MySQL and many other software packages over the years. Amazon is working hard to continue to innovate and others are pushing to catch up. So this will stay an active area not letting Amazon slow down. The chest beating between OpenStack and other cloud providers makes them less interesting to me.

CloudStack may gain traction now that it is under the Apache Foundation and separated from Citrix the makers of Xen. I hope they increase the number of services offered but as of today, they are not sufficient for our needs. They offer a limited EC2/S3 service without any of the other services that make AWS so interesting. If you are looking for just a virtual machines management system with an EC2 interface, then this will definitely serve your needs. One major advantage over Eucalyptus is that it offers support for additional hypervisors that the current Eucalyptus does support not including Xen, VMWare, and KVM. This could be the difference for some but was not a factor in our decision. The community was quite open and interested in new comers. If they offered more AWS services, I would have been happy with this product.

Eucalyptus which I began working with off the GitHub repository for the 3.3.0 release candidate contains a large number of complex moving parts. In a prior post on Eucalyptus, I gave a list of features that included the base EC2, S3, EBS, AMI, IAM, and the more recently added Autoscaling, Elastic Load Balancer, and Cloudwatch. This whole package was in flux while I was learning about it and building it so I had some additional self-imposed hurdles. The S3 support is lacking some features but is a decent implementing for storing small amounts of information and a relatively small number of files. It serves fine for storing EMI (Eucalyptus Machine Images) or simple configuration data. The coverage of supported API is decent. Don't do something strange like use the AWS DotNet SDK and you are likely to get it working fine. The EBS support works but requires some extra effort to create the initial images. Those issues are being worked on actively by Eucalyptus and you should see significant changes in the near future.

The EC2 support appears to be solid but is limited to the Linux KVM hypervisor only. There used to be support for Xen which was removed in the last couple of versions. Open source users have gotten it to work with Xen recently but it isn't in the main line support right now.

The S3 support is being bolstered internally in their Walrus service and with third party software like Ceph and Riak CS (S3 compatibility with HA). These are ambitious additions to their existing systems and will likely take a few revisions to work out the issues. You can review their road map to see about when they plan for these features.

There are a few options available for AWS replicated services run locally outside of Eucalyptus. I'll have a follow up post sometime in the next few days on a few of these that will include at least: S3, SQS, SNS, DynamoDB, RDS and SWF. These are services not offered or incomplete implementations on Eucalyptus.

Another post that I will flesh out will be about the shared storage used by Eucalyptus to allow for shared volumes between the various components of the system. Not having a NetApp or EMC storage device available made it necessary to learn a bit about free options in this space.

Please comment or ask questions.

Eucalyptus (AWS private Cloud Computing)

Eucalyptus (AWS private Cloud Computing)

I'm not going to give a full run down of what Eucalyptus is but just point you to their marketing material at their website. The quick summary is it offers the Amazon Web Services loaded on a local computer.  These include several of the most interesting services: EC2, S3, EBS, AMI, IAM, and recently they added Autoscaling, Elastic Load Balancer, and Cloudwatch. If that alphabet soup has your interest piqued, then you should continue reading.

Building one of these using their pre-packaged images is dead simple. I'm not one to do anything the simple way and decided to build everything from the source directly from their Github repository. This was not an easy task but definitely taught me a lot about their software and the components of the system.  I would recommend a first-time user to not take my route and just take their binary builds from RPM or their ISO image. Fedora Core has these as well and the guy who supports it is a great guy. Please take the path of least resistance first to get familiar with the software.

In my configuration, I have a couple five to seven year old servers that used to be production. I've got a system with a twelve (12) core CPU and forty-eight (48) GB of RAM and a second system with dual-core and 4GB of RAM. They are an old database server and an old web-server. The heavy-weight system with the better memory and processor was dedicated to serving out virtual machines and the lower-end system is the web services provider. I had a third system that is just a desktop box that is acting as my SAN device with FreeNAS 8.3.

The front end to the whole thing to isolate it from my network is a cheap wireless router that serves out DHCP reservations and provides a private network. The only smart thing on the router is a firewall with port forwarding. I added another desktop system with Linux installed that I use as a jump host into the environment running OpenSSH.

Added to this is a virtual machine image that is running CentOS 6.4 that runs on the virtual machine server to actually build the Eucalyptus software. This image is running under Linux KVM which will later be used by the Eucalyptus software to serve out images managed by EC2. I subverted the environment to let me use it for a build server as well.


  • router - dLink wireless router
  • marduk - web services and custom tools
  • tiamat - virtual machine provider
    • buildserver vm image
    • EC2 instances
  • anshar - SAN server (iSCSI) running FreeNAS 8.3.1
  • gozer - jumphost and utilities server
    • OpenSSH
    • Nagios service monitoring

This relatively cheap set of hardware components allows me to replicate the Amazon services and test my code locally.

I'm not going to sugar coat it, there were issues along the way.  Here is a list of the ones that pop to mind:
  1. I abandoned trying to build software on Ubuntu 12.04.02 LTS and migrated to CentOS 6.4 for the buildserver.
  2. The build process is only mostly documented (but much better in 3.3) with some dependencies missing and no separation between build and runtime environment.
  3. iSCSI is never fun to configure (but no harder than the regular iSCSI fun)
  4. The S3 support is hit and miss (DeleteObjects and multi-part POST API fail) some are fixed in version 3.4 and a major update is coming in 4.0 (Ceph and Riak CS). They are addresssing this actively.
  5. iSCSI volumes has strange behavior with KVM virtual machines
    1. cache=writethrough necessary for kvm images
    2. DAS configuration of Storage Controller takes a couple tries and is a one-way trip
  6. I'm still working on windows imaging (painful but getting better) and the 3.4 will have significant improvement in this area. Eustore may be on option soon.
  7. They are still working on bfEBS (bootable EBS) but check the IRC channel for help. It works with caveats.
On the plus side their support is excellent and I would recommend joining their IRC channel. Also, some of the above issues had to do with my learning curve. I could have used the Fedora provided RPM or their FastStart image and have gotten much further quicker but I'm stubborn when I start working on something.

I hope someone can use this later.  I'm trying to write down the entire process of building this system and will post it back out here later when it looks a little better. I've got a lot of documentation that needs to be cleaned up for release so the learning curve is reduced.

On a completely separate note, I've got a quick-and-dirty Grails application to allow for viewing and managing Eucalyptus S3 components and it's pretty cool to have a local repository to play around with before paying for the AWS service. I may post on that later when I get time to clean it up a bit. The code is a mess as I was hacking it together to help diagnose issues with S3 and write clean code.

Friday, July 5, 2013

Amazon Web Services

New job and learning AWS (http://aws.amazon.com) in support of distributed computing. I've done a good bit of research to understand the ecology of web services. They can be expensive but when you count total costs, they come out pretty close to break-even. You can lease computational power for relatively short periods to do quick work, then give the hardware back and stop paying for it. That is a powerful shift in technology.

OpenStack, CloudStack, Eucalyptus and a couple others where in the running while doing the evaluations. Each had their advantages which I'll probably write about later.

I settled on implementing a Eucalyptus system (http://www.eucalyptus.com) using spare hardware. So far I've got a working and running system built from the source code. It provides a subset of the AWS services on local hardware for testing.  The S3 (http://aws.amazon.com/s3/) support isn't quite there as of version 3.3.0 but it might improve in the next couple of months. S3 is simple storage interface allowing for storing information by a key value.  Their EC2 (http://aws.amazon.com/ec2/) support appears to be much better and allows for quickly building virtual machines with pre-configured operating systems.

In the background, I'm writing Java code to implement AWS tools and services. I forgot how much fun Java can be.

DLNA with Sony

Bought a Sony Blu-ray player with DLNA support. Added MediaTomb with a static binary on the Blackarmor. Tested it an it wasn't very happy. It did run but the performance just wasn't there.

The default Sony Blu-ray with my new BA NAS 400 series works just fine. A solution to my long standing problem with playing my digital content on my TV and sound system. DLNA support in the Sony equipment just seems to work.

Tuesday, September 25, 2012

Cygwin Ports by way of binary files

Binary Files

I wanted to do some work on an undocumented binary file format. The free HexEdit tool for Windows from Catch22 Software is a pretty good for mapping data structures to the raw binary/hex data in a file.  It uses the C format struct to format the data structures into a human readable format.  It could use some additional docs on how to use the typedef format but was relatively easy to figure out.

What was frustrating was finding a tool that would allow for doing binary diffs with a decent interface.  There are several methods for doing binary diffs on files.  The unix 'od' command, 'bvi' and others presented themselves but were not as interactive as I would like when looking at large numbers of files visually for iterative small changes. Editors and Binary Editors fall into an area of preferences. I could be entering a flame-war picking a tool but would love to hear some feedback on tools people have used.

What I found that seemed to fit the bill for me was Chris' excellent tool called vBinDiff found at his website. Since I wanted to use UNIX tools for scripted automation but still be on a Windows system, I installed Cygwin.  Getting vbindiff to work on Cygwin was a testament to Chris' excellent code base.  After finishing up and getting it working, I thought it might be nice to package it for the Cygwin project.

Cygports

Thus started my journey into Cygport.  I found myself frustrated with the documentation for the assumption that I knew a lot about the Cygwin packaging which is what I was trying to avoid by using the tool. This is not a bashing session for cygport which does an amazing job of wrapping up lots of the trivia that is Cygwin packaging.  What was missing is a basic "HowTo Package ABC" for Cygwin.  The cygport documents while good are written for someone who had done the manual process of packaging at some point.  The documentation failed to meet my expectations as a competent developer who has done lots of porting work to quickly introduce me to this new means of packaging for their platform.  There is not quick guide on the simple cases with a couple of interesting example.  The other lack was how to get these example applications of varied type out of Cygwin easily.  I figured it out and afterwards felt silly for not having seen it.  All that said, I have come the realization that I have to put some time aside and try to write an HowTo guide on packaging for Cygwin.

I've got two packages done so far.  vBinDiff is the first and I am still polishing that install as I learn more along the way. vbindiff is currently working great as a package on my local system. Wy60 is another tool from years back when I worked on UniData/Universe systems and needed access to their Wyse 60 PICK interfaces. This makes for two packages and I have used several features to make these work.  I'll have to learn how to submit a request for inclusion into the main Cygwin package group.  There is a mailing list and a set of questions to answer.  I'll just need to put some time aside for it. :)

Interests 

So those are my current interests in the world of technology.  I keep playing around with the BlackArmor NAS devices occasionally and would like to find the time to get the USB to serial interface setup on my second unit. The ARM gcc tool-chain languishes for lack of time as well. The DLNA server ended up being too much of a pain without a decent tool-chain so I knuckled under and installed Microsoft Media Center on an old laptop to feed my digital media to my xbox360. I really want to revisit that particular issue and get a low-power digital media server working.  Maybe I'll find the time and pick those back up.

Wednesday, February 22, 2012

Debian on BA NAS 110

Hajo on the BlackArmor Forums has an older posting about getting Debian Linux 5.0 (Lenny) installed on BA NAS 110/220/4x0. This is not a port that includes the kernel but simply a minimum install that gets the system setup to install binaries out of the Lenny EABI ARM platform.  The kernel that comes with the BA NAS is compatible with those binaries.  The newer kernel for the Debian 6 or higher is not compatible with the BA NAS. This has some limitations but offers a way to get to some newer software pre-compiled.  I don't want to loose the existing functionality on my test system but the draw to DLNA services is pretty strong right now.

To top it off, Debian has a nicely setup cross-compilation setup documented for people working on non-Intel platforms.  This offers a way to compile newer software without killing myself anymore on building the entire compiler and supporting software myself.

The goal has always been to make the NAS device useful and I want to play my movies off it to my TV upstairs so this might be the next thing I play with on the development NAS.

Saturday, February 18, 2012

miniDLNA

I was reading a bit more on DLNA servers and found that someone had gotten miniDLNA working under the Debian port for the BlackArmor NAS.  NickolasZev looks like he is enjoying his Debian install on his BA110. He is documenting his Debian initial install, adding miniDLNA, a webserver, bittorrent and download manager. I'm not interested in putting Debian on mine and want to stay as close to the vanilla firmware as possible but it is nice to see that someone has it working on the hardware.