If you want to get in AWS and want to understand the things that are important for you, then there are many helpful sources. At first you have to know which special services you need for your application. A summary of all Amazon Web Services can be found here. For AWSAC the most important services are EC2 and S3. SimpleDB may perhaps be called “bonus”. EC2 is much more complex than S3. So for me, understanding EC2 was of top priority at the beginning. Hence, I worked myself through Amazon’s EC2 Resources. There you can find the official documentation (including the very essential Developer Guide), great articles, helpful tutorials and useful examples. Additionally, the AWS Forums are very convenient for special problems and questions (great support!).
To spare you to read through everything, in this chapter I will explain the technical facts that you should have in background while reading this documentation.
AWS is using a so-called Web Service Description Language (WSDL) that strictly defines operations and how to control their services. As stated before in chapter 2, every service is controlled via HTTP. More exactly, a special language meeting the definitions in the WSDL must be spoken over HTTP. For such a special language a XML structure is convenient. This language is spoken in a conversation between the client (the one that has a special request) and the server aws.amazon.com (the one that elaborates a corresponding response). HTTP is used as a protocol to arrange this conversation, since HTTP natively is a request/response standard between a client and a server and it should be accessible from any internet connection.
Consider a client that wants to perform an action on AWS, e.g. on EC2. An action encapsulates the possible interaction between the client and EC2, consisting of a request and response message pair. Then, simply expressed, the Web Services Description Language (WSDL) strictly defines the special structure of this message pair. A special request from a client must meet the corresponding entry in the WSDL. Then either AWS follows this request and performs the requested action; or the request will be declined with a specific error code in the response. The response, in case of success or failure, is strictly defined by the WSDL, too.
The set of possible actions or valid requests defined by the WSDL for e.g. EC2 builds the EC2 Application Programming Interface (API). One requested action is called API call.
The AWS API is subject to constant enhancement. Sometimes a new API version is released. Normally all changes are downwardly compatible; but to avoid any problems, each API version has its own documentation and, most important, the client is able to define the API version its request should be examined with.
In this part I want to be more precise on the concrete language spoken in a conversation between the client and the server (the language that is transported by HTTP). The request may be formed in two different languages, while the response is always the same language.
Now let me illustrate the two possible the request/response systems, that differ in the form of the requests: Imagine you would like to create a new Elastic Block Store volume of 800 GiB in the availability zone “us-east-1a”. Then the two possible requests - without authentication and other security overhead - look like
<CreateVolume xmlns="http://ec2.amazonaws.com/doc/2008-08-08"> <size>800</size> <zone>us-east-1a</zone> </CreateVolume>in XML form (the so-called SOAP API) or
https://ec2.amazonaws.com/ ?Action=CreateVolume &Size=800 &Zone=us-east-1ain HTTP Query-based form (using standard GET or POST methods to submit parameters)
For both of these two requests the response - in case of success - is in XML form and looks like
<CreateVolumeResponse xmlns="http://ec2.amazonaws.com/doc/2008-08-08"> <volumeId>vol-4d826724</volumeId> <size>800</size> <status>creating</status> <createTime>2008-05-07T11:51:50.000Z</createTime> <zone>us-east-1a</zone> <snapshotId></snapshotId> </CreateVolumeResponse>
As you can see, AWS informs you in detail about the requested process. The response always contains all information that you perhaps could need to work on. In this example case, the most essential information is the unique volume ID of the new EBS volume, which you need to e.g. attach the volume to an EC2 instance.
Example requests and corresponding responses of all EC2 API calls are listed in the EC2 Developer Guide - API Reference (for both SOAP and Query API).
The communication with AWS takes place between a managing client and https://aws.amazon.com. Hence, the communication itself is encrypted.
To ensure that only you can perform actions with your account, every request contains authentication parameters. In the case of the Query API, the two important elements of these auth parameters are
- the so-called Access Key ID (public identifier of your account)
- a hash that was built locally from the request data itself in combination with the so-called AWS Secret Key.
Then AWS reads out the Access Key ID from the request, looks up the corresponding Secret Key (from a database) and recalculates the hash with the same algorithm as the client did. When the two hashes (the one sent by the client and the one AWS calculated) match, the request is identified as valid.
The authentication mechanism is different for different AWS and may differ between the API types, too. E.g. when using the SOAP API for EC2, security is guaranteed by using an X.509 certificate in combination with an RSA public/private key pair and when using the SOAP API for S3, the mechanism is almost the same as described above (for EC2 Query API).
To ensure security within the cloud (e.g. between virtual machines of different EC2 users) and between the cloud and the internet, AWS has an entire configurable Network Security concept.
Since every modern high-level programming language has web service libraries (e.g. for HTTP and XML), it is no problem to implement the AWS API. As a result, there meanwhile exist different modules/libraries to control AWS for e.g. Ruby, Java, PHP, Python and so on. Some of these libraries are official libraries, released by AWS itself. With the help of these libraries, it is possible to develop big and powerful applications on the base of AWS!
In the case of Python, the corresponding AWS module is third-party (initiated by Mitch Garnaat) and called boto. Information can be found here: http://code.google.com/p/boto/
I decided to use Python with boto to control AWS and to build the applications that are now called AWSACtools. This is because the Python language in my eyes is perfect for doing such scripting things in the easiest way.
S3 is not as complex as e.g. EC2 and you already know many important things from the S3 introduction. But it is convenient to learn some details more about S3. At this point I can’t summarize it better than Amazon did. Please read the Core Concepts part of Amazon’s S3 Developer Guide. After this you know everything needed about the data model consisting of buckets, objects and keys. Go on reading about Access Control Lists to learn about the mechanism behind sharing data between AWS users or even with the public. The rest of the S3 Developer Guide also offers interesting stuff.
The S3Fox Organizer is a Firefox plug-in that has implemented the S3 API. Thus, S3Fox has the ability to offer a graphical user interface (GUI) to the S3 interface. Sometimes it is convenient to visualize the stored data of one or more AWS account(s) at S3. Then S3Fox offers easy possibilities to check things.
I think that such a tool is really essential, because it may become necessary to e.g. have a look, if a script has really done, what it should have done. So it is great for monitoring. But you have to know that S3Fox maps the bucket and key structure on a classical directory structure: buckets are the highest directory layer and keys containing “/” are split and treated as directories, too. This sometimes may become confusing.
Moreover, S3Fox is helpful to administrate the Access Control Lists.
In this part I will illuminate the relation between the different elements EC2 consists of and explain how to deal with them using different tools.
Amazon Machine Image (AMI):
An AMI is an encrypted image of almost all files of an operating system, including any user-given data. The encrypted image is divided in part files. For each part a checksum is built and logged into a manifest file. The part files together with the manifest file build an Amazon Machine Image, which has to be uploaded to S3 and registered (validated), before instances can be launched from this AMI. Building an AMI is done by using special tools, that are described later. After successful registration of an AMI, it gets an individual and unique AMI ID. Read what Amazon says about AMIs.
General components:
In the the EC2 introduction, I already introduced the most important components of EC2. Let me refer to the corresponding part in the EC2 Developer Guide, the Components of EC2. Hence, an Instance is a running virtual machine in the Elastic Computing Cloud that was started from an AMI and Instance Store is virtual hard disk space closely connected to an instance which means that it is lost, when the instance terminates.
Let me introduce other important components of EC2 by means of usage examples. I think that this is the easiest and clearest way. Consider the case you want to start instance(s) with one API call (one so-called reservation). Before sending this API call, you have to know exactly what you want:
AMI
Choose the special system you would like to start up, defined by its AMI ID. Find out the AMI ID of your wanted AMI. Only registered AMIs can be launched.
instance type
Choose between various instance types. The type must be adjusted to your needs. These instance types have names like m1.small and c1.xlarge. A full list of these names can be found in the EC2 Developer Guide - Instance Types. It is important to consider the different cost / performance ratios of the different instance types. If you e.g. like to start up a 32 bit system for high CPU stressing applications, then you should start up c1.medium for 0.20 $ per hour instead of m1.small for 0.10 $. This is because c1.medium has 2 virtual cores with 2.5 EC2 Compute Units (ECU) each («One EC2 Compute Unit provides the equivalent CPU capacity of a 1.0-1.2 GHz 2007 Opteron or 2007 Xeon processor») and m1.small only has 1 core with 1 ECU. This is the fifth part of computing power for half of the price in comparison with c1.medium.
number of instances
Carefully choose the number of instances to start up at the same time (with the same API call). AWS has a min/max system: «If [...] EC2 cannot launch the minimum number [of instances] you request, no instances launch. If there is insufficient capacity to launch the maximum number [...], EC2 launches as many as possible [...].» It is sufficient to only define the minimal number. This always satisfied my needs.
security group
As you can read in the Network Security part of the EC2 Developer Guide, you can define so-called security groups. Each group gets its own firewall settings (everything blocked by default!). If you like to request a reservation (start up instances), you have to specify to which security group the reservation should belong.
availability zone
The (currently) three availability zones describe different and physically wide divided AWS computer centres. Sometimes it is important to strictly define in which computer centre the instances should start up. The interaction between instances is most efficient, when they are at the same place. And, for e.g. attaching an Elastic Block Store volume to an instance, it is even necessary that they are in the same availability zone.
keypair
If you like to start up a public AMI and wish to log in via ssh, then you don’t know root’s password. If you like to offer one of your AMIs to the public, then you don’t want to distribute root’s password. This is inconvenient. The solution are EC2 keypairs. EC2 offers the possibility to create RSA keypairs for you. Each of EC2 keypairs has a specific name. EC2 then keeps the public keys and you keep the private keys (as files) of your EC2 keypairs. Delivering one keypair’s name within the API call to start instances, you instruct EC2 to inject the corresponding public key into the instance at boot time. Then you can log in as root using the corresponding private key.
userdata
The so-called userdata can be submitted in form of a base64 encoded string to all instances that are started by a RunInstances API call. All EC2 instances within this created EC2 reservation are able to receive this string from an internal EC2 server with the static internal IP 169.254.169.254. In case of the HTTP Query API, the userdata is URL limited to 8 kB, so we can’t use it to submit files (note: when using the SOAP API, EC2 limits the userdata string to 16 kB). But we can use it to make essential and necessary information accessible for the instances within the reservation.
Some of the named parameters are optional: you must not define a keypair or userdata; no availability zone leads to a random one, no security group leads to the default group.
Note
Let me again explain the term reservation. Every RunInstances API call instructs EC2 to start up one or more instances. All instances started up within this one API call belong to the same reservation. This reservation has a unique reservation ID. So all instances just started have the same reservation ID. But every instance within a reservation has an own globally unique instance ID. Additionally each instance within a reservation has an own so-called launch index, starting from 0. This is important for developing applications, because using launch-index and/or instance ID is the only way to distinguish between various instances within one reservation.
Consider the RunInstances API call as valid and the requested instances as started up. If you like to connect to one of your instances, the most important parameter is the public DNS name. You e.g. can use this address to connect to your instance via ssh, preconditioned that the corresponding security group the instance is in allows external accesses on port 22.
Elastic Block Store:
As stated above, the instance storage is lost, when the corresponding instance shuts down. In the list of instance types you can see, that this instance storage is really big. But his doesn’t help, if you like to have persistent and fast storage that exists independently from instances. S3 is not a file system and is sorted out (note: there are third-party possibilities to wrap S3 and create a pseudo file system on top of it). The solution is Elastic Block Store (EBS). An EBS volume itself can be considered as independent EC2 component. It can be created at any time (you have to define its size and its availability zone) and deleted at any time. In the meanwhile it may be attached to instances (only one instance at the same time). Within such an instance an EBS volume appears as hard disk drive that can be mounted and formatted with any file system.
Amazon developed an API call, that invokes a backup of a specific EBS volume to S3. Such a backup is called snapshot. It is possible to create a new EBS volume from an existing snapshot within seconds. Each snapshot has a unique ID, the snapshot ID.
AWS has implemented its EC2 API itself using Java and Javascript. The results are the command line tools EC2 API Tools and the Firefox plug-in Elasticfox. To create own AMIs, Amazon delivers the EC2 AMI Tools, based on Ruby.
EC2 API Tools:
The EC2 API Tools provide little programs for the command line for invoking all possible EC2 API calls. Since they are command line tools, they are useful for scripting. Each little tool with its possible command line parameters is documented in the EC2 Developer Guide - API Tools.
The EC2 API Tools implement the SOAP API. As stated above, in 4.1.2 Authentification and security, this means that the authentication of requests is warranted using a private key / certificate system. If you are the owner of an AWS account, the corresponding files (certificate and key file) can be downloaded from the AWS account management interface.
You can get the EC2 API Tools here: EC2 Developer Resources - Amazon EC2 API Tools.
Elasticfox:
Elasticfox grants a graphical user interface (GUI) to the EC2 API. During development, Elasticfox became one of the most important tools for me. Since it knows and controls the complete set of API calls, it replaced the API Tools for me in terms of monitoring and manual EC2 management. It is very helpful to visualize what scripts are doing, since it e.g. displays the state of instances and EBS volumes. Launching and terminating instances becomes very easy, like registering AMIs, creating keypairs, creating and attaching EBS volumes and so on. Try it!
Since it implements the HTTP Query API, it only needs your AWS Secret Key and the Access Key ID to handle your account.
Check it out at Sourceforge: http://sourceforge.net/projects/elasticfox/ or in the EC2 Developer Resources.
EC2 AMI Tools:
If you like to create an own Amazon Machine Image, you should adopt the help of EC2 AMI Tools. They are command line tools, too. Usage is explained in the EC2 Developer Guide - AMI Tools.
The tools are able to bundle an operating system into an AMI. They provide two ways to do this:
- bundle the needed files of a running system into an AMI
- bundle a loopback file containing a system into an AMI
After having the image itself, the EC2 AMI Tools use your EC2 private key / certificate to perform the encryption. Then, the encrypted image is divided in part files. For each part a checksum is built and logged into the manifest file. Additionally, the tools provide the possibility to upload everything to S3. Registering then is done via Elasticfox or the EC2 API Tools.
Get the tools from EC2 Developer Resources - Amazon EC2 AMI Tools