IT


23
Nov 09

Backups in the Hosting World

Something we’ve been finding pretty frustrating lately is the whole issue of backups. On my desktop I run Dropbox, but there still isn’t a Dropbox quality (or ease of use) service for hosting companies. If you run a website / service / business you need at the very least, a disaster recovery plan, and that plan involves backups. There are several ways hosting companies deal with this.

1) They dont do them

Yeah you read that right. You don’t actually get backups. These policies are buried deep in their terms of service or usage policies. It’s totally up to you to backup your server content. If you don’t and your server crashes its the end-users problem. High profile data losses can destroy any business, especially startups.

2) Highly Available Storage

This strategy is usually combined with #1 above. Instead of backing up your data they just replicate your data across multiple drives. This means that the chances of you losing your data go down, and depending on the technology used to safeguard against drive failure, you can get really high availability. (iSCSI and ZFS come to mind). Its important to remember that RAID is NOT a backup solution, only a way to mitigate potential failures.

3) OS Level Backups

In this strategy end users are still required to worry about their own data, and choose which sets of data they backup. A hosting company will provide an end-point for you to send your backups to. If you’ve ever done managed or dedicated hosting, this is often the product that is sold. Tivoli or some other backup client is provided, but still relies on either a consultant, sysadmin or service provider to configure correctly.

4) VM Level Backups

There’s another solution that works, and it gets around a lot of the issues with OS level backups, like running a database while doing a backup, etc. Snapshot the entire virtual machine and replicate the VM to an off-site storage system. For better performance, use data de-duplication technology to reduce the amount of time to perform your backup. This system seems to work well, however few providers are offering it.

What do you think? What’s your favourite backup strategy as a hosting company?


20
Oct 09

How to move a Virtual Machine From EC2 to VirtualBox or KVM

There have been quite a few requests on forums and blog posts on a few sites we frequent asking someone to figure out how to move a virtual machine from EC2 to VirtualBox or KVM. We’ve got quite a bit of experience working with KVM so we figured why not try our hand at importing a virtual machine template from the Amazon AMI repository so that developers or sysadmins could run them in their local environments. We’ve already written a howto on importing an AMI from Amazon, so you may want to read that first, but this howto also applies to just creating a KVM or VirtualBox image from a linux filesystem of any kind. Right now this particular method only works with Linux but there are more OS agnostic (and much slower) methods for transposing virtual machines. So without further delay, let’s get started.

You’ll need at least 15 gigs of free space to make this work.

1) Download and unpack an AMI from Amazon

You can learn how to do that here, or if you have sufficient knowledge you can build a full linux filesystem

2) Prepare a new raw drive file

We’ll create a file backed drive, set it up so we can partition it and create a new filesystem.

Create the file by using the ‘dd’ command.

dd if=/dev/zero of=newimage.raw bs=1M count=10240

Add it to a loopback device

losetup -fv newimage.raw

Partition the file backed loopback device. For this we’ll just create one partition which is the whole disk. Make sure its bootable.

cfdisk /dev/loop0

Write the partition and exit

Now we’re going to create a filesystem on the partition we just created. Please note that there’s a problem with the way mfks works. When trying to automatically determine filesystem sizes on loopback devices it makes a mistake. So for this we need to do a few things.

Find the partition beginning, ending, number of blocks, number of cylinders, and blocksize

fdisk -l -u /dev/loop0
 
Disk /dev/loop0: 10.7 GB, 10737418240 bytes
255 heads, 63 sectors/track, 1305 cylinders, total 20971520 sectors
Units = sectors of 1 * 512 = 512 bytes
Disk identifier: 0x00000000
 
      Device Boot      Start         End      Blocks   Id  System
/dev/loop0p1   *          63    20964824    10482381   83  Linux

Create a new loopback device for the partition. We do this by calculating the beginning of the partition x blocksize

In this case that’s 512 * 63 (actually in most cases thats what it is)

losetup -fv -o $((512 * 63)) newimage.raw
Loop device is /dev/loop1

Remember those numbers we grabbed earlier using fdisk? Plunk them into this formula. For our example:

( END – START ) x Units / Block Size
If you don’t know the block size use 4096. That’s “standard” and usually the size configured on most ext2/3 filesystems.

So for us it’s this:

echo $(((20964824 - 63) * 512 / 4096 ))

This gives is the number of blocks we need to use in our next command, which is used to create a filesystem with a blocksize of 4096 on /dev/loop1 of block count 2620595. You have to specify the blocksize, otherwise mkfs will try and automatically determine a bunch of things for you which will just break things.

mkfs.ext3 -b 4096 /dev/loop1 2620595

3) Copy & Prepare New Root Filesystem

You can now mount this newly created filesystem somewhere and copy a root filesystem into it. If that filesystem happens to be a Xen image from Amazon, you can use that.

mkdir -p /mnt/loop/1
mount -t ext3 /dev/loop1 /mnt/loop/1
cp -a /some/root/filesystem/* /mnt/loop/1/

Xen virtual machines run with a special kernel that can run under KVM using Xenner, but not other platforms like VirtualBox, so we’re going to copy a real kernel in there. You can use one from another linux system if you want, it will work fine, but you should use one that has the modules required by your virtualization platform. We already have a KVM tuned kernel and initrd available so we’re going to use those.

Note: If you’re going to just copy in the initrd and kernel then make sure the initrd includes all of the modules required to boot your machine.

cp -r /some/boot/filesystem/* /mnt/loop/1/boot/

You should now see a the kernel, initrd and the grub directory in there.

Edit the menu.lst and make sure the root= is set to /dev/sda1

vim /mnt/loop/1/boot/grub/menu.lst

Edit the /etc/fstab in your mounted vm

vim /mnt/loop/1/etc/fstab

Because amazon’s best practices involve setting a random root password, which gets overridden at start time, you’ll have to solve that little problem.

chroot /mnt/loop/1
mv /etc/rc.local /etc/rc.local-old
passwd root
exit

5) Setup Grub on the New Drive

Now unmount /mnt/loop/1 and delete the loopback device for the partition (the one with the offset) so we can setup the bootloader. Grub complains about installing the MBR code when the loopback device is still active on the partition. Leave the loopback device for the entire drive. We’ll need that to get some numbers from fdisk.

umount /mnt/loop/1
losetup -d /mnt/loop1
fdisk -l -u /dev/loop0
 
Disk /dev/loop0: 10.7 GB, 10737418240 bytes
255 heads, 63 sectors/track, 1305 cylinders, total 20971520 sectors
Units = sectors of 1 * 512 = 512 bytes
Disk identifier: 0x00000000
 
Device Boot      Start         End      Blocks   Id  System
/dev/loop0p1   *          63    20964824    10482381   83  Linux

Make a note of the numbers that were presented here. We’ll need the following to setup grub

  • Cylinders : 1305
  • Heads : 255
  • Sectors / Track : 63

These numbers may be different for you depending on the size of partition you created, or a whole bunch of other variables. It’s important to remember these values because we’ll need them for our next step, which is to setup grub.

The following lists the set of commands required to setup the bootloader on a file backed disk over a loopback device.

grub --device-map=/dev/null
device (hd0) /images/newimage.raw
geometry (hd0) 1305 255 63
root (hd0,0)
setup (hd0)

Here’s what that looks like in the grub dialogue:

grub --device-map=/dev/null
 
Probing devices to guess BIOS drives. This may take a long time.
Unknown partition table signature
 
[ Minimal BASH-like line editing is supported.   For
the   first   word,  TAB  lists  possible  command
completions.  Anywhere else TAB lists the possible
completions of a device/filename. ]
grub> device (hd0) /images/newimage.raw
device (hd0) /images/newimage.raw
grub> geometry (hd0) 1305 255 63
geometry (hd0) 1305 255 63
grub> root (hd0,0)
grub> setup (hd0)
 
Checking if "/boot/grub/stage1" exists... yes
Checking if "/boot/grub/stage2" exists... yes
Checking if "/boot/grub/e2fs_stage1_5" exists... yes
Running "embed /boot/grub/e2fs_stage1_5 (hd0)"...  17 sectors are embedded.
succeeded
Running "install /boot/grub/stage1 (hd0) (hd0)1+17 p (hd0,0)/boot/grub/stage2 /boot/grub/menu.lst"... succeeded
Done.
grub>quit

Conclusion

Yay! All you need to do now is delete all those loopback devices attached to your file, and boot it up in either KVM or VirtualBox
Hope you found that useful.


9
Oct 09

Government Brief on Canadian Cloud Computing

Today the Canadian Government released a brief on the opportunities for Canada in Cloud Computing. It’s a great paper that highlights some of the benefits and strategic advantages of building large cloud computing centers in Canada. I’ll jump straight to the conclusion in the article: Canada is one of the BEST places to build out data centers and cloud computing infrastructure. The article mentions a bunch of reasons – I’ll expand on a few.

Geography & Climate

Most of the costs associated with running the 1,000’s of servers is directly associated with the price of electricity and the cost of cooling. Canada has cheap, renewable electricity & it’s colder. That means you can offer competitive services at better margins than someone running a cloud in the hot Nevada Desert. Michael Geist wrote more about it at Clean Cloud Computing.

Legal Reasons

Not only are many Canadian companies required to keep their data on native soil, the privacy and electronic documentations act means keeping information here is a really good idea.

Reliable, low cost, renewable energy

The BC, PEI, and Quebec governments actually have the cleanest and lowest cost per KWh electricity prices in all of North America. That’s possible through the use of hydro-electric dams, which also have an extremely low carbon footprint. As stated previously, the cost of running your servers is mostly the cost of electricity.
Cheaper electricty = Competitive Cloud

We’re right next to the American market

One of the fastest computer networks in the world, funded in part by the Canadian government, already runs through most of Canada. We’re also right next to the American market. That means North Americans can’t really tell if their servers are in Nevada or Nunavut. From a consumers perspective, there would be no reason not to use a Canadian Cloud that’s cheap, secure, and efficient, and we would be able to export a utility that is higher margin than say, electricity.

All in all I’m really excited by this report, and I’m sure that more people will be thinking about the potential Canada has to become the world leader in cloud computing services. You can get a little more background information, and learn more about the suggested ways forward by reading the brief here “Cloud Computing and the Canadian Government


14
May 09

If A Tweet Killed a Tuna – Energy Cost Transparency in IT

One of the keys to improving anything is having enough information. This has been widely discussed in environmental circles, and recent innovations such as the Kill-A-Watt and the awesome hack the Tweet-A-Watt have lead to a more widespread appreciation for just *knowing* the amount of energy your appliances, computers, and home entertainment systems are consuming.  In addition to being surprising, the reality is that all too often assumptions are made about where to focus effort to fix a particular problem – or worse, you don’t even know a problem exists. But what to do with this information? At home it’s as easy as putting your devices on a power bar – such as your home theater – and turning it off when you’re not using it. Having the data enables you to make a decision – the decision to save money because all of the sudden it’s tangible.

These kinds of details can be applied at a really big and small scales too. What if you could measure the amount of power went into making your car? The amount of energy each Google search takes? The amount of energy for every tweet? Would knowing a tweet kills a tuna make you think twice? Would it enable you to make better decisions about the products you consume? Would it allow your customers to make better decisions about their energy efficiency?

This can apply to the hosting world too. Computers currently use more energy than the entire airline industry, and that’s expected to double within the next 5 years. Data centers consume a whopping 2-3% of the power in the United States alone. Hosting companies charge flat rates for collocation, virtual servers, shared hosting, etc. Bundled into that are the charges for electricity, and the electricity required to power the cooling. Unless you’re really close to the physical infrastructure, there’s no way to measure how efficient the servers are, or how much power your server is consuming. If we could measure the amount of power a server uses then you could incorporate that into the pricing of the server, and display the information separately. As a hosting company you would be able to make better decisions about which hardware, software, etc to use. As a hosting customer, you would be able to choose locations that are more power efficient. A slew of other possibilities exist. Due to power deregulation and trading markets in many locations, what costs a dollar during the day might cost 10 cents in the middle of the night.

hourly-demand-in-ontario

Data centers are built for peak capacity, but there should be an incentive for customers to adopt more energy efficient solutions. Being able to measure (in)efficiencies also means that making decisions about moving to a container might be easier to justify.