2017 Migration from ESXi 5.5 to Proxmox 5.0

Published by Pascal on

Here at Encodo, we host our services in our own infrastructure which, after 12 years, has grown quite large. But this article is about our migration away from VMWare.

So, here’s how we proceeded:

We set up a test environment as close as possible to the new one before buying the new server, to test everything. This is the first time we had contact with software raids and it’s monitoring capabilities.

Install the Hypervisor

Installation time, here it goes:

  • Install latest Proxmox[1]: This is very straight forward, I won’t go into that part.
  • After the installation is done, log in via ssh and check the syslog for errors (we had some NTP issues, so I fixed that before doing anything else).

Check Disks

We have our 3 Disks for our Raid5. We do not have a lot of files to store, so we use 1TB which should be still OK (see Why RAID 5 stops working in 2009 as to why you shouldn’t do RAID5 anymore).

We set up Proxmox on a 256GB SSD. Our production server will have 4x 1TB SSD’s, one of which is a spare. Note down the serial number of all your disks. I don’t care how you do it—make pictures or whatever—but if you ever care which slot contains which disk or if the failing disk is actually in that slot, having solid documentation really helps a ton.

You should check your disks for errors beforehand! Do a full smartctl check. Find out which disks are which. This is key, we even took pictures prior to inserting them into the server (and put them in our wiki) so we have the SN available for each slot.

See which disk is which:

for × in {a..e}; do smartctl -a /dev/sd$x | grep 'Serial' | xargs echo "/dev/sd$x: "; done

Start a long test for each disk:

for × in {a..e}; do smartctl -t long /dev/sd$x; done

See SMART tests with smartctl for more detailed information.

Disk Layout & Building the RAID

We’ll assume the following hard disk layout:

/dev/sda = System Disk (Proxmox installation)
/dev/sdb = RAID 5, 1
/dev/sdc = RAID 5, 2
/dev/sdd = RAID 5, 3
/dev/sde = RAID 5 Spare disk
/dev/sdf = RAID 1, 1
/dev/sdg = RAID 1, 2
/dev/sdh = Temporary disk for migration

When the check is done (usually a few hours), you can verify the test result with

smartctl -a /dev/sdX

Now that we know our disks are OK, we can proceed creating the software RAID. Make sure you get the correct disks:

mdadm –create –verbose /dev/md0 –level=5 –raid-devices=3 /dev/sdb /dev/sdc /dev/sdd

The RAID5 will start building immediately but you can also start using it right away. Since I had other things on my hand, I waited for it to finish.

Add the spare disk (if you have one) and export the configuration to the config:

mdadm –add /dev/md0 /dev/sde
mdadm –detail –scan >> /etc/mdadm/mdadm.conf

Configure Monitoring

Edit the email address in /etc/mdadm/mdadm.conf to a valid mail address within your network and test it via

mdadm –monitor –scan –test -1

Once you know that your monitoring mails come through, add active monitoring for the raid device:

mdadm –monitor –daemonise –mail=valid@domain.com –delay=1800 /dev/md0

To finish up monitoring, it’s important to read the mismatch_cnt from /sys/block/md0/md/mismatch_cnt periodically to make sure the Hardware is OK. We use our very old Nagios installation for this and got a working script for the check from Mdadm checkarray by Thomas Krenn

Creating and Mounting Volumes

Back to building! We now need to make the created storage available to Proxmox. To do this, we create a PV, VG and an LV-Thin-Pool. We use 90% of the storage since we need to migrate other devices as well, and 10% is enough for us to migrate 2 VM’s at a time. We format it with XFS:

pvcreate /dev/md0 storage
vgcreate raid5vg /dev/md0
lvcreate -l 90%FREE -T raid5vg
lvcreate -n migrationlv -l +100%FREE raid5vg
mkfs.xfs /dev/mapper/raid5vg-migrationlv

Mount the formatted migration logical volume (if you want to reboot, add it to fstab obviously):

mkdir /mnt/migration
mount /dev/mapper/raid5vg-migrationlv /mnt/migration

If you don’t have the disk space to migrate the VM’s like this, add an additional disk (/dev/sdh in our case). Create a new partition on it with

fdisk /dev/sdh

Accept all the defaults for max size. Then format the partition with xfs and mount it:

mkfs.xfs /dev/sdh1
mkdir /mnt/largemigration
mount /dev/sdh1 /mnt/largemigration

Now you can go to your Proxmox installation and add the thin pool (and your largemigration partition if you have it) in the Datacenter -> Storage -> Add. Give it an ID (I called it raid5 because I’m very creative), Volume Group: raid5vg, Thin Pool: raid5lv.

Extra: Upgrade Proxmox

At this time, we’d bought our Proxmox license and did a dist upgrade from 4.4 to 5.0 which had just released. To do that, follow the upgrade document from the Proxmox wiki. Or install 5.0 right away.

Migrating VMs

Now that the storage is in place, we are all set to create our VM’s and do the migration. Here’s the process we were doing − there are probably more elegant and efficient ways to do that, but this way works for both our Ubuntu installations and our Windows VM’s:

  1. In ESXi: Shut down the VM to migrate
  2. Download the vmdk file from vmware from the storage or activate ssh on ESXi and scp the vmdk including the flat file (important) directly to /mnt/migration (or largemigration respectively).
  3. Shrink the vmdk if you actually downloaded it locally (use the non-flat file as input if the flat doesn’t work):[2]
    vdiskmanager-windows.exe -r vmname.vmdk -t 0 vmname-pve.vmdk
  4. Copy the new file (vmname-pve.vmdk) to proxmox via scp into the migration directory /mnt/migration (or largemigration respectively)
  5. Ssh into your proxmox installation and convert the disk to qcow2:
    qemu-img convert -f vmdk /mnt/migration/vmname-pve.vmdk -O qcow2 /mnt/migration/vmname-pve.qcow2
  6. In the meantime you can create a new VM:
    1. In general: give it the same resources as it had in the old hypervisor
    2. Do not attach a cd/dvd
    3. Set the disk to at least the size of the vmdk image
    4. Make sure the image is in the “migration” storage
    5. Note the ID of the vm, you’re gonna need it in the next step
  7. Once the conversion to qcow2 is done, override the existing image with the converted one. Make sure you get the correct ID and that the target .qcow2 file exists. Override with no remorse:
    mv /mnt/migration/vmname-pve.qcow2 /mnt/migration/images/<vm-id>/vm-<vm-id>-disk-1.qcow2
  8. When this is done, boot the image and test if it comes up and runs
  9. If it does, go to promox and move the disk to the RAID5:
    1. Select the VM you just started
    2. Go to Hardware
    3. Click on Hard Disk
    4. Click on Move Disk
    5. Select the Raid5 Storage and check the checkbox Delete Source
    6. This will happen live

That’s it. Now repeat these last steps for all the VMs − in our case around 20, which is just barely manageable without any automation.
If you have more VMs you could automate more things, like copying the VMs directly from ESXi to Proxmox via scp and do the initial conversion there.

[1] We initially installed Proxmox 4.4, then upgraded to 5.0 during the migration.
[2] You can get the vdiskmanager from Repairing a virtual disk in Fusion 3.1 and Workstation 7.1 (1023856) under “Attachments”