LinuxMCE + RAID on boot drive

From LinuxMCE
Revision as of 05:14, 9 October 2012 by Mcefan (Talk | contribs)

(diff) ←Older revision | view current revision (diff) | Newer revision→ (diff)
Jump to: navigation, search

Warnings and Disclaimer

This process has only been performed a handful of times and should not be considered reliable. Additionally, this procedure involves rearranging the data on your working drive. A failure during any number of crucial steps will lead to complete loss of data on ALL drives. I would suggest, at least until this process becomes more refined that only Linux competent users attempt it.

I am not sure how this method of RAID will play with the Linux MCE scripts and built-in software RAID module.

IT IS HIGHLY RECOMMENDED THAT YOU BACKUP ALL IMPORTANT DATA.

Preface

The intention of this how-to is to describe how a user may protect their bootable LinuxMCE drive with Linux software RAID. This process works on existing installations too, no reformat is required. It is recommended that you do not use this as an excuse to store your media on your boot drive. This could make future migrations (version upgrades) complex and problematic. Instructions are provided on making all levels of RAID arrays, however it is recommended that RAID 1 be used. RAID 5&6 incur a performance penalty over their RAID 0&1 counterparts, and may be overkill for the level of protection required by your boot drive.

If you do want to run other RAID levels you must make a slight modification to these steps. You must create an additional RAID array, about 50MB big. This array must be RAID 1 and it will house your boot partition. I suggest adding this RAID 1 array to all devices.

For RAID 0 installations your best bet is to backup the contents of an installed partition to another location. Create the array, and then restore the data from backup. It seems technically possible to me, that you could backup just the sectors that would be overwritten by the MD superblock. Perform an in-place copy of the data one stripe at a time from the boot device to the MD device. I will test this at a later date as a fun project. It's all theory for now.

Having a Linux Live CD or USB distribution on hand would be very useful for troubleshooting. Especially if the LinuxMCE machines is currently acting as your internet gateway!

Overview

For Linux guru's, who may not want to read the grueling details these are the basic steps to this how-to.

  1. Install LinuxMCE to a single drive, as normal. (Skip this step if you are converting an existing installation.)
  2. Partition RAID drives (except current boot drive) with the same partitions as the boot drive.
  3. Create all RAID arrays without using the current boot drive (specify missing argument to mdadm).
  4. Format RAID arrays to desired file system.
  5. Mount RAID arrays and copy all files from boot drive to array.
  6. Create initrd image that will load RAID modules (you may skip this step if you are using Linux Autodetect partitions).
  7. Setup fstab and grub on arrays
  8. Reboot
  9. Verify a successful boot to the RAID array
  10. Add original BOOT drive to RAID array
  11. Watch RAID rebuild all data to BOOT drive

Process

  • Install Linux MCE using the instructions here. Skip this step if you are converting an existing installation.
  • Boot the computer with all RAID devices attached.
  • Login to a console over SSH or locally as root.
$ sudo su root
  • Copy partition information from boot drive to target RAID drives. Repeat this command for each RAID drive.
# sfdisk -d /dev/sda | sfdisk /dev/sdb
  • Create degraded RAID array.
If you are using other RAID modes this is the time to tweak RAID settings such as chunk size. Note: I use RAID-1 for the swap partition. This is something overlooked by most people running RAID systems. A failure of your swap space could cause erroneous information to be used by the kernel and cause a failure of the RAID system, even if no drives fail! Don't fall into the trap of running your swap as standalone partitions or worse as RAID 0. As the saying goes, you're only as strong as your weakest link.
Also note that I am using the switch -e1. This tells mdadm to create the array with the new 1.0 superblock instead of the older 0.90. The newer superblock has many more features and less restrictions. This how-to assumes you use this superblock version as well. If you use the 0.90 for auto-detect support please be aware of the restrictions imposed by this older format.
# mdadm -C /dev/md0 -n2 -l1 -e1 missing /dev/sdb1
# mdadm -C /dev/md1 -n2 -l1 -e1 missing /dev/sdb5
# mdadm -C /dev/md2 -n2 -l1 -e1 missing /dev/sdb6
  • Create the file system.
Note: If you are using other RAID modes, you should specify the stripe size using the stride option. This is apparently auto-detected for software RAID arrays, but I would make certain of it or your array will experience extremely poor performance.
# mke2fs -j /dev/md0
# mkswap /dev/md1
# mke2fs -j /dev/md2
  • Mount the RAID array
# mkdir /mnt/md0 /mnt/md2
# mount /dev/md0 /mnt/md0
# mount /dev/md2 /mnt/md2
  • Copy the contents of the boot drive to the RAID array. Rebuild system directories.
# cd /mnt/md0
# tar -c --exclude /dev --exclude /tmp --exclude /sys --exclude /proc --anchored --one-file-system / | tar -xv
# mkdir dev tmp sys proc 
# chmod 1777 tmp
# chmod 755 dev sys
# chmod 555 proc
# cp -axv /dev/md* /mnt/md0/dev/
# cp -axv /dev/hd* /mnt/md0/dev/
# cp -axv /dev/sd* /mnt/md0/dev/
# cp -axv /mnt/recovery/dev/ /mnt/md0/
# cp -axv /mnt/recovery/* /mnt/md2/
  • Chroot to RAID array
# chroot /mnt/md0
# mount -t proc proc /proc
  • Make mdadm.conf.
This is a good time to fill in the MAILADDR field of the mdadm.conf file. Adjust these settings to match your specific installation.
# echo -n "ARRAY /dev/md0 level=raid1 num-devices=2 UUID=" >> /etc/mdadm/mdadm.conf
# mdadm -D /dev/md0 2>/dev/null | grep 'UUID' | awk '{ print $3 }' >> /etc/mdadm/mdadm.conf
# echo -n "ARRAY /dev/md1 level=raid1 num-devices=2 UUID=" >> /etc/mdadm/mdadm.conf
# mdadm -D /dev/md1 2>/dev/null | grep 'UUID' | awk '{ print $3 }' >> /etc/mdadm/mdadm.conf
# echo -n "ARRAY /dev/md2 level=raid1 num-devices=2 UUID=" >> /etc/mdadm/mdadm.conf
# mdadm -D /dev/md2 2>/dev/null | grep 'UUID' | awk '{ print $3 }' >> /etc/mdadm/mdadm.conf

The result should look like these three lines at the end of the file:

ARRAY /dev/md0 level=raid1 num-devices=2 UUID=f71980df:ab649521:bd9f1658:0a1d2015
ARRAY /dev/md1 level=raid1 num-devices=2 UUID=8dbcb53b:35014427:bd9f1658:0a1d2015
ARRAY /dev/md2 level=raid1 num-devices=2 UUID=cf76f7e4:10a3b5a1:bd9f1658:0a1d2015
  • Grab UUID's of RAID block devices.
These are not the same as the UUID reported by mdadm. Do not mix them up and do not copy mine, yours will be different.
# blkid
/dev/md0: UUID="7a57eaf8-0025-4a69-874d-37a32ad74625" SEC_TYPE="ext2" TYPE="ext3"
/dev/md1: TYPE="swap" UUID="6433b425-af37-45b1-a009-bfd121b5dcd2"
/dev/md2: UUID="c7687c07-c8d8-4588-b988-86341cccbb6e" SEC_TYPE="ext2" TYPE="ext3"
  • Replace the matching entry in fstab with your new UUIDs.
With RAID devices in particular it is nice to specify the associated UUID's, since simply adding a new drive the array may change it's device number.
# nano /etc/fstab

Your fstab should look similar to this when you are done:

proc            /proc           proc    defaults        0       0
UUID=7a57eaf8-0025-4a69-874d-37a32ad74625       /               ext3    defaults,errors=remount-ro 0       1
UUID=6433b425-af37-45b1-a009-bfd121b5dcd2  none            swap    sw              0       0
/dev/cdrom      /media/cdrom0   udf,iso9660 user,noauto     0       0
UUID=c7687c07-c8d8-4588-b988-86341cccbb6e /mnt/recovery   ext3    ro              0       0
  • Modify Grub (/boot/grub/menu.lst).
Make your menu.lst look like the one below. Of course substitute your RAID root partition UUID as needed. The last entry (System Recovery) seems to point to an invalid kernel image. This might be a bug or it might be remanent from old scripts. I couldn't find any information on it in the Wiki. I removed this entry, but you may elect to keep it in.
The last entry at the bottom is my original boot entry. It is left there only for the first few boots to ensure the initramfs images can successfully assemble the arrays.
We add a second boot option for the second hard drive in the RAID array. The idea is that we instruct Grub to load this second boot option automatically if the first one fails (eg. a hard drive dies). For now the fallback command is set to use our original boot parameters. Once everything is working we will change this.
Note: Some modern boards play very weird tricks on you when using many different interfaces (IDE/SATA/USB). My BIOS for some reason would order my primary slave IDE hard drive as hd0, and the primary master IDE hard drive as hd4, with my SATA drives spread out all over. Despite Linux ordering these devices properly, and specifying the correct geometry to Grub, grub would still fail to boot from hd1 and insisted that I specify hd4. Specifying the drives in the device.map file did not help. I would imagine manually specifying the BIOS address of each drive would solve the problem. In troubleshooting this issue I found the cat command very useful from the grub command-line. Eg: cat (hd0,0)/boot/grub/menu.lst
default         1
fallback        4

timeout         3

hiddenmenu

title           Ubuntu 7.10, kernel 2.6.22-14-raid - drive 1
root            (hd0,0)
kernel          /boot/vmlinuz-2.6.22-14-generic root=UUID=7a57eaf8-0025-4a69-874d-37a32ad74625 ro quiet splash
initrd          /boot/initrd.img-2.6.22-14-raid
quiet

title           Ubuntu 7.10, kernel 2.6.22-14-raid - drive 2
root            (hd1,0)
kernel          /boot/vmlinuz-2.6.22-14-generic root=UUID=7a57eaf8-0025-4a69-874d-37a32ad74625 ro quiet splash
initrd          /boot/initrd.img-2.6.22-14-raid
quiet

title           Ubuntu 7.10, kernel 2.6.22-14-generic (recovery mode)
root            (hd0,0)
kernel          /boot/vmlinuz-2.6.22-14-generic root=UUID=7a57eaf8-0025-4a69-874d-37a32ad74625 ro single
initrd          /boot/initrd.img-2.6.22-14-raid

title           Ubuntu 7.10, memtest86+
root            (hd0,0)
kernel          /boot/memtest86+.bin
quiet

title           Ubuntu 7.10, kernel 2.6.22-14-generic (original boot)
root            (hd0,0)
kernel          /boot/vmlinuz-2.6.22-14-generic root=UUID=65b7f225-5581-428f-befe-edcc9e1dd9d7 ro quiet splash
initrd          /boot/initrd.img-2.6.22-14-generic
quiet
  • At this time it is a good idea to check your /boot/grub/device.map file and ensure that hd0 and hd1 point to both of your RAID devices. If you are having boot-time issues see my notes above about weird BIOS ordering.
  • Make new initramfs image to load RAID modules and boot the correct root device. Ubuntu comes with the mkinitramfs tools, so this step is very easy. We must add one easy script to the initramfs image.
# echo "mdadm --assemble --scan" > /etc/initramfs-tools/scripts/init-premount/mdadm
# chmod 755 /etc/initramfs-tools/scripts/init-premount/mdadm
# mkinitramfs -o /boot/initrd.img-2.6.22-14-raid
  • Run the grub install utility. Note: For the first boot it is best to use the --once option. That way if we made any mistakes it will revert to previous settings and boot normally.
# grub
# root (hd1,0)
# setup (hd0)
# quit
  • Exit chroot. Unmount RAID drive. Reboot.
# exit
# umount /mnt/md0/proc
# umount /mnt/md0
# umount /mnt/md2
# mdadm -S /dev/md0
# mdadm -S /dev/md1
# mdadm -S /dev/md2
  • Verify that system booted off of RAID device without error.
You're looking to match the UUID of the current root filesystem with the UUID from the md0 device we found earlier using blkids.
If you ran into errors, boot to the last option which should boot Linux MCE as normal.
# mount | grep 'on / '
rootfs on / type rootfs (rw)
/dev/disk/by-uuid/7a57eaf8-0025-4a69-874d-37a32ad74625 on / type ext3 (rw,data=ordered)
  • Add original boot device to RAID
This is where the real magic takes place. We will now add our original boot drive to the RAID arrays, which will begin syncing immediately.
# swapoff /dev/sda5
# hdadm --add /dev/md0 /dev/sda1
# hdadm --add /dev/md1 /dev/sda5
# hdadm --add /dev/md2 /dev/sda6
  • Watch RAID rebuild
The arrays will begin rebuilding, one array at a time.
# watch -n2 cat /proc/mdstat
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
md2 : active raid1 sda6[2] sdb6[1]
      9766848 blocks super 1.0 [2/1] [_U]
        resync=DELAYED

md1 : active raid1 sda5[2] sdb5[1]
      1951852 blocks super 1.0 [2/1] [_U]
        resync=DELAYED

md0 : active raid1 sda1[2] sdb1[1]
      68324340 blocks super 1.0 [2/1] [_U]
      [==>..................]  recovery = 12.5% (8557312/68324340) finish=26.8min speed=37032K/sec

unused devices: <none>
  • After the arrays have been rebuilt it's time to run the grub install again.
# grub
# root (hd0,0)
# setup (hd0)
# root (hd1,0)
# setup (hd1)
# quit
  • Change menu.lst options to default 0, fallback 1

Caveats

I am unsure of how this will play with the built-in raid scripts and components of LinuxMCE. After more testing I will update this wiki with my findings.

This process is dangerous. It involves moving all of your data around. There is significant protential for data loss due to factors out of your control, such as power failure or hard drive mechanical failure. I recommend backing up all of your data prior to attempting this.

Troubleshooting

One of the great advantages of Linux software RAID (and open source in general) is how much flexibility you get with the software. Recovering from data failure with Linux is far easier than all other solutions. If you get stuck in a potential data loss situation, ask for help on some Linux forums or mailing lists. It is EXTREMELY useful to have a Linux Live distribution on hand.

Notes

This document is a work in progress. Please update it with your experiences and comments. I personally have followed these steps and experienced great success, that does not mean you will too.