How to replace soft RAID1 hard drive (Hetzner)
Out Of Date Warning
This article was published on 10/07/2014, this means the content may be out of date or no longer relevant.
You should verify that the technical information in this article is still up to date before relying upon it for your own purposes.
Running your own metal (unmanaged) means, it is, to some degree, your responsibility to fix, if a hardware failure happens. We have been using Hetzner as a host for Empfehlungsbund.de for almost 2 years now, but already experienced 2 individual failures of a hard drive. Neither was a real problem, because both ran on RAID1 and were able to be easily replaced. This time, I want to document the steps I took, in the hope of saving myself and other customers time in the future.
Disclaimer: In case of problems, I take no responsibility for any damage. If you don't know what to do, take a managed option or ask a real sysadmin.
Receiving DegradedArray Event e-Mails
Normally, you will receive an E-Mail to your admin/root account:
This is an automatically generated mail message from mdadm running on server.example.com. A DegradedArray event had been detected on md device /dev/md0.
First thing is to log in as root and check, which hard drives and RAID arrays are affected:
$ cat /proc/mdstat Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10] md2 : active raid1 sdb3 723658368 blocks [2/1] [_U] md1 : active raid1 sdb2 sda2 524224 blocks [2/2] [UU] md0 : active raid1 sda1 sdb1 8388544 blocks [2/2] [UU]
Things we see:
- There are 3 RAIDs (
md2) which running in raid1.
md0are run on
sdb1and are operational (
- hard drive
sda3is not visible on
md2anymore, and the hard drive is missing in the array (
[_U]denoted by the underscore).
So, for the reset of the guide, we assume:
sda is the broken drive,
md2 the broken RAID array.
You can get more information about the RAID with:
mdadm --detail /dev/md2
Running a quick smart-check displayed a lot of errors at our case:
Backups! Also prepare for a failover if you have the resources. The server has to shut down for at least a couple of minutes. In the worst case, the server might not boot instantly and you have to book with a rescue console.
Remove broken hard drive completely from all Arrays
If only one RAID is broken, removing the hard drive will only work, if you fail it on the other RAID partitions too:
mdadm --manage /dev/md1 --fail /dev/sda2 mdadm --manage /dev/md0 --fail /dev/sda1 # not needed, because md2 failed for us # mdadm --manage /dev/md2 --fail /dev/sda2
Now you can remove it:
mdadm /dev/md0 -r /dev/sda1 mdadm /dev/md1 -r /dev/sda2
/dev/sda was broken, so we decided to install GRUB boot loader onto
sudo grub-install /dev/sdb
Seemed to work, because after the change the server came back without problems.
Changing hard drive
Hetzner has a special support form for hard drive change. They ask for 2 things:
- A full SMART LOG
- The serial number of the broken drive or the serial number of the functional one (if the broken drives serial number can’t be retrieved).
1. SMART log
smartctl -x /dev/sda > smart.log # Or send yourself a mail if you have sendmail/nullmailer/.. smartctl -x /dev/sda | mail -s 'SMART Log' email@example.com
2. Serial Number
/sbin/udevadm info --query=property --name=sda | grep ID_SERIAL ## or hdparm -i /dev/sda | grep SerialNo
3. Do the replacement
- Fill out form, make an appointment
- Hope the server will come back
After server restart
Copy the boot sector back to the new hard drive:
sfdisk -d /dev/sdb | sfdisk /dev/sda
Put the drive back in the RAID arrays:
mdadm /dev/md0 -a /dev/sda1 mdadm /dev/md1 -a /dev/sda2 mdadm /dev/md2 -a /dev/sda3 grub-mkdevicemap -n
Wait for resync - took 6 hours for us. NERD-Cinema:
watch cat /proc/mdstat
- http://wiki.hetzner.de/index.php/Festplattenaustausch_im_Software-RAID (German)
- http://anton.dollmaier.name/2013/03/17/hdd-tausch-mit-software-raid1-bei-hetzner/ (German)
- http://www.joachim-neu.de/post/140/software-raid-mdadm-festplattenwechsel/ (German)
Image credit: Wikicommons