A Fail event had been detected on md device /dev/md
It could be related to component device /dev/
Faithfully yours, etc.
I like the 'Faithfully yours' bit at the end!
If you'd prefer that
# mdadm.conf written out by anaconda
DEVICE partitions
MAILADDR raid-alert
PROGRAM
ARRAY /dev/md0 level=raid1 num-devices=2 uuid=dd2aabd5:fb2ab384:cba9912c:df0b0f4b
ARRAY /dev/md1 level=raid1 num-devices=2 uuid=2b0846b0:d1a540d7:d722dd48:c5d203e4
ARRAY /dev/md2 level=raid1 num-devices=2 uuid=31c6dbdc:414eee2d:50c4c773:2edc66f6
Only one program name can be given. When an event is detected, that program will be run with three arguments: the event, the RAID device, and (optionally) the RAID element. If you wanted a verbal announcement to be made, for example, you could use a script like this:
#!/bin/bash
#
# mdadm-event-handler :: announce RAID events verbally
#
# Set up the phrasing for the optional element name
if [ '$3' ]
then
E=', element $3'
fi
# Separate words (RebuildStarted -> Rebuild Started)
$T=$(echo $1|sed 's/([A-Z])/ 1/g')
# Make the voice announcement and then repeat it
echo 'Attention! RAID event: $1 on $2 $E'|festival --tts
sleep 2
echo 'Repeat: $1 on $2 $E'|festival --tts
When a drive fails, this script will announce something like 'Attention! RAID event: Failed on
6.2.1.6. Setting up a hot spare
When a system with RAID 1 or higher experiences a disk failure, the data on the failed drive will be recalculated from the remaining drives. However, data access will be slower than usual, and if any other drives fail, the array will not be able to recover. Therefore, it's important to replace a failed disk drive as soon as possible.
When a server is heavily used or is in an inaccessible locationsuch as an Internet colocation facilityit makes sense to equip it with a
To create a hot spare when a RAID array is initially created, use the -x argument to indicate the number of spare devices:
# mdadm --create -l
mdadm: array /dev/md0 started.
$ cat /proc/mdstat
Personalities : [raid1] [raid5] [raid4]
md0 : active raid1 sdf1[2](S) sdc1[1] sdb1[0]
62464 blocks [2/2] [UU]
unused devices: <none>
Notice that
If an active element in the array fails, the hot spare will take over automatically:
$ cat /proc/mdstat
Personalities : [raid1] [raid5] [raid4]
md0 : active raid1 sdf1[2] sdc1[3](F) sdb1[0]
62464 blocks [2/1] [U_]
[=>...................] recovery = 6.4% (4224/62464) finish=1.5min speed=603K/sec
unused devices: <none>
When you remove, replace, and readd the failed drive, it will become the hot spare:
# mdadm --remove
mdadm: hot removed /dev/sdc1
...(Physically replace the failed drive)...
# mdadm --add
mdadm: re-added /dev/sdc1
# cat /proc/mdstat
Personalities : [raid1] [raid5] [raid4]
md0 : active raid1 sdc1[2](S) sdf1[1] sdb1[0]
62464 blocks [2/2] [UU]