Managing a software RAID array using mdadm on Linux systems provides redundancy and improved performance for your data storage. However, hardware failures can still occur, necessitating the replacement of failed disks to maintain the integrity and performance of your RAID array. This guide provides a comprehensive, step-by-step process to replace a failed mirror disk in a software RAID array using mdadm on Ubuntu 22.04.
Table of Contents
- Prerequisites
- 1. Identify the Failed Disk
- 2. Remove the Failed Disk from the RAID Array
- 3. Physically Replace the Disk
- 4. Add the New Disk to the RAID Array
- 5. Monitor the Resynchronization Process
- 6. Verify the RAID Array Status
- 7. Conclusion
Prerequisites
Before proceeding with the disk replacement, ensure you have the following:
- Administrative access to the server.
- A new disk of equal or greater size to replace the failed one.
- Basic understanding of RAID concepts and Linux command-line operations.
- Backup of critical data (recommended but not mandatory for this procedure).
1. Identify the Failed Disk
First, you need to determine which disk in your RAID array has failed.
a. Check RAID Array Status
sudo mdadm --detail /dev/md0
Replace /dev/md0 with your RAID device if different. Look for devices marked as faulty or removed.
b. List All RAID Arrays
cat /proc/mdstat
This command provides a summary of all active RAID arrays and their status.
c. Identify the Failed Disk
From the output, identify the device that is marked as failed. For example:
md0 : active raid1 sdb1[1] sda1[0](F)
2093056 blocks super 1.2 [2/1] [F]
In this example, sda1 is marked as F (failed).
2. Remove the Failed Disk from the RAID Array
Once you’ve identified the failed disk, proceed to remove it from the RAID array.
sudo mdadm --manage /dev/md0 --remove /dev/sda1
Replace /dev/md0 with your RAID device and /dev/sda1 with the failed disk.
3. Physically Replace the Disk
After removing the failed disk logically, it’s time to replace it physically.
Shut down the server gracefully: sudo shutdown -h now
- Physically remove the failed disk (
/dev/sdain our example). - Insert the new disk into the same slot or bay.
- Power on the server.
Ensure the new disk is recognized by the system:sudo fdisk -l
4. Add the New Disk to the RAID Array
With the new disk installed, add it to the RAID array.
a. Partition the New Disk
Create a partition on the new disk that matches the existing RAID configuration.
sudo fdisk /dev/sda
Inside fdisk, perform the following steps:
- Type
nto create a new partition. - Accept the default partition number and sectors.
- Type
tto change the partition type. - Enter the RAID type code (usually
fdfor Linux RAID). - Type
wto write the changes and exit.
b. Add the New Partition to the RAID Array
sudo mdadm --manage /dev/md0 --add /dev/sda1
Replace /dev/md0 with your RAID device and /dev/sda1 with the new partition.
5. Monitor the Resynchronization Process
After adding the new disk, the RAID array will begin resynchronizing.
watch cat /proc/mdstat
This command will display real-time progress of the resynchronization. Wait until the process completes successfully.
6. Verify the RAID Array Status
Once the resynchronization is complete, verify the status of the RAID array.
sudo mdadm --detail /dev/md0
Ensure that all devices are active and no disks are marked as faulty.
7. Conclusion
Replacing a failed mirror disk in a software RAID array managed by mdadm is a straightforward process that ensures data redundancy and system reliability. By following the steps outlined in this guide, you can efficiently replace failed disks and maintain the integrity of your RAID array.
Best Practices:
- Regularly monitor your RAID arrays to detect and address failures promptly.
- Maintain up-to-date backups to prevent data loss in case of multiple disk failures.
- Use disks of the same size and specifications to ensure optimal RAID performance.
For more detailed information on mdadm and RAID management, refer to the official mdadm manual and GNU mdadm documentation.

