How to replace a failed drive in a Raid 1 software Raid configuration?

The initial step in this process is to identify which RAID Arrays have failed. For that we have to check the status of the Raid by doing ‘cat /proc/mdstat’.

A good Raid mirrored partitions will show the result as given below.

[servermyserver]# cat /proc/mdstat

Personalities:[raid1]

read_ahead 1024 sectors

md2:active raid1 sdb3[1] sda3[0]

50304 blocks [2/2] [UU]

md1:active raid1 sdb2[1] sda2[0]

29632 blocks [2/2] [UU]

md0:active raid1 sdb1[1] sda1[0]

24576 blocks [2/2] [UU]

This is reflected by the fact that each mirrored partition is being displayed by [UU] – each “U” is a good partition.

To identify if a RAID Array is Good or Failed look at the string containing [UU]. Each “U” representsanhealthy partition in the RAID Array. If you see

[UU] then the RAID Array is healthy. If you see a missing “U” like [_U] then the RAID Array is degraded or faulty.

A faulty Raid mirrored partitions will show a result something like this.

# [servermyserver]# cat /proc/mdstat

Personalities:[raid1]

read_ahead 1024 sectors

md2:active raid1 sdb3[1] sda3[0]

50304 blocks [2/2] [UU]

md1:active raid1 sdb2[1] sda2[0]

29632 blocks [2/2] [U_]

md0:active raid1 sdb1[1] sda1[0]

24576 blocks [2/2] [UU]

From the aboveout putwe can see that RAID Array “md1?ismissing a “U” and is degraded or faulty.

Lets have a closer look at the failed Raid array “md1”

md1:active raid1 sdb2[1] sda2[0]

29632 blocks [2/2] [U_]

whichindicates that “sda2” has failed and sdb2 is Good.

Removing the failed partition(s) and disk:

Before we can physically remove the hard drive from the system we must first “fail” thedisks partition(s)from all RAID Arrays that they belong to. Even

thoughonly partition /dev/sda2 or RAID Array md1 has failed, we must manually fail all the other /dev/sda# partitions that belong to RAID Arrays, beforewe

canremove the hard drive from the system.

Now we will fail the disk partitions using the following command.

mdadm–manage /dev/md1 –fail /dev/sda2

We have to repeat this command for each partition changing /dev/md# and /dev/sdb# in the above command to match the output from “cat /proc/mdstat”.

like: # mdadm –manage /dev/md0 –fail /dev/sda1 etc.. .

Removing the failed partitions

Now, all the partitions are failed and can be removed from the RAID arrays.

Command to follow.

mdadm–manage /dev/md1 –remove /dev/sda2

We have to repeat this command for each partition changing /dev/md# and /dev/sdb# in the above command to match the output from “cat /proc/mdstat”.

Like:

# mdadm –manage /dev/md0 –remove /dev/sda1

Power off the system and physically replace the hard drive and power on (since it is a Software RAID)

# shutdown -h now

How to add the new disk to the RAID Array

The new hard disk has been successfully added. Now, we have to create the exact same partition as that of # /dev/sda

We can do that using the following command

# sfdisk -d /dev/sda | sfdisk /dev/sdb

We can make sure if both the hard drives having the same partitions using

# fdisk -l

Then follow the below commands to add the partition back to the RAID array using the command “mdadm”

# mdadm –manage /dev/md0 –add /dev/sda1 (Repeat them for each md# and sda#)

servermyserver:~# mdadm –manage /dev/md1 –add /dev/sda2

mdadm: re-added /dev/sda2

Follow this for md0 and md2 like

server1:~# mdadm –manage /dev/md0 –add /dev/sda1

mdadm: re-added /dev/sda1

Check if the partitions are beingsynchronisedusing ” cat /proc/mdstat “. It will show the current status of the synchronization process.

#[servermyserver]# cat /proc/mdstat

Personalities:[raid1]

read_ahead 1024 sectors

md2:active raid1 sdb3[1] sda3[0]

50304 blocks [2/2] [UU]

md1:active raid1 sdb2[1] sda2[0]

29632 blocks [2/1] [U_]

[>………………..] recovery= 1.8% (179/29632) finish=193.6min speed=81086K/sec

md0:active raid1 sdb1[1] sda1[0]

24576 blocks [2/2] [UU]

After a successful synchronization raid will be back to normal. You verify it by checking the status of ” cat /proc/mdstat “.

Check the status of md1 to see if all are good.

[servermyserver]# cat /proc/mdstat

Personalities:[raid1]

read_ahead 1024 sectors

md2:active raid1 sdb3[1] sda3[0]

50304 blocks [2/2] [UU]

md1:active raid1 sdb2[1] sda2[0]

29632 blocks [2/2] [UU]

md0:active raid1 sdb1[1] sda1[0]

24576 blocks [2/2] [UU]

After all are fine and recovery has completed, install grub on the drives

About the Author: admin

Game Server Issues: ‘Connection to Server Lost’ is Fixed

How to Fix WordPress White Screen of Death?

How To Set Up DKIM – A Comprehensive Guide for DKIM Setup

How to Fix: Could Not Connect to Server in FileZilla [Solved]

Leave A Comment Cancel reply

How to replace a failed drive in a Raid 1 software Raid configuration?

Share This Story, Choose Your Platform!

About the Author: admin

Related Posts

Game Server Issues: ‘Connection to Server Lost’ is Fixed

How to Fix WordPress White Screen of Death?

How To Set Up DKIM – A Comprehensive Guide for DKIM Setup

How to Fix: Could Not Connect to Server in FileZilla [Solved]

Leave A Comment Cancel reply