summaryrefslogtreecommitdiff
path: root/vimwiki/Replacing A Failed Disk in a mdadm RAID.md
diff options
context:
space:
mode:
authorVito Graffagnino <vito@graffagnino.xyz>2020-09-08 18:10:49 +0100
committerVito Graffagnino <vito@graffagnino.xyz>2020-09-08 18:10:49 +0100
commit3b0142cedcde39e4c2097ecd916a870a3ced5ec6 (patch)
tree2116c49a845dfc0945778f2aa3e2118d72be428b /vimwiki/Replacing A Failed Disk in a mdadm RAID.md
parent8cc927e930d5b6aafe3e9862a61e81705479a1b4 (diff)
Added the relevent parts of the .config directory. Alss add ssh config
Diffstat (limited to 'vimwiki/Replacing A Failed Disk in a mdadm RAID.md')
-rw-r--r--vimwiki/Replacing A Failed Disk in a mdadm RAID.md63
1 files changed, 63 insertions, 0 deletions
diff --git a/vimwiki/Replacing A Failed Disk in a mdadm RAID.md b/vimwiki/Replacing A Failed Disk in a mdadm RAID.md
new file mode 100644
index 0000000..8f80365
--- /dev/null
+++ b/vimwiki/Replacing A Failed Disk in a mdadm RAID.md
@@ -0,0 +1,63 @@
+
+If disk errors are reported there may be H/W problems with the disk. Check dmesg for the following type of errors:
+
+`[737961.360080] raid5_end_read_request: 64 callbacks suppressed`
+`[737961.360087] md/raid:md125: read error corrected (8 sectors at 2722701256 on sdc1)`
+`[737961.360093] md/raid:md125: read error corrected (8 sectors at 2722701264 on sdc1)`
+`[737961.360095] md/raid:md125: read error corrected (8 sectors at 2722701272 on sdc1)`
+`[737961.360098] md/raid:md125: read error corrected (8 sectors at 2722701280 on sdc1)`
+`[737961.360100] md/raid:md125: read error corrected (8 sectors at 2722701288 on sdc1)`
+`[737961.360102] md/raid:md125: read error corrected (8 sectors at 2722701296 on sdc1)`
+`[737961.360105] md/raid:md125: read error corrected (8 sectors at 2722701304 on sdc1)`
+`[737961.360107] md/raid:md125: read error corrected (8 sectors at 2722701312 on sdc1)`
+`[737961.360109] md/raid:md125: read error corrected (8 sectors at 2722701320 on sdc1)`
+`[737961.360112] md/raid:md125: read error corrected (8 sectors at 2722701328 on sdc1)`
+`[742462.760119] md: md125: data-check done.`
+
+Use SMART to investigate the hard drive.
+
+`$ smartctl -i /dev/sdc`
+
+The drive can be tested via the following command
+
+`$ smartctl -t long /dev/sdc`
+
+The long test will take a while, there is also a short test which can be performed.
+The results can be viewed using:
+
+`$ smartctl -l selftest /dev/sdc`
+` `
+`smartctl 6.2 2013-07-26 r3841 [x86_64-linux-3.10.0-327.36.3.el7.x86_64] (local build)`
+`Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org`
+``
+`=== START OF READ SMART DATA SECTION ===`
+`SMART Self-test log structure revision number 1`
+`Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error`
+`# 1 Extended offline Completed: read failure 40% 21930 2722703304`
+
+Thus this needs to be replaced. To find it can use hdparm to get the serial number.
+
+`$ hdparm -i /dev/sdc | grep SerialNo`
+`Model=ST2000DM001-1ER164, FwRev=CC27, SerialNo=Z4Z5QAY5`
+
+so before shutting down and replacing the drive mdadm is used to mark the drive as failed and it can
+be removed from the raid.
+
+`$ mdadm --manage /dev/md0 --fail /dev/sdc1`
+`$ mdadm --manage /dev/md0 --remove /dev/sdc1`
+
+Before the old drive is removed the partition table can be dumped using:
+
+`$ sfdisk -d /dev/sdc > sdc.out`
+
+Once the new drive has been swapped in, the old partition table can then be used on the new drive:
+
+`$ sfdisk -d /dev/sdc < sdc.out`
+
+The new disk is now ready to be included in the raid:
+
+`$ mdadm --manage /dev/md125 --add /dev/sdc1`
+
+Finally can monitor the progress of the rebuild using:
+
+`$ cat /proc/mdstat`