blob: 8f803654ce8597c3a0e44aec187405ddeaa27f0e (
plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
|
If disk errors are reported there may be H/W problems with the disk. Check dmesg for the following type of errors:
`[737961.360080] raid5_end_read_request: 64 callbacks suppressed`
`[737961.360087] md/raid:md125: read error corrected (8 sectors at 2722701256 on sdc1)`
`[737961.360093] md/raid:md125: read error corrected (8 sectors at 2722701264 on sdc1)`
`[737961.360095] md/raid:md125: read error corrected (8 sectors at 2722701272 on sdc1)`
`[737961.360098] md/raid:md125: read error corrected (8 sectors at 2722701280 on sdc1)`
`[737961.360100] md/raid:md125: read error corrected (8 sectors at 2722701288 on sdc1)`
`[737961.360102] md/raid:md125: read error corrected (8 sectors at 2722701296 on sdc1)`
`[737961.360105] md/raid:md125: read error corrected (8 sectors at 2722701304 on sdc1)`
`[737961.360107] md/raid:md125: read error corrected (8 sectors at 2722701312 on sdc1)`
`[737961.360109] md/raid:md125: read error corrected (8 sectors at 2722701320 on sdc1)`
`[737961.360112] md/raid:md125: read error corrected (8 sectors at 2722701328 on sdc1)`
`[742462.760119] md: md125: data-check done.`
Use SMART to investigate the hard drive.
`$ smartctl -i /dev/sdc`
The drive can be tested via the following command
`$ smartctl -t long /dev/sdc`
The long test will take a while, there is also a short test which can be performed.
The results can be viewed using:
`$ smartctl -l selftest /dev/sdc`
` `
`smartctl 6.2 2013-07-26 r3841 [x86_64-linux-3.10.0-327.36.3.el7.x86_64] (local build)`
`Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org`
``
`=== START OF READ SMART DATA SECTION ===`
`SMART Self-test log structure revision number 1`
`Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error`
`# 1 Extended offline Completed: read failure 40% 21930 2722703304`
Thus this needs to be replaced. To find it can use hdparm to get the serial number.
`$ hdparm -i /dev/sdc | grep SerialNo`
`Model=ST2000DM001-1ER164, FwRev=CC27, SerialNo=Z4Z5QAY5`
so before shutting down and replacing the drive mdadm is used to mark the drive as failed and it can
be removed from the raid.
`$ mdadm --manage /dev/md0 --fail /dev/sdc1`
`$ mdadm --manage /dev/md0 --remove /dev/sdc1`
Before the old drive is removed the partition table can be dumped using:
`$ sfdisk -d /dev/sdc > sdc.out`
Once the new drive has been swapped in, the old partition table can then be used on the new drive:
`$ sfdisk -d /dev/sdc < sdc.out`
The new disk is now ready to be included in the raid:
`$ mdadm --manage /dev/md125 --add /dev/sdc1`
Finally can monitor the progress of the rebuild using:
`$ cat /proc/mdstat`
|