Tag Archives: smartctl

SCT Error Recovery Control in RAID drives

SCT ERC (Smart Command Transfer Error Recovery Control) controls how much time a drive spends trying to fix read/write errors for defect sectors. After that time has expired, the drive just gives up on fixing the problem itself and reports a read/write failure to the RAID controller. This prevents the RAID array from being degraded just because one drive has a single defect sector. RAID recovery might take a long time and stresses all remaining drives.

Linux's mdraid handles the ERC timeout as follows:
- Read missing data from other RAID devices
- Overwrite bad block
- Reread bad block
If overwrite or reread of bad block fails again, then finally the drive will be disabled and the array will be degraded.

Hard drive manufacturers have different names for this error recovery feature:
- Western Digital: TLER (For WD Re drives, this feature cannot be disabled, and timeout is fixed to 7 seconds, s. here http://support.wdc.com/KnowledgeBase/answer.aspx?ID=1478. For WD Red drives, this feature can be configured.)
- Seagate: ERC (e.g. for Barracuda ES and ES.2 family SATA enterprise drives, s. here http://knowledge.seagate.com/articles/en_US/FAQ/203991en?language=en_US)
- Samsung, Hitachi: CCTL

The drive's timeout should be lower than the RAID controller timeout. Check the current timeout of your disk drive:

$ smartctl -l scterc /dev/sda
SCT Error Recovery Control command not supported
(If ERC is not supported by the drive, it might be a cheap desktop model.)

Set disk read and write timeout to 20 seconds:

$ smartctl -l scterc,200,200 /dev/sda

Check mdraid controller timeout of Linux's software raid:

$ cat /sys/block/sda/device/timeout