July 2007
badblocks
We have error messages on server iwi202
that look like shown in .
The problem repeats twice in quick succession (8 seconds between occurrences) about every twelve minutes, but doesn't stick to fixed post-the-hour times, so we don't believe a cron job causes it.
The machine does react more slowly than usual.
I will move important processes off the machine, but some minor items may stay on it, and I want to see if I can get rid of the problem by making the ReiserFS stop using the single block that is causing errors.
Jul 17 09:11:48 src@iwinnn kernel: scsi1: ERROR on channel 0, id 0, lun 0, CDB: 0x28 00 0a b5 f0 fa 00 00 10 00
Jul 17 09:11:48 src@iwinnn kernel: Info fld=0xab5f0fd, Current sd08:09: sns = f0 3
Jul 17 09:11:48 src@iwinnn kernel: ASC=11 ASCQ= 0
Jul 17 09:11:48 src@iwinnn kernel: Raw sense data:0xf0 0x00 0x03 0x0a 0xb5 0xf0 0xfd 0x0a 0x00 0x00 0x00 0x00 0x11 0x00 0xe4 0x80 0x00 0x86
Jul 17 09:11:48 src@iwinnn kernel: I/O error: dev 08:09, sector 104206368
Jul 17 09:11:56 src@iwinnn kernel: scsi1: ERROR on channel 0, id 0, lun 0, CDB: 0x28 00 0a b5 f0 fa 00 00 08 00
Jul 17 09:11:56 src@iwinnn kernel: Info fld=0xab5f0fd, Current sd08:09: sns = f0 3
Jul 17 09:11:56 src@iwinnn kernel: ASC=11 ASCQ= 0
Jul 17 09:11:56 src@iwinnn kernel: Raw sense data:0xf0 0x00 0x03 0x0a 0xb5 0xf0 0xfd 0x0a 0x00 0x00 0x00 0x00 0x11 0x00 0xe4 0x80 0x00 0x86
Jul 17 09:11:56 src@iwinnn kernel: I/O error: dev 08:09, sector 104206368
The sector
that causes errors -104206368- is located in /dev/sda9
, which is mounted as /var
.
I could run badblocks on the entire disk
if I put the machine in single-user mode and unmounted /var, but I'd rather be as unobtrusive as possible, as I'll see notifications of bad sectors turning up in the logs anyway. According to the badblocks
manual, I can say:
badblocks -c<blocks-at-a-time> <device> <end-block> <start-block> -i <former-badblocks-report>
Badblock counts in blocks of 1024 bytes, whereas we know the location of the bad sector in 512-byte sectors.
So we compute the location of the sector in blocks:
echo -e "104206368\n2\n/\np"|dc
, which yields
52103184
. Then we issue the command to check the partition: badblocks -c64 /dev/sda9 52103222 52103152 |tee ~/bad_blocks.dev.sda9 A few blocks after our culprit appear to be bad as well:
52103184
52103185
52103186
52103187