On Fri, 2016-07-22 at 09:51 -0400, Jeff Moyer wrote: > Dmitry Monakhov <dmonakhov@xxxxxxxxxx> writes: > > > But once I rewrite this block, problem goes away. > > #xfs_io -c "pwrite -S 0x0 $((80069000/2))k 4k" -d /dev/sda > > > > Now I can read it w/o any errors and smartctl is happy > > #smartctl -t short /dev/sda > > #smartctl -l selftest /dev/sda > > Num Test_Description Status Remaining > > LifeTime(hours) LBA_of_first_error > > # 1 Short offline Completed without error 00% > > 4683 - > > > > So my disk is not dead right? > > Correct. > > > Why the hell HDD fail read from very beginning > > Is this because HDD firmware detect internal crcXX sum corruption? > > Yes. > > > How this can happen? Is this because of power failure? > > Could be. If power was cut in the middle of a write, this can > happen. There are other causes, though (bit rot, for example). > > > AFAIK standard guarantees that sector will be updated atomically. > > No, the SCSI and ATA standards most certainly do not guarantee that! > NVMe is the only standard I know of that requires Atomic Write Unit > Power Fail to be at lest one sector. The mechanics of the drive mostly ensure atomic updates on the physical block level. You definitely get either the old data, the new data or an unreadable sector. The latter is a pretty rare event because surviving power usually ensures the writes complete, but it's not guaranteed. > > But it happens! Please guide me how to fix such problems in > > general. > > You fixed it. Overwriting the sector will clear the error. Actually only "may clear the error" depending on what happened. If the hamming codes on the sector itself just failed (because of a torn write due to power fail) then a rewrite simply re-fixes the sector in situ. Sometimes the magnetic substrate of the track is worn (so the sector is permanently damaged) and the re-write forces a reallocation. If that's happening to your disk then eventually it will fail irrecoverably when the reallocation table is full. You can monitor this with the smart Reallocated_Event_Count. James -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html