Re: disk crash leads to I/O errors when accessing partition [FIXED]

Arno Wagner <arno@xxxxxxxxxxx> · Thu, 17 Feb 2011 14:26:16 +0100

It all depends on what the "crash" actually is. For a 
LUKS container, the header and keyslot-area need to be 
undameged. Without that, the data is permanently gone,
unless you have a LUKS header backup, see the FAQ.

With plain dm-crypt, damage is localized to the damaged
sectors, i.e. one damaged encrypted sector equals one 
damaged decrypted sector.

So, for dm-crypt, the encryption does not change the
damage to the disk. For LUKS, damaged data in the
LUKS header area gets amplified to kill the whole 
container. 

That said, if it is a pure software "crash" (i.e.
the disk did not crash at all, but you OS did),
it is reasonable to expect no more damage to the
data on disk than in the unencrypted case. The
LUKS header is not written except by LUKS maintenance
commands. Most lost LUKS partitions are due to
user error like creating a filesystem on top
of the container. Without header backup that 
usually wipes out all data. Also keep in mind that
modern HDDs have about a 5% physical failure rate
per year when treated well (!). Backup is 
non-optional, for encrypted data just like
it is non-optional for normal data.

Arno

On Thu, Feb 17, 2011 at 12:05:44AM -0800, Ross Boylan wrote:
> I'm happy to report that rebooting seemed to clear things up.  There
> were some incomplete transactions in both partitions, but fsck is clean
> on both now.
> 
> I'm still curious about one issue: if a disk crashes, is it reasonable
> to expect to be able to recover an encrypted device that had that disk
> (or part of the disk) underneath it?
> 
> I'm assuming one isn't, e.g., writing to LUKS headers at the time of the
> crash.
> 
> Ross
> On Wed, 2011-02-16 at 22:35 -0800, Ross Boylan wrote:
> > SUMMARY
> > After a power outage an encrypted partition is inaccessible, as is a
> > regular one, and perhaps the disk as a whole.  Is there a way to
> > recover, or at least diagnose?  The disk is in an external USB dock.
> > 
> > I'm hoping a reboot might help, but I'd like advice before I do anything
> > that might cause further damage.
> > 
> > As I investigated and discovered the problem extends beyond the
> > encrypted volume this query may be a bit off topic.  On topic, is it
> > reasonable to expect the encrypted partition to be recoverable in these
> > circumstances?  I'd appreciate off-topic advice as well :)
> > 
> > Please cc me directly to help me see this even though my mail system is
> > broken because of this problem.
> > 
> > DETAILS
> > The physical disk is a Western Digital WD-20EARS 2TB SATA 3GBPS (5400
> > RPM) mounted on Unitek SATA HDD Docking Station with Hub Y-1063.  It is
> > connected via USB to a Pentium 4 system running linux kernel
> > 2.6.26-2-686, Debian GNU/Linux, mostly lenny.
> > 
> > The disk has 2 partitions with a GPT.  The first partition is a spare;
> > the 2nd, larger, one is part of an LVM volume group that includes other
> > disks.  One logical volume (LV) serves as the raw partition for a luks
> > encrypted device which backs the mail spool.  Another LV serves directly
> > as a spare backup area.
> > 
> > The docking station has surge suppression only; the main computer went
> > through the brief power failure without shutting down.  Since then I get
> > I/O errors when I attempt to access the encrypted partition:
> > Wed Feb 16 04:57:47 PST 2011  Power failure.
> > Wed Feb 16 04:57:53 PST 2011  Running on UPS batteries.
> > Wed Feb 16 04:58:14 PST 2011  Mains returned. No longer on UPS
> > batteries.
> > Wed Feb 16 04:58:14 PST 2011  Power is back. UPS running on mains.
> > 
> > led to
> > Feb 16 04:57:46 corn kernel: [59153.907186] sd 2:0:0:0: [sdc] Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK,SUGGEST_OK
> > Feb 16 04:57:46 corn kernel: [59153.907186] end_request: I/O error, dev sdc, sector 345805680
> > Feb 16 04:57:46 corn kernel: [59153.907186] ReiserFS: dm-15: warning: zam-7001: io error in reiserfs_find_entry
> > Feb 16 04:57:46 corn kernel: [59153.907186] usb 5-3.1: USB disconnect, address 7
> > Feb 16 04:57:46 corn kernel: [59153.907186] ReiserFS: dm-15: warning: zam-7001: io error in reiserfs_find_entry
> > [last message repeats a lot.]
> > Feb 16 04:57:46 corn kernel: [59153.919189] Buffer I/O error on device dm-15, logical block 1591728
> > Feb 16 04:57:46 corn kernel: [59154.009858] hub 5-3:1.0: hub_port_status failed (err = -71)
> > Feb 16 04:57:46 corn kernel: [59154.009865] hub 5-3:1.0: connect-debounce failed, port 1 disabled
> > Feb 16 04:57:46 corn kernel: [59154.010093] hub 5-3:1.0: cannot disable port 1 (err = -71)
> > Feb 16 04:57:46 corn kernel: [59154.010108] usb 5-3: USB disconnect, address 4
> > Feb 16 04:57:46 corn kernel: [59154.010111] usb 5-3.2: USB disconnect, address 8
> > Feb 16 04:57:46 corn kernel: [59154.010308] usblp0: removed
> > Feb 16 04:57:46 corn cyrus/master[1966]: process 12320 exited, signaled to death by 7
> > Feb 16 04:57:47 corn kernel: [59154.227754] usb 5-4: USB disconnect, address 5
> > Feb 16 04:57:47 corn apcupsd[9495]: Power failure.
> > Feb 16 04:57:47 corn chipcardd[9196]: devicemanager.c: 3373: Changes in hardware list
> > Feb 16 04:57:49 corn kernel: [59157.066031] Buffer I/O error on device dm-15, logical block 7955
> > Feb 16 04:57:49 corn kernel: [59157.066031] lost page write due to I/O error on dm-15
> > Feb 16 04:57:49 corn kernel: [59157.066031] Buffer I/O error on device dm-15, logical block 7956
> > Feb 16 04:57:49 corn kernel: [59157.066031] lost page write due to I/O error on dm-15
> > Feb 16 04:57:49 corn kernel: [59157.067067] Buffer I/O error on device dm-15, logical block 7957
> > Feb 16 04:57:49 corn kernel: [59157.067072] lost page write due to I/O error on dm-15
> > Feb 16 04:57:49 corn kernel: [59157.067086] Buffer I/O error on device dm-15, logical block 7958
> > Feb 16 04:57:49 corn kernel: [59157.067090] lost page write due to I/O error on dm-15
> > Feb 16 04:57:49 corn kernel: [59157.067098] Buffer I/O error on device dm-15, logical block 7959
> > Feb 16 04:57:49 corn kernel: [59157.067103] lost page write due to I/O error on dm-15
> > Feb 16 04:57:49 corn kernel: [59157.067116] Buffer I/O error on device dm-15, logical block 7960
> > Feb 16 04:57:49 corn kernel: [59157.067120] lost page write due to I/O error on dm-15
> > Feb 16 04:57:49 corn kernel: [59157.067128] Buffer I/O error on device dm-15, logical block 7961
> > Feb 16 04:57:49 corn kernel: [59157.067132] lost page write due to I/O error on dm-15
> > Feb 16 04:57:49 corn kernel: [59157.067140] Buffer I/O error on device dm-15, logical block 7962
> > Feb 16 04:57:49 corn kernel: [59157.067144] lost page write due to I/O error on dm-15
> > Feb 16 04:57:49 corn kernel: [59157.073972] REISERFS: abort (device dm-15): Journal write error in flush_commit_list
> > Feb 16 04:57:49 corn kernel: [59157.073972] REISERFS: Aborting journal for filesystem on dm-15
> > Feb 16 04:57:53 corn apcupsd[9495]: Running on UPS batteries.
> > 
> > Similar errors repeat frequently throughout the day; they did not appear
> > before the power failure.
> > 
> > Diagnostic attempts:
> > corn:/# date; /etc/init.d/cyrus2.2 stop   # uses the encrypted partition
> > Wed Feb 16 20:55:53 PST 2011
> > Stopping Cyrus IMAPd: cyrmaster.
> > corn:/# umount /var/spool/cyrus/
> > corn:/# # note it is mounted on top of crypto
> > corn:/# fsck.reiserfs --check /dev/mapper/cyrspool3 
> > reiserfsck 3.6.19 (2003 www.namesys.com)
> > 
> > Will read-only check consistency of the filesystem
> > on /dev/mapper/cyrspool3
> > Will put log info to 'stdout'
> > 
> > Do you want to run this program?[N/Yes] (note need to type Yes if you
> > do):Yes
> > 
> > The problem has occurred looks like a hardware problem. If you have
> > bad blocks, we advise you to get a new hard drive, because once you
> > get one bad block  that the disk  drive internals  cannot hide from
> > your sight,the chances of getting more are generally said to become
> > much higher  (precise statistics are unknown to us), and  this disk
> > drive is probably not expensive enough  for you to you to risk your
> > time and  data on it.  If you don't want to follow that follow that
> > advice then  if you have just a few bad blocks,  try writing to the
> > bad blocks  and see if the drive remaps  the bad blocks (that means
> > it takes a block  it has  in reserve  and allocates  it for use for
> > of that block number).  If it cannot remap the block,  use badblock
> > option (-B) with  reiserfs utils to handle this block correctly.
> > 
> > bread: Cannot read the block (2): (Input/output error).
> > 
> > Aborted
> > 
> > # next device is an LVM logical volume, unencrypted
> > corn:/# umount /dev/daisy/bacula-backup
> > corn:/# date; e2fsck /dev/daisy/bacula-backup 
> > Wed Feb 16 21:55:05 PST 2011
> > e2fsck 1.41.3 (12-Oct-2008)
> > e2fsck: Attempt to read block from filesystem resulted in short read
> > while trying to open /dev/daisy/bacula-backup
> > Could this be a zero-length partition?
> > 
> > # finally, try the whole disk
> > # The volume group that includes it is still active
> > # although I've dismounted the 2 LVs based on the PV.
> > corn:/# fdisk /dev/sdc
> > 
> > Unable to open /dev/sdc
> > 
> > pvscan does not list an sdc, but does show an sdd which can only be the
> > external drive.
> > 
> > BACKGROUND
> > The cyrus spool was on sdb originally; it developed hardware problems.
> > I'm out of space in the case, and plugs on the UPS, and so I'm migrating
> > to sdc which is mounted externally w/o UPS.  A spare copy of my backups
> > are also on sdc, though my main backups are not.  I believe I could
> > recover the mail spool as of c 5 hours before the power failure if
> > necessary.
> > 
> 
> _______________________________________________
> dm-crypt mailing list
> dm-crypt@xxxxxxxx
> http://www.saout.de/mailman/listinfo/dm-crypt
> 

-- 
Arno Wagner, Dr. sc. techn., Dipl. Inform., CISSP -- Email: arno@xxxxxxxxxxx 
GnuPG:  ID: 1E25338F  FP: 0C30 5782 9D93 F785 E79C  0296 797F 6B50 1E25 338F
----
Cuddly UI's are the manifestation of wishful thinking. -- Dylan Evans

If it's in the news, don't worry about it.  The very definition of 
"news" is "something that hardly ever happens." -- Bruce Schneier 
_______________________________________________
dm-crypt mailing list
dm-crypt@xxxxxxxx
http://www.saout.de/mailman/listinfo/dm-crypt