hfsplus corruption, failed fsck, journalling and zero'ing extent record on delete

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Vyacheslav,

I mentioned briefly some days ago that I managed to corrupt an HFS+ paritition while experimenting with the journalling code, to the extent that fsck_hfs/fsck.hfsplus (Apple's diskdev_cmds tool) refuses to fix. And that partition, with the unmodified module used ready-only can get the kernel to BUG() "reliably" by just doing "du" on it (and I was thinking whether BUG()'ing on corrupted disk is a bug to file...).

With a lot of reading-up, and some C, some python, in the end, some dd if=/of= and a hex editor with a calculator, I managed to get fsck to go successfully again. So now I have some idea of how to stress HFS+ for fsck to refuse to fix. The recipe is something like this:

- a disk with a lot of small files and quite full. (I have a 105 GB partition, 75% full, 600,000 leaf records in the catalog btree, or 400,000 inodes depends on how you count... untar'ing a few kernel trees under it should do)

- try to delete the small files one by one very quickly. (I did essentially 
    cat list | perl -ne 'chomp; if (-f $_) {unlink $_;}' 
, after comparing the netgear code with stock kernel's and generating an "uninteresting" file list).

- probably SMP system + reasonably amount of memory for disk cache (dual core + 2GB RAM).

Under that combination of conditions, it may be possible to stress HFS+ in a way such that:

1. the Catalog B-Tree needs to be substantially re-written/re-located, rather than being updated in-place. i.e. a large number of changes of leaf-records in a short time.

2. the re-written/re-located part of the Catalog B-Tree needs to re-use the extents which are recently "vacated" by the deletion. i.e. need a fairly full disk to see this.

It seems that when files are deleted, leaf records are made only *partially* invalid, and a partially up-to-date new Catalog B-Tree is written, and then further updates happen in-place to bring to whole thing consistent (the extent bitmap & volume headers, etc) ... but in my case, for whatever reason, things was interrupted in the middle.

So I had a new Catalog B-Tree sitting on the overlapping extents as partially deleted file records. fsck thinks the files need to be "undeleted", but cannot read the B-Tree without error on those partially invalid leaf records, and cannot fix either of them.

I pieced together the Catalog B-Tree (in 3 fragments - actually it was 4 to begin with, fsck in rebuild-Catalog mode gives me a new one which is "differently" broken - i.e. overlap with another set of ~140 partially deleted records), found all the overlaping leaf records - ~140 of them in 17 leaf nodes , used a hex editor to zero'ed the extents and file sizes by hand, and voila, fsck was a lot happier afterwards.

- the linux hfsplus driver probably *should* zero' the corresponding extent descriptor in the leaf record when a file is deleted?
I seem to remember years ago between ext2 and ext3, one notable/advertised difference of ext3 is that ext3 zero's inodes on delete (and make it difficult for low-level data recovery) - and there was a reason for it... I should read that up... disk formatting & file deleting under Mac OS X seems to take much longer, compared to under linux - do they zero' records *fully* on format/delete?

- whether this possibility of corruption is related to the experimental journalling code - it does work correctly under light use - i.e. fsck is fully happy after unmount.

- HFS+ is probably one of the rare minority of file systems where critical parts of it, the Catalog B-Tree, (and the other 3-4(?) B-Tree), are regular files and subjected to the same fragmentation and competition from normal file usage?! (instead of being in "dedicated" allocated areas, and also having multiple copies).

- oh, one last thing: there was one later version of the journalling code from netgear, which copied a lot of files from ext3 (the jbd part). Maybe they know about HFS+ needing a kernel demon to do more regular sync to disk than others...

More experiments... fixing things which fsck cannot, makes experimenting easier...

Hin-Tak

--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]
  Powered by Linux