Re: [PATCH 00/14] hfsplus: introduce journal replay functionality

Hin-Tak Leung <htl10@xxxxxxxxxxxxxxxxxxxxx> · Sun, 19 Jan 2014 01:50:41 +0000 (GMT)

------------------------------
On Thu, Jan 9, 2014 06:12 GMT Vyacheslav Dubeyko wrote:

>Hi Hin-Tak,
>
>On Thu, 2013-12-26 at 14:26 -0800, Andrew Morton wrote:
>> On Thu, 26 Dec 2013 15:57:45 +0000 (GMT) Hin-Tak Leung <htl10@xxxxxxxxxxxxxxxxxxxxx> wrote:
>> 
>> > It is quite a co-incidence - I have also spent some hours in the last few days rebasing the netgear derived patch set bit-rotting in my hard disk. I have a quick diff and it seems that you have drawn some ideas but yours is largely an independent effort. Anyway, here are a few things from where I am. 
>> > 
>> > - It appears that you don't treat the special files (catalog, etc) special? netgear's does; but i think there is a mistake in that they don't consider them ever getting fragmented, so they were not journaling those files properly. 
>> > 
>> > - I am still nowhere near figuring out what's the issue with just runing du on one funny volume i have. The driver gets confused after a few du's but the disk always fsck clean. 
>> > 
>> > - I see I still have one out-standing patch not yet submitted - the one about folder counts on case-sensitive volumes. 
>> > 
>> > I'll try to spend some time reading your patches and maybe even try them out. Will write again. 
>> 
>> Thanks, useful!
>> 
>> Vyacheslav, I'll duck the patchset pending Hin-Tak's review.  Please
>> cc me on the later resend.
>
>How do you feel about the patchset? What opinion do you have? Could you
>share your opinion about patchset?

Tested-by: Hin-Tak Leung <htl10@xxxxxxxxxxxxxxxxxxxxx>

I have given the patch set a bit of light usage, and it fsck'ed clean afterwards.
So that's the minimum it should do.I also see that you have spent substantial
amount of effort in verifying and checking the validity and sanity of the journal
itself (which Netgear certainly didn't do). And thanks for the good work! That part
at least is very desirable and overdue and should land in the kernel soon.

About documentation and the new "norecovery" option. Should "norecovery"
imply read-only (and also state clearly so)? The meaning of the "force" option
also needs some updating - if we play back journal, then unclean journalled
volumes would be mounted read-write after verify/validating and journal playback,
correct? I see there is a need for "norecovery" in addition to "read-only" (in the typical
convention elsewhere, the latter will occasionally write to a journalled fs,
in the case of journal playback - so "norecovery" is "really no write, not
a single byte, I mean it, don't even playback journal"), but I think norecovery
should imply read-only, unless in combination with "force"?

A somewhat minor thing - I see a few instances of
'pr_err("journal replay is failed\n");' - the "is" isn't needed. Just English usage.

Now it still worries me that whichever way we implements journalling in the linux
kernel, it may be correct according to spec, but doesn't inter-op with
Apple's. This is especially relevant since a substantial portion of users
who wants hfs+ journalling in their linux kernel is because they have an
intel Mac and they are dual-booting. Now we have 3 implementations - apple's,
this, and Netgear's. Even if we discount Netgear's, while each of them 
will create a journal during its normal mode of operation, will playback a journal
of its own creation, we really need to test the case of playback of journals
made by the Mac OS X darwin kernel... because I can think of this happening:
somebody has a dual-boot machine, you pull the plug while it is in Mac OS X,
and the boot-loader is configured to go into linux by default after a short time-out
(or vice versa).

So again, I must thank you for spending the effort on verifying and validating the
journal entries.

On separate/somewhat unrelated matters, I came across this bug report in
the recent activities:

https://bugs.launchpad.net/ubuntu/+source/hfsprogs/+bug/680606

I think we addressed both of the 2nd and 3rd entries:
   errors ("Invalid volume file count", "Invalid volume free block count")
   error ("Unused node is not erased")

The first entry looks suspiciously like the problem which I have/had:
   (restore a large directory from time-machine)
   Invoke "rm -r" targeting the directory
   - segfault occurs
   - Various applications freeze, and it's impossible to cleanly shut down
   - OS X Disk Utility reports that the disk can't be repaired

So the disk I manually edited and fixed, still can get the kernel confused (i.e
the kernel just get confused, if one unmount and reboot, the disk is fsck-clean still),
just by repeatedly running "du". I found that I need to put the system under
some memory stress: I can run "du" for a dozen of times on an idle system
freshly rebooted, but as soon as I have google-chrome running with a few browser
windows opening (or mozilla/firefox). The kernel driver can get confused if one transverse
the file system quickly (with a du, or in the above case, with an r"m -rf large directory"),
especially the system is relatively loaded.

Oh, there is still the folder count issue with case-sensitive HFS+ (which AFAIK, just a
convenience for some file system transversal usage, not actually of any critical function),
but that's a relatively harmless issue. We can deal with it at some point.

Regards,
Hin-Tak

--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html