Re: [PATCH v2] hfsplus: add journal replay

Vyacheslav Dubeyko <slava@xxxxxxxxxxx> · Thu, 13 Mar 2014 14:20:39 +0400

On Sun, 2014-03-09 at 23:50 +0100, Sergei Antonov wrote:

I postpone discussions about small topics and to continue discussion
about two principal topics: (1) structure declarations; (2) transaction
based nature of journal replay. It doesn't make sense for me to discuss
minor issues without consent about principal topics.

> My name is not "different author". My name is Vyacheslav Dubeyko. If you
> > want to insult me then you had achieved the goal.
> 
> No flame, please :).

You started to blame me. So, simply enjoy by blaming style from my side.
I will answer to you as I want.

> I checked your code and made a bugreport.

Your report is related to the issue that I don't re-read in my code a
volume header after journal replay. This bug can be fixed easily. I've
written to you about it yet.

> Mine is 5 files changed, 1009 insertions(+), 6 deletions(-).
> Yours is 9 files changed, 1559 insertions(+), 39 deletions(-).

I simply repeat below what I've written in previous e-mail.

My patch contains 1598 lines of code at whole. If I exclude: (1) several
big comments - 99 lines, (2) journal initialization functionality - ~192
lines, (3) "norecovery" mount option functionality - ~134 lines, then, I
will have about 1173 lines of code in my patchset.

So, 1015 lines of code vs. 1173 lines. It is not so big difference. And
if you adopt my comments ~100 lines (I believe that my comments are
really reasonable) and "norecovery" mount option ~100 lines of code then
the difference will vanish completely.

> You contradict yourself sometimes :). I remember when you yourself
> suggested changing Apple's name of a structure field. It was when we
> discussed another patch:
> ========
> What about shorter name as "subfolders" for both fields in struct
> hfsplus_inode_info and in struct hfsplus_cat_folder? I suppose it will
> be more informative and correct.
> ========
> Apple's name was "folderCount", my name was after it: "folder_count",
> you suggested a different, but indeed better, "subfolders". In my
> patch I also found several better (IMO) names.

Specification of HFS+ file system (Technical Note TN1150) doesn't define
this field. It simply declares it as reserved field:

struct HFSPlusCatalogFolder {
    SInt16              recordType;
    UInt16              flags;
    UInt32              valence;
    HFSCatalogNodeID    folderID;
    UInt32              createDate;
    UInt32              contentModDate;
    UInt32              attributeModDate;
    UInt32              accessDate;
    UInt32              backupDate;
    HFSPlusBSDInfo      permissions;
    FolderInfo          userInfo;
    ExtendedFolderInfo  finderInfo;
    UInt32              textEncoding;
    UInt32              reserved;
};

So, in such situation we have some freedom in field naming. Because
specification doesn't contain any description.

The "folderCount" name contradicts to kernel coding style. I suppose
that "subfolders" is much sensible and shorter name as "folder_count".
Moreover, you've accepted my suggestion.

> > (3) to distort a structure with comparing with Technical Note TN1150 (I
> > mean struct block_list_header, for example).
> 
> I'll explain.
> block_list_header is not good because it describes sequential block
> runs, not blocks. To avoid a double meaning of "block" I came up with:
>   struct block_list_header (not really a block is meant) -> struct hfs_jrnl_list
>   struct block_info (not really a block is meant) -> struct hfs_jrnl_run
> Also changed fields:
>   block_info::bnum (now comes the real block!) ->
> hfs_jrnl_run::first_block (the 1st block of the run)
>   block_info::bsize (size in blocks?) -> hfs_jrnl_run::byte_len (no,
> it is in bytes! make it clear)

I don't think that you've suggested better names. It is completely
obscure and not informative for me, personally.

And I always say NACK of your patch while we haven't naming of
structures and fields as it described in specification of HFS+
(Technical Note TN1150). Because specification is common point for
everybody who try to understand HFS+ internals and functionality. And if
your code doesn't comply with specification then everybody will have
troubles with understanding of the code. Simply write your own file
system if you want to name in your own way always. We have specification
for HFS+.

I prefer to leave mainline untouched then to have incorrect or obscure
code for HFS+.

Of course, it is possible to discuss every name but the common remark
from my side is: (1) your names are longer; (2) your names are obscure.

> > First of all, journal contains sequence of transactions. Technical Note
> > TN1150: "A group of related changes is called a transaction. When all of
> > the changes of a transaction have been written to their normal locations
> > on disk, that transaction has been committed, and is removed from the
> > journal.
> 
> This is about journaling changes, not about replay.
> 

This is related as journaling as journal replay.

> > Secondly, for example, you had sudden power-off. It means that last
> > transaction will be not finished.
> 
> You have a poor understanding of HFS+ journal. After sudden power-off
> the last transaction is good. The unfinished (not completely written)
> transaction may be the one that is beyond header->data_end. The header
> is updated only after it is completely on the disk. This is why header
> is 1 sector long - header update is atomic, and the journal is
> consistent any time.
> 
> If you've found a corrupted transactions it is not a sudden power-off,
> rather it is your bug :) or a bug in FS driver that wrote them. Wise
> handling is: cry about it using printk, leave the journal unchanged,
> mount for read-only. This is what my code does.

I insist on transaction-based journal replay. I don't think that I have
poor understanding of journaling at whole.

There are many possible reasons that transaction in journal can be
broken:
(1) You can't guarantee that initiated flush will succeed. As a result,
file system driver can think that all is OK with written data but really
storage can have troubled data.
(2) Even if data was written successfully then it doesn't mean that
these data will be read successfully because of different storage'
troubles. We live in not ideal world.

So, any transaction can be broken, potentially. Moreover, you can't
guarantee anything with last transaction after sudden power-off. As a
result, it makes sense to replay as much as possible valid transactions
before discovering of invalid one.

> > And, finally, if you firstly check all transactions then it means that
> > you need to read all of its in memory before real replaying. And, as far
> > as I can judge, you simply keep all sectors in memory before replay. For
> > example, your system has 512 MB of RAM and HFS+ volume has fully filled
> > journal with 512 MB in size. How lucky will you with journal replay on
> > such system without a swap?
> 
> Pretty lucky :) because:
> 1st I store data to write to sectors, not all journal.
> 2nd there is a feature of my code which significantly minimizes memory
> usage. Different data in journal is destined to same sectors, and I
> avoid allocating separate memory buffers for it. An awesome
> <linux/interval_tree_generic.h> data structure helps me do it, see
> hfsplus_replay_data_add().
> 
> To defend my approach more, I can make measurements of memory
> consumption for a 512 MB journal. Would you like me to?

There are many possible situation that you doesn't take into account:
(1) Journal can be 1 GB in size, for example. And some system can have
512 MB of RAM. And size of journal potentially can be much more,
potentially.
(2) You can't predict in what environment will work system and under
what memory pressure a HFS+ volume will be mounted. So, trying to read
all transactions in memory can be resulted with memory allocation
failure with significant probability. And such approach is very
dangerous way, from my point of view.

[TO RESUME]

I have such principal points:
(1) On-disk layout declarations should live in peace with Technical Note
TN1150;
(2) Journal replay should be transaction based. I insist namely on
transaction-based replay.

I always say NACK of your patch while we haven't naming of structures
and fields as it described in specification of HFS+ (Technical Note
TN1150) and if journal replay will be not transaction-based.

I prefer to leave mainline untouched then to have incorrect or obscure
code for HFS+.

Thanks,
Vyacheslav Dubeyko.

--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html