答复: [PATCH v2] jbd2: fix descriptor block checksum failed after format with lazy_journal_init=1

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I apologize for the code format problem, please ignore this email list, I will send another one tomorrow.

-----邮件原件-----
发件人: 常凤楠
发送时间: 2020年9月9日 16:52
收件人: 'Andreas Dilger' <adilger@xxxxxxxxx>
抄送: tytso@xxxxxxx; linux-ext4@xxxxxxxxxxxxxxx; jack@xxxxxxxx; Darrick J . Wong <darrick.wong@xxxxxxxxxx>
主题: 答复: [PATCH v2] jbd2: fix descriptor block checksum failed after format with lazy_journal_init=1


Thank you for your comments, I have modified the code.
I don't fully understand what you mean about "ri_commit_block".
I modified a version first anyway, please review it again.

[PATCH v2] jbd2: avoid transaction reuse after reformatting When format ext4 with lazy_journal_init=1, it is possible that the previous transaction will be used again during jbd2 recovery.
If you format with lazy_journal_init=1 first time, after mount a short time, you reboot machine, the layout of jbd2 may be like this:

 Before format         After fomrat with lazy_journal_init=1
==================   ========================================================
previous journal          case 1 journal        case 2 journal        case 3 journal
==================   ==================   ===============    ===============
[journal superblock]     [journal superblock]    [journal superblock]   [journal superblock]
[ transactions          [ new transactions    [new transactions     [ new transactions
    ....              ....          ...       ..
      ....        ]           ....       ]           ....              ..
[ descriptor block       [ descriptor block           ....   ]         ..
| data blocks       | data blocks         | data blocks          ..
    ....                ....             ....       ..
      ....                     ....      |              ....   |             ..  ]
|  commit block  ]     |   commit block ]    | commit block   ]    |   commit block ]
[ more transactions      [ more transactions    [ more transactions    [ more transactions
     ....                    ....                    ....                ....
       ....       ]            ....       ]            ....     ]            ....     ]

In case 1 and 3,will be recovery failed.


fs/jbd2/recovery.c | 62 ++++++++++++++++++++++++++++++++++++++++++++++++------
 1 file changed, 55 insertions(+), 7 deletions(-)

diff --git a/fs/jbd2/recovery.c b/fs/jbd2/recovery.c index a4967b27ffb6..d6fa68c830d6 100644
--- a/fs/jbd2/recovery.c
+++ b/fs/jbd2/recovery.c
@@ -33,6 +33,8 @@ struct recovery_info
 intnr_replays;
 intnr_revokes;
 intnr_revoke_hits;
+unsigned long ri_commit_block;
+__be64  last_trans_commit_time;
 };

 enum passtype {PASS_SCAN, PASS_REVOKE, PASS_REPLAY}; @@ -412,7 +414,27 @@ static int jbd2_block_tag_csum_verify(journal_t *j, journal_block_tag_t *tag,
 else
 return tag->t_checksum == cpu_to_be16(csum32);  }
-
+/*
+ * We check the commit time and compare it with the commit time of
+ * the previous transaction, if it's smaller than previous,
+ * We think it's not belong to same journal.
+ */
+static bool is_same_journal(journal_t *journal,struct buffer_head *bh,
+unsigned long blocknr, __u64 last_commit_sec) {
+unsigned long commit_block = blocknr + count_tags(journal, bh) + 1;
+struct buffer_head *nbh;
+struct commit_header *cbh;
+__u64commit_sec;
+
+int err = jread(&nbh, journal, commit_block);
+if (err)
+return true;
+
+cbh = (struct commit_header *)nbh->b_data;
+commit_sec = be64_to_cpu(cbh->h_commit_sec);
+
+return commit_sec >= last_commit_sec;
+}
 static int do_one_pass(journal_t *journal,
 struct recovery_info *info, enum passtype pass)  { @@ -514,18 +536,29 @@ static int do_one_pass(journal_t *journal,
 switch(blocktype) {
 case JBD2_DESCRIPTOR_BLOCK:
 /* Verify checksum first */
+if(pass == PASS_SCAN)
+info->ri_commit_block = 0;
+
 if (jbd2_journal_has_csum_v2or3(journal))
 descr_csum_size =
 sizeof(struct jbd2_journal_block_tail);
 if (descr_csum_size > 0 &&
     !jbd2_descriptor_block_csum_verify(journal,
        bh->b_data)) {
-printk(KERN_ERR "JBD2: Invalid checksum "
-       "recovering block %lu in log\n",
+if(is_same_journal(journal,bh,next_log_block-1,info->last_trans_commit_time)) {
+printk(KERN_ERR "JBD2: Invalid checksum recovering block %lu in
+log\n",
        next_log_block);
-err = -EFSBADCRC;
-brelse(bh);
-goto failed;
+err = -EFSBADCRC;
+brelse(bh);
+goto failed;
+} else {
+/*if it's not belong to same journal, just end this recovery with success*/
+jbd_debug(1,"JBD2: Invalid checksum found in block %lu in log, but not same journal %d\n",
+       next_log_block,next_commit_ID);
+err = 0;
+brelse(bh);
+goto done;
+}
 }

 /* If it is a valid descriptor block, replay it @@ -688,6 +721,17 @@ static int do_one_pass(journal_t *journal,
  * are present verify them in PASS_SCAN; else not
  * much to do other than move on to the next sequence
  * number. */
+if(pass == PASS_SCAN) {
+struct commit_header *cbh =
+(struct commit_header *)bh->b_data;
+if(info->ri_commit_block) {
+jbd_debug(1, "invalid commit block found in %lu, stop here.\n",next_log_block);
+brelse(bh);
+goto done;
+}
+info->ri_commit_block = next_log_block;
+info->last_trans_commit_time = be64_to_cpu(cbh->h_commit_sec);
+}
 if (pass == PASS_SCAN &&
     jbd2_has_feature_checksum(journal)) {
 int chksum_err, chksum_seen;
@@ -761,7 +805,11 @@ static int do_one_pass(journal_t *journal,
 brelse(bh);
 continue;
 }
-
+if (pass != PASS_SCAN && info->ri_commit_block) {
+jbd_debug(1, "invalid revoke block found in %lu, stop here.\n",next_log_block);
+brelse(bh);
+goto done;
+}
 err = scan_revoke_records(journal, bh,
   next_commit_ID, info);
 brelse(bh);

-----邮件原件-----
发件人: Andreas Dilger <adilger@xxxxxxxxx>
发送时间: 2020年9月9日 6:48
收件人: 常凤楠 <changfengnan@xxxxxxxxxxxxx>
抄送: tytso@xxxxxxx; linux-ext4@xxxxxxxxxxxxxxx; jack@xxxxxxxx; Darrick J . Wong <darrick.wong@xxxxxxxxxx>
主题: Re: [PATCH v2] jbd2: fix descriptor block checksum failed after format with lazy_journal_init=1

On Sep 8, 2020, at 2:51 AM, 常凤楠 <changfengnan@xxxxxxxxxxxxx> wrote:
>
> After the last commit, I found that some situations were not considered.
> In last post, I only considered the case of descriptor block checksum failed,but there is others situations.

Thank you for the updated patch.  There are several things that need to be fixed before the patch can be accepted:
- the whitespace of the patch is broken, since it looks like all tabs are lost
- the patch summary should be shorter than 64 characters, maybe like:
  "jbd2: avoid transaction reuse after reformatting" or similar, and then
  put a more complete description in the commit message
- the patch commit message should be shorter than 70 characters.  It would be
  much easier to read if you aligned the transaction blocks vertically, down
  the page, instead having such very long lines, like:

     case 1 journal        case 2 journal        case 3 journal
  ====================  ====================  ===================
  [journal superblock]  [journal superblock]
  [ transactions
      ....
        ....         ]
  [ descriptor block |
  |   data blocks
      ....
        ....         |
  |   commit block   ]
  [ more transactions
       ....
         ....        ]

- the commit message should describe only the patch itself, not the previous
  version of the patch or general comments.  Other comments can be written
  in a separate email that is a reply to the patch itself.
- please run your patch through "checkpatch.pl" to find any style issues
- you need to add a "Signed-off-by: 常凤楠 <changfengnan@xxxxxxxxxxxxx>" line
  or maybe "Signed-off-by: 常凤楠 (Chang Fengnan) <changfengnan@xxxxxxxxxxxxx>"

Some general comments on the patch inline.

> for example:
>
> if you format with lazy_journal_init=1 first time, after mount a short time, you reboot machine, the layout of jbd2 may be like this:
>
> journal Superblock|     [ transactions..... ]      |descriptor_block | data_blocks|  commmit_block | descriptor_block |  data_blocks| commmit_block|[more transactions...
> -------------------------|     [some transactions]   |-------------------       transaction x          -----------------|-----------------          transaction x+1         ---------------|
>
> and after reboot, you format with lazy_journal_init=1 second time, after mount a short time, you reboot machine again, the layout of jbd2 may be like this:
>
> 1.
> journal Superblock|     [ transactions..... ]       |descriptor_block | data_blocks|  commmit_block | descriptor_block |  data_blocks| commmit_block|[more transactions...
> -------------------------|     [some transactions]   |---------------          transaction x       ---------------------|
> 2.
> journal Superblock|     [ transactions..... ]       |descriptor_block |  data_blocks | data_blocks  |    data_blocks   | commmit_block| commmit_block|[more transactions...
> -------------------------|     [some transactions]   |-----------------------------------               transaction x                  --------------------------------|
> 3.
> journal Superblock|     [ transactions..... ]      |descriptor_block |  data_blocks | data_blocks  |    commmit_block | data_blocks | commmit_block|[more transactions...
> -------------------------|    [some transactions]   |--------------------------------     transaction x           ----------------------------|
>
> In case one and two,will be recovery failed. So there is another patch:
>
> fs/jbd2/recovery.c | 60
> +++++++++++++++++++++++++++++++++++++++++++++++++-----
> 1 file changed, 55 insertions(+), 5 deletions(-)
>
> diff --git a/fs/jbd2/recovery.c b/fs/jbd2/recovery.c index
> a4967b27ffb6..56198c2c1a04 100644
> --- a/fs/jbd2/recovery.c
> +++ b/fs/jbd2/recovery.c
> @@ -33,6 +33,8 @@ struct recovery_info intnr_replays; intnr_revokes;
> intnr_revoke_hits;
> +int transaction_flag;

(minor) This should be an unsigned int, and there should be a #define'd constant to describe what "0x1" means instead of using "0x1" in the code.
However, it may be better to make this "ri_commit_block", see below.

> +__be64  last_trans_commit_time;
> };
>
> enum passtype {PASS_SCAN, PASS_REVOKE, PASS_REPLAY}; @@ -412,7 +414,27
> @@ static int jbd2_block_tag_csum_verify(journal_t *j,
> journal_block_tag_t *tag, else return tag->t_checksum ==
> cpu_to_be16(csum32); }
> +/*
> + * We check the commit time and compare it with the commit time of
> +the previous transaction,

Please wrap lines at 80 columns.

> + * if it's smaller than previous, We think it's not belong to same journal.
> + */
> +static int is_same_journal(journal_t *journal,struct buffer_head *bh,
> +unsigned long blocknr, __u64 last_commit_sec)

(minor) return type should be "bool"
(style) Wrap at 80 columns.

> +{
> +int commit_block_nr = blocknr + count_tags(journal, bh) + 1;

(defect) "commit_block_nr" should not be a signed int, or it may overflow
(style) it would be good to use a consistent naming for the block number and the buffer head, like:

unsigned long commit_block;
struct buffer_head *commit_bh;

> +struct buffer_head *nbh;
>
> +int err = jread(&nbh, journal, commit_block_nr);

(style) blank line after variable declarations

> +if (err)
> +return 1;

(defect?) Should the checksum of commit_bh be verified before use?
Otherwise, this may be a random block in the journal that has some value > last_commit_sec, and not a commit block at all.  However, it may be enough to avoid this extra commit block checksum by storing the commit block number as ri_commit_block, and then checking it is correct when the commit block is later processed.

> +
> +struct commit_header *cbh = (struct commit_header *)nbh->b_data;
> +__u64commit_sec = be64_to_cpu(cbh->h_commit_sec);

(style) no variable declarations in the middle of functions

> +
> +if(commit_sec < last_commit_sec)
> +return 0;
> +else
> +return 1;

(style) no need for "else" after "return".  That said, this function could just return the boolean comparison value directly:

return commit_sec >= last_commit_sec;

> +}
> static int do_one_pass(journal_t *journal, struct recovery_info *info,
> enum passtype pass) { @@ -514,18 +536,31 @@ static int
> do_one_pass(journal_t *journal,
> switch(blocktype) {
> case JBD2_DESCRIPTOR_BLOCK:
> /* Verify checksum first */
> +if(pass == PASS_SCAN) {
> +info->transaction_flag = 0x1;
> +}

(style) no need for {} around single-line block
(style) could this instead store the block number of the logical commit block number that is referenced and verified by this descriptor block?
That would be better than just storing the "0x1" flag, to verify that the proper checksum/timestamp was used by the descriptor block?

I had to re-indent the below code to understand the logic, but I think I did it correctly.

> if (jbd2_journal_has_csum_v2or3(journal))
> descr_csum_size =
> sizeof(struct jbd2_journal_block_tail);
> if (descr_csum_size > 0 &&
>     !jbd2_descriptor_block_csum_verify(journal,
>        bh->b_data)) {
> -printk(KERN_ERR "JBD2: Invalid checksum "
> +
> +if(is_same_journal(journal,bh,next_log_block-1,info->last_trans_commi
> +t_time)) {

(style) wrap lines at 80 columns.

> +printk(KERN_ERR "JBD2: Invalid checksum "
>         "recovering block %lu in log\n",
>         next_log_block);

(style) console error strings should *not* be wrapped at 80 columns, so it can be found more easily in the code (e.g. grep for "checksum recovering"
should be able to find this line of code).  This code was written before the "do not wrap error strings" rule was added, but should be updated to follow the new code style if modified.

> -err = -EFSBADCRC;
> -brelse(bh);
> -goto failed;
> +err = -EFSBADCRC;
> +brelse(bh);
> +goto failed;

> +} else {
> +/*if it's not belong to same journal, just end this recovery,
> +return with success*/

(style) wrap at 80 columns

> +printk(KERN_ERR "JBD2: Invalid checksum "
> +       "found in block %lu in log, but not same journal %d\n",
> +       next_log_block,next_commit_ID);

(style) do not wrap error strings
(minor) this shouldn't really be printed as an "error" to the user, since it isn't an "error" at all, only a bad coincidence or a short transaction?
This should probably only be a jbd_debug() message or similar?

> +err = 0;
> +brelse(bh);
> +goto done;
> +}
> }
>
> /* If it is a valid descriptor block, replay it @@ -688,6 +723,17 @@
> static int do_one_pass(journal_t *journal,
>  * are present verify them in PASS_SCAN; else not
>  * much to do other than move on to the next sequence
>  * number. */
> +if(pass == PASS_SCAN) {
> +struct commit_header *cbh =
> +(struct commit_header *)bh->b_data;
> +if(info->transaction_flag != 0x1) {

(style) This should check "info->ri_commit_block" to verify that this block is the correct commit block, which is safer than just checking "flag != 0x1".

> +jbd_debug(1, "invalid commit block found in %lu, stop
> +here.\n",next_log_block); brelse(bh); goto done; }
> +info->transaction_flag = 0x0;
> +info->last_trans_commit_time = be64_to_cpu(cbh->h_commit_sec);
> +}
> if (pass == PASS_SCAN &&
>     jbd2_has_feature_checksum(journal)) { int chksum_err, chksum_seen;
> @@ -761,7 +807,11 @@ static int do_one_pass(journal_t *journal,
> brelse(bh); continue; }
> -
> +if (pass != PASS_SCAN && info->transaction_flag != 0x1) {
> +jbd_debug(1, "invalid revoke block found in %lu, stop
> +here.\n",next_log_block); brelse(bh); goto done; }
> err = scan_revoke_records(journal, bh,
>   next_commit_ID, info);
> brelse(bh);


Cheers, Andreas






________________________________

CONFIDENTIALITY NOTICE: This electronic message is intended to be viewed only by the individual or entity to whom it is addressed. It may contain information that is privileged, confidential and exempt from disclosure under applicable law. Any dissemination, distribution or copying of this communication is strictly prohibited without our prior permission. If the reader of this message is not the intended recipient, or the employee or agent responsible for delivering the message to the intended recipient, or if you have received this communication in error, please notify us immediately by return e-mail and delete the original message and any copies of it from your computer system. For further information about Hikvision company. please see our website at www.hikvision.com<http://www.hikvision.com>




[Index of Archives]     [Reiser Filesystem Development]     [Ceph FS]     [Kernel Newbies]     [Security]     [Netfilter]     [Bugtraq]     [Linux FS]     [Yosemite National Park]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Device Mapper]     [Linux Media]

  Powered by Linux