Re: [PATCH v2 2/3] ext4: avoid journaling sb update on error if journal is destroying

Baokun Li <libaokun1@xxxxxxxxxx> · Thu, 13 Mar 2025 10:08:48 +0800

On 2025/3/13 9:20, Zhang Yi wrote:
On 2025/3/13 1:15, Jan Kara wrote:
On Wed 12-03-25 19:56:36, Ojaswin Mujoo wrote:
On Wed, Mar 12, 2025 at 11:51:03AM +0100, Jan Kara wrote:
On Mon 10-03-25 10:13:36, Ritesh Harjani wrote:
Ojaswin Mujoo <ojaswin@xxxxxxxxxxxxx> writes:
On Sun, Mar 09, 2025 at 12:11:22AM +0530, Ritesh Harjani wrote:
Ojaswin Mujoo <ojaswin@xxxxxxxxxxxxx> writes:
On Sat, Mar 08, 2025 at 06:56:23PM +0530, Ritesh Harjani wrote:
Ojaswin Mujoo <ojaswin@xxxxxxxxxxxxx> writes:
On Sat, Mar 08, 2025 at 03:25:04PM +0530, Ritesh Harjani (IBM) wrote:
Ojaswin Mujoo <ojaswin@xxxxxxxxxxxxx> writes:
Presently we always BUG_ON if trying to start a transaction on a journal marked
with JBD2_UNMOUNT, since this should never happen. However, while ltp running
stress tests, it was observed that in case of some error handling paths, it is
possible for update_super_work to start a transaction after the journal is
destroyed eg:

(umount)
ext4_kill_sb
   kill_block_super
     generic_shutdown_super
       sync_filesystem /* commits all txns */
       evict_inodes
         /* might start a new txn */
       ext4_put_super
	flush_work(&sbi->s_sb_upd_work) /* flush the workqueue */
         jbd2_journal_destroy
           journal_kill_thread
             journal->j_flags |= JBD2_UNMOUNT;
           jbd2_journal_commit_transaction
             jbd2_journal_get_descriptor_buffer
               jbd2_journal_bmap
                 ext4_journal_bmap
                   ext4_map_blocks
                     ...
                     ext4_inode_error
                       ext4_handle_error
                         schedule_work(&sbi->s_sb_upd_work)

                                                /* work queue kicks in */
                                                update_super_work
                                                  jbd2_journal_start
                                                    start_this_handle
                                                      BUG_ON(journal->j_flags &
                                                             JBD2_UNMOUNT)

Hence, introduce a new sbi flag s_journal_destroying to indicate journal is
destroying only do a journaled (and deferred) update of sb if this flag is not
set. Otherwise, just fallback to an un-journaled commit.

We set sbi->s_journal_destroying = true only after all the FS updates are done
during ext4_put_super() (except a running transaction that will get commited
during jbd2_journal_destroy()). After this point, it is safe to commit the sb
outside the journal as it won't race with a journaled update (refer
2d01ddc86606).

Also, we don't need a similar check in ext4_grp_locked_error since it is only
called from mballoc and AFAICT it would be always valid to schedule work here.

Fixes: 2d01ddc86606 ("ext4: save error info to sb through journal if available")
Reported-by: Mahesh Kumar <maheshkumar657g@xxxxxxxxx>
Suggested-by: Jan Kara <jack@xxxxxxx>
Signed-off-by: Ojaswin Mujoo <ojaswin@xxxxxxxxxxxxx>
---
  fs/ext4/ext4.h      | 2 ++
  fs/ext4/ext4_jbd2.h | 8 ++++++++
  fs/ext4/super.c     | 4 +++-
  3 files changed, 13 insertions(+), 1 deletion(-)

diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index 2b7d781bfcad..d48e93bd5690 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -1728,6 +1728,8 @@ struct ext4_sb_info {
  	 */
  	struct work_struct s_sb_upd_work;
  
+	bool s_journal_destorying;
+
  	/* Atomic write unit values in bytes */
  	unsigned int s_awu_min;
  	unsigned int s_awu_max;
diff --git a/fs/ext4/ext4_jbd2.h b/fs/ext4/ext4_jbd2.h
index 9b3c9df02a39..6bd3ca84410d 100644
--- a/fs/ext4/ext4_jbd2.h
+++ b/fs/ext4/ext4_jbd2.h
@@ -437,6 +437,14 @@ static inline int ext4_journal_destroy(struct ext4_sb_info *sbi, journal_t *jour
  {
  	int err = 0;
  
+	/*
+	 * At this point all pending FS updates should be done except a possible
+	 * running transaction (which will commit in jbd2_journal_destroy). It
+	 * is now safe for any new errors to directly commit superblock rather
+	 * than going via journal.
+	 */
+	sbi->s_journal_destorying = true;
This is not correct right. I think what we decided to set this flag
before we flush the workqueue. So that we don't schedule any new
work after this flag has been set. At least that is what I understood.

[1]: https://lore.kernel.org/all/87eczc6rlt.fsf@xxxxxxxxx/

-ritesh
Hey Ritesh,

Yes that is not correct, I missed that in my patch however we realised
that adding it before flush_work() also has issues [1]. More
specifically:
Ohk. right.

                      **kjournald2**
                      jbd2_journal_commit_transaction()
                      ...
                      ext4_handle_error()
                         /* s_journal_destorying is not set */
                         if (journal && !s_journal_destorying)
Then maybe we should not schedule another work to update the superblock
via journalling, it the error itself occurred while were trying to
commit the journal txn?


-ritesh
Hmm, ideally yes that should not happen, but how can we achieve that?
For example with the trace we saw:

    **kjournald2**
    jbd2_journal_commit_transaction()
      jbd2_journal_get_descriptor_buffer
        jbd2_journal_bmap
          ext4_journal_bmap
            ext4_map_blocks
              ...
              ext4_inode_error
                ext4_handle_error
                  schedule_work(&sbi->s_sb_upd_work)

How do we tell ext4_handle_error that it is in the context of a
committing txn.
So I was thinking about this. It is not a problem to determine we are
running in kjournald context - it is enough to check

	current == EXT4_SB(sb)->s_journal->j_task
Oh, right :)

But I'm not sure checking this in ext4_handle_error() and doing direct sb
update instead of scheduling a journalled one is always correct. For
example kjournald does also writeback of ordered data and if that hits an
error, we do not necessarily abort the journal (well, currently we do as
far as I'm checking but it seems a bit fragile to rely on this).
Okay so IIUC your concern is there might be some codepaths, now or in
the future, where kjournald might call the FS layer, hit an error and
still decide to not abort. In which case we would still want to update
the sb via journal.
Yeah. The reason why I'm a bit concerned about it is mostly the case of
kjournald also handling ordered data and situations like
!(journal->j_flags & JBD2_ABORT_ON_SYNCDATA_ERR) where people want to
continue although ordered data had issues. Or situations where something in
j_commit_callback or another jbd2 hook ends up calling ext4_error()...

Ha, right! This is a case where kjournald triggers an ext4 error but does
not abort the journal for now, I forgot this one, and there may be more.
Thanks for pointing it out. I would also prefer to use this solution of
adding ext4_journal_destory().

If we consider the possibility that there might be calls to ext4_error()
without aborting the journal, although I cannot imagine how this might
happen, this situation may indeed appear in hidden corners now or in the
future. Therefore, an extra flag is indeed needed, with which we don't
have to think so much. 😀


Cheers,
Baokun