[PATCH 0/9 v3] xfs: shutdown is a racy mess

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



With the recent log problems we've uncovered, it's clear that the
way we shut down filesystems and the log is a chaotic mess. We can
have multiple filesystem shutdown executions being in progress at
once, all competing to run shutdown processing and emit log messages
saying the filesystem has been shut down and why. Further, shutdown
changes the log state and runs log IO completion callbacks without
any co-ordination with ongoing log operations.

This results in shutdowns running unpredictably, running multiple
times, racing with the iclog state machine transitions and exposing
us to use-after-free situations and unexpected state changes within
the log itself.

This patch series tries to address the chaotic nature of shutdowns
by making shutdown execution consistent and predictable. This is
achieved by:

- making the mount shutdown state transistion atomic and not
  dependent on log state.
- making operational log state transitions atomic
- making the log shutdown check be based entirely on the operational
  XLOG_IO_ERROR log state rather than a combination of log flags and
  iclog XLOG_STATE_IOERROR checks.
- Getting rid of XLOG_STATE_IOERROR means shutdown doesn't perturb
  iclog state in the middle of operations that are expecting iclogs
  to be in specific state(s).
- shutdown doesn't process iclogs that are actively referenced.
  This avoids use-after-free situations where shutdown runs
  callbacks and frees objects that own the reference to the iclog
  and are still in use by the iclog reference owner.
- Run shutdown processing when the last active reference to an iclog
  goes away. This guarantees that shutdown processing occurs on all
  iclogs, but it only occurs when it is safe to do so.
- acknowledge that log state is not consistent once shutdown has
  been entered and so don't try to apply consistency checking during
  a shutdown...

At the end of this patch series, shutdown runs once and once only at
the first trigger, iclog state is not modified by shutdown, and
iclog callbacks and wakeups are not processed until all active
references to the iclog(s) are dropped. Hence we now have
deterministic shutdown behaviour for both the mount and the log and
a consistent iclog lifecycle framework that we can build more
complex functionality on top of safely.

Version 3:
- rebase on 5.14-rc4 + for-next @ 130916145229
- Fixed typos in commit messages

Version 2:
- https://lore.kernel.org/linux-xfs/20210714031958.2614411-1-david@xxxxxxxxxxxxx/
- rebase on 5.14-rc1
- added comment about XFS_FORCED_SHUTDOWN -> xlog_is_shutdown in commit message.
- fix spurious semi-colon at end of for loop.
- fixed typos in commit messages
- undid the do {} while -> for {} conversion in xlog_state_do_callbacks()
- removed spurious blank lines in xfs_do_force_shutdown()
- added comment to commit description explaining the unconditional stack dump on
  shutdown if the error level is high enough.
- added comment about iclog IO completion avoiding shutdown races with
  referenced iclogs that haven't yet been submitted to commit description.
- cleaned up xlog_state_release_iclog() structure for better readability.
- cleaned up xlog_space_left() structure for better readability.

Version 1:
- https://lore.kernel.org/linux-xfs/20210630063813.1751007-1-david@xxxxxxxxxxxxx/




[Index of Archives]     [XFS Filesystem Development (older mail)]     [Linux Filesystem Development]     [Linux Audio Users]     [Yosemite Trails]     [Linux Kernel]     [Linux RAID]     [Linux SCSI]


  Powered by Linux