On 2025/1/7 15:08, Darrick J. Wong wrote:
On Tue, Jan 07, 2025 at 10:01:32AM +0800, Baokun Li wrote:
On 2025/1/7 7:49, Darrick J. Wong wrote:
On Sat, Jan 04, 2025 at 10:41:28AM +0800, Baokun Li wrote:
Hi Ted,
On 2025/1/3 23:54, Theodore Ts'o wrote:
On Fri, Jan 03, 2025 at 10:35:17AM -0500, Theodore Ts'o wrote:
I don't see how setting the shutdown flag causes reads to fail. That
was true in an early version of the ext4 patch which implemented
shutdown support, but one of the XFS developers (I don't remember if
it was Dave or Cristoph) objected because XFS did not cause the
read_pages function to fail. Are you seeing this with an upstream
kernel, or with a patched kernel? The upstream kernel does *not* have
the check in ext4_readpages() or ext4_read_folio() (post folio
conversion).
OK, that's weird. Testing on 6.13-rc4, I don't see the problem simulating an ext4 error:
root@kvm-xfstests:~# mke2fs -t ext4 -Fq /dev/vdc
/dev/vdc contains a ext4 file system
last mounted on /vdc on Fri Jan 3 10:38:21 2025
root@kvm-xfstests:~# mount -t ext4 -o errors=continue /dev/vdc /vdc
We are discussing "errors=remount-ro," as the title states, not the
continue mode. The key code leading to the behavior change is as follows,
therefore the continue mode is not affected.
Hmm. On the one hand, XFS has generally returned EIO (or ESHUTDOWN in a
couple of specialty cases) when the fs has been shut down.
Indeed, this is the intended behavior during shutdown.
OTOH XFS also doesn't have errors=remount-ro; it just dies, which I
think has been its behavior for a long time.
Yes. As an aside, is there any way for xfs to determine if -EIO is
originating from a hardware error or if the filesystem has been shutdown?
XFS knows the difference, but nothing above it does.
Okay.
Or would you consider it useful to have the mount command display
"shutdown" when the file system is being shut down?
Trouble is, will mount get confused and try to pass ",shutdown" as part
of a remount operation?
The ",shutdown" string is only displayed by show_options when specific
flags are set; it's not actually parsed by remount. Unless the sysadmin
sees it in the mount command output and then mounts with this option.
I suppose the fs is dead so what does it
matter...
Since XFS is typically already shut down when it returns EIO, this prompt
may not be important for xfs. However, it's not as straightforward to
distinguish between EIO and shutdown for file systems that support a
continue mode or allow some operations even after shutdown.
To me, it doesn't sound unreasonable for ext* to allow reads after a
shutdown when errors=remount-ro since it's always had that behavior.
Yes, a previous bug fix inadvertently changed the behavior of
errors=remount-ro,
and the patch to correct this is coming.
Additionally, ext4 now allows directory reads even after shutdown, is this
expected behavior?
There's no formal specification for what shutdown means, so ... it's not
unexpected. XFS doesn't allow that.
Okay.
Bonus Q: do you want an errors=fail variant to shut things down fast?
--D
In my opinion, I have not yet seen a scenario where the file system needs
to be shut down after an error occurs. Therefore, using errors=remount-ro
to prevent modifications after an error is sufficient. Of course, if
customers have such needs, implementing this mode is also very simple.
IO errors, sure. Metadata errors? No, we want to stop the world
immediately, either so the sysadmin can go run xfs_repair, or the clod
manager can just kill the node and deploy another.
--D
The remount-ro mode generally only becomes read-only when metadata errors
occur,too. I think if users have no need to read after an error, there is
basically no difference between read-only and shutdown. In fact,
errors=remount-ro does more; it allows users to back up some potentially
lost data after an error and then exit gracefully. However, reading
corrupted metadata does have some risks, for which we have done a lot of
work.
Regards,
Baokun