Re: recent intermittent fsx-related failures

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



> On Sep 19, 2021, at 7:19 PM, Trond Myklebust <trondmy@xxxxxxxxxxxxxxx> wrote:
> 
> On Sun, 2021-09-19 at 23:03 +0000, Chuck Lever III wrote:
>> 
>>> On Jul 23, 2021, at 4:24 PM, Trond Myklebust
>>> <trondmy@xxxxxxxxxxxxxxx> wrote:
>>> 
>>> On Fri, 2021-07-23 at 20:12 +0000, Chuck Lever III wrote:
>>>> Hi-
>>>> 
>>>> I noticed recently that generic/075, generic/112, and generic/127
>>>> were
>>>> failing intermittently on NFSv3 mounts. All three of these tests
>>>> are
>>>> based on fsx.
>>>> 
>>>> "git bisect" landed on this commit:
>>>> 
>>>> 7b24dacf0840 ("NFS: Another inode revalidation improvement")
>>>> 
>>>> After reverting 7b24dacf0840 on v5.14-rc1, I can no longer
>>>> reproduce
>>>> the test failures.
>>>> 
>>>> 
>>> 
>>> So you are seeing file metadata updates that end up not changing
>>> the
>>> ctime?
>> 
>> As far as I can tell, a WRITE and two SETATTRs are happening in
>> sequence to the same file during the same jiffy. The WRITE does
>> not report pre/post attributes, but the SETATTRs do. The reported
>> pre- and post- mtime and ctime are all the same value for both
>> SETATTRs, I believe due to timestamp_truncate().
>> 
>> My theory is that persistent-storage-backed filesystems seem to
>> go slow enough that it doesn't become a significant problem. But
>> with tmpfs, this can happen often enough that the client gets
>> confused. And I can make the problem unreproducable if I enable
>> enough debugging paraphernalia on the server to slow it down.
>> 
>> I'm not exactly sure how the client becomes confused by this
>> behavior, but fsx reports a stale size value, or it can hit a
>> bus error. I'm seeing at least four of the fsx-based xfs tests
>> fail intermittently.
> 
> It really isn't a client problem then. If the server is failing to
> update the timestamps, then you gets what you gets.

I don't think it's as simple as that.

The Linux VFS has clamped the resolution of file timestamps since
before the git era began. See current_time() and its ancestors.
The fsx-based tests start failing only after 

7b24dacf0840 ("NFS: Another inode revalidation improvement")

was applied to the client. So until 7b24dacf0840, things worked
despite poor server-side timestamp resolution.

In addition, it's not terribly sensible that the client should
ignore changes that it made itself simply because the ctime on
the server didn't change. m/ctime has been more or less a hint
since day one, used to detect possible changes by _other_
clients. Here, the client is doing a SETATTR then throwing away
the server-returned attributes and presenting a stale file size
from its own cache to an application.

That smells awfully like a client regression to me.

In any event, as I said above, I'm not exactly sure how the
client is becoming confused, so this is not yet a rigorous
root-cause analysis. I was simply responding to your question
about file metadata updates without a ctime change. Yes, that
is happening, but apparently that is actually a pretty normal
situation.


--
Chuck Lever







[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux