Re: reservation errors during fstests on pNFS block

Chuck Lever III <chuck.lever@xxxxxxxxxx> · Fri, 14 Jun 2024 18:33:02 +0000

> On Jun 14, 2024, at 2:26 PM, Christoph Hellwig <hch@xxxxxxxxxxxxx> wrote:
> 
> On Fri, Jun 14, 2024 at 05:46:21PM +0000, Chuck Lever III wrote:
>>> Reservation means another node has an active reservation on that LU.
>> 
>> There are only two accessors of the LUN: the NFS server and
>> the NFS client running the test. That's why these errors are
>> a little surprising to me.
> 
> You can create registrations from userspace, and some cluster managers
> do that.  But none of that should happen for a default setup.
> 
>>> When pNFS layout access fails we fall back to normal access through the
>>> MDS, so this is expected.
>> 
>> Expected, OK. From a usability standpoint, error messages like
>> this would probably be alarming to administrators. I plan to
>> convert the printk's and dprintk's in the NFSD layout code into
>> trace points, but that doesn't help the messages emitted by the
>> block and SCSI drivers. Ideally this should be less noisy.
> 
> Well, they really should be alarming because the admin configured
> a block layout setup and it did not work as expected.  So it should
> ring alarm bells.

Yes, I expect that "pNFS: failed to open device
/dev/disk/by-id/dm-uuid-mpath-0x6001 ..." is very likely
operator error.

>>> Is generic/069 that first test that failed when doing a full xfstests
>>> run?
>> 
>> Yes, it's a full run. generic/069 is the first test where there
>> are remarkable system journal messages (ie, PR errors), though
>> there are a few subsequent tests that are also whinging.
> 
> Interesting.  Normally only the server actually reserves the LU,
> the clients just register.  And something went wrong here and only
> for these tests.

I just checked the NFS server's system journal, and there's
nothing interesting there.

FWIW, the other two tests that emit unexpected journal
messages that I noted down are generic/108 and generic/616.

>>> Do you see LAYOUT* ops in /proc/self/mountstats for the previous
>>> tests?
>> 
>> generic/013 is known to generate layout recalls, for example,
>> so there is layout activity during the test run.
> 
> Ok.  The other thing would be to run blktrace on the client and
> see that it shows I/O.  But all this sounds like the tests in
> general work, but something is up with generic/069.
> 
> generic/069 just does O_APPEND writes, so I can't see what
> would be so special about it.
> 
>> 
>> I can go back and try reproducing with just generic/069 and
>> tcpdump as a first step. Is there a way I can tell that the
>> PR errors are not reporting a possible data corruption?
> 
> xfstests in general does data verifycation to check for data integrity,
> so we should not rely on kernel messages.
> 
> I'm a bit busy right now, but I'll try to reproduce this locally next
> week.

Thanks, I'll also try to investigate further.

--
Chuck Lever