Re: reservation errors during fstests on pNFS block

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




> On Jun 14, 2024, at 12:38 PM, Christoph Hellwig <hch@xxxxxxxxxxxxx> wrote:
> 
> On Fri, Jun 14, 2024 at 02:46:49PM +0000, Chuck Lever III wrote:
>> I've finally gotten kdevops and pNFS block to the point where
>> it can run fstests smoothly with an iSCSI target. I'm seeing
>> error messages on occasion in the system journal. This set is
>> from generic/069:
> 
> Reservation means another node has an active reservation on that LU.

There are only two accessors of the LUN: the NFS server and
the NFS client running the test. That's why these errors are
a little surprising to me.


> Either you did another previous attempt that fail and let the
> reservation linger, or something else in the system claimed it.

This is the first fstests run after the systems were provisioned.
kdevops lets me provision from scratch before every run [1].


>> But note that generic/069 is recorded as passing without error.
> 
> When pNFS layout access fails we fall back to normal access through the
> MDS, so this is expected.

Expected, OK. From a usability standpoint, error messages like
this would probably be alarming to administrators. I plan to
convert the printk's and dprintk's in the NFSD layout code into
trace points, but that doesn't help the messages emitted by the
block and SCSI drivers. Ideally this should be less noisy.


> Is generic/069 that first test that failed when doing a full xfstests
> run?

Yes, it's a full run. generic/069 is the first test where there
are remarkable system journal messages (ie, PR errors), though
there are a few subsequent tests that are also whinging.


> Do you see LAYOUT* ops in /proc/self/mountstats for the previous
> tests?

generic/013 is known to generate layout recalls, for example,
so there is layout activity during the test run.

I can go back and try reproducing with just generic/069 and
tcpdump as a first step. Is there a way I can tell that the
PR errors are not reporting a possible data corruption? I
guess the PASS report from generic/069 is one way. The pass/fail
log from xfstests for pNFS block looks just the non-pNFS runs,
so maybe this is must ado about nothing.


--
Chuck Lever

[1] - https://github.com/chucklever/kdevops/tree/pnfs-block-testing




[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux