Re: About xfstests generic/361

Ian Kent <raven@xxxxxxxxxx> · Wed, 30 Oct 2019 09:23:29 +0800

On Tue, 2019-10-29 at 10:02 +0800, Ian Kent wrote:
> On Tue, 2019-10-29 at 09:11 +0800, Ian Kent wrote:
> > On Mon, 2019-10-28 at 17:52 -0700, Darrick J. Wong wrote:
> > > On Tue, Oct 29, 2019 at 08:29:38AM +0800, Ian Kent wrote:
> > > > On Mon, 2019-10-28 at 16:34 -0700, Darrick J. Wong wrote:
> > > > > On Mon, Oct 28, 2019 at 05:17:05PM +0800, Ian Kent wrote:
> > > > > > Hi Darrick,
> > > > > > 
> > > > > > Unfortunately I'm having a bit of trouble with my USB
> > > > > > keyboard
> > > > > > and random key repeats, I lost several important messages
> > > > > > this
> > > > > > morning due to it.
> > > > > > 
> > > > > > Your report of the xfstests generic/361 problem was one of
> > > > > > them
> > > > > > (as was Christoph's mail about the mount code location,
> > > > > > I'll
> > > > > > post
> > > > > > on that a bit later). So I'm going to have to refer to the
> > > > > > posts
> > > > > > and hope that I can supply enough context to avoid
> > > > > > confusion.
> > > > > > 
> > > > > > Sorry about this.
> > > > > > 
> > > > > > Anyway, you posted:
> > > > > > 
> > > > > > "Dunno what's up with this particular patch, but I see
> > > > > > regressions
> > > > > > on
> > > > > > generic/361 (and similar asserts on a few others).  The
> > > > > > patches
> > > > > > leading
> > > > > > up to this patch do not generate this error."
> > > > > > 
> > > > > > I've reverted back to a point more or less before moving
> > > > > > the
> > > > > > mount
> > > > > > and super block handling code around and tried to reproduce
> > > > > > the
> > > > > > problem
> > > > > > on my test VM and I din't see the problem.
> > > > > > 
> > > > > > Is there anything I need to do when running the test, other
> > > > > > have
> > > > > > SCRATCH_MNT and SCRATCH_DEV defined in the local config,
> > > > > > and
> > > > > > the
> > > > > > mount point, and the device existing?
> > > > > 
> > > > > Um... here's the kernel branch that I used:
> > > > > 
> > > > > https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/log/?h=mount-api-crash
> > > > 
> > > > Ok, I'll see what I can do with that.
> > > > 
> > > > > Along with:
> > > > > 
> > > > > MKFS_OPTIONS -- -m crc=0
> > > > 
> > > > Right.
> > > > 
> > > > > MOUNT_OPTIONS -- -o usrquota,grpquota
> > > > 
> > > > It looked like generic/361 used only the SCRATCH_DEV so I
> > > > thought
> > > > that meant making a file system and mounting it within the
> > > > test.
> > > 
> > > Yes.  MOUNT_OPTIONS are used to mount the scratch device (and in
> > > my
> > > case
> > > the test device too).
> > > 
> > > > > and both TEST_DEV and SCRATCH_DEV pointed at boring scsi
> > > > > disks.
> > > > 
> > > > My VM disks are VirtIO (file based) virtual disks, so that
> > > > sounds
> > > > a bit different.
> > > > 
> > > > Unfortunately I can't use raw disks on the NAS I use for VMs
> > > > and
> > > > I've migrated away from having a desktop machine with a couple
> > > > of
> > > > disks to help with testing.
> > > > 
> > > > I have other options if I really need to but it's a little bit
> > > > harder to setup and use company lab machines remotely, compared
> > > > to
> > > > local hardware (requesting additional disks is hard to do), and
> > > > I'm not sure (probably not) if they can/will use raw disks (or
> > > > partitions) either.
> > > 
> > > Sorry, I meant 'boring SCSI disks' in a VM.
> > > 
> > > Er let's see what the libvirt config is...
> > > 
> > >     <disk type='file' device='disk'>
> > >       <driver name='qemu' type='raw' cache='unsafe'
> > > discard='unmap'/>
> > >       <source file='/run/mtrdisk/a.img'/>
> > >       <target dev='sda' bus='scsi'/>
> > >       <address type='drive' controller='0' bus='0' target='0'
> > > unit='0'/>
> > >     </disk>
> > > 
> > > Which currently translates to virtio-scsi disks.
> > 
> > I could use the scsi driver for the disk I guess but IO is already
> > a bottleneck for me.
> > 
> > For my VM disks I have:
> > 
> >     <disk type='file' device='disk'>
> >       <driver name='qemu' type='qcow2' cache='writeback'/>
> >       <source file='/share/VS-VM/images/F30 test/F30
> > test_2.1565610215' startupPolicy='optional'/>
> >       <target dev='vdc' bus='virtio'/>
> >       <address type='pci' domain='0x0000' bus='0x00' slot='0x08'
> > function='0x0'/>
> >     </disk>
> > 
> > I'm pretty much restricted to cow type VM disks if I don't do
> > some questionable manual customization to the xml, ;)
> > 
> > In any case the back trace you saw looks like it's in the mount/VFS
> > code
> > so it probably isn't disk driver related.
> > 
> > I'll try and reproduce it with a checkout of you branch above.
> 
> I guess this is where things get difficult.
> 
> I can't reproduce it, I tried creating an additional VM disk that
> uses a SCSI controller as well but no joy.
> 
> I used this config:
> [xfs]
> FSTYPE=xfs
> MKFS_OPTIONS="-m crc=0"
> MOUNT_OPTIONS="-o usrquota,grpquota"
> TEST_DIR=/mnt/test
> TEST_DEV=/dev/vdb
> TEST_LOGDEV=/dev/vdd
> SCRATCH_MNT=/mnt/scratch
> SCRATCH_DEV=/dev/sda
> SCRATCH_LOGDEV=/dev/vde
> 
> and used:
> ./check -s xfs generic/361
> 
> Perhaps some of the earlier tests played a part in the problem,
> I'll try running all the tests next ...
> 
> Perhaps I'll need to try a different platform ... mmm.

Well, that was rather more painful that I had hoped.

I have been able to reproduce the problem by using a libvirt VM on
my NUC desktop.

That raises the question of whether the (older version) qemu on my
NAS is at fault or the newer libvirt is at fault.

I don't think it's the raw vs. qcow virtaul disk difference but I
may need to check that in the libvirt setup.

I think a bare metal install should be definitive ... what do you
think Darrick?

Ian