Re: A NFS, xfs, reflink and rmapbt story

Murphy Zhou <jencce.kernel@xxxxxxxxx> · Wed, 5 Feb 2020 14:52:24 +0800

On Tue, Jan 28, 2020 at 10:56:17AM +1100, Dave Chinner wrote:
> On Thu, Jan 23, 2020 at 05:10:19PM -0800, Darrick J. Wong wrote:
> > On Thu, Jan 23, 2020 at 04:32:17PM +0800, Murphy Zhou wrote:
> > > Hi,
> > > 
> > > Deleting the files left by generic/175 costs too much time when testing
> > > on NFSv4.2 exporting xfs with rmapbt=1.
> > > 
> > > "./check -nfs generic/175 generic/176" should reproduce it.
> > > 
> > > My test bed is a 16c8G vm.
> > 
> > What kind of storage?

Loop device in guest.

# Host:

[root@ibm-x3850x5-03]$ lsblk
NAME                            MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
sda                               8:0    0  2.7T  0 disk
├─sda1                            8:1    0    1M  0 part
├─sda2                            8:2    0    1G  0 part /boot
└─sda3                            8:3    0  2.7T  0 part
  ├─rhel_ibm--x3850x5--03-root  253:0    0  550G  0 lvm  /
  ├─rhel_ibm--x3850x5--03-swap  253:1    0 27.6G  0 lvm  [SWAP]
  ├─rhel_ibm--x3850x5--03-home  253:2    0  1.7T  0 lvm  /home
  ├─rhel_ibm--x3850x5--03-test1 253:3    0   10G  0 lvm
  └─rhel_ibm--x3850x5--03-test2 253:4    0   10G  0 lvm
loop0                             7:0    0    1G  0 loop
loop1                             7:1    0    1G  0 loop
[root@ibm-x3850x5-03]$ smartctl -a /dev/sda
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1115.el7.x86_64]
(local build)
Copyright (C) 2002-18, Bruce Allen, Christian Franke,
www.smartmontools.org

=== START OF INFORMATION SECTION ===
Vendor:               IBM
Product:              ServeRAID M5015
Revision:             2.13
Compliance:           SPC-3
User Capacity:        2,996,997,980,160 bytes [2.99 TB]
Logical block size:   512 bytes
Logical Unit id:      0x600605b001665aa019cb17be1e9ce991
Serial number:        0091e99c1ebe17cb19a05a6601b00506
Device type:          disk
Local Time is:        Wed Feb  5 14:35:57 2020 CST
SMART support is:     Unavailable - device lacks SMART capability.

=== START OF READ SMART DATA SECTION ===
Current Drive Temperature:     0 C
Drive Trip Temperature:        0 C

Error Counter logging not supported

Device does not support Self Test logging
[root@ibm-x3850x5-03]$ virsh domblklist 8u
Target     Source
------------------------------------------------
hda        /home/8u.qcow2
hdb        /home/8ut.qcow2
hdc        /home/8ut1.qcow2

[root@ibm-x3850x5-03]$

# Guest:

[root@8u]$ lsblk
NAME          MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
sda             8:0    0  800G  0 disk
├─sda1          8:1    0    2G  0 part
│ └─rhel-swap 253:0    0    2G  0 lvm  [SWAP]
└─sda2          8:2    0  798G  0 part /
sdb             8:16   0  200G  0 disk /home
sdc             8:32   0  100G  0 disk
├─sdc1          8:33   0   50G  0 part
└─sdc2          8:34   0   50G  0 part
pmem0         259:0    0    5G  0 disk
[root@8u]$ smartctl -a /dev/sdb
smartctl 6.6 2017-11-05 r4594 [x86_64-linux-5.5.0-v5.5-9386-g33b4013]
(local build)
Copyright (C) 2002-17, Bruce Allen, Christian Franke,
www.smartmontools.org

=== START OF INFORMATION SECTION ===
Device Model:     QEMU HARDDISK
Serial Number:    QM00003
Firmware Version: 1.5.3
User Capacity:    214,748,364,800 bytes [214 GB]
Sector Size:      512 bytes logical/physical
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:   ATA/ATAPI-7, ATA/ATAPI-5 published, ANSI NCITS
340-2000
Local Time is:    Wed Feb  5 14:39:18 2020 CST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x82)	Offline data collection activity
					was completed without error.
					Auto Offline Data Collection:
Enabled.
Self-test execution status:      (   0)	The previous self-test routine
completed
					without error or no self-test
has ever
					been run.
Total time to complete Offline
data collection: 		(  288) seconds.
Offline data collection
capabilities: 			 (0x19) SMART execute Offline immediate.
					No Auto Offline data collection
support.
					Suspend Offline collection upon
new
					command.
					Offline surface scan supported.
					Self-test supported.
					No Conveyance Self-test
supported.
					No Selective Self-test
supported.
SMART capabilities:            (0x0003)	Saves SMART data before entering
					power-saving mode.
					Supports SMART auto save timer.
Error logging capability:        (0x01)	Error logging supported.
					No General Purpose Logging
support.
Short self-test routine
recommended polling time: 	 (   2) minutes.
Extended self-test routine
recommended polling time: 	 (  54) minutes.

SMART Attributes Data Structure revision number: 1
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE
UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x0003   100   100   006    Pre-fail  Always
-       0
  3 Spin_Up_Time            0x0003   100   100   000    Pre-fail  Always
-       16
  4 Start_Stop_Count        0x0002   100   100   020    Old_age   Always
-       100
  5 Reallocated_Sector_Ct   0x0003   100   100   036    Pre-fail  Always
-       0
  9 Power_On_Hours          0x0003   100   100   000    Pre-fail  Always
-       1
 12 Power_Cycle_Count       0x0003   100   100   000    Pre-fail  Always
-       0
190 Airflow_Temperature_Cel 0x0003   069   069   050    Pre-fail  Always
-       31 (Min/Max 31/31)

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
No self-tests have been logged.  [To run self-tests, use: smartctl -t]

Selective Self-tests/Logging not supported

[root@8u]$

> 
> Is the NFS server the same machine as what the local XFS tests were
> run on?

Yes. It's also reproducible whening testing on remote NFS mounts.

> 
> > > NFSv4.2  rmapbt=1   24h+
> > 
> > <URK> Wow.  I wonder what about NFS makes us so slow now?  Synchronous
> > transactions on the inactivation?  (speculates wildly at the end of the
> > workday)
> 
> Doubt it - NFS server uses ->commit_metadata after the async
> operation to ensure that it is completed and on stable storage, so
> the truncate on inactivation should run at pretty much the same
> speed as on a local filesystem as it's still all async commits. i.e.
> the only difference on the NFS server is the log force that follows
> the inode inactivation...
> 
> > I'll have a look in the morning.  It might take me a while to remember
> > how to set up NFS42 :)
> > 
> > --D
> > 
> > > NFSv4.2  rmapbt=0   1h-2h
> > > xfs      rmapbt=1   10m+
> > > 
> > > At first I thought it hung, turns out it was just slow when deleting
> > > 2 massive reflined files.
> 
> Both tests run on the scratch device, so I don't see where there is
> a large file unlink in either of these tests.
> 
> In which case, I'd expect that all the time is consumed in
> generic/176 running punch_alternating to create a million extents
> as that will effectively run a synchronous server-side hole punch
> half a million times.

I've tracked this down. Time was consumed in "rm -rf" in _scratch_mkfs
of generic/176. Thread https://www.spinics.net/lists/fstests/msg13316.html

Thanks,
Murphy

> 
> However, I'm guessing that the server side filesystem has a very
> small log and is on spinning rust, hence the ->commit_metadata log
> forces are preventing in-memory aggregation of modifications. This
> results in the working set of metadata not fitting in the log and so
> each new hole punch transaction ends up waiting on log tail pushing
> (i.e. metadata writeback IO).  i.e. it's thrashing the disk, and
> that's why it is slow.....
> 
> Storage details, please!
> 
> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@xxxxxxxxxxxxx