AW: Locking problems with Linux 4.9 and 4.11 with NFSD and `fs/iomap.c`

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Paul et al,

> Dear Dave,
>
>
> On 08/02/17 00:51, Dave Chinner wrote:
> > On Tue, Aug 01, 2017 at 07:49:50PM +0200, Paul Menzel wrote:
>
> >> On 06/27/17 13:59, Paul Menzel wrote:
> >>
> >>> Just a small update that we were hit by the problem on a different 
> >>> machine (identical model) with Linux 4.9.32 and the exact same 
> >>> symptoms.
> >>>
> >>> ```
> >>> $ sudo cat /proc/2085/stack
> >>> [<ffffffff811f920c>] iomap_write_begin+0x8c/0x120 
> >>> [<ffffffff811f982b>] iomap_zero_range_actor+0xeb/0x210 
> >>> [<ffffffff811f9a82>] iomap_apply+0xa2/0x110 [<ffffffff811f9c58>] 
> >>> iomap_zero_range+0x58/0x80 [<ffffffff8133c7de>] 
> >>> xfs_zero_eof+0x4e/0xb0 [<ffffffff8133c9dd>] 
> >>> xfs_file_aio_write_checks+0x19d/0x1c0
> >>> [<ffffffff8133ce89>] xfs_file_buffered_aio_write+0x79/0x2d0
> >>> [<ffffffff8133d17e>] xfs_file_write_iter+0x9e/0x150 
> >>> [<ffffffff81198dc0>] do_iter_readv_writev+0xa0/0xf0 
> >>> [<ffffffff81199fba>] do_readv_writev+0x18a/0x230 
> >>> [<ffffffff8119a2ac>] vfs_writev+0x3c/0x50 [<ffffffffffffffff>] 
> >>> 0xffffffffffffffff ```
> >>>
> >>> We haven’t had time to set up a test system yet to analyze that further.
> >>
> >> Today, two systems with Linux 4.9.23 exhibited the problem of `top` 
> >> showing that `nfsd` is at 100 %. Restarting one machine into Linux
> >> *4.9.38* showed the same problem. One of them with a 1 GBit/s network 
> >> device got traffic from a 10 GBit/s system, so the connection was 
> >> saturated.
> > 
> > So the question is this: is there IO being issued here, is the page 
> > cache growing, or is it in a tight loop doing nothing? Details of your 
> > hardware, XFS config and NFS server config is kinda important here, 
> > too.

Looking at 64 bit input for variable count of

iomap_zero_range_actor(struct inode *inode, loff_t pos, loff_t count,
		void *data, struct iomap *iomap)


and 32 bit internal variables

		unsigned offset, bytes;

culprit could be

		bytes = min_t(unsigned, PAGE_SIZE - offset, count);

Lets say you are trying to zero multiple of 4GB chunks. With bytes
evaluated towards 0 this will hit an endless loop within that iomap
function. That might explain your observation. If that is right a bugfix
would qualify for stable 4.8+

Best regards.

Markus

> Could you please guide me, where I can get the information you request?
>
> The hardware ranges from slow 12 thread systems with 96 GB RAM to 80 thread 1 TB RAM machines. Often big files (up to 100 GB) are written.
>
> > For example, if the NFS server IO patterns trigger a large speculative 
> > delayed allocation, then the client does a write at the end of the 
> > speculative delalloc range, we will zero the entire speculative 
> > delalloc range. That could be several GB of zeros that need to be 
> > written here. It's sub-optimal, yes, and but large zeroing is rare 
> > enough that we haven't needed to optimise it by allocating unwritten 
> > extents instead.  It would be really handy to know what application 
> > the NFS client is running as that might give insight into the trigger 
> > behaviour and whether you are hitting this case.
>
> It ranges from simple `cp` to scripts writing FASTQ files with biological sequences in it.
>
> > > Also, if the NFS client is only writing to one file, then all the 
> > other writes that are on the wire will end up being serviced by nfsd 
> > threads that then block waiting for the inode lock. If the client 
> > issues more writes on the wire than the NFS server has worker threads, 
> > the client side write will starve the NFS server of worker threads 
> > until the zeroing completes. This is the behaviour you are seeing - 
> > it's a common server side config error that's been known for at least 
> > 15 years...
> > 
> > FWIW, it used to be that a linux NFS client could have 16 concurrent 
> > outstanding NFS RPCs to a server at a time - I don't know if that 
> > limit still exists or whether it's been increased. However, the 
> > typical knfsd default is (still) only 8 worker threads, meaning a 
> > single client and server using default configs can cause the above 
> > server DOS issue. e.g on a bleeding edge debian distro install:
> > 
> > $ head -2 /etc/default/nfs-kernel-server # Number of servers to start 
> > up
> > RPCNFSDCOUNT=8
> > $
> > 
> > So, yeah, distros still only configure the nfs server with 8 worker 
> > thread by default. If it's a dedicated NFS server, then I'd be using 
> > somewhere around 64 NFSD threads *per CPU* as a starting point for the 
> > server config...
> > 
> > At minimum, you need to ensure that the NFS server has at least double 
> > the number of server threads as the largest client side concurrent RPC 
> > count so that a single client can't DOS the NFS server with a single 
> > blocked write stream.
>
> That’s not the issue here. It’s started with 64 threads here. Also this doesn’t explain, why it works with the 4.4 series. 
>
> The directory cannot be accessed at all. `ls /mounted/path` just hangs on remote systems.
>
>
> Kind regards,
>
> Paul

****************************************************************************
Diese E-Mail enthält vertrauliche und/oder rechtlich geschützte
Informationen. Wenn Sie nicht der richtige Adressat sind oder diese E-Mail
irrtümlich erhalten haben, informieren Sie bitte sofort den Absender und
vernichten Sie diese Mail. Das unerlaubte Kopieren sowie die unbefugte
Weitergabe dieser Mail ist nicht gestattet.

�ber das Internet versandte E-Mails können unter fremden Namen erstellt oder
manipuliert werden. Deshalb ist diese als E-Mail verschickte Nachricht keine
rechtsverbindliche Willenserklärung.

Collogia
Unternehmensberatung AG
Ubierring 11
D-50678 Köln

Vorstand:
Kadir Akin
Dr. Michael Höhnerbach

Vorsitzender des Aufsichtsrates:
Hans Kristian Langva

Registergericht: Amtsgericht Köln
Registernummer: HRB 52 497

This e-mail may contain confidential and/or privileged information. If you
are not the intended recipient (or have received this e-mail in error)
please notify the sender immediately and destroy this e-mail. Any
unauthorized copying, disclosure or distribution of the material in this
e-mail is strictly forbidden.

e-mails sent over the internet may have been written under a wrong name or
been manipulated. That is why this message sent as an e-mail is not a
legally binding declaration of intention.

Collogia
Unternehmensberatung AG
Ubierring 11
D-50678 Köln

executive board:
Kadir Akin
Dr. Michael Höhnerbach

President of the supervisory board:
Hans Kristian Langva

Registry office: district court Cologne
Register number: HRB 52 497

****************************************************************************

[Index of Archives]     [XFS Filesystem Development (older mail)]     [Linux Filesystem Development]     [Linux Audio Users]     [Yosemite Trails]     [Linux Kernel]     [Linux RAID]     [Linux SCSI]


  Powered by Linux