Re: hanging nfsd requests on an RBD to NFS gateway

Ryan Tokarek <tokarek@xxxxxxxxxxx> · Thu, 22 Oct 2015 18:42:36 -0500

> On Oct 22, 2015, at 3:57 PM, John-Paul Robinson <jpr@xxxxxxx> wrote:
> 
> Hi,
> 
> Has anyone else experienced a problem with RBD-to-NFS gateways blocking
> nfsd server requests when their ceph cluster has a placement group that
> is not servicing I/O for some reason, eg. too few replicas or an osd
> with slow request warnings?

We have experienced exactly that kind of problem except that it sometimes happens even when ceph health reports "HEALTH_OK". This has been incredibly vexing for us. 

If the cluster is unhealthy for some reason, then I'd expect your/our symptoms as writes can't be completed. 

I'm guessing that you have file systems with barriers turned on. Whichever file system that has a barrier write stuck on the problem pg, will cause any other process trying to write anywhere in that FS also to block. This likely means a cascade of nfsd processes will block as they each try to service various client writes to that FS. Even though, theoretically, the rest of the "disk" (rbd) and other file systems might still be writable, the NFS processes will still be in uninterruptible sleep just because of that stuck write request (or such is my understanding). 

Disabling barriers on the gateway machine might postpone the problem (never tried it and don't want to) until you hit your vm.dirty_bytes or vm.dirty_ratio thresholds, but it is dangerous as you could much more easily lose data. You'd be better off solving the underlying issues when they happen (too few replicas available or overloaded osds). 

For us, even when the cluster reports itself as healthy, we sometimes have this problem. All nfsd processes block. sync blocks. echo 3 > /proc/sys/vm/drop_caches blocks. There is a persistent 4-8MB "Dirty" in /proc/meminfo. None of the osds log slow requests. Everything seems fine on the osds and mons. Neither CPU nor I/O load is extraordinary on the ceph nodes, but at least one file system on the gateway machine will stop accepting writes. 

If we just wait, the situation resolves itself in 10 to 30 minutes. A forced reboot of the NFS gateway "solves" the performance problem, but is annoying and dangerous (we unmount all of the file systems that are still unmountable, but the stuck ones lead us to a sysrq-b). 

This is on Scientific Linux 6.7 systems with elrepo 4.1.10 Kernels running Ceph Firefly (0.8.10) and XFS file systems exported over NFS and samba. 

Ryan
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com