Re: hanging nfsd requests on an RBD to NFS gateway

John-Paul Robinson <jpr@xxxxxxx> · Thu, 22 Oct 2015 22:19:05 -0500

A few clarifications on our experience:

* We have 200+ rbd images mounted on our RBD-NFS gateway.  (There's
nothing easier for a user to understand than "your disk is full".)

* I'd expect more contention potential with a single shared RBD back
end, but with many distinct and presumably isolated backend RBD images,
I've always been surprised that *all* the nfsd task hang.  This leads me
to think  it's an nfsd issue rather than and rbd issue.  (I realize this
is an rbd list, looking for shared experience. ;) )

* I haven't seen any difference between reads and writes.  Any access to
any backing RBD store from the NFS client hangs.

~jpr

On 10/22/2015 06:42 PM, Ryan Tokarek wrote:
>> On Oct 22, 2015, at 3:57 PM, John-Paul Robinson <jpr@xxxxxxx> wrote:
>>
>> Hi,
>>
>> Has anyone else experienced a problem with RBD-to-NFS gateways blocking
>> nfsd server requests when their ceph cluster has a placement group that
>> is not servicing I/O for some reason, eg. too few replicas or an osd
>> with slow request warnings?
> We have experienced exactly that kind of problem except that it sometimes happens even when ceph health reports "HEALTH_OK". This has been incredibly vexing for us. 
>
>
> If the cluster is unhealthy for some reason, then I'd expect your/our symptoms as writes can't be completed. 
>
> I'm guessing that you have file systems with barriers turned on. Whichever file system that has a barrier write stuck on the problem pg, will cause any other process trying to write anywhere in that FS also to block. This likely means a cascade of nfsd processes will block as they each try to service various client writes to that FS. Even though, theoretically, the rest of the "disk" (rbd) and other file systems might still be writable, the NFS processes will still be in uninterruptible sleep just because of that stuck write request (or such is my understanding). 
>
> Disabling barriers on the gateway machine might postpone the problem (never tried it and don't want to) until you hit your vm.dirty_bytes or vm.dirty_ratio thresholds, but it is dangerous as you could much more easily lose data. You'd be better off solving the underlying issues when they happen (too few replicas available or overloaded osds). 
>
>
> For us, even when the cluster reports itself as healthy, we sometimes have this problem. All nfsd processes block. sync blocks. echo 3 > /proc/sys/vm/drop_caches blocks. There is a persistent 4-8MB "Dirty" in /proc/meminfo. None of the osds log slow requests. Everything seems fine on the osds and mons. Neither CPU nor I/O load is extraordinary on the ceph nodes, but at least one file system on the gateway machine will stop accepting writes. 
>
> If we just wait, the situation resolves itself in 10 to 30 minutes. A forced reboot of the NFS gateway "solves" the performance problem, but is annoying and dangerous (we unmount all of the file systems that are still unmountable, but the stuck ones lead us to a sysrq-b). 
>
> This is on Scientific Linux 6.7 systems with elrepo 4.1.10 Kernels running Ceph Firefly (0.8.10) and XFS file systems exported over NFS and samba. 
>
> Ryan
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com