A few clarifications on our experience: * We have 200+ rbd images mounted on our RBD-NFS gateway. (There's nothing easier for a user to understand than "your disk is full".) * I'd expect more contention potential with a single shared RBD back end, but with many distinct and presumably isolated backend RBD images, I've always been surprised that *all* the nfsd task hang. This leads me to think it's an nfsd issue rather than and rbd issue. (I realize this is an rbd list, looking for shared experience. ;) ) * I haven't seen any difference between reads and writes. Any access to any backing RBD store from the NFS client hangs. ~jpr On 10/22/2015 06:42 PM, Ryan Tokarek wrote: >> On Oct 22, 2015, at 3:57 PM, John-Paul Robinson <jpr@xxxxxxx> wrote: >> >> Hi, >> >> Has anyone else experienced a problem with RBD-to-NFS gateways blocking >> nfsd server requests when their ceph cluster has a placement group that >> is not servicing I/O for some reason, eg. too few replicas or an osd >> with slow request warnings? > We have experienced exactly that kind of problem except that it sometimes happens even when ceph health reports "HEALTH_OK". This has been incredibly vexing for us. > > > If the cluster is unhealthy for some reason, then I'd expect your/our symptoms as writes can't be completed. > > I'm guessing that you have file systems with barriers turned on. Whichever file system that has a barrier write stuck on the problem pg, will cause any other process trying to write anywhere in that FS also to block. This likely means a cascade of nfsd processes will block as they each try to service various client writes to that FS. Even though, theoretically, the rest of the "disk" (rbd) and other file systems might still be writable, the NFS processes will still be in uninterruptible sleep just because of that stuck write request (or such is my understanding). > > Disabling barriers on the gateway machine might postpone the problem (never tried it and don't want to) until you hit your vm.dirty_bytes or vm.dirty_ratio thresholds, but it is dangerous as you could much more easily lose data. You'd be better off solving the underlying issues when they happen (too few replicas available or overloaded osds). > > > For us, even when the cluster reports itself as healthy, we sometimes have this problem. All nfsd processes block. sync blocks. echo 3 > /proc/sys/vm/drop_caches blocks. There is a persistent 4-8MB "Dirty" in /proc/meminfo. None of the osds log slow requests. Everything seems fine on the osds and mons. Neither CPU nor I/O load is extraordinary on the ceph nodes, but at least one file system on the gateway machine will stop accepting writes. > > If we just wait, the situation resolves itself in 10 to 30 minutes. A forced reboot of the NFS gateway "solves" the performance problem, but is annoying and dangerous (we unmount all of the file systems that are still unmountable, but the stuck ones lead us to a sysrq-b). > > This is on Scientific Linux 6.7 systems with elrepo 4.1.10 Kernels running Ceph Firefly (0.8.10) and XFS file systems exported over NFS and samba. > > Ryan > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com