On Thu, Apr 2, 2015 at 8:41 AM, Mel Gorman <mgorman@xxxxxxx> wrote: > On Thu, Apr 02, 2015 at 02:40:19AM +0300, Ilya Dryomov wrote: >> On Thu, Apr 2, 2015 at 2:03 AM, Mel Gorman <mgorman@xxxxxxx> wrote: >> > On Wed, Apr 01, 2015 at 08:19:20PM +0300, Ilya Dryomov wrote: >> >> Following nbd and iscsi, commit 89baaa570ab0 ("libceph: use memalloc >> >> flags for net IO") set SOCK_MEMALLOC and PF_MEMALLOC flags for rbd and >> >> cephfs. However it turned out to not play nice with loopback scenario, >> >> leading to lockups with a full socket send-q and empty recv-q. >> >> >> >> While we always advised against colocating kernel client and ceph >> >> servers on the same box, a few people are doing it and it's also useful >> >> for light development testing, so rather than reverting make sure to >> >> not set those flags in the loopback case. >> >> >> > >> > This does not clarify why the non-loopback case needs access to pfmemalloc >> > reserves. Granted, I've spent zero time on this but it's really unclear >> > what problem was originally tried to be solved and why dirty page limiting >> > was insufficient. Swap over NFS was always a very special case minimally >> > because it's immune to dirty page throttling. >> >> I don't think there was any particular problem tried to be solved, > > Then please go back and look at why dirty page limiting is insufficient > for ceph. > >> certainly not one we hit and fixed with 89baaa570ab0. Mike is out this >> week, but I'm pretty sure he said he copied this for iscsi from nbd >> because you nudged him to (and you yourself did this for nbd as part of >> swap-over-NFS series). > > In http://thread.gmane.org/gmane.comp.file-systems.ceph.devel/23708 I > stated that if ceph insisted on using using nbd as justification for ceph > using __GFP_MEMALLOC that it was preferred that nbd be broken instead. In > commit 7f338fe4540b1d0600b02314c7d885fd358e9eca, the use case in mind was > the swap-over-nbd case and I regret I didn't have userspace explicitly > tell the kernel that NBD was being used as a swap device. OK, it all starts to make sense now. So ideally nbd would only use __GFP_MEMALLOC if nbd-client was invoked with -swap - you just didn't implement that. I guess I should have gone deeper into the history of your nbd patch when Mike cited it as a reason he did this for ceph. I think ceph is fine with dirty page limiting in general, so it's only if we wanted to support swap-over-rbd (cephfs is a bit of a weak link currently, so I'm not going there) would we need to enable SOCK_MEMALLOC/PF_MEMALLOC and only for that ceph_client instance. Sounds like that will require a "swap" libceph option, which will also implicitly enable "noshare" to make sure __GFP_MEMALLOC ceph_client is not shared with anything else - luckily we don't have a userspace process a la nbd-client we need to worry about. Thanks, Ilya -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html