On Thu 30-03-17 17:06:51, Ilya Dryomov wrote: [...] > > But if the allocation is stuck then the holder of the lock cannot make > > a forward progress and it is effectivelly deadlocked because other IO > > depends on the lock it holds. Maybe I just ask bad questions but what > > Only I/O to the same OSD. A typical ceph cluster has dozens of OSDs, > so there is plenty of room for other in-flight I/Os to finish and move > the allocator forward. The lock in question is per-ceph_connection > (read: per-OSD). > > > makes GFP_NOIO different from GFP_KERNEL here. We know that the later > > might need to wait for an IO to finish in the shrinker but it itself > > doesn't get the lock in question directly. The former depends on the > > allocator forward progress as well and that in turn wait for somebody > > else to proceed with the IO. So to me any blocking allocation while > > holding a lock which blocks further IO to complete is simply broken. > > Right, with GFP_NOIO we simply wait -- there is nothing wrong with > a blocking allocation, at least in the general case. With GFP_KERNEL > we deadlock, either in rbd/libceph (less likely) or in the filesystem > above (more likely, shown in the xfs_reclaim_inodes_ag() traces you > omitted in your quote). I am not convinced. It seems you are relying on something that is not guaranteed fundamentally. AFAIU all the IO paths should _guarantee_ and use mempools for that purpose if they need to allocate. But, hey, I will not argue as my understanding of ceph is close to zero. You are the maintainer so it is your call. I would just really appreciate if you could document this as much as possible (ideally at the place where you call memalloc_noio_save and describe the lock dependency there). Thanks! -- Michal Hocko SUSE Labs