On 01/07/2013 01:05 PM, Sage Weil wrote: > On Sun, 6 Jan 2013, Yan, Zheng wrote: >> On 01/06/2013 02:00 PM, Sage Weil wrote: >>> On Fri, 4 Jan 2013, Yan, Zheng wrote: >>>> From: "Yan, Zheng" <zheng.z.yan@xxxxxxxxx> >>>> >>>> In commit 22cddde104, ceph_get_caps() was moved into ceph_write_begin(). >>>> So ceph_get_caps() can be called while i_mutex is locked. If there is >>>> pending vmtruncate, ceph_get_caps() will wait for it to complete, but >>>> ceph_vmtruncate_work() is blocked by the locked i_mutex. >>> >>> Hmm... :/ >>> >>>> There are several places that call __ceph_do_pending_vmtruncate() >>>> without holding the i_mutex, I think it's OK to not acquire the i_mutex >>>> in ceph_vmtruncate_work() >>> >>> The intention was that that function woudl only be called under i_mutex. >>> I did a quick look through the callers and that appears to be the case >>> (for things llseek and setattr, the vfs should be taking i_mutex). >> >> both ceph_aio_read() and ceph_aio_write() call __ceph_do_pending_vmtruncate() >> without holding the i_mutex > > Hrm.. that's now how I remember it working. I'm pretty sure i_mutex was > providing the serialization around truncation. > >>> IIRC, this is to serialize the page cache truncation with truncate >>> operations; this work can only be sanely done under i_mutex, so we defer >>> it to the work queue or next person who takes i_mutex and cares about the >>> mapping and i_size being consistent. >>> >>> What was the deadlock you observed? >> >> generic_file_aio_write() locks the i_mutex, then indirectly calls ceph_get_caps() >> through ceph_write_begin(). ceph_get_caps() wait for pending vmtruncate to complete, >> but the work queue is blocked by the i_mutex. > > ...but it seems clear that that's not a workable solution. I suspect the > right solution here is to introduce an inner i_trunc_mutex in the ceph > inode and use that to protect truncation events in the page cache. That > will touch a lot of different code paths, unfortunately, but I think > something like that is necessary if we need to block with i_mutex held by > the VFS. > Maybe ceph_vmtruncate_work() can wait until no one is using write cap. I think it's safe to truncate page cache in that case. Regards Yan, Zheng -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html