Viacheslav Dubeyko <Slava.Dubeyko@xxxxxxx> wrote: > > Firstly, note that there may be a bug in ceph writeback cleanup as it > > stands. > > It calls folio_detach_private() without holding the folio lock (it > > holds the > > writeback lock, but that's not sufficient by MM rules). This means > > you have a > > race between { setting ->private, setting PG_private and inc refcount > > } on one > > hand and { clearing ->private, clearing PG_private and dec refcount } > > on the > > other. > > > > I assume you imply ceph_invalidate_folio() method. Am I correct here? Actually, no, writepages_finish() is the culprit. ceph_invalidate_folio() is called with the folio locked and can freely wrangle folio->private. > > Secondly, there's a counter, ci->i_wrbuffer_ref, that might actually be > > redundant if we do it right as I_PINNING_NETFS_WB offers an alternative > > way we might do things. If we set this bit, ->write_inode() will be > > called with wbc->unpinned_netfs_wb set when all currently dirty pages have > > been cleaned up (see netfs_unpin_writeback()). netfslib currently uses > > this to pin the fscache objects but it could perhaps also be used to pin > > the writeback cap for ceph. > > Yeah, ci->i_wrbuffer_ref looks like not very reliable programming > pattern and if we can do it in other way, then it could be more safe > solution. However, this counter is used in multiple places of ceph > code. It needs to find a solution to get rid of this counter in safe > and easy way. > > > > > Thirdly, I was under the impression that, for any given page/folio, > > only the > > head snapshot could be altered - and that any older snapshot must be > > flushed > > before we could allow that. > > > > > > Fourthly, the ceph_snap_context struct holds a list of snaps. Does > > it really > > need to, or is just the most recent snap for which the folio holds > > changes > > sufficient? > > > > Let me dive into the implementation details. Maybe, Alex can share more > details here. Thanks. David