On Wed, 2024-12-18 at 19:48 +0000, David Howells wrote: > Viacheslav Dubeyko <Slava.Dubeyko@xxxxxxx> wrote: > > > <skipped> > > > > > > > > > > > Thirdly, I was under the impression that, for any given > > > page/folio, > > > only the > > > head snapshot could be altered - and that any older snapshot must > > > be > > > flushed > > > before we could allow that. > > > > > > As far as I can see, ceph_dirty_folio() attaches [1] to folio->private a pointer on struct ceph_snap_context. So, it sounds that folio could not have any associated snapshot context until it will be marked as dirty. Oppositely, ceph_invalidate_folio() detaches [2] the ceph_snap_context from folio->private, or writepage_nounlock() detaches [3] the ceph_snap_context from page, or writepages_finish() detaches [4] the ceph_snap_context from page. So, technically speaking, folio/page should have the associated snapshot context only in dirty state. The struct ceph_snap_context represents a set of existing snapshots: struct ceph_snap_context { refcount_t nref; u64 seq; u32 num_snaps; u64 snaps[]; }; The snapshot context is prepared by build_snap_context() and the set of existing snapshots include: (1) parent inode's snapshots [5], (2) inode's snapshots [6], (3) prior parent snapshots [7]. * When a snapshot is taken (that is, when the client receives * notification that a snapshot was taken), each inode with caps and * with dirty pages (dirty pages implies there is a cap) gets a new * ceph_cap_snap in the i_cap_snaps list (which is sorted in ascending * order, new snaps go to the tail). So, ceph_dirty_folio() takes the latest ceph_cap_snap: if (__ceph_have_pending_cap_snap(ci)) { struct ceph_cap_snap *capsnap = list_last_entry(&ci->i_cap_snaps, struct ceph_cap_snap, ci_item); snapc = ceph_get_snap_context(capsnap->context); capsnap->dirty_pages++; } else { BUG_ON(!ci->i_head_snapc); snapc = ceph_get_snap_context(ci->i_head_snapc); ++ci->i_wrbuffer_ref_head; } * On writeback, we must submit writes to the osd IN SNAP ORDER. So, * we look for the first capsnap in i_cap_snaps and write out pages in * that snap context _only_. Then we move on to the next capsnap, * eventually reaching the "live" or "head" context (i.e., pages that * are not yet snapped) and are writing the most recently dirtied * pages For example, writepage_nounlock() executes such logic [8]: oldest = get_oldest_context(inode, &ceph_wbc, snapc); if (snapc->seq > oldest->seq) { doutc(cl, "%llx.%llx page %p snapc %p not writeable - noop\n", ceph_vinop(inode), page, snapc); /* we should only noop if called by kswapd */ WARN_ON(!(current->flags & PF_MEMALLOC)); ceph_put_snap_context(oldest); redirty_page_for_writepage(wbc, page); return 0; } ceph_put_snap_context(oldest); So, we should flush all dirty pages/folios in the snapshots order. But I am not sure that we modify a snapshot by making pages/folios dirty. I think we simply adding capsnap in the list and making a new snapshot context in the case of new snapshot creation. > > > Fourthly, the ceph_snap_context struct holds a list of snaps. > > > Does > > > it really > > > need to, or is just the most recent snap for which the folio > > > holds > > > changes > > > sufficient? > > > > > > > As far as I can see, the main goal of ceph_snap_context is the accounting of all snapshots that has particular inode and all its parents. And all these guys could have dirty pages. So, the responsibility of of ceph_snap_context is to flush dirty folios/pages with the goal to flush it in snapshots order for all inodes in the hierarchy. I could miss some details. :) But I hope the answer could help. Thanks, Slava. [1] https://elixir.bootlin.com/linux/v6.13-rc3/source/fs/ceph/addr.c#L127 [2] https://elixir.bootlin.com/linux/v6.13-rc3/source/fs/ceph/addr.c#L157 [3] https://elixir.bootlin.com/linux/v6.13-rc3/source/fs/ceph/addr.c#L800 [4] https://elixir.bootlin.com/linux/v6.13-rc3/source/fs/ceph/addr.c#L911 [5] https://elixir.bootlin.com/linux/v6.13-rc3/source/fs/ceph/snap.c#L391 [6] https://elixir.bootlin.com/linux/v6.13-rc3/source/fs/ceph/snap.c#L399 [7] https://elixir.bootlin.com/linux/v6.13-rc3/source/fs/ceph/snap.c#L402 [8] https://elixir.bootlin.com/linux/v6.13-rc3/source/fs/ceph/addr.c#L695