RE: Ceph and Netfslib

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, 2024-12-18 at 19:48 +0000, David Howells wrote:
> Viacheslav Dubeyko <Slava.Dubeyko@xxxxxxx> wrote:
> > > 
<skipped>

> > > 
> > 
> > > 
> > > Thirdly, I was under the impression that, for any given
> > > page/folio,
> > > only the
> > > head snapshot could be altered - and that any older snapshot must
> > > be
> > > flushed
> > > before we could allow that.
> > > 
> > > 

As far as I can see, ceph_dirty_folio() attaches [1] to folio->private
a pointer on struct ceph_snap_context. So, it sounds that folio could
not have any associated snapshot context until it will be marked as
dirty.

Oppositely, ceph_invalidate_folio() detaches [2] the ceph_snap_context
from folio->private, or writepage_nounlock() detaches [3] the
ceph_snap_context from page, or writepages_finish() detaches [4] the
ceph_snap_context from page. So, technically speaking, folio/page
should have the associated snapshot context only in dirty state.

The struct ceph_snap_context represents a set of existing snapshots:

struct ceph_snap_context {
	refcount_t nref;
	u64 seq;
	u32 num_snaps;
	u64 snaps[];
};

The snapshot context is prepared by build_snap_context() and the set of
existing snapshots include: (1) parent inode's snapshots [5], (2)
inode's snapshots [6], (3) prior parent snapshots [7].


 * When a snapshot is taken (that is, when the client receives
 * notification that a snapshot was taken), each inode with caps and
 * with dirty pages (dirty pages implies there is a cap) gets a new
 * ceph_cap_snap in the i_cap_snaps list (which is sorted in ascending
 * order, new snaps go to the tail).

So, ceph_dirty_folio() takes the latest ceph_cap_snap:

	if (__ceph_have_pending_cap_snap(ci)) {
		struct ceph_cap_snap *capsnap =
				list_last_entry(&ci->i_cap_snaps,
						struct ceph_cap_snap,
						ci_item);
		snapc = ceph_get_snap_context(capsnap->context);
		capsnap->dirty_pages++;
	} else {
		BUG_ON(!ci->i_head_snapc);
		snapc = ceph_get_snap_context(ci->i_head_snapc);
		++ci->i_wrbuffer_ref_head;
	}


 * On writeback, we must submit writes to the osd IN SNAP ORDER.  So,
 * we look for the first capsnap in i_cap_snaps and write out pages in
 * that snap context _only_.  Then we move on to the next capsnap,
 * eventually reaching the "live" or "head" context (i.e., pages that
 * are not yet snapped) and are writing the most recently dirtied
 * pages

For example, writepage_nounlock() executes such logic [8]:

	oldest = get_oldest_context(inode, &ceph_wbc, snapc);
	if (snapc->seq > oldest->seq) {
		doutc(cl, "%llx.%llx page %p snapc %p not writeable -
noop\n",
		      ceph_vinop(inode), page, snapc);
		/* we should only noop if called by kswapd */
		WARN_ON(!(current->flags & PF_MEMALLOC));
		ceph_put_snap_context(oldest);
		redirty_page_for_writepage(wbc, page);
		return 0;
	}
	ceph_put_snap_context(oldest);

So, we should flush all dirty pages/folios in the snapshots order. But
I am not sure that we modify a snapshot by making pages/folios dirty. I
think we simply adding capsnap in the list and making a new snapshot
context in the case of new snapshot creation.


> > > Fourthly, the ceph_snap_context struct holds a list of snaps. 
> > > Does
> > > it really
> > > need to, or is just the most recent snap for which the folio
> > > holds
> > > changes
> > > sufficient?
> > > 
> > 
> > 

As far as I can see, the main goal of ceph_snap_context is the
accounting of all snapshots that has particular inode and all its
parents. And all these guys could have dirty pages. So, the
responsibility of of ceph_snap_context is to flush dirty folios/pages
with the goal to flush it in snapshots order for all inodes in the
hierarchy.


I could miss some details. :) But I hope the answer could help.

Thanks,
Slava.

[1]
https://elixir.bootlin.com/linux/v6.13-rc3/source/fs/ceph/addr.c#L127
[2]
https://elixir.bootlin.com/linux/v6.13-rc3/source/fs/ceph/addr.c#L157
[3]
https://elixir.bootlin.com/linux/v6.13-rc3/source/fs/ceph/addr.c#L800
[4]
https://elixir.bootlin.com/linux/v6.13-rc3/source/fs/ceph/addr.c#L911
[5]
https://elixir.bootlin.com/linux/v6.13-rc3/source/fs/ceph/snap.c#L391
[6]
https://elixir.bootlin.com/linux/v6.13-rc3/source/fs/ceph/snap.c#L399
[7]
https://elixir.bootlin.com/linux/v6.13-rc3/source/fs/ceph/snap.c#L402
[8]
https://elixir.bootlin.com/linux/v6.13-rc3/source/fs/ceph/addr.c#L695






[Index of Archives]     [CEPH Users]     [Ceph Large]     [Ceph Dev]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux