Re: [PATCH 33/39] xfs: Add order IDs to log items in CIL

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Jun 02, 2021 at 05:49:14PM -0700, Darrick J. Wong wrote:
> On Thu, Jun 03, 2021 at 10:16:22AM +1000, Dave Chinner wrote:
> > On Thu, May 27, 2021 at 12:00:23PM -0700, Darrick J. Wong wrote:
> > > On Wed, May 19, 2021 at 10:13:11PM +1000, Dave Chinner wrote:
> > > > From: Dave Chinner <dchinner@xxxxxxxxxx>
> > > > 
> > > > Before we split the ordered CIL up into per cpu lists, we need a
> > > > mechanism to track the order of the items in the CIL. We need to do
> > > > this because there are rules around the order in which related items
> > > > must physically appear in the log even inside a single checkpoint
> > > > transaction.
> > > > 
> > > > An example of this is intents - an intent must appear in the log
> > > > before it's intent done record so taht log recovery can cancel the
> > > 
> > > s/taht/that/
> > > 
> > > > intent correctly. If we have these two records misordered in the
> > > > CIL, then they will not be recovered correctly by journal replay.
> > > > 
> > > > We also will not be able to move items to the tail of
> > > > the CIL list when they are relogged, hence the log items will need
> > > > some mechanism to allow the correct log item order to be recreated
> > > > before we write log items to the hournal.
> > > > 
> > > > Hence we need to have a mechanism for recording global order of
> > > > transactions in the log items  so that we can recover that order
> > > > from un-ordered per-cpu lists.
> > > > 
> > > > Do this with a simple monotonic increasing commit counter in the CIL
> > > > context. Each log item in the transaction gets stamped with the
> > > > current commit order ID before it is added to the CIL. If the item
> > > > is already in the CIL, leave it where it is instead of moving it to
> > > > the tail of the list and instead sort the list before we start the
> > > > push work.
> > > > 
> > > > XXX: list_sort() under the cil_ctx_lock held exclusive starts
> > > > hurting that >16 threads. Front end commits are waiting on the push
> > > > to switch contexts much longer. The item order id should likely be
> > > > moved into the logvecs when they are detacted from the items, then
> > > > the sort can be done on the logvec after the cil_ctx_lock has been
> > > > released. logvecs will need to use a list_head for this rather than
> > > > a single linked list like they do now....
> > > 
> > > ...which I guess happens in patch 35 now?
> > 
> > Right. I'll just remove this from the commit message.
> > 
> > > > @@ -780,6 +780,26 @@ xlog_cil_build_trans_hdr(
> > > >  	tic->t_curr_res -= lvhdr->lv_bytes;
> > > >  }
> > > >  
> > > > +/*
> > > > + * CIL item reordering compare function. We want to order in ascending ID order,
> > > > + * but we want to leave items with the same ID in the order they were added to
> > > 
> > > When do we have items with the same id?
> > 
> > All the items in a single transaction have the same id. The order id
> > increments before we tag all the items in the transaction and insert
> > them into the CIL.
> > 
> > > I guess that happens if we have multiple transactions adding items to
> > > the cil at the same time?  I guess that's not a big deal since each of
> > > those threads will hold a disjoint set of locks, so even if the order
> > > ids are the same for a bunch of items, they're never going to be
> > > touching the same AG/inode/metadata object, right?
> > >
> > > If that's correct, then:
> > > Reviewed-by: Darrick J. Wong <djwong@xxxxxxxxxx>
> > 
> > 
> > While true, it's not the way this works so I won't immediately
> > accept your RVB. The reason for not changing the ordering within a
> > single transaction is actually intent logging.  i.e. this:
> > 
> > > > + * the list. This is important for operations like reflink where we log 4 order
> > > > + * dependent intents in a single transaction when we overwrite an existing
> > > > + * shared extent with a new shared extent. i.e. BUI(unmap), CUI(drop),
> > > > + * CUI (inc), BUI(remap)...
> > 
> > There's a specific order of operations that recovery must run these
> > intents in, and so if we re-order them here in the CIL they'll be
> > out of order in the log and recovery will replay the intents in the
> > wrong order. Replaying the intents in the wrong order results in
> > corruption warnings and assert failures during log recovery, hence
> > the constraint of not re-ordering items within the same transaction.
> 
> <ding> lightbulb comes on.  I think I understood this better the last
> time I read all these patches. :/
> 
> Basically, for each item that can be attached to a transaction, you're
> assigning it an "order id" that is a monotonically increasing counter
> that (roughly) records the last time the item was committed.  Certain
> items (like inodes) can be relogged and committed multiple times in
> rapid fire succession, in which case the order_id will get bumped
> forward.

Effectively, yes.

> In the /next/ patch you'll change the cil item list to be per-cpu and
> only splice the mess together at cil push time.  For that to work
> properly, you have to re-sort that resulting list in commit order (aka
> the order_id) to keep the items in order of commit.
> 
> For items *within* a transaction, you take advantage of the property
> of list_sort that it won't reorder items with cmp(a, b) == 0, which
> means that all the intents logged to a transaction will maintain the
> same order that the author of higher level code wrote into the software.

Correct.

> Question: xlog_cil_push_work zeroes the order_id of pushed log items.
> Is there any potential problem here when ctx->order_id wraps around to
> zero?  I think the answer is that we'll move on to a new cil context
> long before we hit 2^32-1 transactions?

Yes. At the moment, the max transaction rate is about 800k/s, which
means it'd take a couple of hours to run 4 billion transactions. So
we're in no danger of overruning the number of transactions in a CIL
commit any time soon. And if we ever get near that, we can just bump
the counter to a 64 bit value...

Cheers,

Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx



[Index of Archives]     [XFS Filesystem Development (older mail)]     [Linux Filesystem Development]     [Linux Audio Users]     [Yosemite Trails]     [Linux Kernel]     [Linux RAID]     [Linux SCSI]


  Powered by Linux