Re: [PATCH] xfs: serialize iclog write of xlog_cil_push

Mark Tinguely <tinguely@xxxxxxx> · Mon, 07 Jan 2013 12:03:21 -0600

On 01/05/13 18:08, Dave Chinner wrote:
On Sat, Jan 05, 2013 at 02:34:15PM -0600, Mark Tinguely wrote:
The back-end of xlog_cil_push() allows multiple push sequences
to write to the xlog at the same time.

It does this by design, and has since day zero.

This will cause problems
for recovery and also could cause the xlog_cil_committed() callback
to be called out of sequence.

Log recovery is supposed to be able to handle it just fine in that
recovery only replays up to the last checkpoint with a valid commit
record. Checkpoints that don't have valid commit records - no matter
the order they are written - will terminate recovery at the LSN of
the lowest entire commit.

This was discovered with an EFI/EFD misorder. There are several (5) active
sequence pushes and 3 completed pushes. The callback for the sequence (2)
holding the EFD is being processed but the callback for the sequence (1)
holding the EFI has not yet been processed.

What are the symptoms shown when this problem is hit?  AFAICT, there
is no problem here - this comment above __xfs_efi_release() explains
that EFI/EFD misordering is expected:

/*
  * Freeing the efi requires that we remove it from the AIL if it has already
  * been placed there. However, the EFI may not yet have been placed in the AIL
  * when called by xfs_efi_release() from EFD processing due to the ordering of
  * committed vs unpin operations in bulk insert operations. Hence the
  * test_and_clear_bit(XFS_EFI_COMMITTED) to ensure only the last caller frees
  * the EFI.
  */

This was introduced in this commit:

$ gl -n 1 b199c8a4
commit b199c8a4ba11879df87daad496ceee41fdc6aa82
Author: Dave Chinner<dchinner@xxxxxxxxxx>
Date:   Mon Dec 20 11:59:49 2010 +1100

     xfs: Pull EFI/EFD handling out from under the AIL lock

     EFI/EFD interactions are protected from races by the AIL lock. They
     are the only type of log items that require the the AIL lock to
     serialise internal state, so they need to be separated from the AIL
     lock before we can do bulk insert operations on the AIL.

     To acheive this, convert the counter of the number of extents in the
     EFI to an atomic so it can be safely manipulated by EFD processing
     without locks. Also, convert the EFI state flag manipulations to use
     atomic bit operations so no locks are needed to record state
     changes. Finally, use the state bits to determine when it is safe to
     free the EFI and clean up the code to do this neatly.

     Signed-off-by: Dave Chinner<dchinner@xxxxxxxxxx>
     Reviewed-by: Christoph Hellwig<hch@xxxxxx>

Hence if there is some problem being seen as a result of allowing this
behaviour, I'd like to know what that problem actually is....

The filesystem is in forced shutdown with the following error:
xfs_trans_ail_delete_bulk: attempting to delete a log item that is not 
in the AIL.

The EFI is in the CIL sequence #1, the EFD is in sequence #2.

Sequence #2 is running the cil callback but sequence #1 has not yet run 
its callback.

Normally, xfs_efi_item_committed() sets the XFS_EFI_COMMITTED bit. 
Either the xfs_efi_item_unpin() or the xfs_efi_release() could happen 
first and the comparison in __xfs_efi_release() compensates for that.

But if the xfs_efi_item_committed() is not called before the 
xfs_efi_release() is called from the xfs_efd_item_committed(), then we 
will get the "not in the AIL" shutdown.

The xlog_cil_committed() callback misorder happens because the buffer that
contains the sequence ticket is filled by another sequence push and the
callback for the buffer write happens before the callback is placed onto
that buffer.

I'm not sure I follow you here.  xfs_log_done() takes a reference to
the iclog that the commit record is added to, and I/O cannot be
issued on that iclog until the reference count drops to zero. Hence
the sequence of writing the commit record, obtaining the commit_lsn,
adding the callbacks to the iclog and releasing the iclog are atomic
from an I/O perspective, and IO is only issued when the reference
count falls to zero.

And given that xlog_write() uses the same reference counting to
provide the same guarantees, I cannot see how concurrent in-memory
writes to the same iclog could cause IO completion callbacks to be
issued out of order.

I will look again at the ic_refcnt but it the callback from cil push 
sequence #2 has jumped in front of cil push #1.

This patch serializes the xlog_write() so that only one sequence (the lowest)
is written at a time. This will also stop the race between xlog_commit_record()
and the adding of the callback onto the buffer containing the sequence commit
record.

The point of the separation of the commit record from the
commit "data" is to allow interleaving of multiple transaction
commits without serialising the transaction commit process. It's
actually a significant performance win when dealing with synchronous
transaction heavy workloads as it allows multiple concurrent
synchronous transactions to aggregate into a single log buffer. From
that perspective alone, that's a NACK from me to to this patch...

Without knowing what the problem you are actually seeing is, I can't
make any further suggestions or comments about whether there is a
real issue here.

Cheers,

Dave.

--Mark.

_______________________________________________
xfs mailing list
xfs@xxxxxxxxxxx
http://oss.sgi.com/mailman/listinfo/xfs