Re: xlog_write: reservation ran out

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, May 04, 2017 at 10:34:35AM +1000, Dave Chinner wrote:
> On Wed, May 03, 2017 at 10:10:32AM -0400, Brian Foster wrote:
> > On Wed, May 03, 2017 at 09:51:52AM +1000, Dave Chinner wrote:
> > > On Mon, May 01, 2017 at 10:41:06AM -0700, Ming Lin wrote:
> > > > On 5/1/2017 6:12 AM, Brian Foster wrote:
> > > > > On Sun, Apr 30, 2017 at 11:10:15PM -0700, Ming Lin wrote:
> > > > >>
> > > > >> On 4/28/2017 1:56 PM, Ming Lin wrote:
> > > > >>> I'm new to xfs code.
> > > > >>>
> > > > >>> Search  XFS_TRANS_INACTIVE and the usage is like below,
> > > > >>>
> > > > >>> xfs_trans_alloc(mp, XFS_TRANS_INACTIVE);
> > > > >>> xfs_trans_reserve(tp, &M_RES(mp)->tr_itruncate, 0, 0);
> > > > >>>
> > > > >>> xfs_trans_alloc(mp, XFS_TRANS_INACTIVE);
> > > > >>> xfs_trans_reserve(tp, &M_RES(mp)->tr_ifree, XFS_IFREE_SPACE_RES(mp), 0);
> > > > >>>
> > > > >>> seems tr_remove is not related.
> > > > >>> I'll just try to enlarge the reservation for tr_itruncate and tr_ifree.
> > > > >>
> > > > >> Now things are a little bit more clear. I tried below debug patch.
> > > > >> The t_decrease[] array was used to track where the space was used.
> > > > >>
> > > > >>  fs/xfs/libxfs/xfs_trans_resv.c |  4 ++--
> > > > >>  fs/xfs/xfs_log.c               | 23 ++++++++++++++++++++---
> > > > >>  fs/xfs/xfs_log_cil.c           |  8 ++++++++
> > > > >>  fs/xfs/xfs_log_priv.h          |  3 +++
> > > > >>  fs/xfs/xfs_super.c             |  1 +
> > > > >>  5 files changed, 34 insertions(+), 5 deletions(-)
> > > > >>
> > > > > ...
> > > > >>  277 static void
> > > > >>  278 xlog_cil_insert_items(
> > > > >>  279         struct xlog             *log,
> > > > >>  280         struct xfs_trans        *tp)
> > > > >>  281 {
> > > > >>
> > > > >> ....
> > > > >>
> > > > >>  340         /* do we need space for more log record headers? */
> > > > >>  341         iclog_space = log->l_iclog_size - log->l_iclog_hsize;
> > > > >>  342         if (len > 0 && (ctx->space_used / iclog_space !=
> > > > >>  343                                 (ctx->space_used + len) / iclog_space)) {
> > > > >>  344                 int hdrs;
> > > > >>  345
> > > > >>  346                 hdrs = (len + iclog_space - 1) / iclog_space;
> > > > >>  347                 /* need to take into account split region headers, too */
> > > > >>  348                 hdrs *= log->l_iclog_hsize + sizeof(struct xlog_op_header);
> > > > >>  349                 ctx->ticket->t_unit_res += hdrs;
> > > > >>  350                 ctx->ticket->t_curr_res += hdrs;
> > > > >>  351                 tp->t_ticket->t_curr_res -= hdrs;
> > > > >>  352                 tp->t_ticket->t_decrease[6] = hdrs;
> > > > >>  353                 ASSERT(tp->t_ticket->t_curr_res >= len);
> > > > >>  354         }
> > > > >>  355         tp->t_ticket->t_curr_res -= len;
> > > > >>  356         tp->t_ticket->t_decrease[7] = len;
> > > > >>  357         ctx->space_used += len;
> > > > >>  358
> > > > >>  359         spin_unlock(&cil->xc_cil_lock);
> > > > >>  360 }
> > > > >>
> > > > >> Any idea why it used so many reservation space here?
> > > > >>
> > > > > 
> > > > > Nothing really rings a bell for me atm. Perhaps others might have ideas.
> > > > > That does appear to be a sizable overrun, as opposed to a few bytes that
> > > > > could more likely be attributed to rounding, header accounting issues or
> > > > > something of that nature.
> > > > 
> > > > FYI, here are some numbers.
> > > > 
> > > > The original "unit res" is 83024. I made it x2 larger, so now it's 166048
> > > > "unit res" - "current res" = the reservation space already used
> > > > 
> > > > XFS (nvme10n1p1): xlog_write: reservation summary:
> > > >   trans type  = INACTIVE (3)
> > > >   unit res    = 166048 bytes
> > > >   current res = 77088 bytes
> > > >   total reg   = 0 bytes (o/flow = 0 bytes)
> > > >   ophdrs      = 0 (ophdr space = 0 bytes)
> > > >   ophdr + reg = 0 bytes
> > > >   num regions = 0
> > > 
> > > So the problem here is that the ticket summary is being printed
> > > after everything useful has been removed from the ticket and moved
> > > to the CIL. i.e. for more useful debug, this overrun condition
> > > needs to be checked between the xlog_cil_insert_format_items() call
> > > and moving the items off the transaction in xfs_log_commit_cil().
> > > 
> > > That way xlog_print_tic_res() will actually tell us how many regions
> > > are being logged and give us a much better idea of the scope of
> > > changes logged in the transaction that lead to the overrun.
> > > 
> > 
> > Hmm.. interesting idea, but I'm not sure that just moving the ticket
> > summary prior to the CIL insertion tells us much. For one, this is a
> > transaction ticket and afaict the region count bits are only really used
> > for a CIL ticket, unless I'm missing something..?
> 
> Seems like it - I was talking about dumping the logged regions in the
> transaction log items, not the formatted log vectors passed to
> xlog_write() from the CIL.
> 

Ok. I wasn't suggesting the latter, fwiw. Just pointing out that afaict,
moving xlog_print_tic_res() without making further enhancements looked
like it would just change the "current res" value that ends up printed.

> We've got the log items and their vectors still attached to the
> transaction pointer after the call to xlog_cil_insert_format_items()
> (i.e. on the tp->t_items list), and we know what the formatted
> length is going to be and so we can directly check for overrun
> before doing anything else. i.e. something like:
> 

Indeed...

> ....
> 	xlog_cil_insert_format_items(log, tp, &len, &diff_iovecs);
> 
> 	/* account for space used by new iovec headers  */
> 	len += diff_iovecs * sizeof(xlog_op_header_t);
> 
> 	/* trivial overrun check */
> 	if (tp->t_ticket->t_curr_res - len < 0) {
> 		/* factor printing into helper */
> 
> 		xfs_warn("transaction reservation overrun <transaction info>")
> 		/* dump transaction/ticket info */
> 
> 		list_for_each_entry(lidp, &tp->t_items, lid_trans) {
> 			struct xfs_log_item     *lip = lidp->lid_item;
> 			struct xfs_log_iovec	*lv = lip->li_lv;
> 
> 			/* dump log item info */
> 			xfs_warn("Log item type, flags, ...");
> 
> 			/* dump lv array info */
> 
> 			/* dump formatted log vector regions */
> 			for (i = 0; i < lv->lv_niovecs; i++) {
> 				/* dump individual lv info */
> 				/* dump first 32 bytes */
> 			}
> 		}
> 	}
> 

Seems reasonable. I suppose if we do the check at this level rather than
in the caller, we could move the CIL insert code to the last step before
we drop ->xc_cil_lock and have an accurate account of the reservation
pulled off the transaction. E.g.:

{
	...
	tp->t_ticket->t_curr_res -= len;
	ctx->space_used += len;

	/* dump tp on overrun prior to item removal */
	if (tp->t_ticket->t_curr_res < 0)
		xlog_print_tic(tp)

	/* reposition to CIL ... */
	list_for_each_entry(lidp, &tp->t_items, lid_trans) {
		...
		list_move_tail(...);
	}
	spin_unlock(&cil->xc_cil_lock);
}

... where xlog_print_tic() can call xlog_print_tic_res() and then dump
the t_items list as above.

> > IOWs, it seems to me
> > that if we had such pre-CIL format overrun detection, we'd probably want
> > to add new code to dump information about what is attached to the
> > transaction so we can analyze what was logged against the actual
> > reservation.
> 
> Yes, that's exactly what I'm suggesting needs to be done.
> 

Ok, I'll play around with it. Thanks for the clarification.

Brian

> Keep in mind this is what xlog_print_tic_res() did prior to the
> introduction of delayed logging. i.e. I broke the detailed region
> logging of transaction overruns 7 years ago when changing it to
> provide detailed logging of CIL ticket reservation overruns, and
> this is the first time since then that we've needed transaction
> overrun debugging...
> 
> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@xxxxxxxxxxxxx
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux