On 5/1/2017 6:12 AM, Brian Foster wrote: > On Sun, Apr 30, 2017 at 11:10:15PM -0700, Ming Lin wrote: >> >> On 4/28/2017 1:56 PM, Ming Lin wrote: >>> I'm new to xfs code. >>> >>> Search XFS_TRANS_INACTIVE and the usage is like below, >>> >>> xfs_trans_alloc(mp, XFS_TRANS_INACTIVE); >>> xfs_trans_reserve(tp, &M_RES(mp)->tr_itruncate, 0, 0); >>> >>> xfs_trans_alloc(mp, XFS_TRANS_INACTIVE); >>> xfs_trans_reserve(tp, &M_RES(mp)->tr_ifree, XFS_IFREE_SPACE_RES(mp), 0); >>> >>> seems tr_remove is not related. >>> I'll just try to enlarge the reservation for tr_itruncate and tr_ifree. >> >> Now things are a little bit more clear. I tried below debug patch. >> The t_decrease[] array was used to track where the space was used. >> >> fs/xfs/libxfs/xfs_trans_resv.c | 4 ++-- >> fs/xfs/xfs_log.c | 23 ++++++++++++++++++++--- >> fs/xfs/xfs_log_cil.c | 8 ++++++++ >> fs/xfs/xfs_log_priv.h | 3 +++ >> fs/xfs/xfs_super.c | 1 + >> 5 files changed, 34 insertions(+), 5 deletions(-) >> > ... >> 277 static void >> 278 xlog_cil_insert_items( >> 279 struct xlog *log, >> 280 struct xfs_trans *tp) >> 281 { >> >> .... >> >> 340 /* do we need space for more log record headers? */ >> 341 iclog_space = log->l_iclog_size - log->l_iclog_hsize; >> 342 if (len > 0 && (ctx->space_used / iclog_space != >> 343 (ctx->space_used + len) / iclog_space)) { >> 344 int hdrs; >> 345 >> 346 hdrs = (len + iclog_space - 1) / iclog_space; >> 347 /* need to take into account split region headers, too */ >> 348 hdrs *= log->l_iclog_hsize + sizeof(struct xlog_op_header); >> 349 ctx->ticket->t_unit_res += hdrs; >> 350 ctx->ticket->t_curr_res += hdrs; >> 351 tp->t_ticket->t_curr_res -= hdrs; >> 352 tp->t_ticket->t_decrease[6] = hdrs; >> 353 ASSERT(tp->t_ticket->t_curr_res >= len); >> 354 } >> 355 tp->t_ticket->t_curr_res -= len; >> 356 tp->t_ticket->t_decrease[7] = len; >> 357 ctx->space_used += len; >> 358 >> 359 spin_unlock(&cil->xc_cil_lock); >> 360 } >> >> Any idea why it used so many reservation space here? >> > > Nothing really rings a bell for me atm. Perhaps others might have ideas. > That does appear to be a sizable overrun, as opposed to a few bytes that > could more likely be attributed to rounding, header accounting issues or > something of that nature. FYI, here are some numbers. The original "unit res" is 83024. I made it x2 larger, so now it's 166048 "unit res" - "current res" = the reservation space already used XFS (nvme10n1p1): xlog_write: reservation summary: trans type = INACTIVE (3) unit res = 166048 bytes current res = 77088 bytes total reg = 0 bytes (o/flow = 0 bytes) ophdrs = 0 (ophdr space = 0 bytes) ophdr + reg = 0 bytes num regions = 0 "already used" = 166048 - 77088 = 88960 overrun = 88960 - 83024 = 5936 XFS (nvme7n1p1): xlog_write: reservation summary: trans type = INACTIVE (3) unit res = 166048 bytes current res = 53444 bytes total reg = 0 bytes (o/flow = 0 bytes) ophdrs = 0 (ophdr space = 0 bytes) ophdr + reg = 0 bytes num regions = 0 "already used" = 166048 - 53444 = 112604 overrun = 112604 - 83024 = 29580 The overrun bytes seems a lot to me. > > The debug code doesn't really tell us much beyond that the transaction > required logging more data than it had reserved. In the snippet above, > len essentially refers to a byte total of what is logged across all of > the various items (inode, buffers, etc.) in the transaction. > > I'm assuming you can reproduce this often enough if you can capture It takes about 10 hours to reproduce the problem. > debug information. Have you tried to reproduce the actual transaction > overrun without using Ceph (i.e., create the fs using ceph as normal, > but run the object removal directly)? If you can do that, you could Not exactly same. But I did try just write the xfs fs with fio(64 threads) to 80% full, then remove the files, but can't reproduce it. > create an xfs_metadump of the populated fs, run a more simple reproducer > on that and that might make it easier to 1.) try newer distro and/or > upstream kernels to try and isolate where the problem exists and/or 2.) > share it so we can try to reproduce and narrow down where the overrun > seems to occur (particularly if this hasn't already been fixed > somewhere). I'll try to find a more simple reproducer. Thanks, Ming > > Brian > >> Thanks, >> Ming -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html