On Tue, Jun 15, 2021 at 11:37:49AM -0400, Brian Foster wrote: > On Tue, Jun 15, 2021 at 04:46:57PM +1000, Dave Chinner wrote: > > From: Dave Chinner <dchinner@xxxxxxxxxx> > > > > For the DEBUGS! > > > > Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx> > > --- > > fs/xfs/xfs_log.c | 18 ++++++++++++++++ > > fs/xfs/xfs_trace.h | 52 ++++++++++++++++++++++++++++++++++++++++++++++ > > 2 files changed, 70 insertions(+) > > > > diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c > > index e921b554b683..54fd6a695bb5 100644 > > --- a/fs/xfs/xfs_log.c > > +++ b/fs/xfs/xfs_log.c > > @@ -524,6 +524,7 @@ __xlog_state_release_iclog( > > iclog->ic_header.h_tail_lsn = cpu_to_be64(tail_lsn); > > xlog_verify_tail_lsn(log, iclog, tail_lsn); > > /* cycle incremented when incrementing curr_block */ > > + trace_xlog_iclog_syncing(iclog, _RET_IP_); > > return true; > > } > > > > @@ -543,6 +544,7 @@ xlog_state_release_iclog( > > { > > lockdep_assert_held(&log->l_icloglock); > > > > + trace_xlog_iclog_release(iclog, _RET_IP_); > > if (iclog->ic_state == XLOG_STATE_IOERROR) > > return -EIO; > > > > @@ -804,6 +806,7 @@ xlog_wait_on_iclog( > > { > > struct xlog *log = iclog->ic_log; > > > > + trace_xlog_iclog_wait_on(iclog, _RET_IP_); > > Seems like this might be more informative if we actually wait. This tells us the state at the time the call is made, so we know from that if we waited or not. Being able to see all calls to this function was instrumental in understanding the stiuation where we were waiting and when we weren't waiting. This is the iclog we got stuck waiting on: xlog_iclog_wait_on: dev 259:0 state 0x3 refcnt 0 offset 32256 lsn 0x2300004be0 caller xlog_cil_push_work The state is XLOG_STATE_DONE_SYNC, indicating IO had been completed, and the lsn tells me it is a future iclog we are waiting on, which tells us that it's waiting for a previous iclog to complete IO before it can move forward... > > if (!XLOG_FORCED_SHUTDOWN(log) && > > iclog->ic_state != XLOG_STATE_ACTIVE && > > iclog->ic_state != XLOG_STATE_DIRTY) { > > @@ -1804,6 +1807,7 @@ xlog_write_iclog( > > unsigned int count) > > { > > ASSERT(bno < log->l_logBBsize); > > + trace_xlog_iclog_write(iclog, _RET_IP_); > > > > /* > > * We lock the iclogbufs here so that we can serialise against I/O > > @@ -1950,6 +1954,7 @@ xlog_sync( > > unsigned int size; > > > > ASSERT(atomic_read(&iclog->ic_refcnt) == 0); > > + trace_xlog_iclog_sync(iclog, _RET_IP_); > > > > count = xlog_calc_iclog_size(log, iclog, &roundoff); > > > > @@ -2488,6 +2493,7 @@ xlog_state_activate_iclog( > > int *iclogs_changed) > > { > > ASSERT(list_empty_careful(&iclog->ic_callbacks)); > > + trace_xlog_iclog_activate(iclog, _RET_IP_); > > > > I wonder a bit whether these state change oriented tracepoints could be > served by a single trace_ixlog_iclog_state() or some such, since the > tracepoint already prints the state. That wouldn't show the before -> > state in each tracepoint, but that should reflect in the full log. I disagree. We lose a -lot- of context by getting rid of the per-call site trace names, especially those that get called from multiple places. Not to mention tracing an ACTIVE, idle iclog can result in zero state information except for the tracepoint name... e.g: 384.204393: xlog_iclog_write: dev 259:0 state 0x2 refcnt 0 offset 32256 lsn 0x2300004b60 caller xlog_sync 384.204406: xlog_iclog_sync_done: dev 259:0 state 0x2 refcnt 0 offset 32256 lsn 0x2300004b60 caller xlog_ioend_work 384.204407: xlog_iclog_get_space: dev 259:0 state 0x0 refcnt 1 offset 0 lsn 0x0 caller xlog_write_get_more_iclog_space 384.204408: xlog_iclog_switch: dev 259:0 state 0x0 refcnt 1 offset 0 lsn 0x2300004ba0 caller xlog_state_get_iclog_space 384.204418: xlog_iclog_release: dev 259:0 state 0x1 refcnt 1 offset 32256 lsn 0x2300004ba0 caller xlog_write_get_more_iclog_space 384.204419: xlog_iclog_syncing: dev 259:0 state 0x2 refcnt 0 offset 32256 lsn 0x2300004ba0 caller xlog_write_get_more_iclog_space 384.204419: xlog_iclog_sync: dev 259:0 state 0x2 refcnt 0 offset 32256 lsn 0x2300004ba0 caller xlog_state_release_iclog 384.204422: xlog_iclog_write: dev 259:0 state 0x2 refcnt 0 offset 32256 lsn 0x2300004ba0 caller xlog_sync 384.204435: xlog_iclog_sync_done: dev 259:0 state 0x2 refcnt 0 offset 32256 lsn 0x2300004ba0 caller xlog_ioend_work See how the xlog_iclog_get_space trace has zero state/lsn? That indicates that it has actually found an idle, active iclog in the correct state for it to hand out. We can also see this progression is through xlog_write() filling mulitple iclogs. However, we can get different progressions from different contexts such as: 384.193766: xlog_iclog_get_space: dev 259:0 state 0x0 refcnt 1 offset 0 lsn 0x0 caller xlog_write_get_more_iclog_space 384.193778: xlog_iclog_release: dev 259:0 state 0x0 refcnt 1 offset 27352 lsn 0x23000042b0 caller xlog_write 384.193779: xlog_iclog_get_space: dev 259:0 state 0x0 refcnt 1 offset 27352 lsn 0x23000042b0 caller xlog_write 384.193780: xlog_iclog_wait_on: dev 259:0 state 0x0 refcnt 0 offset 0 lsn 0x0 caller xlog_cil_push_work 384.193781: xlog_iclog_release: dev 259:0 state 0x0 refcnt 1 offset 27364 lsn 0x23000042b0 caller xlog_cil_push_work 384.201268: xlog_iclog_get_space: dev 259:0 state 0x0 refcnt 1 offset 27364 lsn 0x23000042b0 caller xlog_write .... the competion of a checkpoint where xlog_write() releases the iclog and it's not a full iclog. This is followed immediately by the commit record write which gets iclog space from a different caller location, passes it back to xlog_cil_push_work which then releases it. ANd then we try to wait on the previous iclog which is an idle ACTIVE buffer, so we skip the wait. And because this isn't a "stable CIL push", we don't see a xlog_iclog_switch() trace, and the so iclog is not flushed to disk when it is release here. IOWs, if we don't have the individual trace names, we largely lose all this context and then have to infer it from caller and state changes that occur. That just makes things unnnecessarily difficult when looking for problems in hundreds of thousands trace events.... > > @@ -3138,6 +3153,8 @@ xfs_log_force( > > if (iclog->ic_state == XLOG_STATE_IOERROR) > > goto out_error; > > > > + trace_xlog_iclog_force(iclog, _RET_IP_); > > + > > We have a log force tracepoint a few lines up. Do we need both? Yes. The log force tracepoint tells us nothing about the iclog state we are about to operate on. > > + TP_ARGS(iclog, caller_ip), > > + TP_STRUCT__entry( > > + __field(dev_t, dev) > > + __field(uint32_t, state) > > + __field(int32_t, refcount) > > + __field(uint32_t, offset) > > + __field(unsigned long long, lsn) > > + __field(unsigned long, caller_ip) > > + ), > > + TP_fast_assign( > > + __entry->dev = iclog->ic_log->l_mp->m_super->s_dev; > > + __entry->state = iclog->ic_state; > > + __entry->refcount = atomic_read(&iclog->ic_refcnt); > > + __entry->offset = iclog->ic_offset; > > + __entry->lsn = be64_to_cpu(iclog->ic_header.h_lsn); > > + __entry->caller_ip = caller_ip; > > + ), > > + TP_printk("dev %d:%d state 0x%x refcnt %d offset %u lsn 0x%llx caller %pS", > > + MAJOR(__entry->dev), MINOR(__entry->dev), > > + __entry->state, > > It might be nice to see a string representation of the state like we do > for other things like log item type, etc. Yup, I thought about that after I sent when looking at other code. Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx