Re: Still seeing hangs in xlog_grant_log_space

Mark Tinguely <tinguely@xxxxxxx> · Fri, 18 May 2012 09:42:37 -0500

On 05/18/12 05:10, Dave Chinner wrote:

On Wed, May 16, 2012 at 04:29:01PM -0500, Mark Tinguely wrote:

What I know so far:
I have log cleaner kicker added to xlog_grant_head_wake(). This

The reserve space waiter list has perl scripts and the flusher.


What's a "log cleaner"? I've never heard that term before use for
XFS, so I can only assume you are talking about waking the xfsaild.
If that's the case, can you just say "pushing the AIL" rather
than making up new terminology?

You are right. I borrowed the term from the comments in the code to 
"kick the dirty buffers out".


kicker at best would prevent waiting for the next sync before
starting the log cleaner.

Can you post the patch?

Sure. This test has a small log, the transactions are large proportional 
to the log, and the processes are related. This results in the perl 
scripts quickly filling the log but the processes in the test become 
blocked. With no transactions coming in to push the cleaning of the AIL, 
the IO stalls until the next sync. After the sync, the backlog of 
waiters will quickly fill the log and everything stalls again until the 
next sync. We knew this could happen theory, but logs are generally not 
this small to see it in reality. The patch just keeps cleaning while 
there are waiters for log space. If another transaction has already 
started the cleaning process, this new request to clean is ignored.


I have one machine that has been running for 2 days without hanging.
Actually, now I would prefer it to hurry up and hang.


The machine running the test for 3+ day finally hung. It has the same 
pattern as the other hangs: 200-250K of the reserved space left, an 
empty AIL and small amount of reserved space on the CIL.

Here is what see on the machine that is hung:

A few processes (4-5) are hung waiting to get space on the log.
There isn't enough free space on the log for the first transaction
and it waits. All other processes will have to wait behind the first
process. 251,811 bytes of the original 589,842 bytes should still be
free (if my hand free space calculations are correct).

The AIL is empty.

So the CIL has consumed ~50% of the log, and the background flusher
has not triggered? Why didn't the background CIL flush fire?

There is nothing to clean. Any new transaction at
this point will kick the cleaner, and it still can't start the first
waiter, so it joins the wait list.

Assuming you mean the cleaner is the xfsaild, then if the AIL is
empty, waking the xfsaild will do nothing because there is nothing
for it to act on.

Yes, the AIL is empty.


The only XFS traffic at this point is inode reclaim worker. This is
to be expected.

The CIL has entries, nothing is waiting on the CIL.
xc_current_sequence = 117511 xc_push_seq = 117510. So there is
nothing for the CIL worker to do.

It means that we're stalled on the CIL, not on the AIL. The question
is, as I asked above, why didn't the background flusher fire when
the last transaction completed and saw the CIL over the hard flush
threshold?

If you dump the CIL context structure, you can find out how much
space has been consumed by the items queued on the CIL.

The CIL seems to have only 30-70K reserved (depending on which hang you
look at). it does not meet the XLOG_CIL_SPACE_LIMIT for the background
worker to push the CIL.

Since this does not match up with the approx 300-350K that is reported 
to be reserved, I traversed the log_item list to see if that CIL amount 
seemed reasonable, and it does.


117511 is the largest sequence number that I have found so far in
the xfs_log_item list. There are a few entries with smaller sequence
numbers and the following strange entry:

77th entry in the xfs_log_item list:

crash>  struct xfs_log_item ffff88083222b5b8
struct xfs_log_item {
   li_ail = {
     next = 0xffff88083222b5b0,

that points 8 bytes backwards.

     prev = 0x0

And that indicates that this was never part of a log item as the
list head is always initialised.

   },
   li_lsn = 0,

never committed

   li_desc = 0x9f5d9f5d,

that's supposed to be a pointer

   li_mountp = 0xffff88083116e300,
   li_ailp = 0x0,

Never initialised as a log item.

   li_type = 0,

Impossible.

   li_flags = 0,
   li_bio_list = 0x0,
   li_cb = 0,
   li_ops = 0xffff88083105de00,
   li_cil = {
     next = 0xffff880832ad9f08,
     prev = 0xffff880831751448
   },
   li_lv = 0xc788c788,

supposed to be a pointer

   li_seq = -131906182637504

That's a pointer: 0xFFFF880832D71840

So this looks like it some form of memory corruption - either a use
after free, the memory has been been overwritten or the list has
been pointed off into lala land.

Everything in this entry is bad except the li_cil.next and
li_cil.prev. It looks like li_ail.next is really part of a list that
starts at 0xffff88083222b5b0. The best explanation is a junk
addresses was inserted into the li_cil chain.

It seems unlikely, but if you turn on kmemleak it might find a
memory leak or overwrite that is causing this.


This is a single data point which could be anything including bad
hardware. I will continue to traverse this list until I can get the
other box to hang. If someone want to traverse their xfs_log_item
list ...

Given how little it looks like a log item, I'm not sure you can
follow those pointers - do they even link up with other log items?

Yes, this was the second to last entry. The next/previous links were 
correct, but everything else in this structure is bad. That is why I 
suspect the wrong address was inserted into the log_item list.

I came to the same conclusion that this does not explain the problem, 
the other hangs do not have a corrupted entry. As strange as this entry 
is, I decided to ignore this entry and go back and try to account for 
all the reserved space.


Still, this doesn't explain the hang at all - the CIL forms a new
list every time a checkpoint occurs, and this corruption would cause
a crash trying to walk the li_lv list when pushed. So it comes back
to why hasn't the CIL been pushed? what does the CIL context
structure look like?

The CIL context on the machine that was running 3+ days before hanging.

struct xfs_cil_ctx {
  cil = 0xffff88034a8c5240,
  sequence = 1241833,
  start_lsn = 0,
  commit_lsn = 0,
  ticket = 0xffff88034e0ebc08,
  nvecs = 237,
  space_used = 39964,
  busy_extents = {
    next = 0xffff88034b287958,
    prev = 0xffff88034d10c698
  },
  lv_chain = 0x0,
  log_cb = {
    cb_next = 0x0,
    cb_func = 0,
    cb_arg = 0x0
  },
  committing = {
    next = 0xffff88034c84d120,
    prev = 0xffff88034c84d120
  }
}


--Mark Tinguely.


Start the cleaning of the log when still full after last clean.
---
 fs/xfs/xfs_log.c |    4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

Index: b/fs/xfs/xfs_log.c
===================================================================

--- a/fs/xfs/xfs_log.c
+++ b/fs/xfs/xfs_log.c
@@ -191,8 +191,10 @@ xlog_grant_head_wake(
 
 	list_for_each_entry(tic, &head->waiters, t_queue) {
 		need_bytes = xlog_ticket_reservation(log, head, tic);
-		if (*free_bytes < need_bytes)
+		if (*free_bytes < need_bytes) {
+			xlog_grant_push_ail(log, need_bytes);
 			return false;
+		}
 
 		*free_bytes -= need_bytes;
 		trace_xfs_log_grant_wake_up(log, tic);
_______________________________________________
xfs mailing list
xfs@xxxxxxxxxxx
http://oss.sgi.com/mailman/listinfo/xfs