On 23/07/10 13:19 +1000, Neil Brown wrote: > On Thu, 22 Jul 2010 14:49:33 -0400 > Justin Bronder <jsbronder@xxxxxxxxxx> wrote: > > > On 16/07/10 14:46 -0400, Justin Bronder wrote: > > > > I've done some more research that may potentially help. All of > > the following was done with 2.6.34.1. > > > > Still produces the hang: > > - Using cp (may take a bit longer). > > - Using jfs as the filesystem. > > - Dropping RESYNC_DEPTH to 32 > > - Using the offset layout. > > > > Does not produce the hang: > > - Using the near layout. > > - Using dd on the partition directly instead of on a > > filesystem via something like: > > dd if=/dev/${MD_DEV}p1 of=/dev/${MD_DEV}p1 seek=4001 bs=1M > > > > > > As the barrier code is very similiar, I repeated a number of > > these tests using raid1 instead of raid10. In every case, I was > > unable to cause the system to hang. I focused on the barriers > > due to the tracebacks in the previous email. For the heck of it, > > I added some tracing (patch below) where the reason for the hang > > is fairly obvious. Of course, how it happened isn't. > > > > The last bit of the trace before the hang. > > Thanks for doing this! > > See below... <previous trace cut> > > > So the 'dd' process successfully waited for the barrier to be gone at > 189.021179, and thus set pending to '1'. It then submitted the IO request. > We should then see swapper (or possibly some other thread) calling > allow_barrier when the request completes. But we don't. > A request could possibly take many milliseconds to complete, but it shouldn't > take seconds and certainly not minutes. > > It might be helpful if you could run this again, and in make_request(), after > the call to "wait_barrier()" print out: > bio->bi_sector, bio->bi_size, bio->bi_rw > > I'm guessing that the last request that doesn't seem to complete will be > different from the other in some important way. Nothing stood out to me, but here's the tail end of a couple of different traces. <...>-5047 [002] 207.023784: wait_barrier: in: dd - w:0 p:11 b:0 <...>-5047 [002] 207.023784: wait_barrier: out: dd - w:0 p:12 b:0 <...>-5047 [002] 207.023785: make_request: dd - sector:7472001 sz:40960 rw:0 <...>-4958 [002] 207.023872: raise_barrier: mid: md99_resync - w:0 p:12 b:1 <...>-5047 [002] 207.024689: allow_barrier: dd - w:0 p:11 b:1 <...>-5047 [002] 207.024695: allow_barrier: dd - w:0 p:10 b:1 <...>-5047 [002] 207.024697: allow_barrier: dd - w:0 p:9 b:1 <...>-5047 [002] 207.024710: allow_barrier: dd - w:0 p:8 b:1 <...>-5047 [002] 207.024713: allow_barrier: dd - w:0 p:7 b:1 <...>-5047 [002] 207.026679: wait_barrier: in: dd - w:0 p:7 b:1 <idle>-0 [003] 207.043049: allow_barrier: swapper - w:1 p:6 b:1 <idle>-0 [003] 207.043058: allow_barrier: swapper - w:1 p:5 b:1 <idle>-0 [003] 207.043063: allow_barrier: swapper - w:1 p:4 b:1 <idle>-0 [003] 207.043070: allow_barrier: swapper - w:1 p:3 b:1 <idle>-0 [003] 207.043074: allow_barrier: swapper - w:1 p:2 b:1 <idle>-0 [003] 207.043079: allow_barrier: swapper - w:1 p:1 b:1 <idle>-0 [003] 207.043084: allow_barrier: swapper - w:1 p:0 b:1 <...>-4958 [003] 207.043108: raise_barrier: out: md99_resync - w:1 p:0 b:1 <...>-4958 [003] 207.043150: raise_barrier: in: md99_resync - w:1 p:0 b:1 <...>-4957 [003] 207.051206: lower_barrier: md99_raid10 - w:1 p:0 b:0 <...>-5047 [002] 207.051215: wait_barrier: out: dd - w:0 p:1 b:0 <...>-5047 [002] 207.051216: make_request: dd - sector:7472081 sz:20480 rw:0 <...>-4958 [003] 207.051218: raise_barrier: mid: md99_resync - w:0 p:1 b:1 <...>-5047 [002] 207.051227: wait_barrier: in: dd - w:0 p:1 b:1 <idle>-0 [002] 207.058929: allow_barrier: swapper - w:1 p:0 b:1 <...>-4958 [003] 207.058938: raise_barrier: out: md99_resync - w:1 p:0 b:1 <...>-4958 [003] 207.059044: raise_barrier: in: md99_resync - w:1 p:0 b:1 <...>-4957 [003] 207.067171: lower_barrier: md99_raid10 - w:1 p:0 b:0 <...>-5047 [002] 207.067179: wait_barrier: out: dd - w:0 p:1 b:0 <...>-5047 [002] 207.067180: make_request: dd - sector:7472121 sz:3584 rw:0 <...>-4958 [003] 207.067182: raise_barrier: mid: md99_resync - w:0 p:1 b:1 <...>-5047 [002] 207.067184: wait_barrier: in: dd - w:0 p:1 b:1 <idle>-0 [000] 463.231730: allow_barrier: swapper - w:2 p:4 b:1 <idle>-0 [000] 463.231739: allow_barrier: swapper - w:2 p:3 b:1 <idle>-0 [000] 463.231746: allow_barrier: swapper - w:2 p:2 b:1 <idle>-0 [000] 463.231765: allow_barrier: swapper - w:2 p:1 b:1 <idle>-0 [000] 463.231774: allow_barrier: swapper - w:2 p:0 b:1 <...>-5004 [000] 463.231792: raise_barrier: out: md99_resync - w:2 p:0 b:1 <...>-5004 [000] 463.232005: raise_barrier: in: md99_resync - w:2 p:0 b:1 <...>-5003 [001] 463.232453: lower_barrier: md99_raid10 - w:2 p:0 b:0 <...>-5009 [000] 463.232463: wait_barrier: out: flush-9:99 - w:1 p:1 b:0 <...>-5009 [000] 463.232464: make_request: flush-9:99 - sector:13931137 sz:61440 rw:1 <...>-5105 [001] 463.232466: wait_barrier: out: dd - w:0 p:2 b:0 <...>-5105 [001] 463.232467: make_request: dd - sector:7204393 sz:40960 rw:0 <...>-5009 [000] 463.232476: wait_barrier: in: flush-9:99 - w:0 p:2 b:0 <...>-5009 [000] 463.232477: wait_barrier: out: flush-9:99 - w:0 p:3 b:0 <...>-5009 [000] 463.232477: make_request: flush-9:99 - sector:13931257 sz:3584 rw:1 <...>-5009 [000] 463.232481: wait_barrier: in: flush-9:99 - w:0 p:3 b:0 <...>-5009 [000] 463.232482: wait_barrier: out: flush-9:99 - w:0 p:4 b:0 <...>-5009 [000] 463.232483: make_request: flush-9:99 - sector:13931264 sz:512 rw:1 <...>-5105 [001] 463.232492: wait_barrier: in: dd - w:0 p:4 b:0 <...>-5105 [001] 463.232493: wait_barrier: out: dd - w:0 p:5 b:0 <...>-5105 [001] 463.232494: make_request: dd - sector:7204473 sz:3584 rw:0 <...>-5004 [000] 463.232495: raise_barrier: mid: md99_resync - w:0 p:5 b:1 <...>-5105 [001] 463.232496: wait_barrier: in: dd - w:0 p:5 b:1 <...>-5009 [000] 463.232522: wait_barrier: in: flush-9:99 - w:1 p:5 b:1 <idle>-0 [000] 463.232726: allow_barrier: swapper - w:2 p:4 b:1 <idle>-0 [001] 463.240520: allow_barrier: swapper - w:2 p:3 b:1 <idle>-0 [000] 463.240946: allow_barrier: swapper - w:2 p:2 b:1 <idle>-0 [000] 463.240955: allow_barrier: swapper - w:2 p:1 b:1 Thanks, -- Justin Bronder
Attachment:
pgpLzPS0yqanj.pgp
Description: PGP signature