On 2011.06.29 at 14:31 +1000, Dave Chinner wrote: > On Wed, Jun 22, 2011 at 05:30:47PM +1000, Dave Chinner wrote: > > On Wed, Jun 22, 2011 at 09:06:47AM +0200, Markus Trippelsdorf wrote: > > > On 2011.06.22 at 10:04 +1000, Dave Chinner wrote: > > > > On Tue, Jun 21, 2011 at 08:57:01PM +0200, Markus Trippelsdorf wrote: > > > > > > > > That will at least tell us if this is the cause of your problem. If > > > > it is, I think I know how to avoid most of the list walk overhead > > > > fairly easily and that should avoid the need to change workqueue > > > > configurations at all. > > > > > > The kernel log is attached. > > > > Ok, so that is the cause of the problem∵ THe 3 seconds of output > > where it is nothing but: > > > > Jun 22 08:53:09 x4 kernel: XFS (sdb1): ail: ooo splice, tail 0x12000156e7, item 0x12000156e6 > > Jun 22 08:53:09 x4 kernel: XFS (sdb1): ail: ooo splice, walked 15503 items > > ..... > > Jun 22 08:53:12 x4 kernel: XFS (sdb1): ail: ooo splice, tail 0x12000156e7, item 0x12000156e6 > > Jun 22 08:53:12 x4 kernel: XFS (sdb1): ail: ooo splice, walked 16945 items > > > > Interesting is the LSN of the tail - it's only one sector further on > > than the items being inserted. That's what I'd expect from a commit > > record write race between two checkpoints. I'll have a deeper look > > into whether this can be avoided later tonight and also whether I > > can easily implement a "last insert cursor" easily so subsequent > > inserts at the same LSN avoid the walk.... > > Ok, so here's a patch that does just this. I should probably also do > a little bit of cleanup on the cursor code as well, but this avoids > the repeated walks of the AIL to find the insert position. > > Can you try it without the WQ changes you made, Marcus, and see if > the interactivity problems go away? Sorry to be the bringer of bad news, but this made things much worse: -------cpu0-usage--------------cpu1-usage--------------cpu2-usage--------------cpu3-usage------ --dsk/sdc-- ---system-- ---load-avg--- --dsk/sdc-- usr sys idl wai hiq siq:usr sys idl wai hiq siq:usr sys idl wai hiq siq:usr sys idl wai hiq siq| read writ| int csw | 1m 5m 15m |reads writs 1 1 98 0 0 0: 0 1 99 0 0 0: 0 1 99 0 0 0: 0 1 99 0 0 0| 0 0 | 603 380 |0.66 0.55 0.28| 0 0 1 0 99 0 0 0: 1 0 99 0 0 0: 1 19 80 0 0 0: 0 0 100 0 0 0| 0 0 | 719 383 |0.66 0.55 0.28| 0 0 3 1 96 0 0 0: 3 1 96 0 0 0: 1 52 47 0 0 0: 0 0 100 0 0 0| 0 6464k|1847 919 |0.66 0.55 0.28| 0 202 2 13 85 0 0 0: 2 2 96 0 0 0: 1 56 43 0 0 0: 1 31 69 0 0 0|4096B 256k|1910 1280 |0.68 0.56 0.28| 1 8 > 0 1 99 0 0 0: 0 0 100 0 0 0: 0 1 99 0 0 0: 0 100 0 0 0 0| 0 0 |1256 170 |0.68 0.56 0.28| 0 0 > 0 1 99 0 0 0: 1 1 98 0 0 0: 1 0 99 0 0 0: 0 99 0 0 0 1| 0 0 |1395 229 |0.68 0.56 0.28| 0 0 > 0 0 100 0 0 0: 0 0 100 0 0 0: 0 3 97 0 0 0: 0 100 0 0 0 0| 0 512B|1304 167 |0.68 0.56 0.28| 0 1 > 1 1 98 0 0 0: 1 1 98 0 0 0: 0 0 100 0 0 0: 0 99 0 0 0 1| 0 0 |1211 146 |0.68 0.56 0.28| 0 0 > 0 0 100 0 0 0: 0 0 100 0 0 0: 0 1 99 0 0 0: 0 97 0 0 0 3| 0 0 |1270 149 |0.87 0.60 0.30| 0 0 5 2 65 29 0 0: 2 3 95 0 0 0: 1 0 99 0 0 0: 2 24 72 0 0 1| 0 8866k|2654 2398 |0.87 0.60 0.30| 0 496 6 2 25 67 0 0: 3 1 59 37 0 0: 0 0 100 0 0 0: 4 4 92 0 0 0| 0 4554k|2224 2494 |0.87 0.60 0.30| 0 399 1 1 98 0 0 0: 0 0 83 17 0 0: 1 3 96 0 0 0: 0 1 99 0 0 0| 0 2270k|1079 1030 |0.87 0.60 0.30| 0 200 1 1 98 0 0 0: 1 1 98 0 0 0: 0 1 99 0 0 0: 1 0 99 0 0 0| 0 9216B| 713 567 |0.87 0.60 0.30| 0 2 0 0 100 0 0 0: 1 1 98 0 0 0: 0 0 100 0 0 0: 0 1 99 0 0 0| 0 0 | 492 386 |0.80 0.59 0.30| 0 0 As you can see in the table above (resolution 1sec) the hang is now 5-6 seconds long, instead of the 1-3 seconds seen before. -- Markus _______________________________________________ xfs mailing list xfs@xxxxxxxxxxx http://oss.sgi.com/mailman/listinfo/xfs