Re: [Bug 18632] "INFO: task" dpkg "blocked for more than 120 seconds.

Tao Ma <tm@xxxxxx> · Thu, 09 Jun 2011 22:12:38 +0800



Hi Fengguang and Christoph,
On 06/09/2011 08:11 PM, Wu Fengguang wrote:
> On Thu, Jun 09, 2011 at 07:02:14PM +0800, Christoph Hellwig wrote:
>> On Thu, Jun 09, 2011 at 05:09:06PM +0800, Wu Fengguang wrote:
>>> I have a sync livelock test script and it sometimes livelocked on XFS
>>> even with the livelock fix patches. Ext4 is always OK.
>>
>> This sounds similar to the cfq issue just posted to lkml as
>> "CFQ: async queue blocks the whole system".
Just want to say more about the situation here. Actually the flusher is
too much easier to be blocked by the sync requests. And whenever it is
blocked, it takes a quite long time to get back(because several cfq
designs), so do you think we can use WRITE_SYNC for the bdev inodes in
flusher? AFAICS, in most of the cases when a volume is mounted, the
writeback for a bdev inode means the metadata writeback. And they are
very important for a file system and should be written as far as
possible. I ran my test cases with the change, and now the livelock
doesn't show up anymore.

Regards,
Tao
> 
> I once ran two dd's doing sequential reads and writes in parallel, and
> find the write dd to be completely blocked (note: I can no longer
> reproduce it today, on 3.0-rc2). At the time I thought: "Wow, this is
> good for typical desktop". But yes, it is livelocking async IOs, which
> is bad.
> 
>> Does this happen with non-CFQ I/O schedulers, too?
> 
> Just tried the deadline scheduler, sync times are still long:
> 
> echo deadline > /sys/block/sda/queue/scheduler
> 
> sync time: 21
> sync time: 22
> sync time: 29
> 
> Also tried disabling the cfq low latency feature,
> 
> echo cfq > /sys/block/sda/queue/scheduler
> echo 0   > /sys/block/sda/queue/iosched/low_latency
> 
> However the low_latency value seem to have NO much effects in the
> sync time (and also don't considerably improve async dd write
> bandwidth at the presence of another parallel dd read).
> 
> sync time: 19
> sync time: 24
> sync time: 22
> 
>>> [ 3581.185120]  [<ffffffff812ed520>] xfs_ioend_wait+0x87/0x9f
>>
>> This waits for the I/O completion to actually arrive - something that
>> XFS does correctly in both sync and fsync, but ext4 only does for fsync.
> 
> Will it benefit to flush the disk _once_ at the end of sync?
> (perhaps it's not be as easy in complex storage setups or whatever)
> 
>> It might have some issues in the way it's implemented, I'll look if
>> we can do something.  But I suspect cfq delaying async writes too much
>> is definitively going to cause issues for us here.
> 
> It's definitely a problem that cfq delays async writes too much.
> However in Carlos's report,
> 
>         https://bugzilla.kernel.org/attachment.cgi?id=61222
> 
> there are no sync(1) or fsync running at all. So it may be indicating
> a different problem.
> 
> Thanks,
> Fengguang

--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html