Re: linux-next: Tree for Aug 7 [ call-trace on suspend: ext4 | pm related ? ]

Sedat Dilek <sedat.dilek@xxxxxxxxx> · Thu, 8 Aug 2013 01:15:44 +0200

On Thu, Aug 8, 2013 at 12:58 AM, Colin Cross <ccross@xxxxxxxxxxx> wrote:
> Can you try add a call to show_state_filter(TASK_UNINTERRUPTIBLE) in
> the error path of try_to_freeze_tasks(), where it prints the "refusing
> to freeze" message?  It will print the stack trace of every thread
> since they are all in the freezer, so the output will be very long.
>

If you provide a patch, I will give it a try.

- Sedat -

> On Wed, Aug 7, 2013 at 4:02 PM, Rafael J. Wysocki <rjw@xxxxxxx> wrote:
>> On Wednesday, August 07, 2013 04:25:14 PM Sedat Dilek wrote:
>>> On Wed, Aug 7, 2013 at 7:54 AM, Stephen Rothwell <sfr@xxxxxxxxxxxxxxxx> wrote:
>>> > Hi all,
>>> >
>>> > Changes since 20130806:
>>> >
>>> > The ext4 tree lost its build failure.
>>> >
>>> > The mvebu tree gained a build failure so I used the version from
>>> > next-20130806.
>>> >
>>> > The akpm tree gained conflicts against the ext4 tree.
>>> >
>>> > ----------------------------------------------------------------------------
>>> >
>>>
>>> [ CC ext4 and pm folks ]
>>>
>>> I saw this on my 1st suspend which was not successful (2nd and 3rd try
>>> I could suspend and resume):
>>>
>>> [ 5467.724074] PM: Syncing filesystems ... done.
>>> [ 5467.973575] PM: Preparing system for mem sleep
>>> [ 5467.974121] Freezing user space processes ...
>>> [ 5487.970574] Freezing of tasks failed after 20.010 seconds (1 tasks
>>> refusing to freeze, wq_busy=0):
>>> [ 5487.970591] DOM Worker      D ffffffff81811820     0  2437      1 0x00000004
>>> [ 5487.970595]  ffff880056ca3ca8 0000000000000002 00000000002d627f
>>> 000009af00000002
>>> [ 5487.970598]  ffff880066ede640 ffff880056ca3fd8 ffff880056ca3fd8
>>> ffff880056ca3fd8
>>> [ 5487.970601]  ffff880119f98340 ffff880066ede640 ffff880056ca3ca8
>>> ffff88011fad5118
>>> [ 5487.970604] Call Trace:
>>> [ 5487.970612]  [<ffffffff81144360>] ? __lock_page+0x70/0x70
>>> [ 5487.970615]  [<ffffffff816e8179>] schedule+0x29/0x70
>>> [ 5487.970618]  [<ffffffff816e824f>] io_schedule+0x8f/0xd0
>>> [ 5487.970621]  [<ffffffff8114436e>] sleep_on_page+0xe/0x20
>>> [ 5487.970624]  [<ffffffff816e4be2>] __wait_on_bit+0x62/0x90
>>> [ 5487.970627]  [<ffffffff81144f9b>] ? find_get_pages_tag+0xcb/0x170
>>> [ 5487.970630]  [<ffffffff811444d0>] wait_on_page_bit+0x80/0x90
>>> [ 5487.970633]  [<ffffffff8108a0e0>] ? wake_atomic_t_function+0x40/0x40
>>> [ 5487.970636]  [<ffffffff811445ec>] filemap_fdatawait_range+0x10c/0x190
>>> [ 5487.970640]  [<ffffffff81145ce0>] filemap_write_and_wait_range+0x50/0x80
>>> [ 5487.970644]  [<ffffffff81246c3d>] ext4_sync_file+0x15d/0x340
>>> [ 5487.970648]  [<ffffffff811db8dd>] do_fsync+0x5d/0x90
>>> [ 5487.970651]  [<ffffffff811dbcc0>] SyS_fsync+0x10/0x20
>>> [ 5487.970655]  [<ffffffff816f25ef>] tracesys+0xe1/0xe6
>>> [ 5487.970658]
>>> [ 5487.970659] Restarting tasks ... done.
>>>
>>> With yesterday's -next I did not have issues like this.
>>
>> It looks like ext4 was doing fsync, so it scheduled a write a waited for it
>> to complete, but that never happened (most likely whoever was supposed to do
>> the write had been already frozen then).
>>
>> Thanks,
>> Rafael
>>