Re: linux-next: Tree for Aug 7 [ call-trace on suspend: ext4 | pm related ? ]

Sedat Dilek <sedat.dilek@xxxxxxxxx> · Thu, 8 Aug 2013 11:17:53 +0200

On Thu, Aug 8, 2013 at 10:46 AM, Sedat Dilek <sedat.dilek@xxxxxxxxx> wrote:
> On Thu, Aug 8, 2013 at 3:12 AM, Colin Cross <ccross@xxxxxxxxxxx> wrote:
>> On Wed, Aug 7, 2013 at 6:01 PM, Sedat Dilek <sedat.dilek@xxxxxxxxx> wrote:
>>> On Thu, Aug 8, 2013 at 1:34 AM, Colin Cross <ccross@xxxxxxxxxxx> wrote:
>>>> On Wed, Aug 7, 2013 at 4:15 PM, Sedat Dilek <sedat.dilek@xxxxxxxxx> wrote:
>>>>> On Thu, Aug 8, 2013 at 12:58 AM, Colin Cross <ccross@xxxxxxxxxxx> wrote:
>>>>>> Can you try add a call to show_state_filter(TASK_UNINTERRUPTIBLE) in
>>>>>> the error path of try_to_freeze_tasks(), where it prints the "refusing
>>>>>> to freeze" message?  It will print the stack trace of every thread
>>>>>> since they are all in the freezer, so the output will be very long.
>>>>>>
>>>>>
>>>>> If you provide a patch, I will give it a try.
>>>>
>>>> Try the attached patch.
>>>
>>> This time I do not see ext4 related messages.
>>>
>>> - Sedat -
>>
>> Can you describe your filesystem setup?  It looks like you have an
>> ntfs fuse filesystem and a loopback ext4 mount?  Is the file backing
>> the loopback ext4 filesystem located on the ntfs filesystem?
>>
>
> [ CC fuse folks ]
>
> This is a Ubuntu/precise AMD64 installed in a WUBI environment (rootfs
> is an image on my Win7-partition and thus uses fuse/ntfs/loop).
>
> $ cat /etc/fstab
> # /etc/fstab: static file system information.
>
> # Use 'blkid' to print the universally unique identifier for a
> # device; this may be used with UUID= as a more robust way to name devices
> # that works even if disks are added and removed. See fstab(5).
>
> # <file system>                 <mount point>   <type>  <options>
>          <dump>  <pass>
> proc                            /proc           proc
> nodev,noexec,nosuid     0       0
> /host/ubuntu/disks/root.disk    /               ext4
> loop,errors=remount-ro  0       1
> /host/ubuntu/disks/swap.disk    none            swap    loop,sw
>          0       0
>
> $ df -T
> Dateisystem    Typ      1K-Blöcke   Benutzt Verfügbar Verw% Eingehängt auf
> rootfs         rootfs    17753424  11598008   5230540   69% / <---
> 18GiB file-image
> udev           devtmpfs   1969240         4   1969236    1% /dev
> tmpfs          tmpfs       789036       884    788152    1% /run
> /dev/sda2      fuseblk  465546236 107358584 358187652   24% /host <--- Win7-OS
> /dev/loop0     ext4      17753424  11598008   5230540   69% /
> none           tmpfs         5120         0      5120    0% /run/lock
> none           tmpfs      1972588       200   1972388    1% /run/shm
>
> $ sudo LC_ALL=C fdisk -l /dev/sda
>
> Disk /dev/sda: 500.1 GB, 500107862016 bytes
> 255 heads, 63 sectors/track, 60801 cylinders, total 976773168 sectors
> Units = sectors of 1 * 512 = 512 bytes
> Sector size (logical/physical): 512 bytes / 4096 bytes
> I/O size (minimum/optimal): 4096 bytes / 4096 bytes
> Disk identifier: 0xcb9885ab
>
>    Device Boot      Start         End      Blocks   Id  System
> /dev/sda1   *        2048      206847      102400    7  HPFS/NTFS/exFAT
> /dev/sda2          206848   931299327   465546240    7  HPFS/NTFS/exFAT
> /dev/sda3       931299328   976773119    22736896   27  Hidden NTFS WinRE
>
> [ /etc/grub.d/40_custom ]
> ...
> menuentry "Ubuntu/precise AMD64, WUBI-system with Ubuntu-kernel"
> --class ubuntu --class gnu-linux --class gnu --class os {
>         insmod part_msdos
>         insmod ntfs
>         set root='(hd0,msdos2)'
>         search --no-floppy --fs-uuid --set=root 001AADA61AAD9964
>         loopback loop0 /ubuntu/disks/root.disk
>         set root=(loop0)
>         linux /vmlinuz root=UUID=001AADA61AAD9964
> loop=/ubuntu/disks/root.disk ro
>         initrd /initrd.img
> }
> ...
>
> So, no "native" Linux installation here.
>
> - Sedat -
>
> [1] https://wiki.ubuntu.com/WubiGuide
>
>> This looks like the interesting part, the process backing the fuse
>> filesystem is frozen (through the normal freeze path, not any of my
>> new freeze points), and the loop0 processes are blocked on it.  It
>> doesn't seem related to my patches.
>>
>> [  125.336205] mount.ntfs      D ffffffff81811820     0   303      1 0x00000004
>> [  125.336209]  ffff8800d3e95d18 0000000000000002 0000000037a6b000
>> ffff8800d57a66b8
>> [  125.336212]  ffff8801185286c0 ffff8800d3e95fd8 ffff8800d3e95fd8
>> ffff8800d3e95fd8
>> [  125.336214]  ffffffff81c144a0 ffff8801185286c0 ffff8800d3e95d08
>> ffff8801185286c0
>> [  125.336217] Call Trace:
>> [  125.336225]  [<ffffffff816e81b9>] schedule+0x29/0x70
>> [  125.336230]  [<ffffffff810ba663>] __refrigerator+0x43/0xe0
>> [  125.336234]  [<ffffffff81078e19>] get_signal_to_deliver+0x5b9/0x600
>> [  125.336238]  [<ffffffff812c4b98>] ? fuse_dev_read+0x68/0x80
>> [  125.336242]  [<ffffffff810133b8>] do_signal+0x58/0x8f0
>> [  125.336246]  [<ffffffff8110906c>] ? acct_account_cputime+0x1c/0x20
>> [  125.336249]  [<ffffffff8137a90d>] ? do_raw_spin_unlock+0x5d/0xb0
>> [  125.336252]  [<ffffffff816e968e>] ? _raw_spin_unlock+0xe/0x10
>> [  125.336255]  [<ffffffff8109d00d>] ? vtime_account_user+0x6d/0x80
>> [  125.336258]  [<ffffffff81013cd8>] do_notify_resume+0x88/0xc0
>> [  125.336261]  [<ffffffff816f26da>] int_signal+0x12/0x17
>> [  125.336263] loop0           D ffffffff81811820     0   310      2 0x00000000
>> [  125.336265]  ffff8800d573d968 0000000000000002 0000000000000000
>> ffff8800d57aeb80
>> [  125.336268]  ffff880118bda740 ffff8800d573dfd8 ffff8800d573dfd8
>> ffff8800d573dfd8
>> [  125.336270]  ffff880119f98340 ffff880118bda740 ffff8800d573d968
>> ffff88011fad5118
>> [  125.336273] Call Trace:
>> [  125.336278]  [<ffffffff811443a0>] ? __lock_page+0x70/0x70
>> [  125.336280]  [<ffffffff816e81b9>] schedule+0x29/0x70
>> [  125.336283]  [<ffffffff816e828f>] io_schedule+0x8f/0xd0
>> [  125.336286]  [<ffffffff811443ae>] sleep_on_page+0xe/0x20
>> [  125.336288]  [<ffffffff816e49ed>] __wait_on_bit_lock+0x5d/0xc0
>> [  125.336291]  [<ffffffff81144397>] __lock_page+0x67/0x70
>> [  125.336294]  [<ffffffff8108a0e0>] ? wake_atomic_t_function+0x40/0x40
>> [  125.336298]  [<ffffffff811da79f>] __generic_file_splice_read+0x59f/0x5d0
>> [  125.336302]  [<ffffffff813656d8>] ? cpumask_next_and+0x38/0x50
>> [  125.336305]  [<ffffffff810a3b33>] ? update_sd_lb_stats+0x123/0x610
>> [  125.336309]  [<ffffffff81048143>] ? x2apic_send_IPI_mask+0x13/0x20
>> [  125.336312]  [<ffffffff8104029b>] ? native_smp_send_reschedule+0x4b/0x60
>> [  125.336315]  [<ffffffff810964b6>] ? resched_task+0x76/0x80
>> [  125.336318]  [<ffffffff811d8c80>] ? page_cache_pipe_buf_release+0x30/0x30
>> [  125.336321]  [<ffffffff811da80e>] generic_file_splice_read+0x3e/0x80
>> [  125.336324]  [<ffffffff811d8f6b>] do_splice_to+0x7b/0xa0
>> [  125.336326]  [<ffffffff811d91f7>] splice_direct_to_actor+0xa7/0x1c0
>> [  125.336330]  [<ffffffff81495780>] ? loop_thread+0x2a0/0x2a0
>> [  125.336333]  [<ffffffff81495292>] do_bio_filebacked+0xf2/0x340
>> [  125.336336]  [<ffffffff8137a79c>] ? do_raw_spin_lock+0x4c/0x120
>> [  125.336339]  [<ffffffff814955c5>] loop_thread+0xe5/0x2a0
>> [  125.336341]  [<ffffffff8108a060>] ? __init_waitqueue_head+0x40/0x40
>> [  125.336344]  [<ffffffff814954e0>] ? do_bio_filebacked+0x340/0x340
>> [  125.336346]  [<ffffffff81089848>] kthread+0xd8/0xe0
>> [  125.336348]  [<ffffffff81089770>] ? flush_kthread_worker+0xe0/0xe0
>> [  125.336351]  [<ffffffff816f236c>] ret_from_fork+0x7c/0xb0
>> [  125.336353]  [<ffffffff81089770>] ? flush_kthread_worker+0xe0/0xe0
>> [  125.336354] jbd2/loop0-8    D 0000000000000000     0   312      2 0x00000000
>> [  125.336357]  ffff8800d56c7b08 0000000000000002 ffff8800d56c7ab8
>> ffffffff8137a90d
>> [  125.336359]  ffff880037ba4240 ffff8800d56c7fd8 ffff8800d56c7fd8
>> ffff8800d56c7fd8
>> [  125.336362]  ffff88010d33e5c0 ffff880037ba4240 ffff8800d56c7b08
>> ffff88011fad5118
>> [  125.336364] Call Trace:
>> [  125.336367]  [<ffffffff8137a90d>] ? do_raw_spin_unlock+0x5d/0xb0
>> [  125.336369]  [<ffffffff811ddcb0>] ? __wait_on_buffer+0x30/0x30
>> [  125.336371]  [<ffffffff816e81b9>] schedule+0x29/0x70
>> [  125.336374]  [<ffffffff816e828f>] io_schedule+0x8f/0xd0
>> [  125.336376]  [<ffffffff811ddcbe>] sleep_on_buffer+0xe/0x20
>> [  125.336378]  [<ffffffff816e4c22>] __wait_on_bit+0x62/0x90
>> [  125.336380]  [<ffffffff811ddcb0>] ? __wait_on_buffer+0x30/0x30
>> [  125.336382]  [<ffffffff816e4ccc>] out_of_line_wait_on_bit+0x7c/0x90
>> [  125.336384]  [<ffffffff8108a0e0>] ? wake_atomic_t_function+0x40/0x40
>> [  125.336386]  [<ffffffff811ddcae>] __wait_on_buffer+0x2e/0x30
>> [  125.336390]  [<ffffffff812a02c1>]
>> jbd2_journal_commit_transaction+0x1051/0x1c60
>> [  125.336393]  [<ffffffff810a460b>] ? load_balance+0x14b/0x870
>> [  125.336397]  [<ffffffff816e97e4>] ? _raw_spin_lock_irqsave+0x24/0x30
>> [  125.336399]  [<ffffffff8107328f>] ? try_to_del_timer_sync+0x4f/0x70
>> [  125.336402]  [<ffffffff812a5a7b>] kjournald2+0x11b/0x350
>> [  125.336405]  [<ffffffff816e72e5>] ? __schedule+0x3e5/0x850
>> [  125.336407]  [<ffffffff8108a060>] ? __init_waitqueue_head+0x40/0x40
>> [  125.336410]  [<ffffffff812a5960>] ? jbd2_journal_clear_features+0x90/0x90
>> [  125.336412]  [<ffffffff81089848>] kthread+0xd8/0xe0
>> [  125.336414]  [<ffffffff81089770>] ? flush_kthread_worker+0xe0/0xe0
>> [  125.336416]  [<ffffffff816f236c>] ret_from_fork+0x7c/0xb0
>> [  125.336418]  [<ffffffff81089770>] ? flush_kthread_worker+0xe0/0xe0

My 1st suspend try with next-20130808 shows same isssue (w/o your "hack" patch):

[   98.026982] PM: Syncing filesystems ... done.
[   98.321181] PM: Preparing system for mem sleep
[   98.321736] Freezing user space processes ...
[  118.320817] Freezing of tasks failed after 20.003 seconds (1 tasks
refusing to freeze, wq_busy=0):
[  118.320838] apt-check       D 0000000000000000     0  2636   2456 0x00000004
[  118.320841]  ffff880109631c98 0000000000000002 0000000000000000
ffff88005c044280
[  118.320845]  ffff88005c1d4640 ffff880109631fd8 ffff880109631fd8
ffff880109631fd8
[  118.320847]  ffff880119f440c0 ffff88005c1d4640 ffff880109631c98
ffff88011fa15118
[  118.320850] Call Trace:
[  118.320859]  [<ffffffff81144500>] ? sleep_on_page+0x20/0x20
[  118.320863]  [<ffffffff816e8889>] schedule+0x29/0x70
[  118.320865]  [<ffffffff816e895f>] io_schedule+0x8f/0xd0
[  118.320868]  [<ffffffff8114450e>] sleep_on_page_killable+0xe/0x40
[  118.320871]  [<ffffffff816e50bd>] __wait_on_bit_lock+0x5d/0xc0
[  118.320874]  [<ffffffff81144467>] __lock_page_killable+0x67/0x70
[  118.320877]  [<ffffffff8108a110>] ? wake_atomic_t_function+0x40/0x40
[  118.320880]  [<ffffffff8114738e>] generic_file_aio_read+0x47e/0x720
[  118.320883]  [<ffffffff811ac3aa>] do_sync_read+0x5a/0x90
[  118.320885]  [<ffffffff811ad444>] vfs_read+0xb4/0x180
[  118.320887]  [<ffffffff811ad6c2>] SyS_read+0x52/0xa0
[  118.320891]  [<ffffffff816f2cef>] tracesys+0xe1/0xe6
[  118.320892]
[  118.320893] Restarting tasks ... done.

I walked trough my scripts-dir... is run_pm-test.sh an appropriate
test-case/use-case?

- Sedat -
Attachment:
run_pm-test.sh

Description: Bourne shell script