Re: WARNING: CPU: 2 PID: 11303 at drivers/scsi/scsi_lib.c:2600 scsi_device_resume+0x4f/0x58

Przemek Socha <soprwa@xxxxxxxxx> · Fri, 22 Feb 2019 17:37:31 +0100

Dnia piątek, 22 lutego 2019 04:55:20 CET piszesz:
> On 2/20/19 1:17 PM, Przemek Socha wrote:
> > Greetings,
> > 
> > recently, after resume from hibernation I'm getting a strange warning in
> > my
> > dmesg output.
> > 
> > The whole process (hibernate/resume) triggers WARN_ON_ONCE at line 2600 in
> > scsi_lib.c file.
> > 
> > It is a Lenovo g50-45 netbook with sata ssd (micron 1100) and amd fch ahci
> > piix4 (1022:7801) sata controller. Scheduler is bfq.
> > 
> > Here is the WARNING part from machine's log:
> > 
> > 
> > 
> > [   95.680910] WARNING: CPU: 2 PID: 11303 at drivers/scsi/scsi_lib.c:2600
> > scsi_device_resume+0x4f/0x58
> > [   95.680912] Modules linked in: rfcomm nf_tables ebtable_nat ip_set
> > nfnetlink ebtable_filter ebtables ip6table_filter ip6_tables overlay
> > squashfs loop bnep ipv6 ath3k btusb btintel bluetooth ecdh_generic
> > rtsx_usb_sdmmc rtsx_usb_ms memstick rtsx_usb uvcvideo videobuf2_vmalloc
> > videobuf2_memops videobuf2_v4l2 videobuf2_common videodev media kvm_amd
> > ath9k ath9k_common ath9k_hw kvm sdhci_pci cqhci irqbypass crc32_pclmul
> > sdhci mac80211 ghash_clmulni_intel serio_raw mmc_core ath sp5100_tco
> > amdgpu xhci_pci ehci_pci ehci_hcd mfd_core cfg80211 chash xhci_hcd
> > gpu_sched ttm
> > [   95.680959] CPU: 2 PID: 11303 Comm: kworker/u8:70 Not tainted
> > 5.0.0-rc1+
> > #50
> > [   95.680961] Hardware name: LENOVO 80E3/Lancer 5B2, BIOS A2CN45WW(V2.13)
> > 08/04/2016
> > [   95.680966] Workqueue: events_unbound async_run_entry_fn
> > [   95.680971] RIP: 0010:scsi_device_resume+0x4f/0x58
> > [   95.680974] Code: 00 00 00 00 48 8b 7b 08 e8 96 33 e6 ff 83 bb d0 07 00
> > 00 05 75 0a c7 83 d0 07 00 00 02 00 00 00 48 89 ef 5b 5d e9 d9 71 25 00
> > <0f> 0b eb cb 0f 1f 44 00 00 eb a6 66 0f 1f 44 00 00 48 c7 c2 80 51
> > [   95.680976] RSP: 0018:ffffba5482167e50 EFLAGS: 00010246
> > [   95.680979] RAX: 0000000000000000 RBX: ffff9dff5610c800 RCX:
> > 0000000000000000 [   95.680980] RDX: ffff9dfef2dfc240 RSI:
> > ffffffffaae8abc0 RDI: ffff9dff5610cfb0 [   95.680982] RBP:
> > ffff9dff5610cfb0 R08: 0000000000000008 R09: 0000646e756f626e [  
> > 95.680984] R10: 8080808080808080 R11: 0000000000000010 R12:
> > ffff9dff56816000 [   95.680986] R13: 0000000000000000 R14:
> > ffff9dff276736c0 R15: 0ffff9dff5681600 [   95.680988] FS: 
> > 0000000000000000(0000) GS:ffff9dff57b00000(0000) knlGS: 0000000000000000
> > [   95.680990] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [   95.680992] CR2: 0000000000000000 CR3: 00000001dfc0e000 CR4:
> > 00000000000406e0
> > [   95.680993] Call Trace:
> > [   95.681003]  scsi_dev_type_resume+0x2e/0x60
> > [   95.681007]  async_run_entry_fn+0x32/0xd8
> > [   95.681012]  process_one_work+0x1f4/0x420
> > [   95.681016]  worker_thread+0x28/0x3c0
> > [   95.681020]  ? rescuer_thread+0x330/0x330
> > [   95.681023]  kthread+0x118/0x130
> > [   95.681027]  ? kthread_create_on_node+0x60/0x60
> > [   95.681033]  ret_from_fork+0x22/0x40
> > [   95.681038] ---[ end trace c27b17349dc21611 ]---
> 
> Hi Przemek,
> 
>      Is this something that occurs systematically or only sporadically?
> If it occurs systematically, did this behavior start with kernel
> v5.0-rc1 or did it also occur with older kernel versions? Does this
> behavior only occur with the BFQ I/O scheduler or does it also occur
> with other I/O schedulers?
> 
> Thanks,
> 
> Bart.

Hi Bart,

and thank you for the response on this.

As for the questions you've asked, well, I can reproduce this every power 
cycle only once when the machine is suspended/hibernated and then resumed for 
the first time in this cycle. When netbook is suspended or hibernated 
later(without rebooting), warning does not occur more in the single power 
cycle.

It does not matter which scheduler is used, mq-deadline, bfq, noop.

I have spotted this lately, as machine wasn't suspending/hibernatig because of 
amdgpu kernel driver (DC) horror on my CIK asic.

I thought this could be an easy one, "but no pain, no gain".

The kernel that I'm using now is amd-staging-drm-next branch from Alex Deucher 
git repo, so bisect log could be looking little strange, but IMHO it is 
relevant:

git bisect start
# good: [94710cac0ef4ee177a63b5227664b38c95bbf703] Linux 4.18
git bisect good 94710cac0ef4ee177a63b5227664b38c95bbf703
# bad: [bfeffd155283772bbe78c6a05dec7c0128ee500c] Linux 5.0-rc1
git bisect bad bfeffd155283772bbe78c6a05dec7c0128ee500c
# good: [94710cac0ef4ee177a63b5227664b38c95bbf703] Linux 4.18
git bisect good 94710cac0ef4ee177a63b5227664b38c95bbf703
# bad: [bfeffd155283772bbe78c6a05dec7c0128ee500c] Linux 5.0-rc1
git bisect bad bfeffd155283772bbe78c6a05dec7c0128ee500c
# bad: [9fe5c59ff6a1e5e26a39b75489a1420e7eaaf0b1] nvme-pci: fix conflicting p2p 
resource adds
git bisect bad 9fe5c59ff6a1e5e26a39b75489a1420e7eaaf0b1
# bad: [9fe5c59ff6a1e5e26a39b75489a1420e7eaaf0b1] nvme-pci: fix conflicting p2p 
resource adds
git bisect bad 9fe5c59ff6a1e5e26a39b75489a1420e7eaaf0b1
# good: [13bf2cf9e2d1e0e56088ec6342c2726704100647] Merge tag 'dmaengine-4.19-
rc1' of git://git.infradead.org/users/vkoul/slave-dma
git bisect good 13bf2cf9e2d1e0e56088ec6342c2726704100647
# bad: [a36cf6865120d7534fcb132d311f03e5159f2da7] Merge tag 'mtd/for-4.20' of 
git://git.infradead.org/linux-mtd
git bisect bad a36cf6865120d7534fcb132d311f03e5159f2da7
# good: [d207ea8e74ff45be0838afa12bdd2492fa9dc8bc] Merge branch 'perf-urgent-
for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
git bisect good d207ea8e74ff45be0838afa12bdd2492fa9dc8bc
# good: [bfb0e9b490bc15f243009359745a9d8a94089dc4] Merge tag 'usb-4.19-rc6' of 
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
git bisect good bfb0e9b490bc15f243009359745a9d8a94089dc4
# good: [f8ccb14fd6c9f58ef766062b7e3929c423580f09] ubifs: Fix WARN_ON logic in 
exit path
git bisect good f8ccb14fd6c9f58ef766062b7e3929c423580f09
# good: [528985117126f11beea339cf39120ee99da04cd2] Merge tag 'arm64-upstream' 
of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
git bisect good 528985117126f11beea339cf39120ee99da04cd2
# bad: [b7c7be6f6bd28ffea7f608ac2d806b8a4bdc82fe] nvme-fabrics: move controller 
options matching to fabrics
git bisect bad b7c7be6f6bd28ffea7f608ac2d806b8a4bdc82fe
# bad: [656e33ca3d405196f94133babc4e38454a49cb73] lightnvm: move device L2P 
detection to core
git bisect bad 656e33ca3d405196f94133babc4e38454a49cb73
# bad: [fa2a1f609e6491383ab63ff6329e0aaa2db2b9f7] kyber: don't make domain 
token sbitmap larger than necessary
git bisect bad fa2a1f609e6491383ab63ff6329e0aaa2db2b9f7
# good: [43b729bfe9cf30ad11499a66e3b7bd300c716d44] block: move 
integrity_req_gap_{back,front}_merge to blk.h
git bisect good 43b729bfe9cf30ad11499a66e3b7bd300c716d44
# good: [c39ae60dfbda66922f644193b91850abcd4d588c] block: remove 
ARCH_BIOVEC_PHYS_MERGEABLE
git bisect good c39ae60dfbda66922f644193b91850abcd4d588c
# good: [18c9a6bbe0645a05172a900740b9d2d379d54320] percpu-refcount: Introduce 
percpu_ref_resurrect()
git bisect good 18c9a6bbe0645a05172a900740b9d2d379d54320
# bad: [986d413b7c156e69198dfc80fb74aa18d0ddef44] blk-mq: Enable support for 
runtime power management
git bisect bad 986d413b7c156e69198dfc80fb74aa18d0ddef44
# good: [7cedffec8e759480f7f7a9be9cd0d7ebf0aafff2] block: Make blk_get_request() 
block for non-PM requests while suspended
git bisect good 7cedffec8e759480f7f7a9be9cd0d7ebf0aafff2
# first bad commit: [986d413b7c156e69198dfc80fb74aa18d0ddef44] blk-mq: Enable 
support for runtime power management

This one seems to be the cause or maybe a symptom in my case:

Commit 986d413b7c156e69198dfc80fb74aa18d0ddef44 (refs/bisect/bad)
Author: Bart Van Assche <bvanassche@xxxxxxx>
Date:   Wed Sep 26 14:01:10 2018 -0700

    blk-mq: Enable support for runtime power management

    Now that the blk-mq core processes power management requests
    (marked with RQF_PREEMPT) in other states than RPM_ACTIVE, enable
    runtime power management for blk-mq.

    Signed-off-by: Bart Van Assche <bvanassche@xxxxxxx>
    Reviewed-by: Ming Lei <ming.lei@xxxxxxxxxx>
    Reviewed-by: Christoph Hellwig <hch@xxxxxx>
    Cc: Jianchao Wang <jianchao.w.wang@xxxxxxxxxx>
    Cc: Hannes Reinecke <hare@xxxxxxxx>
    Cc: Johannes Thumshirn <jthumshirn@xxxxxxx>
    Cc: Alan Stern <stern@xxxxxxxxxxxxxxxxxxx>
    Signed-off-by: Jens Axboe <axboe@xxxxxxxxx>

If you need anything else, please do not hesitate.

Thanks,
Przemek.
Attachment:
signature.asc

Description: This is a digitally signed message part.