On Wed, Jul 22, 2020 at 06:28:58AM +0000, Damien Le Moal wrote: > On Tue, 2020-07-21 at 12:39 +0530, Vaibhav Gupta wrote: > > On Tue, Jul 21, 2020 at 02:57:28AM +0000, Damien Le Moal wrote: > > > On 2020/07/20 22:32, Vaibhav Gupta wrote: > > > > Drivers using legacy PM have to manage PCI states and device's PM states > > > > themselves. They also need to take care of configuration registers. > > > > > > > > With improved and powerful support of generic PM, PCI Core takes care of > > > > above mentioned, device-independent, jobs. > > > > > > > > This driver makes use of PCI helper functions like > > > > pci_save/restore_state(), pci_enable/disable_device(), > > > > pci_request/release_regions(), pci_set_power_state() and > > > > pci_set_master() to do required operations. In generic mode, they are no > > > > longer needed. > > > > > > > > Change function parameter in both .suspend() and .resume() to > > > > "struct device*" type. Use to_pci_dev() to get "struct pci_dev*" variable. > > > > > > This commit message is rather vague, and the last sentence actually does not > > > describe correctly the change. What about something very simple, yet clear, like > > > this: > > > > > > skd: use generic power management > > > > > > Switch from the legacy .suspend()/.resume() power management interface to the > > > generic power management interface using the single .driver.pm() method. This > > > avoids the need for the driver to directly call most of the PCI helper functions > > > and device power state control functions as the generic power management > > > interface takes care of the necessary operations. > > > > > Okay. I will improve on it. Just inform me after testing that if any other > > changes are required. I guess [PATCH 1/3] and [PATCH 2/3] are okay, so I will > > only send v3 of [PATCH 3/3] after suggested changes. > > > > Compile-tested only. > > > > > > > > Signed-off-by: Vaibhav Gupta <vaibhavgupta40@xxxxxxxxx> > > > > --- > > > > drivers/block/skd_main.c | 30 ++++++++---------------------- > > > > 1 file changed, 8 insertions(+), 22 deletions(-) > > > > > > > > diff --git a/drivers/block/skd_main.c b/drivers/block/skd_main.c > > > > index 51569c199a6c..7f2d42900b38 100644 > > > > --- a/drivers/block/skd_main.c > > > > +++ b/drivers/block/skd_main.c > > > > @@ -3315,10 +3315,11 @@ static void skd_pci_remove(struct pci_dev *pdev) > > > > return; > > > > } > > > > > > > > -static int skd_pci_suspend(struct pci_dev *pdev, pm_message_t state) > > > > +static int __maybe_unused skd_pci_suspend(struct device *dev) > > > > { > > > > int i; > > > > struct skd_device *skdev; > > > > + struct pci_dev *pdev = to_pci_dev(dev); > > > > > > > > skdev = pci_get_drvdata(pdev); > > > > if (!skdev) { > > > > @@ -3337,18 +3338,15 @@ static int skd_pci_suspend(struct pci_dev *pdev, pm_message_t state) > > > > if (skdev->pcie_error_reporting_is_enabled) > > > > pci_disable_pcie_error_reporting(pdev); > > > > > > > > - pci_release_regions(pdev); > > > > - pci_save_state(pdev); > > > > - pci_disable_device(pdev); > > > > - pci_set_power_state(pdev, pci_choose_state(pdev, state)); > > > > return 0; > > > > } > > > > > > > > -static int skd_pci_resume(struct pci_dev *pdev) > > > > +static int __maybe_unused skd_pci_resume(struct device *dev) > > > > { > > > > int i; > > > > int rc = 0; > > > > struct skd_device *skdev; > > > > + struct pci_dev *pdev = to_pci_dev(dev); > > > > > > > > skdev = pci_get_drvdata(pdev); > > > > if (!skdev) { > > > > @@ -3356,16 +3354,8 @@ static int skd_pci_resume(struct pci_dev *pdev) > > > > return -1; > > > > } > > > > > > > > - pci_set_power_state(pdev, PCI_D0); > > > > - pci_enable_wake(pdev, PCI_D0, 0); > > > > - pci_restore_state(pdev); > > > > + device_wakeup_disable(dev); > > > > > > > > - rc = pci_enable_device(pdev); > > > > - if (rc) > > > > - return rc; > > > > - rc = pci_request_regions(pdev, DRV_NAME); > > > > - if (rc) > > > > - goto err_out; > > > > rc = dma_set_mask_and_coherent(&pdev->dev, DMA_BIT_MASK(64)); > > > > if (rc) > > > > rc = dma_set_mask_and_coherent(&pdev->dev, DMA_BIT_MASK(32)); > > > > @@ -3374,7 +3364,6 @@ static int skd_pci_resume(struct pci_dev *pdev) > > > > goto err_out_regions; > > > > } > > > > > > > > - pci_set_master(pdev); > > > > rc = pci_enable_pcie_error_reporting(pdev); > > > > if (rc) { > > > > dev_err(&pdev->dev, > > > > @@ -3427,10 +3416,6 @@ static int skd_pci_resume(struct pci_dev *pdev) > > > > pci_disable_pcie_error_reporting(pdev); > > > > > > > > err_out_regions: > > > > - pci_release_regions(pdev); > > > > - > > > > -err_out: > > > > - pci_disable_device(pdev); > > > > return rc; > > > > } > > > > > > > > @@ -3450,13 +3435,14 @@ static void skd_pci_shutdown(struct pci_dev *pdev) > > > > skd_stop_device(skdev); > > > > } > > > > > > > > +static SIMPLE_DEV_PM_OPS(skd_pci_pm_ops, skd_pci_suspend, skd_pci_resume); > > > > + > > > > static struct pci_driver skd_driver = { > > > > .name = DRV_NAME, > > > > .id_table = skd_pci_tbl, > > > > .probe = skd_pci_probe, > > > > .remove = skd_pci_remove, > > > > - .suspend = skd_pci_suspend, > > > > - .resume = skd_pci_resume, > > > > + .driver.pm = &skd_pci_pm_ops, > > > > .shutdown = skd_pci_shutdown, > > > > }; > > > > > > > > > > > > > > Apart from the commit message, this looks OK to me. > > > I will give this a spin today on the hardware to check. > > > > > Thanks! > > Tested this and it seems OK. > > Of note, after having removed the device (echo 1 > > /sys/block/skd0/device/remove) and rescaning its PCI slot, I got a > lockdep splat: > > [ 4449.098757] ====================================================== > [ 4449.104945] WARNING: possible circular locking dependency detected > [ 4449.111136] 5.8.0-rc6+ #846 Not tainted > [ 4449.114980] ------------------------------------------------------ > [ 4449.121168] tcsh/1550 is trying to acquire lock: > [ 4449.125798] ffffffffa85c3748 (pci_rescan_remove_lock){+.+.}-{3:3}, > at: dev_rescan_store+0x38/0x60 > [ 4449.134686] > [ 4449.134686] but task is already holding lock: > [ 4449.140522] ffff8f6cd352ed28 (kn->active#294){++++}-{0:0}, at: > kernfs_fop_write+0xad/0x1c0 > [ 4449.148795] > [ 4449.148795] which lock already depends on the new lock. > [ 4449.148795] > [ 4449.156979] > [ 4449.156979] the existing dependency chain (in reverse order) is: > [ 4449.164464] > [ 4449.164464] -> #1 (kn->active#294){++++}-{0:0}: > [ 4449.170486] __kernfs_remove+0x276/0x2f0 > [ 4449.174939] kernfs_remove_by_name_ns+0x41/0x80 > [ 4449.180001] remove_files+0x2b/0x60 > [ 4449.184018] sysfs_remove_group+0x38/0x80 > [ 4449.188560] sysfs_remove_groups+0x29/0x40 > [ 4449.193190] device_remove_attrs+0x4b/0x70 > [ 4449.197818] device_del+0x167/0x3f0 > [ 4449.201843] pci_remove_bus_device+0x70/0x110 > [ 4449.206728] pci_stop_and_remove_bus_device_locked+0x1e/0x30 > [ 4449.212915] remove_store+0x55/0x60 > [ 4449.216939] kernfs_fop_write+0xd9/0x1c0 > [ 4449.221398] vfs_write+0xc7/0x1f0 > [ 4449.225239] ksys_write+0x58/0xd0 > [ 4449.229089] do_syscall_64+0x4f/0x2c0 > [ 4449.233284] entry_SYSCALL_64_after_hwframe+0x44/0xa9 > [ 4449.238863] > [ 4449.238863] -> #0 (pci_rescan_remove_lock){+.+.}-{3:3}: > [ 4449.245575] __lock_acquire+0x1194/0x1fd0 > [ 4449.250113] lock_acquire+0x97/0x390 > [ 4449.254221] __mutex_lock+0x58/0x820 > [ 4449.258330] dev_rescan_store+0x38/0x60 > [ 4449.262697] kernfs_fop_write+0xd9/0x1c0 > [ 4449.267154] vfs_write+0xc7/0x1f0 > [ 4449.270999] ksys_write+0x58/0xd0 > [ 4449.274846] do_syscall_64+0x4f/0x2c0 > [ 4449.279041] entry_SYSCALL_64_after_hwframe+0x44/0xa9 > [ 4449.284620] > [ 4449.284620] other info that might help us debug this: > [ 4449.284620] > [ 4449.292628] Possible unsafe locking scenario: > [ 4449.292628] > [ 4449.298555] CPU0 CPU1 > [ 4449.303090] ---- ---- > [ 4449.307630] lock(kn->active#294); > [ > 4449.311131] lock(pci_rescan_remove_lock > ); > [ 4449.317838] lock(kn->active#294); > [ 4449.323855] lock(pci_rescan_remove_lock); > [ 4449.328048] > [ 4449.328048] *** DEADLOCK *** > [ 4449.328048] > [ 4449.333977] 3 locks held by tcsh/1550: > [ 4449.337731] #0: ffff8f6f18b7d448 (sb_writers#4){.+.+}-{0:0}, at: > vfs_write+0x174/0x1f0 > [ 4449.345747] #1: ffff8f6e74d09a88 (&of->mutex){+.+.}-{3:3}, at: > kernfs_fop_write+0xa5/0x1c0 > [ 4449.354111] #2: ffff8f6cd352ed28 (kn->active#294){++++}-{0:0}, at: > kernfs_fop_write+0xad/0x1c0 > [ 4449.362821] > [ 4449.362821] stack backtrace: > [ 4449.367192] CPU: 15 PID: 1550 Comm: tcsh Not tainted 5.8.0-rc6+ #846 > [ 4449.373548] Hardware name: Supermicro Super Server/X11DPL-i, BIOS > 3.1 05/21/2019 > [ 4449.380953] Call Trace: > [ 4449.383419] dump_stack+0x78/0xa0 > [ 4449.386744] check_noncircular+0x12d/0x150 > [ 4449.390853] __lock_acquire+0x1194/0x1fd0 > [ 4449.394875] lock_acquire+0x97/0x390 > [ 4449.398462] ? dev_rescan_store+0x38/0x60 > [ 4449.402486] __mutex_lock+0x58/0x820 > [ 4449.406067] ? dev_rescan_store+0x38/0x60 > [ 4449.410090] ? dev_rescan_store+0x38/0x60 > [ 4449.414113] ? lock_acquire+0x97/0x390 > [ 4449.417876] ? kernfs_fop_write+0xad/0x1c0 > [ 4449.421983] dev_rescan_store+0x38/0x60 > [ 4449.425831] kernfs_fop_write+0xd9/0x1c0 > [ 4449.429768] vfs_write+0xc7/0x1f0 > [ 4449.433092] ksys_write+0x58/0xd0 > [ 4449.436421] do_syscall_64+0x4f/0x2c0 > [ 4449.440094] ? prepare_exit_to_usermode+0xa/0x40 > [ 4449.444725] ? rcu_read_lock_sched_held+0x3f/0x50 > [ 4449.449436] ? asm_exc_page_fault+0x8/0x30 > [ 4449.453546] entry_SYSCALL_64_after_hwframe+0x44/0xa9 > [ 4449.458608] RIP: 0033:0x7fcce58f45e7 > [ 4449.462191] Code: Bad RIP value. > [ 4449.465426] RSP: 002b:00007fff2592c898 EFLAGS: 00000246 ORIG_RAX: > 0000000000000001 > [ 4449.473000] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: > 00007fcce58f45e7 > [ 4449.480141] RDX: 0000000000000002 RSI: 0000556d30b96100 RDI: > 0000000000000001 > [ 4449.487283] RBP: 0000556d30b96100 R08: 0000000000000000 R09: > 00007fff2592c7f0 > [ 4449.494425] R10: 0000000000000000 R11: 0000000000000246 R12: > 0000000000000001 > [ 4449.501567] R13: 0000556d31fc4fb8 R14: 0000556d31fc37c0 R15: > 0000556d31fc4fb4 > [ 4449.520721] pci 0000:d8:00.0: [1b39:0001] type 00 class 0x018000 > [ 4449.526802] pci 0000:d8:00.0: reg 0x10: [mem 0xfbe00000-0xfbe0ffff] > [ 4449.533121] pci 0000:d8:00.0: reg 0x14: [mem 0xfbe10000-0xfbe10fff] > [ 4449.539474] pci 0000:d8:00.0: reg 0x30: [mem 0xfbd00000-0xfbdfffff > pref] > [ 4449.546327] pci 0000:d8:00.0: supports D1 D2 > [ 4449.550936] pci 0000:d8:00.0: BAR 6: assigned [mem 0xfbd00000- > 0xfbdfffff pref] > [ 4449.559804] pci 0000:d8:00.0: BAR 0: assigned [mem 0xfbe00000- > 0xfbe0ffff] > [ 4449.568849] pci 0000:d8:00.0: BAR 1: assigned [mem 0xfbe10000- > 0xfbe10fff] > [ 4449.577178] skd 0000:d8:00.0: PCIe (5.0GT/s 4X) 64bit > [ 4449.582272] skd 0000:d8:00.0: bad enable of PCIe error reporting > rc=-5 > [ 4449.589764] skd 0000:d8:00.0: s1120 state INIT(1)=>SOFT_RESET(8) > [ 4449.595788] skd 0000:d8:00.0: Driver state STARTING(3)=>STARTING(3) > [ 4449.829900] skd 0000:d8:00.0: s1120 state SOFT_RESET(8)=>INIT(1) > [ 4449.835929] skd 0000:d8:00.0: Driver state STARTING(3)=>STARTING(3) > [ 4450.076148] skd 0000:d8:00.0: Time sync driver=0x5f17d872 > device=0x162bdfe9 > [ 4450.083139] skd 0000:d8:00.0: s1120 state INIT(1)=>ONLINE(3) > [ 4450.088822] skd 0000:d8:00.0: Queue depth limit=64 dev=255 lowat=43 > [ 4450.095101] skd 0000:d8:00.0: Driver state STARTING(3)=>STARTING(3) > [ 4450.101539] skd 0000:d8:00.0: Driver state STARTING(3)=>ONLINE(4) > [ 4450.107653] skd 0000:d8:00.0: STEC s1120 ONLINE > [ 4450.120422] skd1: p1 > > But that looks completely unrelated to your patch. Will have a look at > this. > Thanks. I will send the patch-series as v3 with updated commit message as suggested. --Vaibhav Gupta > -- > Damien Le Moal > Western Digital Research