Am 31.12.18 um 23:56 schrieb Andreas Färber: > Am 31.12.18 um 18:50 schrieb Mark Brown: >> On Sun, Dec 30, 2018 at 11:55:46AM +0100, Andreas Färber wrote: >>> Given that observed symptoms were CPU stalls, workqueue hangs and RCU >>> problems, requiring a power-cycle to recover, I wonder whether we are >>> running into some atomic/locking issue with clk_enable()? Is it valid at >>> all to use SPI/regmap for clk_enable()? If it is, is there a known issue >>> specific to spi-sun6i (A64) in 4.20.0? >>> I already tried setting .disable_locking = true in both regmap_configs. >>> Any suggestions how to further debug? >> >> You can't use SPI for clk_enable(), clk_enable() needs to be doable in >> atomic context since we need to wait for the bus operations to complete >> (you can start SPI transfers in atomic context but you still need to >> wait for them to complete). Any clocks that are only accessible via a >> slow bus like I2C or SPI need to do the enable/disable in the >> prepare/unprepare operations which aren't done in atomic context. >> >> regmap can be used in atomic contexts, though you need to configure it >> to use spinlocks instead of mutexes and ensure that no register cache >> allocations happen during I/O (eg, by providing defaults for all >> registers or by not using a cache). > > We have .cache_type = REGCACHE_NONE on both bus and spi regmap_configs. > > I moved the regmap_field_write() from .enable to .prepare and set > .fast_io = true on both regmap_configs to force using spinlocks, but > same hang as in .enable before... > > And same if I set .disable_locking = true on both. > > Given that it works with one SPI driver and not with the other, > independent of the locking options applied, I assume my symptoms are not > a regmap-layer issue. > > Is it allowed during a .prepare operation to call the mentioned > clk_get_rate(), which ends up calling clk_prepare_lock()? > > According to my debug output in spi-sun6i.c our hanging > regmap_field_write() ends up calling sun6i_transfer_one() three times, > the first two look okay, but the third one doesn't make it past the > clk_get_rate() [...]. SysRq still works in that state! Attached is SysRq-w output. (still with .disable_locking = true in both regmap_configs) In the very bottom you see the "ip" task, at wait_for_completion() from __spi_sync(). I trigger this issue with `ip link set lora2 up`, so that looks okay. Then there's a "spi1" task at clk_prepare_lock()' mutex_lock() coming from spi_pump_messages(). The reason for that will be that clk_prepare_lock()'s mutex_trylock() failed (because we're holding the prepare_lock from clk_prepare_enable() in the "ip" task) and that the prepare_owner == current check fails for this separate task_struct, too. So, the third invocation of sun6i_transfer_one() calling clk_get_rate() hangs at the prepare_lock instead of reference-counting, because it runs from a separate kthread, unlike the two previous calls? Besides, there's also an mmc_rescan workqueue task at clk_prepare_lock() coming from sunxi_mmc_enable() due to pm_generic_runtime_resume(). My rootfs is on microSD card. I did not find any *regmap_init_spi() based example in drivers/clk/, and all other "spi" mentions in drivers/clk/ appeared to be clock names. The closest was devm_regmap_init_i2c() based clk-cdce706.c, which uses the prepare/unprepare ops, as suggested by Mark, and does regmap_update_bits() from there. A quick grep in drivers/i2c/ does not find any mention of "kthread", so probably that's the breaking difference? Regards, Andreas -- SUSE Linux GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany GF: Felix Imendörffer, Jane Smithard, Graham Norton HRB 21284 (AG Nürnberg)
[39292.911444] sysrq: SysRq : Show Blocked State [39292.915819] task PC stack pid father [39292.921784] kworker/0:2 D 0 170 2 0x00000028 [39292.927282] Workqueue: ipv6_addrconf addrconf_verify_work [39292.932676] Call trace: [39292.935127] __switch_to+0x9c/0xd8 [39292.938530] __schedule+0x2a0/0x888 [39292.942016] schedule+0x30/0x88 [39292.945157] schedule_preempt_disabled+0x14/0x20 [39292.949770] __mutex_lock.isra.1+0x2c0/0x4b8 [39292.954036] __mutex_lock_slowpath+0x24/0x30 [39292.958302] mutex_lock+0x48/0x50 [39292.961615] rtnl_lock+0x1c/0x28 [39292.964841] addrconf_verify_work+0x14/0x28 [39292.969022] process_one_work+0x1f4/0x428 [39292.973028] worker_thread+0x44/0x4a8 [39292.976686] kthread+0x130/0x138 [39292.979912] ret_from_fork+0x10/0x18 [39292.983518] btrfs-transacti D 0 383 2 0x00000028 [39292.988998] Call trace: [39292.991444] __switch_to+0x9c/0xd8 [39292.994845] __schedule+0x2a0/0x888 [39292.998330] schedule+0x30/0x88 [39293.001472] io_schedule+0x20/0x40 [39293.004874] wbt_wait+0x1bc/0x2f8 [39293.008188] rq_qos_throttle+0x4c/0x68 [39293.011936] blk_mq_make_request+0xc4/0x4f0 [39293.016118] generic_make_request+0xf4/0x308 [39293.020384] submit_bio+0x40/0x198 [39293.023977] btrfs_map_bio+0x370/0x3f8 [btrfs] [39293.028550] btrfs_submit_bio_hook+0xa4/0x1c0 [btrfs] [39293.033727] submit_one_bio+0x78/0xb8 [btrfs] [39293.038211] flush_write_bio.isra.12+0x2c/0x48 [btrfs] [39293.043477] extent_writepages+0x60/0x70 [btrfs] [39293.048217] btrfs_writepages+0x28/0x38 [btrfs] [39293.052747] do_writepages+0x3c/0xe0 [39293.056323] __filemap_fdatawrite_range+0xc0/0x118 [39293.061109] filemap_fdatawrite_range+0x38/0x48 [39293.065764] btrfs_fdatawrite_range+0x34/0x78 [btrfs] [39293.070942] __btrfs_write_out_cache+0x400/0x450 [btrfs] [39293.076382] btrfs_write_out_cache+0xc0/0x160 [btrfs] [39293.081551] btrfs_start_dirty_block_groups+0x2dc/0x4d0 [btrfs] [39293.087591] btrfs_commit_transaction+0x554/0x940 [btrfs] [39293.093111] transaction_kthread+0x18c/0x1a8 [btrfs] [39293.098072] kthread+0x130/0x138 [39293.101298] ret_from_fork+0x10/0x18 [39293.104877] systemd-journal D 0 448 1 0x00000801 [39293.110358] Call trace: [39293.112803] __switch_to+0x9c/0xd8 [39293.116204] __schedule+0x2a0/0x888 [39293.119690] schedule+0x30/0x88 [39293.122830] io_schedule+0x20/0x40 [39293.126230] wait_on_page_bit+0x134/0x240 [39293.130366] btrfs_page_mkwrite+0x1e0/0x450 [btrfs] [39293.135240] do_page_mkwrite+0x40/0xb0 [39293.138985] do_wp_page+0x3d0/0x578 [39293.142471] __handle_mm_fault+0x484/0x528 [39293.146563] handle_mm_fault+0x10c/0x1e8 [39293.150484] do_page_fault+0x174/0x480 [39293.154229] do_mem_abort+0x50/0xa8 [39293.157713] el0_da+0x20/0x24 [39293.160676] journal-offline D 0 23928 1 0x00000801 [39293.166156] Call trace: [39293.168601] __switch_to+0x9c/0xd8 [39293.172002] __schedule+0x2a0/0x888 [39293.175487] schedule+0x30/0x88 [39293.178626] io_schedule+0x20/0x40 [39293.182025] wbt_wait+0x1bc/0x2f8 [39293.185338] rq_qos_throttle+0x4c/0x68 [39293.189085] blk_mq_make_request+0xc4/0x4f0 [39293.193266] generic_make_request+0xf4/0x308 [39293.197533] submit_bio+0x40/0x198 [39293.201064] btrfs_map_bio+0x370/0x3f8 [btrfs] [39293.205632] btrfs_submit_bio_hook+0xa4/0x1c0 [btrfs] [39293.210808] submit_one_bio+0x78/0xb8 [btrfs] [39293.215292] submit_extent_page+0xcc/0x218 [btrfs] [39293.220209] __extent_writepage_io+0x230/0x3e8 [btrfs] [39293.225472] __extent_writepage+0x114/0x2e0 [btrfs] [39293.230476] extent_write_cache_pages+0xec/0x328 [btrfs] [39293.235912] extent_writepages+0x54/0x70 [btrfs] [39293.240651] btrfs_writepages+0x28/0x38 [btrfs] [39293.245179] do_writepages+0x3c/0xe0 [39293.248751] __filemap_fdatawrite_range+0xc0/0x118 [39293.253538] filemap_fdatawrite_range+0x38/0x48 [39293.258194] btrfs_fdatawrite_range+0x34/0x78 [btrfs] [39293.263368] start_ordered_ops+0x4c/0x80 [btrfs] [39293.268110] btrfs_sync_file+0x88/0x3e0 [btrfs] [39293.272640] vfs_fsync_range+0x4c/0x88 [39293.276385] do_fsync+0x48/0x88 [39293.279525] __arm64_sys_fsync+0x24/0x38 [39293.283446] el0_svc_common+0x98/0x100 [39293.287192] el0_svc_handler+0x38/0x78 [39293.290936] el0_svc+0x8/0xc [39293.293829] wickedd D 0 713 1 0x00000000 [39293.299309] Call trace: [39293.301755] __switch_to+0x9c/0xd8 [39293.305155] __schedule+0x2a0/0x888 [39293.308641] schedule+0x30/0x88 [39293.311781] schedule_preempt_disabled+0x14/0x20 [39293.316394] __mutex_lock.isra.1+0x2c0/0x4b8 [39293.320661] __mutex_lock_slowpath+0x24/0x30 [39293.324927] mutex_lock+0x48/0x50 [39293.328243] __netlink_dump_start+0x7c/0x1d8 [39293.332511] rtnetlink_rcv_msg+0x280/0x2c0 [39293.336604] netlink_rcv_skb+0x60/0x120 [39293.340437] rtnetlink_rcv+0x28/0x38 [39293.344010] netlink_unicast+0x1a8/0x280 [39293.347928] netlink_sendmsg+0x188/0x338 [39293.351848] sock_sendmsg+0x4c/0x68 [39293.355333] ___sys_sendmsg+0x26c/0x2a0 [39293.359165] __sys_sendmsg+0x64/0xa0 [39293.362736] __arm64_sys_sendmsg+0x2c/0x38 [39293.366829] el0_svc_common+0x98/0x100 [39293.370575] el0_svc_handler+0x38/0x78 [39293.374320] el0_svc+0x8/0xc [39293.377201] ntpd D 0 1354 1 0x00000001 [39293.382681] Call trace: [39293.385126] __switch_to+0x9c/0xd8 [39293.388526] __schedule+0x2a0/0x888 [39293.392012] schedule+0x30/0x88 [39293.395151] schedule_preempt_disabled+0x14/0x20 [39293.399764] __mutex_lock.isra.1+0x2c0/0x4b8 [39293.404029] __mutex_lock_slowpath+0x24/0x30 [39293.408296] mutex_lock+0x48/0x50 [39293.411610] __netlink_dump_start+0x7c/0x1d8 [39293.415876] rtnetlink_rcv_msg+0x280/0x2c0 [39293.419969] netlink_rcv_skb+0x60/0x120 [39293.423801] rtnetlink_rcv+0x28/0x38 [39293.427374] netlink_unicast+0x1a8/0x280 [39293.431294] netlink_sendmsg+0x188/0x338 [39293.435211] sock_sendmsg+0x4c/0x68 [39293.438694] __sys_sendto+0xe8/0x150 [39293.442266] __arm64_sys_sendto+0x30/0x40 [39293.446272] el0_svc_common+0x98/0x100 [39293.450018] el0_svc_handler+0x38/0x78 [39293.453762] el0_svc+0x8/0xc [39293.456670] kworker/2:1 D 0 23762 2 0x00000028 [39293.462197] Workqueue: events_freezable mmc_rescan [mmc_core] [39293.467936] Call trace: [39293.470381] __switch_to+0x9c/0xd8 [39293.473781] __schedule+0x2a0/0x888 [39293.477267] schedule+0x30/0x88 [39293.480406] schedule_preempt_disabled+0x14/0x20 [39293.485019] __mutex_lock.isra.1+0x2c0/0x4b8 [39293.489285] __mutex_lock_slowpath+0x24/0x30 [39293.493551] mutex_lock+0x48/0x50 [39293.496865] clk_prepare_lock+0x48/0xa0 [39293.500699] clk_prepare+0x24/0x58 [39293.504105] sunxi_mmc_enable+0x40/0x220 [sunxi_mmc] [39293.509067] sunxi_mmc_runtime_resume+0x2c/0x70 [sunxi_mmc] [39293.514637] pm_generic_runtime_resume+0x3c/0x58 [39293.519251] __rpm_callback+0x118/0x208 [39293.523082] rpm_callback+0x70/0x98 [39293.526567] rpm_resume+0x65c/0x840 [39293.530052] __pm_runtime_resume+0x68/0xc8 [39293.534182] __mmc_claim_host+0x200/0x260 [mmc_core] [39293.539178] mmc_get_card+0x38/0x48 [mmc_core] [39293.543656] mmc_sd_detect+0x24/0x90 [mmc_core] [39293.548219] mmc_rescan+0x3d0/0x540 [mmc_core] [39293.552659] process_one_work+0x1f4/0x428 [39293.556665] worker_thread+0x44/0x4a8 [39293.560323] kthread+0x130/0x138 [39293.563549] ret_from_fork+0x10/0x18 [39293.567122] kworker/u8:5 D 0 23778 2 0x00000028 [39293.572608] Workqueue: writeback wb_workfn (flush-btrfs-1) [39293.578087] Call trace: [39293.580533] __switch_to+0x9c/0xd8 [39293.583933] __schedule+0x2a0/0x888 [39293.587419] schedule+0x30/0x88 [39293.590557] io_schedule+0x20/0x40 [39293.593956] wbt_wait+0x1bc/0x2f8 [39293.597269] rq_qos_throttle+0x4c/0x68 [39293.601016] blk_mq_make_request+0xc4/0x4f0 [39293.605196] generic_make_request+0xf4/0x308 [39293.609461] submit_bio+0x40/0x198 [39293.612995] btrfs_map_bio+0x370/0x3f8 [btrfs] [39293.617563] btrfs_submit_bio_hook+0xa4/0x1c0 [btrfs] [39293.622743] submit_one_bio+0x78/0xb8 [btrfs] [39293.627227] submit_extent_page+0xcc/0x218 [btrfs] [39293.632142] __extent_writepage_io+0x230/0x3e8 [btrfs] [39293.637405] __extent_writepage+0x114/0x2e0 [btrfs] [39293.642411] extent_write_cache_pages+0xec/0x328 [btrfs] [39293.647849] extent_writepages+0x54/0x70 [btrfs] [39293.652590] btrfs_writepages+0x28/0x38 [btrfs] [39293.657118] do_writepages+0x3c/0xe0 [39293.660691] __writeback_single_inode+0x48/0x438 [39293.665304] writeback_sb_inodes+0x1c8/0x478 [39293.669570] __writeback_inodes_wb+0x78/0xc8 [39293.673835] wb_writeback+0x264/0x388 [39293.677494] wb_workfn+0x304/0x4a8 [39293.680893] process_one_work+0x1f4/0x428 [39293.684898] worker_thread+0x44/0x4a8 [39293.688555] kthread+0x130/0x138 [39293.691781] ret_from_fork+0x10/0x18 [39293.695354] spi1 D 0 23857 2 0x00000028 [39293.700834] Call trace: [39293.703280] __switch_to+0x9c/0xd8 [39293.706680] __schedule+0x2a0/0x888 [39293.710165] schedule+0x30/0x88 [39293.713305] schedule_preempt_disabled+0x14/0x20 [39293.717918] __mutex_lock.isra.1+0x2c0/0x4b8 [39293.722183] __mutex_lock_slowpath+0x24/0x30 [39293.726450] mutex_lock+0x48/0x50 [39293.729763] clk_prepare_lock+0x48/0xa0 [39293.733597] clk_core_get_rate+0x1c/0x78 [39293.737515] clk_get_rate+0x28/0x38 [39293.741006] sun6i_spi_transfer_one+0x150/0x4a0 [spi_sun6i] [39293.746575] spi_transfer_one_message+0x150/0x648 [39293.751274] __spi_pump_messages+0x424/0x6f0 [39293.755541] spi_pump_messages+0x24/0x30 [39293.759460] kthread_worker_fn+0xf0/0x1f8 [39293.763465] kthread+0x130/0x138 [39293.766691] ret_from_fork+0x10/0x18 [39293.770266] kworker/u8:8 D 0 23923 2 0x00000028 [39293.775882] Workqueue: btrfs-submit btrfs_submit_helper [btrfs] [39293.781793] Call trace: [39293.784240] __switch_to+0x9c/0xd8 [39293.787640] __schedule+0x2a0/0x888 [39293.791126] schedule+0x30/0x88 [39293.794301] __mmc_claim_host+0xa8/0x260 [mmc_core] [39293.799210] mmc_get_card+0x38/0x48 [mmc_core] [39293.803658] mmc_mq_queue_rq+0x220/0x308 [mmc_block] [39293.808619] __blk_mq_try_issue_directly+0x120/0x190 [39293.813580] blk_mq_request_issue_directly+0x94/0xc0 [39293.818539] blk_mq_try_issue_list_directly+0x40/0x98 [39293.823585] blk_mq_sched_insert_requests+0x98/0xc0 [39293.828458] blk_mq_flush_plug_list+0x184/0x298 [39293.832984] blk_flush_plug_list+0xd8/0x250 [39293.837164] blk_finish_plug+0x3c/0x4c [39293.841042] run_scheduled_bios+0x2e4/0x408 [btrfs] [39293.846045] pending_bios_fn+0x20/0x30 [btrfs] [39293.850615] normal_work_helper+0x200/0x470 [btrfs] [39293.855619] btrfs_submit_helper+0x20/0x30 [btrfs] [39293.860406] process_one_work+0x1f4/0x428 [39293.864411] worker_thread+0x44/0x4a8 [39293.868068] kthread+0x130/0x138 [39293.871294] ret_from_fork+0x10/0x18 [39293.874866] ip D 0 23925 23924 0x00000000 [39293.880346] Call trace: [39293.882792] __switch_to+0x9c/0xd8 [39293.886193] __schedule+0x2a0/0x888 [39293.889679] schedule+0x30/0x88 [39293.892817] schedule_timeout+0x2c4/0x408 [39293.896823] wait_for_common+0xdc/0x1a8 [39293.900656] wait_for_completion+0x28/0x38 [39293.904750] __spi_sync+0x148/0x388 [39293.908235] spi_sync+0x34/0x58 [39293.911375] regmap_spi_write+0x84/0x98 [39293.915210] _regmap_raw_write_impl+0x794/0x8d8 [39293.919737] _regmap_bus_raw_write+0x70/0x90 [39293.924003] _regmap_write+0x70/0x128 [39293.927662] regmap_write+0x50/0x78 [39293.931155] sx130x_radio_read+0x94/0x270 [lora_sx130x] [39293.936375] _regmap_raw_read+0xd0/0x248 [39293.940295] _regmap_bus_read+0x50/0x88 [39293.944127] _regmap_read+0x6c/0x170 [39293.947700] _regmap_update_bits+0xa4/0xf0 [39293.951793] regmap_update_bits_base+0x68/0x98 [39293.956233] regmap_field_update_bits_base+0x5c/0x70 [39293.961195] sx125x_clkout_prepare+0x44/0x50 [lora_sx125x] [39293.966674] clk_core_prepare+0x64/0x1d8 [39293.970593] clk_prepare+0x2c/0x58 [39293.973995] sx130x_loradev_open+0x58/0x738 [lora_sx130x] [39293.979388] __dev_open+0xd8/0x178 [39293.982788] __dev_change_flags+0x134/0x198 [39293.986968] dev_change_flags+0x34/0x70 [39293.990800] do_setlink+0x2a8/0xbb8 [39293.994287] rtnl_newlink+0x408/0x768 [39293.997945] rtnetlink_rcv_msg+0x20c/0x2c0 [39294.002038] netlink_rcv_skb+0x60/0x120 [39294.005870] rtnetlink_rcv+0x28/0x38 [39294.009443] netlink_unicast+0x1a8/0x280 [39294.013363] netlink_sendmsg+0x188/0x338 [39294.017281] sock_sendmsg+0x4c/0x68 [39294.020766] ___sys_sendmsg+0x26c/0x2a0 [39294.024598] __sys_sendmsg+0x64/0xa0 [39294.028170] __arm64_sys_sendmsg+0x2c/0x38 [39294.032263] el0_svc_common+0x98/0x100 [39294.036009] el0_svc_handler+0x38/0x78 [39294.039753] el0_svc+0x8/0xc