Re: [bug report] scsi: libsas: Fix hung when disable phys

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,


在 2023/2/27 21:17, yangxingui 写道:

Hi, All

If disabling remote PHY just after disabling all local PHYs in expander
envirnment,as follows:
echo 0 > /sys/class/sas_phy/phy-4\:0/enable
echo 0 > /sys/class/sas_phy/phy-4\:1/enable
echo 0 > /sys/class/sas_phy/phy-4\:2/enable
echo 0 > /sys/class/sas_phy/phy-4\:3/enable
echo 0 > /sys/class/sas_phy/phy-4\:4/enable
echo 0 > /sys/class/sas_phy/phy-4\:5/enable
echo 0 > /sys/class/sas_phy/phy-4\:6/enable
echo 0 > /sys/class/sas_phy/phy-4\:7/enable
echo 0 > /sys/class/sas_phy/phy-4:0:7/enable

a hung as follows occurs.

[ 245.564088] INFO: task kworker/u256:1:883 blocked for more than 120 seconds.
[  245.571115]       Tainted: G           O      5.16.0-rc4+ #1
[ 245.576759] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 245.584557] task:kworker/u256:1 state:D stack: 0 pid: 883 ppid: 2 flags:0x00000008 [ 245.592878] Workqueue: 0000:74:02.0_event_q sas_phy_event_worker [libsas]
[  245.599652] Call trace:
[  245.602092]  __switch_to+0xd8/0x114
[  245.605574]  __schedule+0x2f0/0x85c
[  245.609054]  schedule+0x60/0x100
[  245.612273]  __kernfs_remove.part.0+0x288/0x2e0
[  245.616791]  kernfs_remove_by_name_ns+0x70/0xc0
[  245.621307]  sysfs_remove_file_ns+0x24/0x30
[  245.625477]  device_remove_file+0x24/0x34
[  245.629475]  attribute_container_remove_attrs+0x50/0x8c
[  245.634684]  attribute_container_class_device_del+0x24/0x3c
[  245.640237]  transport_remove_classdev+0x64/0x80
[  245.644839]  attribute_container_device_trigger+0x11c/0x124
[  245.650393]  transport_remove_device+0x24/0x30
[  245.654823]  sas_phy_delete+0x34/0x60
[  245.658475]  do_sas_phy_delete+0x60/0x70
[  245.662385]  device_for_each_child+0x68/0xb0
[  245.666640]  sas_remove_children+0x44/0x54
[  245.670723]  sas_destruct_devices+0x5c/0xa0 [libsas]
[  245.675676]  sas_deform_port+0x178/0x1bc [libsas]
[  245.680371]  sas_phye_loss_of_signal+0x28/0x34 [libsas]
[  245.685583]  sas_phy_event_worker+0x3c/0x60 [libsas]
[  245.690536]  process_one_work+0x1e0/0x46c
[  245.694534]  worker_thread+0x15c/0x464
[  245.698272]  kthread+0x188/0x194
[  245.701491]  ret_from_fork+0x10/0x20
[  245.705120] INFO: task bash:25579 blocked for more than 120 seconds.
[  245.711450]       Tainted: G           O      5.16.0-rc4+ #1
[ 245.717087] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 245.724883] task:bash state:D stack: 0 pid:25579 ppid: 25113 flags:0x00000200
[  245.733202] Call trace:
[  245.735639]  __switch_to+0xd8/0x114
[  245.739117]  __schedule+0x2f0/0x85c
[  245.742595]  schedule+0x60/0x100
[  245.745814]  schedule_timeout+0x180/0x1bc
[  245.749811]  wait_for_completion+0x8c/0x100
[  245.753984]  flush_workqueue+0x108/0x3d4
[  245.757896]  drain_workqueue+0xc8/0x16c
[  245.761722]  __sas_drain_work+0x54/0x90 [libsas]
[  245.766328]  sas_drain_work+0x68/0x70 [libsas]
[  245.770760]  queue_phy_enable+0x9c/0xec [libsas]
[  245.775368]  store_sas_phy_enable+0xf0/0x10c
[  245.779624]  dev_attr_store+0x24/0x40
[  245.783275]  sysfs_kf_write+0x50/0x60
[  245.786930]  kernfs_fop_write_iter+0x124/0x1b4
[  245.791361]  new_sync_write+0xf0/0x190
[  245.795098]  vfs_write+0x23c/0x2a0
[  245.798490]  ksys_write+0x78/0x104
[  245.801882]  __arm64_sys_write+0x28/0x3c
[  245.805794]  invoke_syscall.constprop.0+0x58/0xf0
[  245.810483]  do_el0_svc+0x19c/0x1b0
[  245.813962]  el0_svc+0x28/0xec
[  245.817009]  el0t_64_sync_handler+0x1a8/0x1ac
[  245.821351]  el0t_64_sync+0x1a0/0x1a4

We find that when all local PHYs are disabled, all the devices will be
removed in work PHY_LOSS_OF_SIGNAL which will try to wait the kn->active
of the device to be deactivated (in function kernfs_drain),but
kn->active may be still activated as we use sysfs interface to disable
remote PHYs at the same time, meanwhile it will drain libsas work
including work PHY_LOSS_OF_SIGNAL in the sysfs interface, so hung
occurs.

How to fix the problem in this scenario?

It seems be a common issue in libsas layer.
What about directly calling callback function of phy_enable_work and phy_reset_work in function queue_phy_enable/queue_phy_reset instead of (queue those works + sas_drain_work)?



regards,

Xingui

.

.





[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [SCSI Target Devel]     [Linux SCSI Target Infrastructure]     [Kernel Newbies]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Linux IIO]     [Samba]     [Device Mapper]

  Powered by Linux