Re: tcmu-runner lock failure

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 19/09/2022 23:32, j.rasakunasingam@xxxxxxxxxxxx wrote:

Hi,

we have 3x controller and 6xstorage Ceph Cluster running. We use iscsi/tcmu
runner (16.2.9) to connect VMware to Ceph.

We face an issue, that we lost the connection to the iscsi gateways, that
ESXi is connected not works properly. After restarting the servers it works
again, but later the tcmu runner restart the docker container it self. The
only thing what I found here is this from tcmu-runner:

2022-09-12 12:33:24.186 81 [ERROR] tcmur_cmdproc_thread:864: ppoll received
unexpected revent: 0x19
2022-09-12 12:33:24.192 81 [ERROR] tcmur_cmdproc_thread:864: ppoll received
unexpected revent: 0x19
2022-09-12 12:33:24.198 81 [ERROR] tcmur_cmdproc_thread:864: ppoll received
unexpected revent: 0x19
2022-09-12 12:33:24.201 81 [ERROR] tcmur_cmdproc_thread:864: ppoll received
unexpected revent: 0x19
2022-09-12 12:33:24.205 81 [ERROR] tcmur_cmdproc_thread:864: ppoll received
unexpected revent: 0x19
2022-09-12 12:33:24.208 81 [ERROR] tcmur_cmdproc_thread:864: ppoll received
unexpected revent: 0x19
2022-09-12 12:33:24.211 81 [ERROR] tcmur_cmdproc_thread:864: ppoll received
unexpected revent: 0x19
2022-09-12 12:33:24.279 81 [ERROR] tcmu_cfgfs_set_str:294: Kernel does not
support configfs
file /sys/kernel/config/target/core/user_3/pool_ag.image_ag/action/block_dev.

2022-09-12 12:33:24.279 81 [ERROR] tcmu_cfgfs_set_str:294: Kernel does not
support configfs
file /sys/kernel/config/target/core/user_3/pool_ag.image_ag/action/block_dev.

I think your kernel is a little old. And without the 'block_dev' we cannot make sure the inflight IOs to finish before closing the device, and this will cause all the other errors in above and below.

Please update your kernel which will include:

commit 892782caf19a97ccc95df51b3bb659ecacff986a
Author: Mike Christie <mchristi@xxxxxxxxxx>
Date:   Tue Dec 19 04:03:58 2017 -0600

    tcmu: allow userspace to reset ring

    This patch adds 2 tcmu attrs to block/unblock a device and
    reset the ring buffer. They are used when the userspace
    daemon has crashed or forced to shutdown while IO is executing.
    On restart, the daemon can block the device so new IO is not
    sent to userspace while it puts the ring in a clean state.

    Notes: The reset ring opreation is specific to tcmu, but the
    block one could be generic. I kept it tcmu specific, because
    it requires some extra locking/state checks in the main IO
    path and since other backend modules did not need this
    functionality I thought only tcmu should take the perf hit.

    Signed-off-by: Mike Christie <mchristi@xxxxxxxxxx>
    Signed-off-by: Nicholas Bellinger <nab@xxxxxxxxxxxxxxx>


2022-09-12 12:33:24.754 81 [ERROR] tcmur_cmdproc_thread:864: ppoll received
unexpected revent: 0x19
2022-09-12 12:33:24.965 81 [ERROR] tcmur_cmdproc_thread:864: ppoll received
unexpected revent: 0x19
2022-09-16 10:07:42.829 81 [ERROR] tcmu_rbd_service_status_update:140
rbd/pool_scutum.image_scutum: Could not update service status. (Err -107)
2022-09-16 10:07:42.829 81 [ERROR] __tcmu_report_event:173
rbd/pool_scutum.image_scutum: Could not report events. Error -107.
2022-09-16 10:08:51.126 81 [ERROR] tcmu_rbd_service_status_update:140
rbd/pool_scutum.image_scutum: Could not update service status. (Err -107)
2022-09-16 10:08:51.126 81 [ERROR] __tcmu_report_event:173
rbd/pool_scutum.image_scutum: Could not report events. Error -107.
2022-09-16 10:13:05.313 81 [ERROR] tcmu_rbd_service_status_update:140
rbd/pool_scutum.image_scutum: Could not update service status. (Err -107)
2022-09-16 10:13:05.313 81 [ERROR] __tcmu_report_event:173
rbd/pool_scutum.image_scutum: Could not report events. Error -107.
2022-09-16 10:20:23.229 81 [ERROR] tcmu_rbd_service_status_update:140
rbd/pool_scutum.image_scutum: Could not update service status. (Err -107)
2022-09-16 10:20:23.229 81 [ERROR] __tcmu_report_event:173
rbd/pool_scutum.image_scutum: Could not report events. Error -107.
2022-09-16 10:27:21.421 81 [ERROR] tcmu_rbd_service_status_update:140
rbd/pool_scutum.image_scutum: Could not update service status. (Err -107)
2022-09-16 10:27:21.422 81 [ERROR] __tcmu_report_event:173
rbd/pool_scutum.image_scutum: Could not report events. Error -107.
2022-09-16 10:28:02.669 81 [ERROR] tcmu_acquire_dev_lock:432
rbd/pool_desktop.image_desktop: Could not reopen device while taking lock.
Err -16.
2022-09-16 10:29:47.049 81 [ERROR] tcmu_rbd_service_status_update:140
rbd/pool_ag.image_ag: Could not update service status. (Err -107)
2022-09-16 10:29:47.049 81 [ERROR] __tcmu_report_event:173
rbd/pool_ag.image_ag: Could not report events. Error -107.
2022-09-16 10:31:26.025 81 [ERROR] tcmu_rbd_service_status_update:140
rbd/pool_desktop.image_desktop: Could not update service status. (Err -107)
2022-09-16 10:31:26.025 81 [ERROR] __tcmu_report_event:173
rbd/pool_desktop.image_desktop: Could not report events. Error -107.
2022-09-16 10:48:23.553 81 [ERROR] tcmu_acquire_dev_lock:432
rbd/pool_desktop.image_desktop: Could not reopen device while taking lock.
Err -16.
2022-09-16 10:49:58.223 81 [ERROR] tcmu_acquire_dev_lock:432
rbd/pool_desktop.image_desktop: Could not reopen device while taking lock.
Err -16.
2022-09-16 10:54:06.798 81 [ERROR] tcmu_acquire_dev_lock:432
rbd/pool_desktop.image_desktop: Could not reopen device while taking lock.
Err -16.
2022-09-16 10:56:48.497 81 [ERROR] tcmu_acquire_dev_lock:432
rbd/pool_desktop.image_desktop: Could not reopen device while taking lock.
Err -16.
2022-09-16 10:59:48.393 81 [ERROR] tcmu_acquire_dev_lock:432
rbd/pool_desktop.image_desktop: Could not reopen device while taking lock.
Err -16.
2022-09-16 11:01:28.993 81 [ERROR] tcmu_acquire_dev_lock:432
rbd/pool_desktop.image_desktop: Could not reopen device while taking lock.
Err -16.
2022-09-16 11:09:42.599 81 [ERROR] tcmu_rbd_service_status_update:140
rbd/pool_desktop.image_desktop: Could not update service status. (Err -107)
2022-09-16 11:09:42.600 81 [ERROR] __tcmu_report_event:173
rbd/pool_desktop.image_desktop: Could not report events. Error -107.

Thanks in advance.

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx


_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux