Inline… Usually it shouldn't be so many entries, and we have added some patches to fix this. When the exclusive lock is broke by a new gateway, the previous one will be added to the blocklist by ceph. And in tcmu-runner when the previous gateway detect that it has been added to the blocklist will it try to reopen that image, during which the stale blocklist entry will be removed. Locally I tried and couldn't reproduce this issue. Could you send me some of the blacklist entries you saw ? What are they look like ? Sure - here is a short snippet of an example from last week that I logged: [root@cxcto-c240-j27-01 ~]# ceph osd blacklist ls 10.122.242.199:0/4226293933 2021-08-27T20:39:44.061508+0000 10.122.242.198:0/2282309529 2021-08-27T20:39:44.192061+0000 10.122.242.197:0/3952967406 2021-08-27T20:39:43.240488+0000 10.122.242.197:0/2355855561 2021-08-27T20:39:43.272794+0000 10.122.242.197:0/1182316932 2021-08-27T20:39:43.029873+0000 10.122.242.199:0/2839589212 2021-08-27T20:39:43.086062+0000 10.122.242.197:0/1768780841 2021-08-27T20:39:43.068138+0000 10.122.242.197:0/3136295259 2021-08-27T20:39:42.175505+0000 10.122.242.197:0/2370363728 2021-08-27T20:39:42.051609+0000 10.122.242.200:0/3544501318 2021-08-27T20:39:42.270797+0000 10.122.242.199:0/2049723123 2021-08-27T20:39:39.951792+0000 10.122.242.197:0/244698804 2021-08-27T20:39:40.236649+0000 10.122.242.197:0/4246474017 2021-08-27T20:39:39.736140+0000 10.122.242.199:0/180742279 2021-08-27T20:39:39.071984+0000 10.122.242.197:0/301384989 2021-08-27T20:39:37.949623+0000 10.122.242.198:0/1054843518 2021-08-27T20:39:36.859075+0000 10.122.242.197:0/3257267535 2021-08-27T20:39:37.219640+0000 10.122.242.197:0/856045413 2021-08-27T20:39:35.812634+0000 10.122.242.197:0/3399533573 2021-08-27T20:39:35.815300+0000 10.122.242.199:0/2236736112 2021-08-27T20:39:34.643961+0000 10.122.242.200:0/1419077698 2021-08-27T20:39:34.674641+0000 10.122.242.197:0/3192807538 2021-08-27T20:39:34.234574+0000 10.122.242.197:0/3535150177 2021-08-27T20:39:34.222314+0000 10.122.242.197:0/214335566 2021-08-27T20:39:34.253096+0000 10.122.242.197:0/3969335486 2021-08-27T20:39:33.255463+0000 10.122.242.199:0/2070390422 2021-08-27T20:39:33.873459+0000 10.122.242.197:0/2291391951 2021-08-27T20:39:32.087530+0000 10.122.242.197:0/3306349369 2021-08-27T20:39:32.746714+0000 10.122.242.199:0/3740796140 2021-08-27T20:39:31.658943+0000 10.122.242.197:0/4099129478 2021-08-27T20:39:31.249495+0000 10.122.242.197:0/2514846855 2021-08-27T20:39:29.735692+0000 10.122.242.200:0/1580971106 2021-08-27T20:39:30.224904+0000 … snip …. 10.122.242.197:0/2755867789 2021-08-27T20:26:10.204159+0000 10.122.242.198:0/3288532389 2021-08-27T20:26:26.269538+0000 10.122.242.200:0/4167199222 2021-08-27T20:26:26.970351+0000 10.122.242.197:0/2795494436 2021-08-27T20:26:27.348841+0000 10.122.242.197:0/643616937 2021-08-27T20:26:27.452835+0000 10.122.242.197:0/549853104 2021-08-27T20:26:27.361418+0000 listed 14528 entries The initiators accessing the iSCSI volumes are all VMWare ESXi hosts. Do you think it’s expected to see so much path switching in this kind of environment or perhaps I need to look at some parameters on the ESXi side to make it not switch so often. What's the Path Selection Policy you are using ? It can be either “Fixed”, “Most Recently Used” or “Round Robin” and it seems you are using the last one ? We are using “Most Recently Used” - however there are 50 ESXi hosts all trying to access the same data stores, so it’s very possible that one host is choosing iSCSI gateway 1 and another host is choosing iSCSI gateway 2. Is only one gateway allowed to access an image at any given time? If so perhaps I need to hardcode the paths on the ESXi hosts so they all prefer one gateway and then only use the other to fail over. Now we don’t have redundancy, but at least things are stable while we wait for a fix. Any chance this fix will make it into the 16.2.6 release? Not sure, I am still wait someone to help me review them. Ilya, would you be able to help? - Xiubo -Paul On Aug 29, 2021, at 8:48 PM, Xiubo Li <xiubli@xxxxxxxxxx<mailto:xiubli@xxxxxxxxxx>> wrote: On 8/27/21 11:10 PM, Paul Giralt (pgiralt) wrote: Ok - thanks Xiubo. Not sure I feel comfortable doing that without breaking something else, so will wait for a new release that incorporates the fix. In the meantime I’m trying to figure out what might be triggering the issue, since this has been running fine for months and just recently started happening. Now it happens fairly regularly. I noticed that in the tcmu logs, I see the following: 2021-08-27 15:06:40.158 8:ework-thread [ERROR] tcmu_rbd_service_status_update:140 rbd/iscsi-pool-0001.iscsi-p0001-img-01: Could not update service status. (Err -107) 2021-08-27 15:06:40.158 8:ework-thread [ERROR] __tcmu_report_event:173 rbd/iscsi-pool-0001.iscsi-p0001-img-01: Could not report events. Error -107. 2021-08-27 15:06:41.131 8:io_context_pool [WARN] tcmu_notify_lock_lost:271 rbd/iscsi-pool-0002.iscsi-p0002-img-02: Async lock drop. Old state 5 2021-08-27 15:06:41.147 8:cmdproc-uio9 [INFO] alua_implicit_transition:592 rbd/iscsi-pool-0002.iscsi-p0002-img-02: Starting write lock acquisition operation. 2021-08-27 15:06:42.132 8:ework-thread [ERROR] tcmu_rbd_service_status_update:140 rbd/iscsi-pool-0002.iscsi-p0002-img-02: Could not update service status. (Err -107) 2021-08-27 15:06:42.132 8:ework-thread [ERROR] __tcmu_report_event:173 rbd/iscsi-pool-0002.iscsi-p0002-img-02: Could not report events. Error -107. 2021-08-27 15:06:42.216 8:ework-thread [INFO] tcmu_rbd_rm_stale_entries_from_blacklist:340 rbd/iscsi-pool-0001.iscsi-p0001-img-01: removing addrs: {10.122.242.197:0/2251669337} 2021-08-27 15:06:42.217 8:ework-thread [ERROR] tcmu_rbd_rm_stale_entry_from_blacklist:322 rbd/iscsi-pool-0001.iscsi-p0001-img-01: Could not rm blacklist entry '�(~'. (Err -13) 2021-08-27 15:06:42.217 8:ework-thread [INFO] tcmu_rbd_rm_stale_entries_from_blacklist:340 rbd/iscsi-pool-0001.iscsi-p0001-img-01: removing addrs: {10.122.242.197:0/3276725458} 2021-08-27 15:06:42.218 8:ework-thread [ERROR] tcmu_rbd_rm_stale_entry_from_blacklist:322 rbd/iscsi-pool-0001.iscsi-p0001-img-01: Could not rm blacklist entry ''. (Err -13) 2021-08-27 15:06:42.443 8:io_context_pool [WARN] tcmu_notify_lock_lost:271 rbd/iscsi-pool-0005.iscsi-p0005-img-01: Async lock drop. Old state 5 2021-08-27 15:06:42.459 8:cmdproc-uio0 [INFO] alua_implicit_transition:592 rbd/iscsi-pool-0005.iscsi-p0005-img-01: Starting write lock acquisition operation. 2021-08-27 15:06:42.488 8:ework-thread [INFO] tcmu_rbd_rm_stale_entries_from_blacklist:340 rbd/iscsi-pool-0005.iscsi-p0005-img-01: removing addrs: {10.122.242.197:0/2189482708} 2021-08-27 15:06:42.489 8:ework-thread [ERROR] tcmu_rbd_rm_stale_entry_from_blacklist:322 rbd/iscsi-pool-0005.iscsi-p0005-img-01: Could not rm blacklist entry '`"�'. (Err -13) The tcmu_rbd_service_status_update is showing up in there which is the code that is affected by this bug. Any idea what the error -107 means? Maybe if I fix what is causing some of these errors, it might work around the problem. Also if you have thoughts on the other blacklist entry errors and what might be causing them, that would be greatly appreciated as well. There has one way to improve this, which is to make the HA=1, but won't void it 100%. I found your case is triggered when it's doing active paths switching between different gateways, which will do the exclusive lock broke and acquiring frequently, the Error -107 means the image has been closed by tcmu-runner but another thread is trying to use the freed connection to report the status. The blocklist error should be okay, it won't affect anything, it's just a waning. - Xiubo -Paul On Aug 26, 2021, at 8:37 PM, Xiubo Li <xiubli@xxxxxxxxxx<mailto:xiubli@xxxxxxxxxx>> wrote: On 8/27/21 12:06 AM, Paul Giralt (pgiralt) wrote: This is great. Is there a way to test the fix in my environment? It seem you could restart the tcmu-runner service from the container. Since this change not only in the handler_rbd.so but also the libtcmu.so and tcmu-runner binary, the whole tcmu-runner need to be built. That means I am afraid you have to build and install it from source on the host and then restart the tcmu container. -Paul On Aug 26, 2021, at 11:05 AM, Xiubo Li <xiubli@xxxxxxxxxx<mailto:xiubli@xxxxxxxxxx>> wrote: Hi Paul, Ilya, I have fixed it in [1], please help review. Thanks [1] https://github.com/open-iscsi/tcmu-runner/pull/667 On 8/26/21 7:34 PM, Paul Giralt (pgiralt) wrote: Thank you for the analysis. Can you think of a workaround for the issue? -Paul Sent from my iPhone On Aug 26, 2021, at 5:17 AM, Xiubo Li <xiubli@xxxxxxxxxx><mailto:xiubli@xxxxxxxxxx> wrote: Hi Paul, There has one racy case when updating the state to ceph cluster and while reopening the image, which will close and open the image, the crash should happen just after the image was closed and the resources were released and then if work queue was trying to update the state to ceph cluster it will trigger use-after-free bug. I will try to fix it. Thanks On 8/26/21 10:40 AM, Paul Giralt (pgiralt) wrote: I will send a unicast email with the link and details. -Paul On Aug 25, 2021, at 10:37 PM, Xiubo Li <xiubli@xxxxxxxxxx<mailto:xiubli@xxxxxxxxxx>> wrote: Hi Paul, Please send me the detail versions of the tcmu-runner and ceph-iscsi packages you are using. Thanks On 8/26/21 10:21 AM, Paul Giralt (pgiralt) wrote: Thank you. I did find some coredump files. Is there a way I can send these to you to analyze? [root@cxcto-c240-j27-02 coredump]# ls -asl total 71292 0 drwxr-xr-x. 2 root root 176 Aug 25 18:31 . 0 drwxr-xr-x. 5 root root 70 Aug 10 11:31 .. 34496 -rw-r-----. 1 root root 35316215 Aug 25 18:31 core.tcmu-runner.0.3083bbc32b6a43acb768b85818414867.4523.1629930681000000.lz4 36796 -rw-r-----. 1 root root 37671322 Aug 24 09:17 core.tcmu-runner.0.baf25867590c40da87305e67d5b97751.4521.1629811022000000.lz4 [root@cxcto-c240-j27-03 coredump]# ls -asl total 161188 4 drwxr-xr-x. 2 root root 4096 Aug 25 19:29 . 0 drwxr-xr-x. 5 root root 70 Aug 10 11:31 .. 45084 -rw-r-----. 1 root root 46159860 Aug 25 19:29 core.tcmu-runner.0.a276a2f5ee5a4d279917fd8c335c9b93.5281.1629934190000000.lz4 33468 -rw-r-----. 1 root root 34263834 Aug 24 16:08 core.tcmu-runner.0.a9df4a27b1ea43d09c6c254bb1e3447a.4209.1629835730000000.lz4 34212 -rw-r-----. 1 root root 35027795 Aug 25 03:43 core.tcmu-runner.0.cce93af5693444108993f0d48371197d.5564.1629877416000000.lz4 48420 -rw-r-----. 1 root root 49574566 Aug 24 10:03 core.tcmu-runner.0.e4f4ed6e35154c95b43f87b069380fbe.4091.1629813832000000.lz4 [root@cxcto-c240-j27-04 coredump]# ls -asl total 359240 4 drwxr-xr-x. 2 root root 4096 Aug 25 19:20 . 0 drwxr-xr-x. 5 root root 70 Aug 10 11:31 .. 31960 -rw-r-----. 1 root root 32720639 Aug 25 00:36 core.tcmu-runner.0.115ba6ee7acb42b8acfe2a1a958b5367.34161.1629866182000000.lz4 38516 -rw-r-----. 1 root root 39435484 Aug 25 19:20 core.tcmu-runner.0.4d43dd5cde9c4d44a96b2c744a9b43f4.4295.1629933615000000.lz4 81012 -rw-r-----. 1 root root 82951773 Aug 25 14:38 core.tcmu-runner.0.6998ff9717cf4e96932349eacd1d81bc.4274.1629916720000000.lz4 95872 -rw-r-----. 1 root root 98165539 Aug 23 17:02 core.tcmu-runner.0.9a28e301d6604d1a8eafbe12ae896c2f.4269.1629752547000000.lz4 111876 -rw-r-----. 1 root root 114554583 Aug 24 11:41 core.tcmu-runner.0.f9ea1331105b44f2b2f28dc0c1a7e653.5059.1629819705000000.lz4 [root@cxcto-c240-j27-05 coredump]# ls -asl total 115720 0 drwxr-xr-x. 2 root root 261 Aug 25 16:47 . 0 drwxr-xr-x. 5 root root 70 Aug 10 11:31 .. 44720 -rw-r-----. 1 root root 45786023 Aug 24 09:46 core.tcmu-runner.0.530b308c30534b9aa4e7619ff1ab869c.4145.1629812787000000.lz4 35032 -rw-r-----. 1 root root 35867165 Aug 24 17:52 core.tcmu-runner.0.5afb87334bd741699c6fd44ceb031128.5672.1629841939000000.lz4 35968 -rw-r-----. 1 root root 36826770 Aug 25 16:47 core.tcmu-runner.0.da66f3f24a624426a75cbe20758be879.5339.1629924435000000.lz4 -Paul On Aug 25, 2021, at 10:14 PM, Xiubo Li <xiubli@xxxxxxxxxx<mailto:xiubli@xxxxxxxxxx>> wrote: On 8/26/21 10:08 AM, Paul Giralt (pgiralt) wrote: Thanks Xiubo. I will try this. How do I set the log level to 4? It's in the /etc/tcmu/tcmu.cfg in the tcmu container. No need to restart the tcmu-runner service, the changes will be loaded by tcmu-runner daemon after the tcmu.cfg closed. -Paul On Aug 25, 2021, at 9:30 PM, Xiubo Li <xiubli@xxxxxxxxxx<mailto:xiubli@xxxxxxxxxx>> wrote: It's buggy, we need one way to export the tcmu-runner log to the host. Could you see any crash coredump from the host ? Without that could you keep running some commands like '$ tail -f XYZ/tcmu-runner.log' in a console from the tcmu containers, let's see could we get any useful logs ? At the same time please set the log_level to 4. If it's an experimental setup then you can just set the log_level to 5. I am not confident we can get any coredump from tcmu-runner.log, but at least we can get something else which may give us a clue. - Xiubo _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx