This is great. Is there a way to test the fix in my environment? -Paul On Aug 26, 2021, at 11:05 AM, Xiubo Li <xiubli@xxxxxxxxxx<mailto:xiubli@xxxxxxxxxx>> wrote: Hi Paul, Ilya, I have fixed it in [1], please help review. Thanks [1] https://github.com/open-iscsi/tcmu-runner/pull/667 On 8/26/21 7:34 PM, Paul Giralt (pgiralt) wrote: Thank you for the analysis. Can you think of a workaround for the issue? -Paul Sent from my iPhone On Aug 26, 2021, at 5:17 AM, Xiubo Li <xiubli@xxxxxxxxxx><mailto:xiubli@xxxxxxxxxx> wrote: Hi Paul, There has one racy case when updating the state to ceph cluster and while reopening the image, which will close and open the image, the crash should happen just after the image was closed and the resources were released and then if work queue was trying to update the state to ceph cluster it will trigger use-after-free bug. I will try to fix it. Thanks On 8/26/21 10:40 AM, Paul Giralt (pgiralt) wrote: I will send a unicast email with the link and details. -Paul On Aug 25, 2021, at 10:37 PM, Xiubo Li <xiubli@xxxxxxxxxx<mailto:xiubli@xxxxxxxxxx>> wrote: Hi Paul, Please send me the detail versions of the tcmu-runner and ceph-iscsi packages you are using. Thanks On 8/26/21 10:21 AM, Paul Giralt (pgiralt) wrote: Thank you. I did find some coredump files. Is there a way I can send these to you to analyze? [root@cxcto-c240-j27-02 coredump]# ls -asl total 71292 0 drwxr-xr-x. 2 root root 176 Aug 25 18:31 . 0 drwxr-xr-x. 5 root root 70 Aug 10 11:31 .. 34496 -rw-r-----. 1 root root 35316215 Aug 25 18:31 core.tcmu-runner.0.3083bbc32b6a43acb768b85818414867.4523.1629930681000000.lz4 36796 -rw-r-----. 1 root root 37671322 Aug 24 09:17 core.tcmu-runner.0.baf25867590c40da87305e67d5b97751.4521.1629811022000000.lz4 [root@cxcto-c240-j27-03 coredump]# ls -asl total 161188 4 drwxr-xr-x. 2 root root 4096 Aug 25 19:29 . 0 drwxr-xr-x. 5 root root 70 Aug 10 11:31 .. 45084 -rw-r-----. 1 root root 46159860 Aug 25 19:29 core.tcmu-runner.0.a276a2f5ee5a4d279917fd8c335c9b93.5281.1629934190000000.lz4 33468 -rw-r-----. 1 root root 34263834 Aug 24 16:08 core.tcmu-runner.0.a9df4a27b1ea43d09c6c254bb1e3447a.4209.1629835730000000.lz4 34212 -rw-r-----. 1 root root 35027795 Aug 25 03:43 core.tcmu-runner.0.cce93af5693444108993f0d48371197d.5564.1629877416000000.lz4 48420 -rw-r-----. 1 root root 49574566 Aug 24 10:03 core.tcmu-runner.0.e4f4ed6e35154c95b43f87b069380fbe.4091.1629813832000000.lz4 [root@cxcto-c240-j27-04 coredump]# ls -asl total 359240 4 drwxr-xr-x. 2 root root 4096 Aug 25 19:20 . 0 drwxr-xr-x. 5 root root 70 Aug 10 11:31 .. 31960 -rw-r-----. 1 root root 32720639 Aug 25 00:36 core.tcmu-runner.0.115ba6ee7acb42b8acfe2a1a958b5367.34161.1629866182000000.lz4 38516 -rw-r-----. 1 root root 39435484 Aug 25 19:20 core.tcmu-runner.0.4d43dd5cde9c4d44a96b2c744a9b43f4.4295.1629933615000000.lz4 81012 -rw-r-----. 1 root root 82951773 Aug 25 14:38 core.tcmu-runner.0.6998ff9717cf4e96932349eacd1d81bc.4274.1629916720000000.lz4 95872 -rw-r-----. 1 root root 98165539 Aug 23 17:02 core.tcmu-runner.0.9a28e301d6604d1a8eafbe12ae896c2f.4269.1629752547000000.lz4 111876 -rw-r-----. 1 root root 114554583 Aug 24 11:41 core.tcmu-runner.0.f9ea1331105b44f2b2f28dc0c1a7e653.5059.1629819705000000.lz4 [root@cxcto-c240-j27-05 coredump]# ls -asl total 115720 0 drwxr-xr-x. 2 root root 261 Aug 25 16:47 . 0 drwxr-xr-x. 5 root root 70 Aug 10 11:31 .. 44720 -rw-r-----. 1 root root 45786023 Aug 24 09:46 core.tcmu-runner.0.530b308c30534b9aa4e7619ff1ab869c.4145.1629812787000000.lz4 35032 -rw-r-----. 1 root root 35867165 Aug 24 17:52 core.tcmu-runner.0.5afb87334bd741699c6fd44ceb031128.5672.1629841939000000.lz4 35968 -rw-r-----. 1 root root 36826770 Aug 25 16:47 core.tcmu-runner.0.da66f3f24a624426a75cbe20758be879.5339.1629924435000000.lz4 -Paul On Aug 25, 2021, at 10:14 PM, Xiubo Li <xiubli@xxxxxxxxxx<mailto:xiubli@xxxxxxxxxx>> wrote: On 8/26/21 10:08 AM, Paul Giralt (pgiralt) wrote: Thanks Xiubo. I will try this. How do I set the log level to 4? It's in the /etc/tcmu/tcmu.cfg in the tcmu container. No need to restart the tcmu-runner service, the changes will be loaded by tcmu-runner daemon after the tcmu.cfg closed. -Paul On Aug 25, 2021, at 9:30 PM, Xiubo Li <xiubli@xxxxxxxxxx<mailto:xiubli@xxxxxxxxxx>> wrote: It's buggy, we need one way to export the tcmu-runner log to the host. Could you see any crash coredump from the host ? Without that could you keep running some commands like '$ tail -f XYZ/tcmu-runner.log' in a console from the tcmu containers, let's see could we get any useful logs ? At the same time please set the log_level to 4. If it's an experimental setup then you can just set the log_level to 5. I am not confident we can get any coredump from tcmu-runner.log, but at least we can get something else which may give us a clue. - Xiubo _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx