Re: krbd reboot hung

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



The "rbdmap" unit needs rbdmap and fstab to be configured for each volume, what if the map and mount are done by applications instead of the system unit? See, we don't write each volume info into /etc/ceph/rbdmap /etc/fstab, and if the "rbdmap" systemd unit is stopped unexpected, not by rebooting, then all rbd volumes will be umounted and unmapped, it's dangerous to applications.

On 1/25/19, 9:35 PM, "Jason Dillaman" <jdillama@xxxxxxxxxx> wrote:

    The "rbdmap" systemd unit file should take care of it [1].
    
    [1] https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fceph%2Fceph%2Fblob%2Fmaster%2Fsystemd%2Frbdmap.service.in%23L4&amp;data=02%7C01%7Cwenjgao%40ebay.com%7C00f03b3f52d744723c3008d682c9e7c9%7C46326bff992841a0baca17c16c94ea99%7C0%7C0%7C636840201026876922&amp;sdata=3s374%2BYvUojNprbfQu%2BoQuswpro%2BPGsAvoy%2FyDgrMw8%3D&amp;reserved=0
    
    On Fri, Jan 25, 2019 at 3:00 AM Gao, Wenjun <wenjgao@xxxxxxxx> wrote:
    >
    > Thanks, what’s the configuration you mentioned?
    >
    >
    >
    > --
    >
    > Thanks,
    >
    > Wenjun
    >
    >
    >
    > From: Gregory Farnum <gfarnum@xxxxxxxxxx>
    > Date: Friday, January 25, 2019 at 3:35 PM
    > To: "Gao, Wenjun" <wenjgao@xxxxxxxx>
    > Cc: "ceph-users@xxxxxxxxxxxxxx" <ceph-users@xxxxxxxxxxxxxx>
    > Subject: Re:  krbd reboot hung
    >
    >
    >
    > Looks like your network deactivated before the rbd volume was unmounted. This is a known issue without a good programmatic workaround and you’ll need to adjust your configuration.
    >
    > On Tue, Jan 22, 2019 at 9:17 AM Gao, Wenjun <wenjgao@xxxxxxxx> wrote:
    >
    > I’m using krbd to map a rbd device to a VM, it appears when the device is mounted, reboot OS will hung for more than 7min, in baremetal case, it could be more than 15min, even using the latest kernel 5.0.0, the problem still occurs.
    >
    > Here are the console logs with 4.15.18 kernel and mimic rbd client, reboot seems to be stuck in umount rbd operation
    >
    > [  OK  ] Stopped Update UTMP about System Boot/Shutdown.
    >
    > [  OK  ] Stopped Create Volatile Files and Directories.
    >
    > [  OK  ] Stopped target Local File Systems.
    >
    >          Unmounting /run/user/110281572...
    >
    >          Unmounting /var/tmp...
    >
    >          Unmounting /root/test...
    >
    >          Unmounting /run/user/78402...
    >
    >          Unmounting Configuration File System...
    >
    > [  OK  ] Stopped Configure read-only root support.
    >
    > [  OK  ] Unmounted /var/tmp.
    >
    > [  OK  ] Unmounted /run/user/78402.
    >
    > [  OK  ] Unmounted /run/user/110281572.
    >
    > [  OK  ] Stopped target Swap.
    >
    > [  OK  ] Unmounted Configuration File System.
    >
    > [  189.919062] libceph: mon4 XX.XX.XX.XX:6789 session lost, hunting for new mon
    >
    > [  189.950085] libceph: connect XX.XX.XX.XX:6789 error -101
    >
    > [  189.950764] libceph: mon3 XX.XX.XX.XX:6789 connect error
    >
    > [  190.687090] libceph: connect XX.XX.XX.XX:6789 error -101
    >
    > [  190.694197] libceph: mon3 XX.XX.XX.XX:6789 connect error
    >
    > [  191.711080] libceph: connect XX.XX.XX.XX:6789 error -101
    >
    > [  191.745254] libceph: mon3 XX.XX.XX.XX:6789 connect error
    >
    > [  193.695065] libceph: connect XX.XX.XX.XX:6789 error -101
    >
    > [  193.727694] libceph: mon3 XX.XX.XX.XX:6789 connect error
    >
    > [  197.087076] libceph: connect XX.XX.XX.XX:6789 error -101
    >
    > [  197.121077] libceph: mon4 XX.XX.XX.XX:6789 connect error
    >
    > [  197.663082] libceph: connect XX.XX.XX.XX:6789 error -101
    >
    > [  197.680671] libceph: mon4 XX.XX.XX.XX:6789 connect error
    >
    > [  198.687122] libceph: connect XX.XX.XX.XX:6789 error -101
    >
    > [  198.719253] libceph: mon4 XX.XX.XX.XX:6789 connect error
    >
    > [  200.671136] libceph: connect XX.XX.XX.XX:6789 error -101
    >
    > [  200.702717] libceph: mon4 XX.XX.XX.XX:6789 connect error
    >
    > [  204.703115] libceph: connect XX.XX.XX.XX:6789 error -101
    >
    > [  204.736586] libceph: mon4 XX.XX.XX.XX:6789 connect error
    >
    > [  209.887141] libceph: connect XX.XX.XX.XX:6789 error -101
    >
    > [  209.918721] libceph: mon0 XX.XX.XX.XX:6789 connect error
    >
    > [  210.719078] libceph: connect XX.XX.XX.XX:6789 error -101
    >
    > [  210.750378] libceph: mon0 XX.XX.XX.XX:6789 connect error
    >
    > [  211.679118] libceph: connect XX.XX.XX.XX:6789 error -101
    >
    > [  211.712246] libceph: mon0 XX.XX.XX.XX:6789 connect error
    >
    > [  213.663116] libceph: connect XX.XX.XX.XX:6789 error -101
    >
    > [  213.696943] libceph: mon0 XX.XX.XX.XX:6789 connect error
    >
    > [  217.695062] libceph: connect XX.XX.XX.XX:6789 error -101
    >
    > [  217.728511] libceph: mon0 XX.XX.XX.XX:6789 connect error
    >
    > [  225.759109] libceph: connect XX.XX.XX.XX:6789 error -101
    >
    > [  225.775869] libceph: mon0 XX.XX.XX.XX:6789 connect error
    >
    > [  233.951062] libceph: connect XX.XX.XX.XX:6789 error -101
    >
    > [  233.951997] libceph: mon3 XX.XX.XX.XX:6789 connect error
    >
    > [  234.719114] libceph: connect XX.XX.XX.XX:6789 error -101
    >
    > [  234.720083] libceph: mon3 XX.XX.XX.XX:6789 connect error
    >
    > [  235.679112] libceph: connect XX.XX.XX.XX:6789 error -101
    >
    > [  235.680060] libceph: mon3 XX.XX.XX.XX:6789 connect error
    >
    > [  237.663088] libceph: connect XX.XX.XX.XX:6789 error -101
    >
    > [  237.664121] libceph: mon3 XX.XX.XX.XX:6789 connect error
    >
    > [  241.695082] libceph: connect XX.XX.XX.XX:6789 error -101
    >
    > [  241.696500] libceph: mon3 XX.XX.XX.XX:6789 connect error
    >
    > [  249.823095] libceph: connect XX.XX.XX.XX:6789 error -101
    >
    > [  249.824101] libceph: mon3 XX.XX.XX.XX:6789 connect error
    >
    > [  264.671119] libceph: connect XX.XX.XX.XX:6789 error -101
    >
    > [  264.672102] libceph: mon0 XX.XX.XX.XX:6789 connect error
    >
    > [  265.695109] libceph: connect XX.XX.XX.XX:6789 error -101
    >
    > [  265.696106] libceph: mon0 XX.XX.XX.XX:6789 connect error
    >
    > [  266.719145] libceph: connect XX.XX.XX.XX:6789 error -101
    >
    > [  266.720204] libceph: mon0 XX.XX.XX.XX:6789 connect error
    >
    > [  268.703121] libceph: connect XX.XX.XX.XX:6789 error -101
    >
    > [  268.704110] libceph: mon0 XX.XX.XX.XX:6789 connect error
    >
    > [  272.671115] libceph: connect XX.XX.XX.XX:6789 error -101
    >
    > [  272.672159] libceph: mon0 XX.XX.XX.XX:6789 connect error
    >
    > [  281.055087] libceph: connect XX.XX.XX.XX:6789 error -101
    >
    > [  281.056577] libceph: mon0 XX.XX.XX.XX:6789 connect error
    >
    > [  294.879098] libceph: connect XX.XX.XX.XX:6789 error -101
    >
    > [  294.880230] libceph: mon3 XX.XX.XX.XX:6789 connect error
    >
    > [  295.711107] libceph: connect XX.XX.XX.XX:6789 error -101
    >
    > [  295.712102] libceph: mon3 XX.XX.XX.XX:6789 connect error
    >
    > [  296.671090] libceph: connect XX.XX.XX.XX:6789 error -101
    >
    > [  296.672082] libceph: mon3 XX.XX.XX.XX:6789 connect error
    >
    > [  298.719086] libceph: connect XX.XX.XX.XX:6789 error -101
    >
    > [  298.720027] libceph: mon3 XX.XX.XX.XX:6789 connect error
    >
    > [  302.687077] libceph: connect XX.XX.XX.XX:6789 error -101
    >
    > [  302.688103] libceph: mon3 XX.XX.XX.XX:6789 connect error
    >
    > [  310.751132] libceph: connect XX.XX.XX.XX:6789 error -101
    >
    > [  310.763103] libceph: mon3 XX.XX.XX.XX:6789 connect error
    >
    > [  325.087096] libceph: connect XX.XX.XX.XX:6789 error -101
    >
    > [  325.088045] libceph: mon0 XX.XX.XX.XX:6789 connect error
    >
    > [  325.663115] libceph: connect XX.XX.XX.XX:6789 error -101
    >
    > [  325.664046] libceph: mon0 XX.XX.XX.XX:6789 connect error
    >
    > [  326.687094] libceph: connect XX.XX.XX.XX:6789 error -101
    >
    > [  326.688120] libceph: mon0 XX.XX.XX.XX:6789 connect error
    >
    > [  328.671075] libceph: connect XX.XX.XX.XX:6789 error -101
    >
    > [  328.672157] libceph: mon0 XX.XX.XX.XX:6789 connect error
    >
    > [  332.703098] libceph: connect XX.XX.XX.XX:6789 error -101
    >
    > [  332.704027] libceph: mon0 XX.XX.XX.XX:6789 connect error
    >
    > [  340.959100] libceph: connect XX.XX.XX.XX:6789 error -101
    >
    > [  340.960055] libceph: mon0 XX.XX.XX.XX:6789 connect error
    >
    > [  355.807079] libceph: connect XX.XX.XX.XX:6789 error -101
    >
    > [  355.808025] libceph: mon4 XX.XX.XX.XX:6789 connect error
    >
    > [  356.703101] libceph: connect XX.XX.XX.XX:6789 error -101
    >
    > [  356.704094] libceph: mon4 XX.XX.XX.XX:6789 connect error
    >
    > [  357.663072] libceph: connect XX.XX.XX.XX:6789 error -101
    >
    > [  357.664025] libceph: mon4 XX.XX.XX.XX:6789 connect error
    >
    > [  359.711102] libceph: connect XX.XX.XX.XX:6789 error -101
    >
    > [  359.712063] libceph: mon4 XX.XX.XX.XX:6789 connect error
    >
    > [  363.679106] libceph: connect XX.XX.XX.XX:6789 error -101
    >
    > [  363.680063] libceph: mon4 XX.XX.XX.XX:6789 connect error
    >
    > [  369.631112] INFO: task umount:6489 blocked for more than 120 seconds.
    >
    > [  369.639665]       Not tainted 4.15.18 #1
    >
    > [  369.640417] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
    >
    > [  369.641909] umount          D    0  6489      1 0x00000004
    >
    > [  369.642842] Call Trace:
    >
    > [  369.643479]  ? __schedule+0x293/0x8a0
    >
    > [  369.644183]  ? out_of_line_wait_on_atomic_t+0x110/0x110
    >
    > [  369.645063]  schedule+0x32/0x80
    >
    > [  369.645760]  io_schedule+0x12/0x40
    >
    > [  369.646477]  bit_wait_io+0xd/0x50
    >
    > [  369.647167]  __wait_on_bit+0x5c/0x90
    >
    > [  369.647846]  out_of_line_wait_on_bit+0x8e/0xb0
    >
    > [  369.648668]  ? bit_waitqueue+0x30/0x30
    >
    > [  369.649409]  jbd2_write_superblock+0x11a/0x230 [jbd2]
    >
    > [  369.650275]  ? submit_bio+0x6e/0x140
    >
    > [  369.650961]  jbd2_journal_update_sb_log_tail+0x32/0x70 [jbd2]
    >
    > [  369.651880]  __jbd2_update_log_tail+0x35/0xf0 [jbd2]
    >
    > [  369.652704]  jbd2_cleanup_journal_tail+0x50/0xa0 [jbd2]
    >
    > [  369.653562]  jbd2_log_do_checkpoint+0x110/0x4b0 [jbd2]
    >
    > [  369.654415]  ? __schedule+0x29b/0x8a0
    >
    > [  369.655106]  ? prepare_to_wait_event+0x80/0x140
    >
    > [  369.655879]  jbd2_journal_destroy+0x116/0x280 [jbd2]
    >
    > [  369.656703]  ? remove_wait_queue+0x60/0x60
    >
    > [  369.657564]  ext4_put_super+0x76/0x3f0 [ext4]
    >
    > [  369.658357]  generic_shutdown_super+0x6c/0x120
    >
    > [  369.659136]  kill_block_super+0x21/0x50
    >
    > [  369.659836]  deactivate_locked_super+0x3f/0x70
    >
    > [  369.660607]  cleanup_mnt+0x3b/0x70
    >
    > [  369.661286]  task_work_run+0x88/0xa0
    >
    > [  369.661983]  exit_to_usermode_loop+0x6c/0xa3
    >
    > [  369.662741]  do_syscall_64+0x177/0x1a0
    >
    > [  369.663444]  entry_SYSCALL_64_after_hwframe+0x3d/0xa2
    >
    > [  369.664285] RIP: 0033:0x7f39f4a41f47
    >
    > [  369.664955] RSP: 002b:00007fff3fb03998 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6
    >
    > [  369.666288] RAX: 0000000000000000 RBX: 0000563f3abfa040 RCX: 00007f39f4a41f47
    >
    > [  369.667335] RDX: 0000000000000001 RSI: 0000000000000000 RDI: 0000563f3abffb80
    >
    > [  369.668381] RBP: 0000563f3abffb80 R08: 0000563f3abffaf0 R09: 0000000000000000
    >
    > [  369.669425] R10: 00007fff3fb03420 R11: 0000000000000246 R12: 00007f39f55bfd58
    >
    > [  369.670521] R13: 0000000000000000 R14: 0000563f3abfa140 R15: 0000563f3abfa040
    >
    > [  371.679142] libceph: connect XX.XX.XX.XX:6789 error -101
    >
    > [  371.680098] libceph: mon4 XX.XX.XX.XX:6789 connect error
    >
    > [  386.015126] libceph: connect XX.XX.XX.XX:6789 error -101
    >
    > [  386.016135] libceph: mon0 XX.XX.XX.XX:6789 connect error
    >
    > [  386.719109] libceph: connect XX.XX.XX.XX:6789 error -101
    >
    > [  386.720042] libceph: mon0 XX.XX.XX.XX:6789 connect error
    >
    > [  387.679128] libceph: connect XX.XX.XX.XX:6789 error -101
    >
    > [  387.680069] libceph: mon0 XX.XX.XX.XX:6789 connect error
    >
    > [  389.663096] libceph: connect XX.XX.XX.XX:6789 error -101
    >
    > [  389.664083] libceph: mon0 XX.XX.XX.XX:6789 connect error
    >
    > [  393.695118] libceph: connect XX.XX.XX.XX:6789 error -101
    >
    > [  393.696073] libceph: mon0 XX.XX.XX.XX:6789 connect error
    >
    > [  401.887064] libceph: connect XX.XX.XX.XX:6789 error -101
    >
    > [  401.887995] libceph: mon0 XX.XX.XX.XX:6789 connect error
    >
    > [  416.735084] libceph: connect XX.XX.XX.XX.10:6789 error -101
    >
    > [  416.736165] libceph: mon1 XX.XX.XX.XX.10:6789 connect error
    >
    > [  417.695092] libceph: connect XX.XX.XX.XX.10:6789 error -101
    >
    > [  417.703178] libceph: mon1 XX.XX.XX.XX.10:6789 connect error
    >
    > [  418.719156] libceph: connect XX.XX.XX.XX.10:6789 error -101
    >
    > [  418.720055] libceph: mon1 XX.XX.XX.XX.10:6789 connect error
    >
    > [  420.703086] libceph: connect XX.XX.XX.XX.10:6789 error -101
    >
    > [  420.703940] libceph: mon1 XX.XX.XX.XX.10:6789 connect error
    >
    > [  424.671082] libceph: connect XX.XX.XX.XX.10:6789 error -101
    >
    > [  424.672037] libceph: mon1 XX.XX.XX.XX.10:6789 connect error
    >
    > [  OK  ] Unmounted /root/test.
    >
    > [  OK  ] Reached target Unmount All Filesystems.
    >
    > [  OK  ] Stopped target Local File Systems (Pre).
    >
    > [  OK  ] Stopped Create Static Device Nodes in /dev.
    >
    > [  OK  ] Stopped Remount Root and Kernel File Systems.
    >
    > [  OK  ] Reached target Shutdown.
    >
    > [  431.016439] systemd-shutdown[1]: Syncing filesystems and block devices.
    >
    > [  433.119104] libceph: connect XX.XX.XX.XX.10:6789 error -101
    >
    > [  433.120595] libceph: mon1 XX.XX.XX.XX.10:6789 connect error
    >
    > _______________________________________________
    > ceph-users mailing list
    > ceph-users@xxxxxxxxxxxxxx
    > https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Flists.ceph.com%2Flistinfo.cgi%2Fceph-users-ceph.com&amp;data=02%7C01%7Cwenjgao%40ebay.com%7C00f03b3f52d744723c3008d682c9e7c9%7C46326bff992841a0baca17c16c94ea99%7C0%7C0%7C636840201026876922&amp;sdata=uOG2TYpomE6EmkWLPtTULrTQl6wWPgI9iS0kw9HdFik%3D&amp;reserved=0
    >
    > _______________________________________________
    > ceph-users mailing list
    > ceph-users@xxxxxxxxxxxxxx
    > https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Flists.ceph.com%2Flistinfo.cgi%2Fceph-users-ceph.com&amp;data=02%7C01%7Cwenjgao%40ebay.com%7C00f03b3f52d744723c3008d682c9e7c9%7C46326bff992841a0baca17c16c94ea99%7C0%7C0%7C636840201026876922&amp;sdata=uOG2TYpomE6EmkWLPtTULrTQl6wWPgI9iS0kw9HdFik%3D&amp;reserved=0
    
    
    
    -- 
    Jason
    

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux