Unfortunately I have rebooted the server, as I needed the services back online. I did try mapping and unmapping again after reboot and did not see the problem anymore. However, I will search through my logs and send you everything from Feb 3 - Feb 5. And if I see the issue again, I will follow all the debug steps described in this thread and post it here. In the mean time, I have upgraded to the next minor revision from your dragonfly-debian archives. So maybe I do not see the problem anymore due to that. Many thanks for your help! Regards, Markus On Tue, Nov 24, 2015 at 12:51 PM, Ilya Dryomov <idryomov@xxxxxxxxx> wrote: > On Tue, Nov 24, 2015 at 12:49 PM, Ilya Dryomov <idryomov@xxxxxxxxx> wrote: >> On Tue, Nov 24, 2015 at 12:12 AM, Markus Kienast <elias1884@xxxxxxxxx> wrote: >>> Kernel Version >>> elias@paris3:~$ uname -a >>> Linux paris3.sfe.tv 3.16.0-28-generic #38-Ubuntu SMP Sat Dec 13 >>> 16:13:28 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux >>> >>> Output of dmesg and /var/log/dmesg attached. >>> But does not show much except for one mon being down. >>> The mon is down for hardware reasons. >>> >>> >>> >>> On Mon, Nov 23, 2015 at 11:26 PM, Ilya Dryomov <idryomov@xxxxxxxxx> wrote: >>>> >>>> On Mon, Nov 23, 2015 at 11:03 PM, Markus Kienast <mark@xxxxxxxxxxxxx> wrote: >>>> > I am having the same issue here. >>>> >>>> Which kernel are you running? Could you attach your dmesg? >>>> >>>> > >>>> > root@paris3:/etc/neutron# rbd unmap /dev/rbd0 >>>> > rbd: failed to remove rbd device: (16) Device or resource busy >>>> > rbd: remove failed: (16) Device or resource busy >>>> > >>>> > root@paris3:/etc/neutron# rbd info -p volumes >>>> > volume-f3ab6892-f35e-4b98-8832-efbaaa2f4ca2 >>>> > 2015-11-23 22:42:06.842697 7f2d57e49700 0 -- :/2760503703 >> >>>> > 10.90.90.4:6789/0 pipe(0x1773250 sd=3 :0 s=1 pgs=0 cs=0 l=1 >>>> > c=0x17734e0).fault >>>> > rbd image 'volume-f3ab6892-f35e-4b98-8832-efbaaa2f4ca2': >>>> > size 500 GB in 128000 objects >>>> > order 22 (4096 kB objects) >>>> > block_name_prefix: rbd_data.1b6d9e2aaa998b >>>> > format: 2 >>>> > features: layering >>>> > root@paris3:/etc/neutron# rados -p volumes listwatchers >>>> > rbd_header.1b6d9e2aaa998b >>>> > 2015-11-23 22:42:58.546723 7fec94fec700 0 -- :/2519796249 >> >>>> > 10.90.90.4:6789/0 pipe(0x9cf260 sd=3 :0 s=1 pgs=0 cs=0 l=1 c=0x9cf4f0).fault >>>> >>>> Did you root cause these faults? >>> >>> Hardware failure caused these faults. >>> >>>> >>>> > watcher=10.90.90.3:0/3293327848 client.8471177 cookie=1 >>>> > >>>> > root@paris3:/etc/neutron# ps ax | grep rbd >>>> > 7814 ? S 0:00 [jbd2/rbd0-8] >>>> >>>> Was there an ext filesystem involved? How was it umounted - do you >>>> have a "umount <mountpoint>" process stuck in D state? >>> >>> Yes, all these RBDs are formatted with ext4. I am regularly using them >>> with openstack and have never had any problems. >>> I did "unmount <mountpoint>" and the unmount process did actually >>> finish just fine. >>> Where can I look up, if it is stuck in "D" state? >>> >>>> >>>> > 11003 ? S 0:00 [jbd2/rbd1-8] >>>> > 14042 ? S 0:00 [jbd2/rbd2p1-8] >>>> > 24228 ? S 0:00 [jbd2/rbd3-8] >>>> > >>>> > root@paris3:/etc/neutron# ceph --version >>>> > ceph version 0.80.11 (8424145d49264624a3b0a204aedb127835161070) >>>> > >>>> > root@paris3:/etc/neutron# ls /sys/block/rbd0/holders/ >>>> > returns nothing >>>> > >>>> > root@paris3:/etc/neutron# fuser -amv /dev/rbd0 >>>> > USER PID ACCESS COMMAND >>>> > /dev/rbd0: >>>> >>>> What's the output of "cat /sys/bus/rbd/devices/0/client_id"? >>> >>> root@paris3:~# cat /sys/bus/rbd/devices/0/client_id >>> client8471177 >>> >>>> >>>> What's the output of "sudo cat /sys/kernel/debug/ceph/*/osdc"? >>> >>> root@paris3:~# ls -l /sys/kernel/debug/ceph/ >>> total 0 >>> drwxr-xr-x 2 root root 0 Feb 4 2015 >>> 32ba3117-e320-49fc-aabd-f100d5a7e94b.client7663711 >>> drwxr-xr-x 2 root root 0 Nov 23 11:41 >>> 32ba3117-e320-49fc-aabd-f100d5a7e94b.client8471177 >>> >>> root@paris3:~# cat >>> /sys/kernel/debug/ceph/32ba3117-e320-49fc-aabd-f100d5a7e94b.client8471177/osdc >>> has no output >> >> This means there are no outstanding/hung rbd I/Os. According to you, >> umount completed successfully, and yet there is a jbd2/rbd0-8 kthread >> hanging around, keeping /dev/rbd0 open and holding a ref to it. >> A quick search produced two similar reports: >> >> [1] https://ask.fedoraproject.org/en/question/7572/how-to-stop-kernel-ext4-journaling-thread/ >> [2] http://lists.openwall.net/linux-ext4/2015/10/24/11 >> >> The only difference as far I can tell is those people noticed the jbd2 >> thread because they wanted to run fsck, while you ran into it because >> you tried to do "rbd unmap". Neither mentions rbd. >> >> Look at [2], did you at any point see any similar errors in dmesg? >> >>> >>> root@paris3:~# cat >>> /sys/kernel/debug/ceph/32ba3117-e320-49fc-aabd-f100d5a7e94b.client7663711/osdc >>> hangs with no output >> >> It shouldn't hang, so it could be unrelated. Given the "Feb 4 2015" > > It should read "so it could be related", of course. > > Thanks, > > Ilya -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html