On Thu, Jul 6, 2017 at 2:43 PM, Ilya Dryomov <idryomov@xxxxxxxxx> wrote: > On Thu, Jul 6, 2017 at 2:23 PM, Stanislav Kopp <staskopp@xxxxxxxxx> wrote: >> 2017-07-06 14:16 GMT+02:00 Ilya Dryomov <idryomov@xxxxxxxxx>: >>> On Thu, Jul 6, 2017 at 1:28 PM, Stanislav Kopp <staskopp@xxxxxxxxx> wrote: >>>> Hi, >>>> >>>> 2017-07-05 20:31 GMT+02:00 Ilya Dryomov <idryomov@xxxxxxxxx>: >>>>> On Wed, Jul 5, 2017 at 7:55 PM, Stanislav Kopp <staskopp@xxxxxxxxx> wrote: >>>>>> Hello, >>>>>> >>>>>> I have problem that sometimes I can't unmap rbd device, I get "sysfs >>>>>> write failed rbd: unmap failed: (16) Device or resource busy", there >>>>>> is no open files and "holders" directory is empty. I saw on the >>>>>> mailling list that you can "force" unmapping the device, but I cant >>>>>> find how does it work. "man rbd" only mentions "force" as "KERNEL RBD >>>>>> (KRBD) OPTION", but "modinfo rbd" doesn't show this option. Did I miss >>>>>> something? >>>>> >>>>> Forcing unmap on an open device is not a good idea. I'd suggest >>>>> looking into what's holding the device and fixing that instead. >>>> >>>> We use pacemaker's resource agent for rbd mount/unmount >>>> (https://github.com/ceph/ceph/blob/master/src/ocf/rbd.in) >>>> I've reproduced the failure again and now saw in ps output that there >>>> is still unmout fs process in D state: >>>> >>>> root 29320 0.0 0.0 21980 1272 ? D 09:18 0:00 >>>> umount /export/rbd1 >>>> >>>> this explains rbd unmap problem, but strange enough I don't see this >>>> mount in /proc/mounts, so it looks like it was successfully unmounted, >>>> if I try to strace the "umount" procces it hung (the strace, with no >>>> output), looks like kernel problem? Do you have some tips for further >>>> debugging? >>> >>> Check /sys/kernel/debug/ceph/<cluster-fsid.client-id>/osdc. It lists >>> in-flight requests, that's what umount is blocked on. >> >> I see this in my output, but don't know what does it means honestly: >> >> root@nfs-test01:~# cat >> /sys/kernel/debug/ceph/4f23f683-21e6-49f3-ae2c-c95b150b9dc6.client138566/osdc >> REQUESTS 2 homeless 0 >> 658 osd9 0.75514984 [9,1,6]/9 [9,1,6]/9 >> rbd_data.6e28c6b8b4567.0000000000000000 0x400024 10'0 >> set-alloc-hint,write >> 659 osd15 0.40f1ea02 [15,7,9]/15 [15,7,9]/15 >> rbd_data.6e28c6b8b4567.0000000000000001 0x400024 10'0 >> set-alloc-hint,write > > It means you have two pending writes (OSD requests), to osd9 and osd15. > What is the output of > > $ ceph -s > $ ceph pg dump pgs_brief Stanislav and I tracked this down to a pacemaker misconfiguration: "... the problem was wrong netmask, we use /22 for this network, but primary interface of machine was configured with /23 and the VIP even with /24, because of that, the VIP was often the first interface which caused unmount problem, because it was stopped as first resource in fail-over situation. After fixing netmask for both VIP and machine's IP it's working without issues (at least it never fails since)." Thanks, Ilya _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com