Samuel, Back again. I converted my cluster to use 24 filestore OSDs, and ran the following test three times: rbd -p ssd_replica create --size 100G image1 rbd --pool ssd_replica bench-write --io-size 2M --io-threads 16 --io-total 100G --io-pattern seq image1 rbd -p ssd_replica rm image1 and in each case the rbd image was removed successfully, rados showed no leftover objects other than 'rbd_directory' which I could remove with rados. (The pool is empty other than for this one image.) I then converted the cluster back to all-bluestore OSDs. For my first run I only created a 10G-sized image1 object, and that was removed successfully. I then repeated the above with 100G, and the problem reappeared, I again have objects I cannot remove. (bluestore warnings removed for brevity). [root@alpha1-p200 ~]# rbd -p ssd_replica ls image1 [root@alpha1-p200 ~]# rbd -p ssd_replica rm image1 Removing image: 100% complete...done. [root@alpha1-p200 ~]# rbd -p ssd_replica ls image1 [root@alpha1-p200 ~]# rados -p ssd_replica ls | wc -l 8582 This is slightly different than last time in that the rbd image 'image1' still appears in the pool, I don't get the "No such file or directory" errors anymore for the rbd image. But I do get those errors for the leftover objects that make up the image when I try to remove them: [root@alpha1-p200 ~]# rados -p ssd_replica rm rbd_data.112d238e1f29.0000000000002d3d error removing ssd_replica>rbd_data.112d238e1f29.0000000000002d3d: (2) No such file or directory I'm not sure where to go from here. Suggestions? Kevan On 5/24/16, 4:11 PM, "Kevan Rehm" <krehm@xxxxxxxx> wrote: >Okay, will do. If the problem goes away with filestore, I'll switch back >to bluestore again and re-duplicate the problem. In that case, are there >particular things you would like me to collect? Or clues I should look >for in logs? > >Thanks, Kevan > >On 5/24/16, 4:06 PM, "Samuel Just" <sjust@xxxxxxxxxx> wrote: > >>My money is on bluestore. If you can try to reproduce on filestore, >>that would rapidly narrow it down. >>-Sam >> >>On Tue, May 24, 2016 at 1:53 PM, Kevan Rehm <krehm@xxxxxxxx> wrote: >>> Nope, not using tiering. >>> >>> Also, this is my second attempt, this is repeatable for me, I'm trying >>>to >>> duplicate a previous occurrence of this same problem to collect useful >>> debug data. In the previous case, I was eventually able to get rid of >>>the >>> objects (but have forgotten how), but that was followed by 22 of the 24 >>> OSDs crashing hard. Took me quite a while to re-deploy and get it >>>working >>> again. I want to get a small, repeatable example for the Ceph guys to >>> look at, assuming it's a bug. >>> >>> Don't know if it's related to the bluestore OSDs or not, still getting >>>my >>> feet wet with Ceph. >>> >>> Kevan >>> >>> On 5/24/16, 3:47 PM, "Jason Dillaman" <jdillama@xxxxxxxxxx> wrote: >>> >>>>Any chance you are using cache tiering? It's odd that you can see the >>>>objects through "rados ls" but cannot delete them with "rados rm". >>>> >>>>On Tue, May 24, 2016 at 4:34 PM, Kevan Rehm <krehm@xxxxxxxx> wrote: >>>>> Greetings, >>>>> >>>>> I have a small Ceph 10.2.1 test cluster using a 3-replicate pool >>>>>based >>>>>on 24 >>>>> SSDs configured with bluestore. I created and wrote an rbd image >>>>>called >>>>> "image1", then deleted the image again. >>>>> >>>>> rbd -p ssd_replica create --size 100G image1 >>>>> >>>>> rbd --pool ssd_replica bench-write --io-size 2M --io-threads 16 >>>>>--io-total >>>>> 100G --io-pattern seq image1 >>>>> >>>>> rbd -p ssd_replica rm image1 >>>>> >>>>> >>>>> The rbd rm command completed successfully, but not all the >>>>>image-related >>>>> files disappeared from the pool. The pool still contains files: >>>>> >>>>> >>>>> rbd_directory >>>>> >>>>> rbd_id.image1 >>>>> >>>>> rbd_object_map.15ed238e1f29 >>>>> >>>>> >>>>> plus 2938 other objects of the form >>>>> "rbd_data.15ed238e1f29.000000000000xxxx". >>>>> >>>>> Attempting to re-delete image1 does not work. Attempting to directly >>>>>delete >>>>> object rbd_id.image1 and one of the data objects doesn't work either. >>>>>The >>>>> commands all fail with "No such file or directory" yet they are >>>>>obviously >>>>> there. (Warning messages stripped for brevity.) >>>>> >>>>> [root@alpha1-p200 fio]# rbd -p ssd_replica rm image1 >>>>> >>>>> Removing image: 0% complete...failed. >>>>> >>>>> rbd: delete error: (2) No such file or directory >>>>> >>>>> [root@alpha1-p200 fio]# rados -p ssd_replica rm rbd_id.image1 >>>>> >>>>> error removing ssd_replica>rbd_id.image1: (2) No such file or >>>>>directory >>>>> >>>>> [root@alpha1-p200 fio]# rados -p ssd_replica rm >>>>> rbd_data.15ed238e1f29.0000000000005b56 >>>>> >>>>> error removing ssd_replica>rbd_data.15ed238e1f29.0000000000005b56: >>>>>(2) >>>>>No >>>>> such file or directory >>>>> >>>>> >>>>> Is this a bug, and if so, where should I look for the cause? Anyone >>>>>know >>>>> how to delete these objects? >>>>> >>>>> Thanks, Kevan >>>>> >>>>> >>>>> >>>>> >>>>> _______________________________________________ >>>>> ceph-users mailing list >>>>> ceph-users@xxxxxxxxxxxxxx >>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>>>> >>>> >>>> >>>> >>>>-- >>>>Jason >>> >>> _______________________________________________ >>> ceph-users mailing list >>> ceph-users@xxxxxxxxxxxxxx >>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com