Re: RBD over cache tier over EC pool: rbd rm doesn't remove objects

Irek Fasikhov <malmyzh@xxxxxxxxx> · Wed, 28 Jan 2015 14:13:38 +0000

Hi,Sage.

Yes, Firefly.
[root@ceph05 ~]# ceph --version
ceph version 0.80.8 (69eaad7f8308f21573c604f121956e64679a52a7)

Yes, I have seen this behavior.

[root@ceph08 ceph]# rbd info vm-160-disk-1
rbd image 'vm-160-disk-1':
        size 32768 MB in 8192 objects
        order 22 (4096 kB objects)
        block_name_prefix: rbd_data.179faf52eb141f2
        format: 2
        features: layering
        parent: rbd/base-145-disk-1@__base__
        overlap: 32768 MB
[root@ceph08 ceph]# rbd rm vm-160-disk-1
Removing image: 100% complete...done.
[root@ceph08 ceph]# rbd info vm-160-disk-1
2015-01-28 10:39:01.595785 7f1fbea9e760 -1 librbd::ImageCtx: error finding header: (2) No such file or directoryrbd: error opening image vm-160-disk-1: (2) No such file or directory

[root@ceph08 ceph]# rados -p rbdcache ls | grep 179faf52eb141f2 | wc
   5944    5944  249633
[root@ceph08 ceph]# rados -p rbdcache ls | grep 179faf52eb141f2 | wc
   5857    5857  245979
[root@ceph08 ceph]# rados -p rbd ls | grep 179faf52eb141f2 | wc
   4377    4377  183819
[root@ceph08 ceph]# rados -p rbdcache ls | grep 179faf52eb141f2 | wc
   5017    5017  210699
[root@ceph08 ceph]# rados -p rbdcache ls | grep 179faf52eb141f2 | wc
   5015    5015  210615
[root@ceph08 ceph]# rados -p rbd ls | grep 179faf52eb141f2 | wc
[root@ceph08 ceph]# rados -p rcachehe ls | grep 179faf52eb141f2 | wc
   1986    1986   83412
[root@ceph08 ceph]# rados -p rbd ls | grep 179faf52eb141f2 | wc
    981     981   41202
[root@ceph08 ceph]# rados -p rbd ls | grep 179faf52eb141f2 | wc
    802     802   33684
[root@ceph08 ceph]# rados -p rbdcache ls | grep 179faf52eb141f2 | wc
   1611    1611   67662

Thank, Sage!

Tue Jan 27 2015 at 7:01:43 PM, Sage Weil <sage@xxxxxxxxxxxx>:
On Tue, 27 Jan 2015, Irek Fasikhov wrote:

> Hi,All.

> Indeed, there is a problem. Removed 1 TB of data space on a cluster is not

> cleared. This feature of the behavior or a bug? And how long will it be

> cleaned?

Your subject says cache tier but I don't see it in the 'ceph df' output

below.  The cache tiers will store 'whiteout' objects that cache object

non-existence that could be delaying some deletion.  You can wrangle the

cluster into flushing those with

 ceph osd pool set <cachepool> cache_target_dirty_ratio .05

(though you'll probably want to change it back to the default .4 later).

If there's no cache tier involved, there may be another problem.  What

version is this?  Firefly?

sage

>

> Sat Sep 20 2014 at 8:19:24 AM, Mika?l Cluseau <mcluseau@xxxxxx>:

>       Hi all,

>

>       I have weird behaviour on my firefly "test + convenience

>       storage" cluster. It consists of 2 nodes with a light imbalance

>       in available space:

>

>       # id    weight    type name    up/down    reweight

>       -1    14.58    root default

>       -2    8.19        host store-1

>       1    2.73            osd.1    up    1   

>       0    2.73            osd.0    up    1   

>       5    2.73            osd.5    up    1   

>       -3    6.39        host store-2

>       2    2.73            osd.2    up    1   

>       3    2.73            osd.3    up    1   

>       4    0.93            osd.4    up    1   

>

>       I used to store ~8TB of rbd volumes, coming to a near-full

>       state. There was some annoying "stuck misplaced" PGs so I began

>       to remove 4.5TB of data; the weird thing is: the space hasn't

>       been reclaimed on the OSDs, they keeped stuck around 84% usage.

>       I tried to move PGs around and it happens that the space is

>       correctly "reclaimed" if I take an OSD out, let him empty it XFS

>       volume and then take it in again.

>

>       I'm currently applying this to and OSD in turn, but I though it

>       could be worth telling about this. The current ceph df output

>       is:

>

>       GLOBAL:

>           SIZE       AVAIL     RAW USED     %RAW USED

>           12103G     5311G     6792G        56.12    

>       POOLS:

>           NAME                 ID     USED       %USED     OBJECTS

>           data                 0      0          0         0      

>           metadata             1      0          0         0      

>           rbd                  2      444G       3.67      117333 

>       [...]

>           archives-ec          14     3628G      29.98     928902 

>           archives             15     37518M     0.30      273167

>

>       Before "just moving data", AVAIL was around 3TB.

>

>       I finished the process with the OSDs on store-1, who show the

>       following space usage now:

>

>       /dev/sdb1             2.8T  1.4T  1.4T  50%

>       /var/lib/ceph/osd/ceph-0

>       /dev/sdc1             2.8T  1.3T  1.5T  46%

>       /var/lib/ceph/osd/ceph-1

>       /dev/sdd1             2.8T  1.3T  1.5T  48%

>       /var/lib/ceph/osd/ceph-5

>

>       I'm currently fixing OSD 2, 3 will be the last one to be fixed.

>       The df on store-2 shows the following:

>

>       /dev/sdb1               2.8T  1.9T  855G  70%

>       /var/lib/ceph/osd/ceph-2

>       /dev/sdc1               2.8T  2.4T  417G  86%

>       /var/lib/ceph/osd/ceph-3

>       /dev/sdd1               932G  481G  451G  52%

>       /var/lib/ceph/osd/ceph-4

>

>       OSD 2 was at 84% 3h ago, and OSD 3 was ~75%.

>

>       During rbd rm (that took a bit more that 3 days), ceph log was

>       showing things like that:

>

>       2014-09-03 16:17:38.831640 mon.0 192.168.1.71:6789/0 417194 :

>       [INF] pgmap v14953987: 3196 pgs: 2882 active+clean, 314

>       active+remapped; 7647 GB data, 11067 GB used, 3828 GB / 14896 GB

>       avail; 0 B/s rd, 6778 kB/s wr, 18 op/s; -5/5757286 objects

>       degraded (-0.000%)

>       [...]

>       2014-09-05 03:09:59.895507 mon.0 192.168.1.71:6789/0 513976 :

>       [INF] pgmap v15050766: 3196 pgs: 2882 active+clean, 314

>       active+remapped; 6010 GB data, 11156 GB used, 3740 GB / 14896 GB

>       avail; 0 B/s rd, 0 B/s wr, 8 op/s; -388631/5247320 objects

>       degraded (-7.406%)

>       [...]

>       2014-09-06 03:56:50.008109 mon.0 192.168.1.71:6789/0 580816 :

>       [INF] pgmap v15117604: 3196 pgs: 2882 active+clean, 314

>       active+remapped; 4865 GB data, 11207 GB used, 3689 GB / 14896 GB

>       avail; 0 B/s rd, 6117 kB/s wr, 22 op/s; -706519/3699415 objects

>       degraded (-19.098%)

>       2014-09-06 03:56:44.476903 osd.0 192.168.1.71:6805/11793 729 :

>       [WRN] 1 slow requests, 1 included below; oldest blocked for >

>       30.058434 secs

>       2014-09-06 03:56:44.476909 osd.0 192.168.1.71:6805/11793 730 :

>       [WRN] slow request 30.058434 seconds old, received at 2014-09-06

>       03:56:14.418429: osd_op(client.19843278.0:46081

>       rb.0.c7fd7f.238e1f29.00000000b3fa [delete] 15.b8fb7551

>       ack+ondisk+write e38950) v4 currently waiting for blocked object

>       2014-09-06 03:56:49.477785 osd.0 192.168.1.71:6805/11793 731 :

>       [WRN] 2 slow requests, 1 included below; oldest blocked for >

>       35.059315 secs

>       [... stabilizes here:]

>       2014-09-06 22:13:48.771531 mon.0 192.168.1.71:6789/0 632527 :

>       [INF] pgmap v15169313: 3196 pgs: 2882 active+clean, 314

>       active+remapped; 4139 GB data, 11215 GB used, 3681 GB / 14896 GB

>       avail; 64 B/s rd, 64 B/s wr, 0 op/s; -883219/3420796 objects

>       degraded (-25.819%)

>       [...]

>       2014-09-07 03:09:48.491325 mon.0 192.168.1.71:6789/0 633880 :

>       [INF] pgmap v15170666: 3196 pgs: 2882 active+clean, 314

>       active+remapped; 4139 GB data, 11215 GB used, 3681 GB / 14896 GB

>       avail; 18727 B/s wr, 2 op/s; -883219/3420796 objects degraded

>       (-25.819%)

>

>       And now, during data movement I described before:

>

>       2014-09-20 15:16:13.394694 mon.0 [INF] pgmap v15344707: 3196

>       pgs: 2132 active+clean, 432 active+remapped+wait_backfill, 621

>       active+remapped, 11 active+remapped+backfilling; 4139 GB data,

>       6831 GB used, 5271 GB / 12103 GB avail; 379097/3792969 objects

>       degraded (9.995%)

>

>       If some ceph developer wants me to do something or to provide

>       some data, please say so quickly, I will probably process OSD 3

>       in ~16-20h.

>       (of course, I'd prefer not loose the data btw :-))

> _______________________________________________

> ceph-users mailing list

> ceph-users@xxxxxxxxxxxxxx

> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

>

>

> 
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com