Re: RBD over cache tier over EC pool: rbd rm doesn't remove objects

Sage Weil <sage@xxxxxxxxxxxx> · Tue, 27 Jan 2015 08:01:42 -0800 (PST)

On Tue, 27 Jan 2015, Irek Fasikhov wrote:
> Hi,All.
> Indeed, there is a problem. Removed 1 TB of data space on a cluster is not
> cleared. This feature of the behavior or a bug? And how long will it be
> cleaned?

Your subject says cache tier but I don't see it in the 'ceph df' output 
below.  The cache tiers will store 'whiteout' objects that cache object 
non-existence that could be delaying some deletion.  You can wrangle the 
cluster into flushing those with

 ceph osd pool set <cachepool> cache_target_dirty_ratio .05

(though you'll probably want to change it back to the default .4 later).

If there's no cache tier involved, there may be another problem.  What 
version is this?  Firefly?

sage

> 
> Sat Sep 20 2014 at 8:19:24 AM, Mika?l Cluseau <mcluseau@xxxxxx>:
>       Hi all,
> 
>       I have weird behaviour on my firefly "test + convenience
>       storage" cluster. It consists of 2 nodes with a light imbalance
>       in available space:
> 
>       # id    weight    type name    up/down    reweight
>       -1    14.58    root default
>       -2    8.19        host store-1
>       1    2.73            osd.1    up    1   
>       0    2.73            osd.0    up    1   
>       5    2.73            osd.5    up    1   
>       -3    6.39        host store-2
>       2    2.73            osd.2    up    1   
>       3    2.73            osd.3    up    1   
>       4    0.93            osd.4    up    1   
> 
>       I used to store ~8TB of rbd volumes, coming to a near-full
>       state. There was some annoying "stuck misplaced" PGs so I began
>       to remove 4.5TB of data; the weird thing is: the space hasn't
>       been reclaimed on the OSDs, they keeped stuck around 84% usage.
>       I tried to move PGs around and it happens that the space is
>       correctly "reclaimed" if I take an OSD out, let him empty it XFS
>       volume and then take it in again.
> 
>       I'm currently applying this to and OSD in turn, but I though it
>       could be worth telling about this. The current ceph df output
>       is:
> 
>       GLOBAL:
>           SIZE       AVAIL     RAW USED     %RAW USED
>           12103G     5311G     6792G        56.12    
>       POOLS:
>           NAME                 ID     USED       %USED     OBJECTS
>           data                 0      0          0         0      
>           metadata             1      0          0         0      
>           rbd                  2      444G       3.67      117333 
>       [...]
>           archives-ec          14     3628G      29.98     928902 
>           archives             15     37518M     0.30      273167
> 
>       Before "just moving data", AVAIL was around 3TB.
> 
>       I finished the process with the OSDs on store-1, who show the
>       following space usage now:
> 
>       /dev/sdb1             2.8T  1.4T  1.4T  50%
>       /var/lib/ceph/osd/ceph-0
>       /dev/sdc1             2.8T  1.3T  1.5T  46%
>       /var/lib/ceph/osd/ceph-1
>       /dev/sdd1             2.8T  1.3T  1.5T  48%
>       /var/lib/ceph/osd/ceph-5
> 
>       I'm currently fixing OSD 2, 3 will be the last one to be fixed.
>       The df on store-2 shows the following:
> 
>       /dev/sdb1               2.8T  1.9T  855G  70%
>       /var/lib/ceph/osd/ceph-2
>       /dev/sdc1               2.8T  2.4T  417G  86%
>       /var/lib/ceph/osd/ceph-3
>       /dev/sdd1               932G  481G  451G  52%
>       /var/lib/ceph/osd/ceph-4
> 
>       OSD 2 was at 84% 3h ago, and OSD 3 was ~75%.
> 
>       During rbd rm (that took a bit more that 3 days), ceph log was
>       showing things like that:
> 
>       2014-09-03 16:17:38.831640 mon.0 192.168.1.71:6789/0 417194 :
>       [INF] pgmap v14953987: 3196 pgs: 2882 active+clean, 314
>       active+remapped; 7647 GB data, 11067 GB used, 3828 GB / 14896 GB
>       avail; 0 B/s rd, 6778 kB/s wr, 18 op/s; -5/5757286 objects
>       degraded (-0.000%)
>       [...]
>       2014-09-05 03:09:59.895507 mon.0 192.168.1.71:6789/0 513976 :
>       [INF] pgmap v15050766: 3196 pgs: 2882 active+clean, 314
>       active+remapped; 6010 GB data, 11156 GB used, 3740 GB / 14896 GB
>       avail; 0 B/s rd, 0 B/s wr, 8 op/s; -388631/5247320 objects
>       degraded (-7.406%)
>       [...]
>       2014-09-06 03:56:50.008109 mon.0 192.168.1.71:6789/0 580816 :
>       [INF] pgmap v15117604: 3196 pgs: 2882 active+clean, 314
>       active+remapped; 4865 GB data, 11207 GB used, 3689 GB / 14896 GB
>       avail; 0 B/s rd, 6117 kB/s wr, 22 op/s; -706519/3699415 objects
>       degraded (-19.098%)
>       2014-09-06 03:56:44.476903 osd.0 192.168.1.71:6805/11793 729 :
>       [WRN] 1 slow requests, 1 included below; oldest blocked for >
>       30.058434 secs
>       2014-09-06 03:56:44.476909 osd.0 192.168.1.71:6805/11793 730 :
>       [WRN] slow request 30.058434 seconds old, received at 2014-09-06
>       03:56:14.418429: osd_op(client.19843278.0:46081
>       rb.0.c7fd7f.238e1f29.00000000b3fa [delete] 15.b8fb7551
>       ack+ondisk+write e38950) v4 currently waiting for blocked object
>       2014-09-06 03:56:49.477785 osd.0 192.168.1.71:6805/11793 731 :
>       [WRN] 2 slow requests, 1 included below; oldest blocked for >
>       35.059315 secs
>       [... stabilizes here:]
>       2014-09-06 22:13:48.771531 mon.0 192.168.1.71:6789/0 632527 :
>       [INF] pgmap v15169313: 3196 pgs: 2882 active+clean, 314
>       active+remapped; 4139 GB data, 11215 GB used, 3681 GB / 14896 GB
>       avail; 64 B/s rd, 64 B/s wr, 0 op/s; -883219/3420796 objects
>       degraded (-25.819%)
>       [...]
>       2014-09-07 03:09:48.491325 mon.0 192.168.1.71:6789/0 633880 :
>       [INF] pgmap v15170666: 3196 pgs: 2882 active+clean, 314
>       active+remapped; 4139 GB data, 11215 GB used, 3681 GB / 14896 GB
>       avail; 18727 B/s wr, 2 op/s; -883219/3420796 objects degraded
>       (-25.819%)
> 
>       And now, during data movement I described before:
> 
>       2014-09-20 15:16:13.394694 mon.0 [INF] pgmap v15344707: 3196
>       pgs: 2132 active+clean, 432 active+remapped+wait_backfill, 621
>       active+remapped, 11 active+remapped+backfilling; 4139 GB data,
>       6831 GB used, 5271 GB / 12103 GB avail; 379097/3792969 objects
>       degraded (9.995%)
> 
>       If some ceph developer wants me to do something or to provide
>       some data, please say so quickly, I will probably process OSD 3
>       in ~16-20h.
>       (of course, I'd prefer not loose the data btw :-))
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
> 
> 
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com