Re: Ceph space problem, garbage collector ?

Samuel Just <sam.just@xxxxxxxxxxx> · Tue, 10 Sep 2013 11:19:43 -0700



Can you post the rest of you crush map?
-Sam

On Tue, Sep 10, 2013 at 5:52 AM, Olivier Bonvalet <ceph.list@xxxxxxxxx> wrote:
> I also checked that all files in that PG still are on that PG :
>
> for IMG in `find . -type f -printf '%f\n' | awk -F '__' '{ print $1 }' |
> sort --unique` ; do echo -n "$IMG "; ceph osd map ssd3copies $IMG | grep
> -v 6\\.31f ; echo ; done
>
> And all objects are referenced in rados (compared with "rados --pool
> ssd3copies ls rados.ssd3copies.dump").
>
>
>
> Le mardi 10 septembre 2013 à 13:46 +0200, Olivier Bonvalet a écrit :
>> Some additionnal informations : if I look on one PG only, for example
>> the 6.31f. "ceph pg dump" report a size of 616GB :
>>
>> # ceph pg dump | grep ^6\\. | awk '{ SUM+=($6/1024/1024) } END { print SUM }'
>> 631717
>>
>> But on disk, on the 3 replica I have :
>> # du -sh  /var/lib/ceph/osd/ceph-50/current/6.31f_head/
>> 1,3G  /var/lib/ceph/osd/ceph-50/current/6.31f_head/
>>
>> Since I was suspected a snapshot problem, I try to count only "head
>> files" :
>> # find /var/lib/ceph/osd/ceph-50/current/6.31f_head/ -type f -name '*head*' -print0 | xargs -r -0 du -hc | tail -n1
>> 448M  total
>>
>> and the content of the directory : http://pastebin.com/u73mTvjs
>>
>>
>> Le mardi 10 septembre 2013 à 10:31 +0200, Olivier Bonvalet a écrit :
>> > Hi,
>> >
>> > I have a space problem on a production cluster, like if there is unused
>> > data not freed : "ceph df" and "rados df" reports 613GB of data, and
>> > disk usage is 2640GB (with 3 replica). It should be near 1839GB.
>> >
>> >
>> > I have 5 hosts, 3 with SAS storage and 2 with SSD storage. I use crush
>> > rules to put pools on SAS or on SSD.
>> >
>> > My pools :
>> > # ceph osd dump | grep ^pool
>> > pool 0 'data' rep size 3 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 576 pgp_num 576 last_change 68315 owner 0 crash_replay_interval 45
>> > pool 1 'metadata' rep size 3 min_size 1 crush_ruleset 1 object_hash rjenkins pg_num 576 pgp_num 576 last_change 68317 owner 0
>> > pool 2 'rbd' rep size 3 min_size 1 crush_ruleset 2 object_hash rjenkins pg_num 576 pgp_num 576 last_change 68321 owner 0
>> > pool 3 'hdd3copies' rep size 3 min_size 1 crush_ruleset 4 object_hash rjenkins pg_num 200 pgp_num 200 last_change 172933 owner 0
>> > pool 6 'ssd3copies' rep size 3 min_size 1 crush_ruleset 7 object_hash rjenkins pg_num 800 pgp_num 800 last_change 172929 owner 0
>> > pool 9 'sas3copies' rep size 3 min_size 1 crush_ruleset 4 object_hash rjenkins pg_num 2048 pgp_num 2048 last_change 172935 owner 0
>> >
>> > Only hdd3copies, sas3copies and ssd3copies are really used :
>> > # ceph df
>> > GLOBAL:
>> >     SIZE       AVAIL      RAW USED     %RAW USED
>> >     76498G     51849G     24648G       32.22
>> >
>> > POOLS:
>> >     NAME           ID     USED      %USED     OBJECTS
>> >     data           0      46753     0         72
>> >     metadata       1      0         0         0
>> >     rbd            2      8         0         1
>> >     hdd3copies     3      2724G     3.56      5190954
>> >     ssd3copies     6      613G      0.80      347668
>> >     sas3copies     9      3692G     4.83      764394
>> >
>> >
>> > My CRUSH rules was :
>> >
>> > rule SASperHost {
>> >     ruleset 4
>> >     type replicated
>> >     min_size 1
>> >     max_size 10
>> >     step take SASroot
>> >     step chooseleaf firstn 0 type host
>> >     step emit
>> > }
>> >
>> > and :
>> >
>> > rule SSDperOSD {
>> >     ruleset 3
>> >     type replicated
>> >     min_size 1
>> >     max_size 10
>> >     step take SSDroot
>> >     step choose firstn 0 type osd
>> >     step emit
>> > }
>> >
>> >
>> > but, since the cluster was full because of that space problem, I swith to a different rule :
>> >
>> > rule SSDperOSDfirst {
>> >     ruleset 7
>> >     type replicated
>> >     min_size 1
>> >     max_size 10
>> >     step take SSDroot
>> >     step choose firstn 1 type osd
>> >     step emit
>> >         step take SASroot
>> >         step chooseleaf firstn -1 type net
>> >         step emit
>> > }
>> >
>> >
>> > So with that last rule, I should have only one replica on my SSD OSD, so 613GB of space used. But if I check on OSD I see 1212GB really used.
>> >
>> > I also use snapshots, maybe snapshots are ignored by "ceph df" and "rados df" ?
>> >
>> > Thanks for any help.
>> >
>> > Olivier
>> >
>> > _______________________________________________
>> > ceph-users mailing list
>> > ceph-users@xxxxxxxxxxxxxx
>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> >
>>
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users@xxxxxxxxxxxxxx
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com