Hi, do you need more information about that ? thanks, Olivier Le mardi 10 septembre 2013 à 11:19 -0700, Samuel Just a écrit : > Can you post the rest of you crush map? > -Sam > > On Tue, Sep 10, 2013 at 5:52 AM, Olivier Bonvalet <ceph.list@xxxxxxxxx> wrote: > > I also checked that all files in that PG still are on that PG : > > > > for IMG in `find . -type f -printf '%f\n' | awk -F '__' '{ print $1 }' | > > sort --unique` ; do echo -n "$IMG "; ceph osd map ssd3copies $IMG | grep > > -v 6\\.31f ; echo ; done > > > > And all objects are referenced in rados (compared with "rados --pool > > ssd3copies ls rados.ssd3copies.dump"). > > > > > > > > Le mardi 10 septembre 2013 à 13:46 +0200, Olivier Bonvalet a écrit : > >> Some additionnal informations : if I look on one PG only, for example > >> the 6.31f. "ceph pg dump" report a size of 616GB : > >> > >> # ceph pg dump | grep ^6\\. | awk '{ SUM+=($6/1024/1024) } END { print SUM }' > >> 631717 > >> > >> But on disk, on the 3 replica I have : > >> # du -sh /var/lib/ceph/osd/ceph-50/current/6.31f_head/ > >> 1,3G /var/lib/ceph/osd/ceph-50/current/6.31f_head/ > >> > >> Since I was suspected a snapshot problem, I try to count only "head > >> files" : > >> # find /var/lib/ceph/osd/ceph-50/current/6.31f_head/ -type f -name '*head*' -print0 | xargs -r -0 du -hc | tail -n1 > >> 448M total > >> > >> and the content of the directory : http://pastebin.com/u73mTvjs > >> > >> > >> Le mardi 10 septembre 2013 à 10:31 +0200, Olivier Bonvalet a écrit : > >> > Hi, > >> > > >> > I have a space problem on a production cluster, like if there is unused > >> > data not freed : "ceph df" and "rados df" reports 613GB of data, and > >> > disk usage is 2640GB (with 3 replica). It should be near 1839GB. > >> > > >> > > >> > I have 5 hosts, 3 with SAS storage and 2 with SSD storage. I use crush > >> > rules to put pools on SAS or on SSD. > >> > > >> > My pools : > >> > # ceph osd dump | grep ^pool > >> > pool 0 'data' rep size 3 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 576 pgp_num 576 last_change 68315 owner 0 crash_replay_interval 45 > >> > pool 1 'metadata' rep size 3 min_size 1 crush_ruleset 1 object_hash rjenkins pg_num 576 pgp_num 576 last_change 68317 owner 0 > >> > pool 2 'rbd' rep size 3 min_size 1 crush_ruleset 2 object_hash rjenkins pg_num 576 pgp_num 576 last_change 68321 owner 0 > >> > pool 3 'hdd3copies' rep size 3 min_size 1 crush_ruleset 4 object_hash rjenkins pg_num 200 pgp_num 200 last_change 172933 owner 0 > >> > pool 6 'ssd3copies' rep size 3 min_size 1 crush_ruleset 7 object_hash rjenkins pg_num 800 pgp_num 800 last_change 172929 owner 0 > >> > pool 9 'sas3copies' rep size 3 min_size 1 crush_ruleset 4 object_hash rjenkins pg_num 2048 pgp_num 2048 last_change 172935 owner 0 > >> > > >> > Only hdd3copies, sas3copies and ssd3copies are really used : > >> > # ceph df > >> > GLOBAL: > >> > SIZE AVAIL RAW USED %RAW USED > >> > 76498G 51849G 24648G 32.22 > >> > > >> > POOLS: > >> > NAME ID USED %USED OBJECTS > >> > data 0 46753 0 72 > >> > metadata 1 0 0 0 > >> > rbd 2 8 0 1 > >> > hdd3copies 3 2724G 3.56 5190954 > >> > ssd3copies 6 613G 0.80 347668 > >> > sas3copies 9 3692G 4.83 764394 > >> > > >> > > >> > My CRUSH rules was : > >> > > >> > rule SASperHost { > >> > ruleset 4 > >> > type replicated > >> > min_size 1 > >> > max_size 10 > >> > step take SASroot > >> > step chooseleaf firstn 0 type host > >> > step emit > >> > } > >> > > >> > and : > >> > > >> > rule SSDperOSD { > >> > ruleset 3 > >> > type replicated > >> > min_size 1 > >> > max_size 10 > >> > step take SSDroot > >> > step choose firstn 0 type osd > >> > step emit > >> > } > >> > > >> > > >> > but, since the cluster was full because of that space problem, I swith to a different rule : > >> > > >> > rule SSDperOSDfirst { > >> > ruleset 7 > >> > type replicated > >> > min_size 1 > >> > max_size 10 > >> > step take SSDroot > >> > step choose firstn 1 type osd > >> > step emit > >> > step take SASroot > >> > step chooseleaf firstn -1 type net > >> > step emit > >> > } > >> > > >> > > >> > So with that last rule, I should have only one replica on my SSD OSD, so 613GB of space used. But if I check on OSD I see 1212GB really used. > >> > > >> > I also use snapshots, maybe snapshots are ignored by "ceph df" and "rados df" ? > >> > > >> > Thanks for any help. > >> > > >> > Olivier > >> > > >> > _______________________________________________ > >> > ceph-users mailing list > >> > ceph-users@xxxxxxxxxxxxxx > >> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > >> > > >> > >> _______________________________________________ > >> ceph-users mailing list > >> ceph-users@xxxxxxxxxxxxxx > >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > > _______________________________________________ > > ceph-users mailing list > > ceph-users@xxxxxxxxxxxxxx > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com