Re: Create file bigger than osd

Fabian Zimmermann <dev.faz@xxxxxxxxx> · Mon, 19 Jan 2015 14:01:48 +0100

Hi,

Am 19.01.15 um 13:08 schrieb Luis Periquito:
>> What is the current issue? Cluster near-full? cluster too-full? Can you
> send the output of ceph -s?

    cluster 0d75b6f9-83fb-4287-aa01-59962bbff4ad
     health HEALTH_ERR 1 full osd(s); 1 near full osd(s)
     monmap e1: 3 mons at {ceph0=10.0.29.0:6789/0,ceph1=10.0.29.1:6789/0,ceph2=10.0.29.2:6789/0}, election epoch 92, quorum 0,1,2 ceph0,ceph1,ceph2
     mdsmap e16: 1/1/1 up {0=2=up:active}, 1 up:standby
     osdmap e415: 24 osds: 24 up, 24 in
            flags full
      pgmap v396664: 704 pgs, 4 pools, 3372 GB data, 866 kobjects
            6750 GB used, 3270 GB / 10020 GB avail
                 704 active+clean

2015-01-19 08:19:23.429198 mon.0 [INF] pgmap v396664: 704 pgs: 704 active+clean; 3372 GB data, 6750 GB used, 3270 GB / 10020 GB avail; 39 B/s rd, 0 op/s

> If this is the case you can look at the output of ceph df detail to figure
> out which pool is using the disk space. How many PGs these pools have? can
> you send the output of ceph df detail and ceph osd dump | grep pool?
> Is there anything else on these nodes taking up disk space? Like the
> journals...
No, I placed the journals on ssd, so they shouldn't use space in the
datadir.

I already got the cluster "back to normal". I just

* shutdown one osd (id=23)
* removed a pg-dir
* started osd (id=23)

after I did shutdown the osd. The following logs apeared in ceph -w
-- 

2015-01-19 10:13:00.391222 mon.0 [INF] osd.23 out (down for 301.648942)
2015-01-19 10:13:00.406649 mon.0 [INF] osdmap e418: 24 osds: 23 up, 23 in full
2015-01-19 10:13:00.414374 mon.0 [INF] pgmap v396684: 704 pgs: 74 active+undersized+degraded, 630 active+clean; 3372 GB data, 6353 GB used, 3249 GB / 9603 GB avail; 104813/1774446 objects degraded (5.907%)
2015-01-19 10:13:01.422289 mon.0 [INF] osdmap e419: 24 osds: 23 up, 23 in full
2015-01-19 10:13:01.428216 mon.0 [INF] pgmap v396685: 704 pgs: 74 active+undersized+degraded, 630 active+clean; 3372 GB data, 6353 GB used, 3249 GB / 9603 GB avail; 104813/1774446 objects degraded (5.907%)
2015-01-19 10:13:02.413598 mon.0 [INF] osdmap e420: 24 osds: 23 up, 23 in full
2015-01-19 10:13:02.443216 mon.0 [INF] pgmap v396686: 704 pgs: 74 active+undersized+degraded, 630 active+clean; 3372 GB data, 6353 GB used, 3249 GB / 9603 GB avail; 104813/1774446 objects degraded (5.907%)
2015-01-19 10:13:03.455175 mon.0 [INF] pgmap v396687: 704 pgs: 74 active+undersized+degraded, 630 active+clean; 3372 GB data, 6353 GB used, 3249 GB / 9603 GB avail; 104813/1774446 objects degraded (5.907%)
2015-01-19 10:13:04.483793 mon.0 [INF] pgmap v396688: 704 pgs: 74 active+undersized+degraded, 630 active+clean; 3372 GB data, 6353 GB used, 3249 GB / 9603 GB avail; 104813/1774446 objects degraded (5.907%)
2015-01-19 10:13:05.431367 mon.0 [INF] osdmap e421: 24 osds: 23 up, 23 in
2015-01-19 10:13:05.451241 mon.0 [INF] pgmap v396689: 704 pgs: 74 active+undersized+degraded, 630 active+clean; 3372 GB data, 6353 GB used, 3249 GB / 9603 GB avail; 104813/1774446 objects degraded (5.907%)
2015-01-19 10:13:06.505841 mon.0 [INF] pgmap v396690: 704 pgs: 74 active+undersized+degraded, 630 active+clean; 3372 GB data, 6353 GB used, 3249 GB / 9603 GB avail; 0 B/s rd, 101 kB/s wr, 63 op/s; 104813/1774446 objects degraded (5.907%)
--

and here the logs after I started the osd again
--

2015-01-19 10:13:00.391222 mon.0 [INF] osd.23 out (down for 301.648942)
2015-01-19 10:13:00.406649 mon.0 [INF] osdmap e418: 24 osds: 23 up, 23 in full
2015-01-19 10:13:00.414374 mon.0 [INF] pgmap v396684: 704 pgs: 74 active+undersized+degraded, 630 active+clean; 3372 GB data, 6353 GB used, 3249 GB / 9603 GB avail; 104813/1774446 objects degraded (5.907%)
2015-01-19 10:13:01.422289 mon.0 [INF] osdmap e419: 24 osds: 23 up, 23 in full
2015-01-19 10:13:01.428216 mon.0 [INF] pgmap v396685: 704 pgs: 74 active+undersized+degraded, 630 active+clean; 3372 GB data, 6353 GB used, 3249 GB / 9603 GB avail; 104813/1774446 objects degraded (5.907%)
2015-01-19 10:13:02.413598 mon.0 [INF] osdmap e420: 24 osds: 23 up, 23 in full
2015-01-19 10:13:02.443216 mon.0 [INF] pgmap v396686: 704 pgs: 74 active+undersized+degraded, 630 active+clean; 3372 GB data, 6353 GB used, 3249 GB / 9603 GB avail; 104813/1774446 objects degraded (5.907%)
2015-01-19 10:13:03.455175 mon.0 [INF] pgmap v396687: 704 pgs: 74 active+undersized+degraded, 630 active+clean; 3372 GB data, 6353 GB used, 3249 GB / 9603 GB avail; 104813/1774446 objects degraded (5.907%)
2015-01-19 10:13:04.483793 mon.0 [INF] pgmap v396688: 704 pgs: 74 active+undersized+degraded, 630 active+clean; 3372 GB data, 6353 GB used, 3249 GB / 9603 GB avail; 104813/1774446 objects degraded (5.907%)
2015-01-19 10:13:05.431367 mon.0 [INF] osdmap e421: 24 osds: 23 up, 23 in
2015-01-19 10:13:05.451241 mon.0 [INF] pgmap v396689: 704 pgs: 74 active+undersized+degraded, 630 active+clean; 3372 GB data, 6353 GB used, 3249 GB / 9603 GB avail; 104813/1774446 objects degraded (5.907%)
2015-01-19 10:13:06.505841 mon.0 [INF] pgmap v396690: 704 pgs: 74 active+undersized+degraded, 630 active+clean; 3372 GB data, 6353 GB used, 3249 GB / 9603 GB avail; 0 B/s rd, 101 kB/s wr, 63 op/s; 104813/1774446 objects degraded (5.907%)
--

quite strange, but after a while the data-usage dropped down to <600G

--

2015-01-19 10:27:15.076969 mon.0 [INF] pgmap v397492: 704 pgs: 1 active+undersized+degraded+remapped+backfilling, 703 active+clean; 921 GB data, 1861 GB used, 8158 GB / 10020 GB avail; 12136 B/s wr, 871 op/s; 881/519944 objects degraded (0.169%); 218/519944 objects misplaced (0.042%)
2015-01-19 10:27:16.134537 mon.0 [INF] pgmap v397493: 704 pgs: 1 active+undersized+degraded+remapped+backfilling, 703 active+clean; 921 GB data, 1860 GB used, 8159 GB / 10020 GB avail; 1968 B/s wr, 569 op/s; 881/519552 objects degraded (0.170%); 218/519552 objects misplaced (0.042%)

...

2015-01-19 10:29:09.119524 mon.0 [INF] pgmap v397600: 704 pgs: 704 active+clean; 573 GB data, 1150 GB used, 8870 GB / 10020 GB avail; 0 B/s rd, 4553 B/s wr, 2 op/s
2015-01-19 10:29:10.131819 mon.0 [INF] pgmap v397601: 704 pgs: 704 active+clean; 573 GB data, 1150 GB used, 8870 GB / 10020 GB avail; 8103 B/s wr, 2 op/s
2015-01-19 10:29:14.063943 mon.0 [INF] pgmap v397602: 704 pgs: 704 active+clean; 573 GB data, 1150 GB used, 8870 GB / 10020 GB avail; 4965 B/s wr, 1 op/s
--

And I'm currently unable to reproduce the problem.

Next time I will try your commands to get more information.
I also took a look into the logs, but nothing useful.

Any further ideas/hints?

Fabian

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com