On Thu, Dec 10, 2015 at 5:06 AM, Christian Balzer <chibi@xxxxxxx> wrote: > > Hello, > > On Wed, 9 Dec 2015 15:57:36 +0000 MATHIAS, Bryn (Bryn) wrote: > >> to update this, the error looks like it comes from updatedb scanning the >> ceph disks. >> >> When we make sure it doesn’t, by putting the ceph mount points in the >> exclusion file, the problem goes away. >> > Ah, I didn't even think about this, as I have been disabling updatedb or > excluding data trees for years now. > It's probably something that would a good addition to the documentation. See http://tracker.ceph.com/issues/7451 -- dan > > Also with atop you would have immediately seen who the culprit was. > > Regard, > > Christian >> Thanks for the help and time. >> On 30 Nov 2015, at 09:53, MATHIAS, Bryn (Bryn) >> <bryn.mathias@xxxxxxxxxxxxxxxxxx<mailto:bryn.mathias@xxxxxxxxxxxxxxxxxx>> >> wrote: >> >> >> On 30 Nov 2015, at 14:37, MATHIAS, Bryn (Bryn) >> <bryn.mathias@xxxxxxxxxxxxxxxxxx<mailto:bryn.mathias@xxxxxxxxxxxxxxxxxx>> >> wrote: >> >> Hi, >> On 30 Nov 2015, at 13:44, Christian Balzer >> <chibi@xxxxxxx<mailto:chibi@xxxxxxx>> wrote: >> >> >> Hello, >> >> On Mon, 30 Nov 2015 07:55:24 +0000 MATHIAS, Bryn (Bryn) wrote: >> >> Hi Christian, >> >> I’ll give you a much better dump of detail :) >> >> Running RHEL 7.1, >> ceph version 0.94.5 >> >> all ceph disks are xfs, with journals on a partition on the disk >> Disks: 6Tb spinners. >> >> OK, I was guessing that journal on disk, but good to know. >> Which exact model? >> Some of them are rather unsuited for Ceph usage (SMR). >> I don’t know the exact model of the disks but they are not SMR disks. >> >> Erasure coded pool with 4+1 EC ISA-L also. >> >> OK, this is where I plead ignorance, no EC experience at all. >> But it would be strange for this to be hitting a single disk at a time. >> It is hitting a single disk in each node, however I’d have thought that >> I’d see repetition over the disks if it were doing this on a per >> placement group basis. >> >> No scrubbing reported in the ceph log, the cluster isn’t old enough yet >> to be doing any deep scrubbing. Also the cpu usage of the osd deamon >> that controls the disk isn’t spiking which I have seen previously when >> scrubbing or deep scrubbing is taking place. >> >> Alright, can you confirm (with atop or the likes) that the busy disk is >> actually being written/read to by the OSD process in question and if >> there is a corresponding network traffic for the amount of I/O? >> I checked for network traffic, there didn’t look to be any. >> Looks like the problem is transient and has disappeared for the moment. >> I will post more when I see the problem again. >> >> Bryn >> >> Christian >> >> >> All disks are at 2% utilisation as given by df. >> >> For explicitness: >> [root@au-sydney ~]# ceph -s >> cluster ff900f17-7eec-4fe1-8f31-657d44b86a22 >> health HEALTH_OK >> monmap e5: 5 mons at >> {au-adelaide=10.50.21.24:6789/0,au-brisbane=10.50.21.22:6789/0,au-canberra=10.50.21.23:6789/0,au-melbourne=10.50.21.21:6789/0,au-sydney=10.50.21.20:6789/0} >> election epoch 274, quorum 0,1,2,3,4 >> au-sydney,au-melbourne,au-brisbane,au-canberra,au-adelaide osdmap e8549: >> 120 osds: 120 up, 120 in pgmap v408422: 8192 pgs, 2 pools, 7794 GB data, >> 5647 kobjects 9891 GB used, 644 TB / 654 TB avail 8192 active+clean >> client io 68363 kB/s wr, 1249 op/s >> >> >> Cheers, >> Bryn >> >> >> On 30 Nov 2015, at 12:57, Christian Balzer >> <chibi@xxxxxxx<mailto:chibi@xxxxxxx><mailto:chibi@xxxxxxx>> wrote: >> >> >> Hello, >> >> On Mon, 30 Nov 2015 07:15:35 +0000 MATHIAS, Bryn (Bryn) wrote: >> >> Hi All, >> >> I am seeing an issue with ceph performance. >> Starting from an empty cluster of 5 nodes, ~600Tb of storage. >> >> It would be helpful to have more details (all details in fact) than this. >> Complete HW, OS, FS used, Ceph versions and configuration details >> (journals on HDD, replication levels etc). >> >> While this might not seem significant to your current question, it might >> prove valuable as to why you're seeing performance problems and how to >> address them. >> >> monitoring disk usage in nmon I see rolling 100% usage of a disk. >> Ceph -w doesn’t report any spikes in throughput and the application >> putting data is not spiking in the load generated. >> >> >> The ceph.log should give a more detailed account, but assuming your >> client side is indeed steady state, this could be very well explained by >> scrubbing, especially deep-scrubbing. >> That should also be visible in the ceph.log. >> >> Christian >> >> │sdg2 0% 0.0 537.5| >> | >> │ │sdh 2% 4.0 >> 4439.8|RW >> │ >> │sdh1 2% 4.0 >> 3972.3|RW >> │ >> │sdh2 0% 0.0 467.6| >> | >> │ │sdj 3% 2.0 >> 3524.7|RW >> │ >> │sdj1 3% 2.0 >> 3488.7|RW >> │ >> │sdj2 0% 0.0 36.0| >> | >> │ │sdk 99% 1144.9 >> 3564.6|RRRRRRRRRRRRRWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWW> >> │ >> │sdk1 99% 1144.9 >> 3254.9|RRRRRRRRRRRRRWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWW> >> │ │sdk2 0% 0.0 309.7|W >> | >> │ │sdl 1% 4.0 955.1|R >> | >> │ │sdl1 1% 4.0 791.3|R >> | >> │ >> │sdl2 0% 0.0 163.8| >> | >> >> >> Is this anything to do with the way objects are stored on the file >> system? I remember reading that as the number of objects grow the files >> on disk are re-orginised? >> >> This issue for obvious reasons causes a large degradation in >> performance, is there a way of mitigating it? Will this go away as my >> cluster reaches a higher level of disk utilisation? >> >> >> Kind Regards, >> Bryn Mathias >> >> _______________________________________________ >> ceph-users mailing list >> ceph-users@xxxxxxxxxxxxxx<mailto:ceph-users@xxxxxxxxxxxxxx><mailto:ceph-users@xxxxxxxxxxxxxx> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> >> >> -- >> Christian Balzer Network/Systems Engineer >> chibi@xxxxxxx<mailto:chibi@xxxxxxx> Global OnLine Japan/Fusion >> Communications http://www.gol.com/ >> >> >> >> -- >> Christian Balzer Network/Systems Engineer >> chibi@xxxxxxx<mailto:chibi@xxxxxxx> Global OnLine Japan/Fusion >> Communications http://www.gol.com/ >> >> > > > -- > Christian Balzer Network/Systems Engineer > chibi@xxxxxxx Global OnLine Japan/Fusion Communications > http://www.gol.com/ > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com