Re: High disk utilisation

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



to update this, the error looks like it comes from updatedb scanning the ceph disks.

When we make sure it doesn’t, by putting the ceph mount points in the exclusion file, the problem goes away.

Thanks for the help and time.
On 30 Nov 2015, at 09:53, MATHIAS, Bryn (Bryn) <bryn.mathias@xxxxxxxxxxxxxxxxxx> wrote:


On 30 Nov 2015, at 14:37, MATHIAS, Bryn (Bryn) <bryn.mathias@xxxxxxxxxxxxxxxxxx> wrote:

Hi,
On 30 Nov 2015, at 13:44, Christian Balzer <chibi@xxxxxxx> wrote:


Hello,

On Mon, 30 Nov 2015 07:55:24 +0000 MATHIAS, Bryn (Bryn) wrote:

Hi Christian,

I’ll give you a much better dump of detail :)

Running RHEL 7.1,
ceph version 0.94.5

all ceph disks are xfs, with journals on a partition on the disk
Disks: 6Tb spinners.

OK, I was guessing that journal on disk, but good to know.
Which exact model?
Some of them are rather unsuited for Ceph usage (SMR).
I don’t know the exact model of the disks but they are not SMR disks.

Erasure coded pool with 4+1 EC ISA-L also.

OK, this is where I plead ignorance, no EC experience at all.
But it would be strange for this to be hitting a single disk at a time.
It is hitting a single disk in each node, however I’d have thought that I’d see repetition over the disks if it were doing this on a per placement group basis.

No scrubbing reported in the ceph log, the cluster isn’t old enough yet
to be doing any deep scrubbing. Also the cpu usage of the osd deamon
that controls the disk isn’t spiking which I have seen previously when
scrubbing or deep scrubbing is taking place.

Alright, can you confirm (with atop or the likes) that the busy disk is
actually being written/read to by the OSD process in question and if there
is a corresponding network traffic for the amount of I/O?
I checked for network traffic, there didn’t look to be any.
Looks like the problem is transient and has disappeared for the moment.
I will post more when I see the problem again.

Bryn

Christian


All disks are at 2% utilisation as given by df.

For explicitness:
[root@au-sydney ~]# ceph -s
  cluster ff900f17-7eec-4fe1-8f31-657d44b86a22
   health HEALTH_OK
   monmap e5: 5 mons at
{au-adelaide=10.50.21.24:6789/0,au-brisbane=10.50.21.22:6789/0,au-canberra=10.50.21.23:6789/0,au-melbourne=10.50.21.21:6789/0,au-sydney=10.50.21.20:6789/0}
election epoch 274, quorum 0,1,2,3,4
au-sydney,au-melbourne,au-brisbane,au-canberra,au-adelaide osdmap e8549:
120 osds: 120 up, 120 in pgmap v408422: 8192 pgs, 2 pools, 7794 GB data,
5647 kobjects 9891 GB used, 644 TB / 654 TB avail 8192 active+clean
client io 68363 kB/s wr, 1249 op/s


Cheers,
Bryn


On 30 Nov 2015, at 12:57, Christian Balzer
<chibi@xxxxxxx<mailto:chibi@xxxxxxx>> wrote:


Hello,

On Mon, 30 Nov 2015 07:15:35 +0000 MATHIAS, Bryn (Bryn) wrote:

Hi All,

I am seeing an issue with ceph performance.
Starting from an empty cluster of 5 nodes, ~600Tb of storage.

It would be helpful to have more details (all details in fact) than this.
Complete HW, OS, FS used, Ceph versions and configuration details
(journals on HDD, replication levels etc).

While this might not seem significant to your current question, it might
prove valuable as to why you're seeing performance problems and how to
address them.

monitoring disk usage in nmon I see rolling 100% usage of a disk.
Ceph -w doesn’t report any spikes in throughput and the application
putting data is not spiking in the load generated.


The ceph.log should give a more detailed account, but assuming your
client side is indeed steady state, this could be very well explained by
scrubbing, especially deep-scrubbing.
That should also be visible in the ceph.log.

Christian

│sdg2       0%    0.0  537.5|
                        |
│ │sdh     2%    4.0
4439.8|RW
                                                                                                                                                      │
│sdh1     2%    4.0
3972.3|RW
                                                                                                                                                       │
│sdh2       0%    0.0  467.6|
                          |
│ │sdj     3%    2.0
3524.7|RW
                                                                                                                                                     │
│sdj1     3%    2.0
3488.7|RW
                                                                                                                                                       │
│sdj2       0%    0.0   36.0|
                  |
│ │sdk       99% 1144.9
3564.6|RRRRRRRRRRRRRWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWW>

│sdk1      99% 1144.9
3254.9|RRRRRRRRRRRRRWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWW>
│ │sdk2       0%    0.0  309.7|W
                               |
│ │sdl        1%    4.0  955.1|R
                 |
│ │sdl1       1%    4.0  791.3|R
                 |

│sdl2       0%    0.0  163.8|
                        |


Is this anything to do with the way objects are stored on the file
system? I remember reading that as the number of objects grow the files
on disk are re-orginised?

This issue for obvious reasons causes a large degradation in
performance, is there a way of mitigating it? Will this go away as my
cluster reaches a higher level of disk utilisation?


Kind Regards,
Bryn Mathias

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx<mailto:ceph-users@xxxxxxxxxxxxxx>
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


--
Christian Balzer        Network/Systems Engineer
chibi@xxxxxxx<mailto:chibi@xxxxxxx>    Global OnLine Japan/Fusion
Communications http://www.gol.com/



-- 
Christian Balzer        Network/Systems Engineer                
chibi@xxxxxxx    Global OnLine Japan/Fusion Communications
http://www.gol.com/


_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux