Re: How To Scale Ceph for Large Numbers of Clients?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Mar 7, 2019 at 2:38 PM Patrick Donnelly <pdonnell@xxxxxxxxxx> wrote:
Is this with one active MDS and one standby-replay? The graph is odd
to me because the session count shows sessions on fs-b and fs-d but
not fs-c. Or maybe max_mds=2 and fs-d has no activity and fs-c is
standby-replay?

The graphs were taken when we were running with 2 active MDS and 2 standby-replay. Currently we're running with 1 active and 1 standby-replay.
 
Three OSDs are going to really struggle with the client load you're
putting on it. It doesn't surprise me you are getting slow requests
warning on the MDS for this reason. When you were running Luminous
12.2.9+ or Mimic 13.2.2+, were you seeing slow metadata I/O warnings?
Even if you did not, it possible that the MDS is delayed issuing caps
to clients because it's waiting for another client to flush writes and
release conflicting caps.

We didn't see any slow metadata I/O warnings, but this is the sort of thing that I've been suspecting is the underlying issue. However, one thing to note is that my current test only walks a directory and reads all the files in it, and it's running on a dev cluster that only I'm using, so I'm not sure which client would be generating writes that the MDS would be waiting on.
 
Generally we recommend that the metadata pool be located on OSDs with
fast devices separate from the data pool. This avoids priority
inversion of MDS metadata I/O with data I/O. See [1] to configure the
metadata pool on a separate set of OSDs.

Also, you're not going to saturate a 1.9TB NVMe SSD with one OSD. You
must partition it and setup multiple OSDs. This ends up being positive
for you so that you can put the metadata pool on its own set of OSDs.

[1] https://ceph.com/community/new-luminous-crush-device-classes/
 
This is excellent info, especially the second point as it lets me split out the metadata pool onto separate OSDs without spinning up any additional resources or using EBS volumes. I will look into partitioning the NVMe drives, split out the metadata pool, and check how this impacts performance.

Thanks,
Zack
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux