Re: Two CEPHFS Issues

Daniel Pryor <dpryor@xxxxxxxxxxxxx> · Sun, 22 Oct 2017 10:46:40 -0600

I've completely upgraded my cluster and made sure my clients were luminous too. Our cluster creates lots of directories really fast and  because of the layering it takes >1 second creating those directories. I would really like to be able to diagnose exactly where the slowness is. I'm thinking mds, but not 100% sure. We have benchmarked all pools and they are really fast. We have also removed the directory structure in our app and our filesystem writes 2KB to 80KB files in 4-10ms.  

example structure:

Mount location: /mnt/fsstore
Directory Structure: /mnt/fsstore/PDFDOCUMENT/2f/00d/f28/49a/74d/2e8/
File: /mnt/fsstore/PDFDOCUMENT/2f/00d/f28/49a/74d/2e8/2f00df28-49a7-4d2e-85c5-20217bafbf6c

Daniel Pryor | Sr. DevOps Engineer
dpryor@xxxxxxxxxxxxx
direct 480.719.1646  ext. 1318 | mobile 208.757.2680
6263 North Scottsdale Road, Suite 330, Scottsdale, AZ 85250
Parchment | Turn Credentials into Opportunities
www.parchment.com

On Thu, Oct 19, 2017 at 8:22 PM, Sage Weil <sage@xxxxxxxxxxxx> wrote:
On Thu, 19 Oct 2017, Daniel Pryor wrote:

> Hello Everyone, 

>

> We are currently running into two issues.

>

> 1) We are noticing huge pauses during directory creation, but our file write

> times are super fast. The metadata and data pools are on the same

> infrastructure. 

>  *  https://gist.github.com/pryorda/a0d5c37f119c4a320fa4ca9d48c8752b

>  *  https://gist.github.com/pryorda/ba6e5c2f94f67ca72a744b90cc58024e

Separate metadata onto different (ideally faster) devices is usually a

good idea if you want to protect metadata performance.  The stalls you're

seeing could either be MDS requests getting slowed down by the OSDs or it

might be the MDS missing something in it's cache and having to go

fetch or flush something to RADOS.  You might see if increasing the MDS

cache size helps.

> 2) Since we were having the issue above, we wanted to possibly move to a

> larger top level directory. Stuff everything in there and later move

> everything out via a batch job. To do this we need to increase the the

> directory limit from 100,000 to 300,000. How do we increase this limit.

I would recommend upgrading to luminous and enabling directory

fragmentation instead of increasing the per-fragment limit on Jewel.  Big

fragments have a negative impact on MDS performance (leading to spikes

like you see above) and can also make life harder for the OSDs.

sage

 >

>

> dpryor@beta-ceph-node1:~$ dpkg -l |grep ceph

> ii  ceph-base                            10.2.10-1xenial                   

> amd64        common ceph daemon libraries and management tools

> ii  ceph-common                          10.2.10-1xenial                   

> amd64        common utilities to mount and interact with a ceph storage

> cluster

> ii  ceph-deploy                          1.5.38                           

>  all          Ceph-deploy is an easy to use configuration tool

> ii  ceph-mds                             10.2.10-1xenial                   

> amd64        metadata server for the ceph distributed file system

> ii  ceph-mon                             10.2.10-1xenial                   

> amd64        monitor server for the ceph storage system

> ii  ceph-osd                             10.2.10-1xenial                   

> amd64        OSD server for the ceph storage system

> ii  libcephfs1                           10.2.10-1xenial                   

> amd64        Ceph distributed file system client library

> ii  python-cephfs                        10.2.10-1xenial                   

> amd64        Python libraries for the Ceph libcephfs library

> dpryor@beta-ceph-node1:~$ 

>

> Any direction would be appreciated!?

>

> Thanks,

> Daniel

>

>

> 

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com