I've completely upgraded my cluster and made sure my clients were luminous too. Our cluster creates lots of directories really fast and because of the layering it takes >1 second creating those directories. I would really like to be able to diagnose exactly where the slowness is. I'm thinking mds, but not 100% sure. We have benchmarked all pools and they are really fast. We have also removed the directory structure in our app and our filesystem writes 2KB to 80KB files in 4-10ms.
example structure:
Mount location: /mnt/fsstore
Directory Structure: /mnt/fsstore/PDFDOCUMENT/2f/00d/f28/49a/74d/2e8/
File: /mnt/fsstore/PDFDOCUMENT/2f/00d/f28/49a/74d/2e8/2f00df28-49a7-4d2e-85c5-20217bafbf6c
Daniel Pryor | Sr. DevOps Engineer
dpryor@xxxxxxxxxxxxx
direct 480.719.1646 ext. 1318 | mobile 208.757.2680
6263 North Scottsdale Road, Suite 330, Scottsdale, AZ 85250
Parchment | Turn Credentials into Opportunities
www.parchment.com
On Thu, Oct 19, 2017 at 8:22 PM, Sage Weil <sage@xxxxxxxxxxxx> wrote:
On Thu, 19 Oct 2017, Daniel Pryor wrote:
> Hello Everyone,
>
> We are currently running into two issues.
>
> 1) We are noticing huge pauses during directory creation, but our file write
> times are super fast. The metadata and data pools are on the same
> infrastructure.
> * https://gist.github.com/pryorda/ a0d5c37f119c4a320fa4ca9d48c875 2b
> * https://gist.github.com/pryorda/ ba6e5c2f94f67ca72a744b90cc5802 4e
Separate metadata onto different (ideally faster) devices is usually a
good idea if you want to protect metadata performance. The stalls you're
seeing could either be MDS requests getting slowed down by the OSDs or it
might be the MDS missing something in it's cache and having to go
fetch or flush something to RADOS. You might see if increasing the MDS
cache size helps.
> 2) Since we were having the issue above, we wanted to possibly move to a
> larger top level directory. Stuff everything in there and later move
> everything out via a batch job. To do this we need to increase the the
> directory limit from 100,000 to 300,000. How do we increase this limit.
I would recommend upgrading to luminous and enabling directory
fragmentation instead of increasing the per-fragment limit on Jewel. Big
fragments have a negative impact on MDS performance (leading to spikes
like you see above) and can also make life harder for the OSDs.
sage
>
>
> dpryor@beta-ceph-node1:~$ dpkg -l |grep ceph
> ii ceph-base 10.2.10-1xenial
> amd64 common ceph daemon libraries and management tools
> ii ceph-common 10.2.10-1xenial
> amd64 common utilities to mount and interact with a ceph storage
> cluster
> ii ceph-deploy 1.5.38
> all Ceph-deploy is an easy to use configuration tool
> ii ceph-mds 10.2.10-1xenial
> amd64 metadata server for the ceph distributed file system
> ii ceph-mon 10.2.10-1xenial
> amd64 monitor server for the ceph storage system
> ii ceph-osd 10.2.10-1xenial
> amd64 OSD server for the ceph storage system
> ii libcephfs1 10.2.10-1xenial
> amd64 Ceph distributed file system client library
> ii python-cephfs 10.2.10-1xenial
> amd64 Python libraries for the Ceph libcephfs library
> dpryor@beta-ceph-node1:~$
>
> Any direction would be appreciated!?
>
> Thanks,
> Daniel
>
>
>
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com