Re: Write Ops on CephFS Increasing exponentially

Kyle Dean <k.s-dean@xxxxxxxxxxx> · Wed, 12 May 2021 16:46:08 +0000

Hi Partick,

Thanks for getting back to me. Looks like I found the issue. Its due to the fact that I had thought I had increased the max_file_size on ceph to 20TB turns out I missed a zero and set it to 1.89 TB.

I had originally tried to fallocate the space for the 8TB volume which kept erroring. I then tried DD and DD the entire space needed without errors. What I dont understand is, what happens to cephFS when you do this.

The files I'm writing into the pre-allocated volume in ceph are still there "luckily" but I thought that ceph would stop you from writing to cephFS if it hit the upper limit of max_file_size.

Kind regards,

Kyle
________________________________
From: Patrick Donnelly <pdonnell@xxxxxxxxxx>
Sent: 11 May 2021 03:14
To: Kyle Dean <k.s-dean@xxxxxxxxxxx>
Cc: ceph-users@xxxxxxx <ceph-users@xxxxxxx>
Subject: Re:  Write Ops on CephFS Increasing exponentially

Hi Kyle,

On Thu, May 6, 2021 at 7:56 AM Kyle Dean <k.s-dean@xxxxxxxxxxx> wrote:
>
> Hi, hoping someone could help me get to the bottom of this particular issue I'm having.
>
> I have ceph octopus installed using ceph-ansible.
>
> Currently, I have 3 MDS servers running, and one client connected to the active MDS. I'm currently storing a very large encrypted container on the CephFS file system, 8TB worth, and I'm writing data into it from the client host.
>
> recently I have noticed a severe impact on performance, and the time take to do processing on file within the container has increased from 1 minute to 11 minutes.
>
> in the ceph dashboard, when I take a look at the performance tab on the file system page, the Write Ops are increasing exponentially over time.
>
> At the end of April around the 22nd I had 49 write Ops on the performance page for the MDS deamons. This is now at 266467 Write Ops and increasing.
>
> Also the client requests have gone from 14 to 67 to 117 and is now at 283
>
> would someone be able to help me make sense of why the performance has decreased and what is going on with the client requests and write operations.

I suggest you look at the "perf dump" statistics from the MDS  (via
ceph tell or admin socket) over a period of time to get an idea what
operations it's performing. It's probable your workload changed
somehow and that is the cause.

--
Patrick Donnelly, Ph.D.
He / Him / His
Principal Software Engineer
Red Hat Sunnyvale, CA
GPG: 19F28A586F808C2402351B93C3301A3E258DD79D

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx