Re: mimic: MDS standby-replay causing blocked ops (MDS bug?)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Dear Yan,

it is difficult to push the MDS to err in this special way. Is it advisable or not to increase the likelihood and frequency of dirfrag operations by tweaking some of the parameters mentioned here: http://docs.ceph.com/docs/mimic/cephfs/dirfrags/. If so, what would reasonable values be, keeping in mind that we are in a pilot production phase already and need to maintain integrity of user data?

Is there any counter showing if such operations happened at all?

Best regards,

=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14

________________________________________
From: Yan, Zheng <ukernel@xxxxxxxxx>
Sent: 16 May 2019 09:35
To: Frank Schilder
Subject: Re:  mimic: MDS standby-replay causing blocked ops (MDS bug?)

On Thu, May 16, 2019 at 2:52 PM Frank Schilder <frans@xxxxxx> wrote:
>
> Dear Yan,
>
> OK, I will try to trigger the problem again and dump the information requested. Since it is not easy to get into this situation and I usually need to resolve it fast (its not a test system), is there anything else worth capturing?
>

just

ceph daemon mds.x dump_ops_in_flight
ceph daemon mds.x dump cache /tmp/cachedump.x

> I will get back as soon as it happened again.
>
> In the meantime, I would be grateful if you could shed some light on the following questions:
>
> - Is there a way to cancel an individual operation in the queue? It is a bit harsh to have to fail an MDS for that.

no

> - What is the fragmentdir operation doing in a single MDS setup? I thought this was only relevant if multiple MDS daemons are active on a file system.
>

It splits large directory to smaller parts.


> =================
> Frank Schilder
> AIT Risø Campus
> Bygning 109, rum S14
>
> ________________________________________
> From: Yan, Zheng <ukernel@xxxxxxxxx>
> Sent: 16 May 2019 05:50
> To: Frank Schilder
> Cc: Stefan Kooman; ceph-users@xxxxxxxxxxxxxx
> Subject: Re:  mimic: MDS standby-replay causing blocked ops (MDS bug?)
>
> > [...]
> > This time I captured the MDS ops list (log output does not really contain more info than this list). It contains 12 ops and I will include it here in full length (hope this is acceptable):
> >
>
> Your issues were caused by stuck internal op fragmentdir.  Can you
> dump mds cache and send the output to us?
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux