Re: [ext] Copying large file stuck, two cephfs-2 mounts on two cluster

Kotresh Hiremath Ravishankar <khiremat@xxxxxxxxxx> · Wed, 4 Jan 2023 15:36:39 +0530

Created a tracker to investigate this further.

https://tracker.ceph.com/issues/58376

On Wed, Jan 4, 2023 at 3:18 PM Kotresh Hiremath Ravishankar <
khiremat@xxxxxxxxxx> wrote:

> Hi Mathias,
>
> I am glad that you could find it's a client related issue and figured a
> way around it.
> I too could reproduce the issue locally i.e. when a client which was
> initially copying the snapshot still
> has access to it even when it's got deleted from the other client. I think
> this needs further investigation.
> I will raise a tracker for the same and share it here.
>
> Thanks and Regards,
> Kotresh H R
>
> On Tue, Jan 3, 2023 at 3:23 PM Kuhring, Mathias <
> mathias.kuhring@xxxxxxxxxxxxxx> wrote:
>
>> Trying to exclude clusters and/or clients might have gotten me on the
>> right track. It might have been a client issue or actually a snapshot
>> retention issue. As it turned out when I tried other routes for the data
>> using a different client, the data was not available anymore since the
>> snapshot had been trimmed.
>>
>> We got behind syncing our snapshots a while ago (due to other issues).
>> And now we are somewhere in between our weekly (16 weeks) and daily (30
>> days) snapshots. So, I assume before we catch up with daily (<30), there is
>> a general risk that snapshots disappear while we are syncing them.
>>
>> The funny/weird thing is though (and why I didn't catch up on this), the
>> particular file (and potentially others) of this trimmed snapshot was
>> apparently still available for the client I initially used for the
>> transfer. I'm wondering if the client somehow cached the data until the
>> snapshot got trimmed. And then just re-tried copying the incompletely
>> cached data.
>>
>
>> Continuing with the next available snapshot, mirroring/syncing is now
>> catching up again. I expect it might happen again once we catch up to the
>> 30-days threshold. If the time point of snapshot trimming falls into the
>> syncinc time frame. But then I know to just cancel/skip the current
>> snapshot and continue with the next one. Syncing time is short enough to
>> get me over the hill then before the next trimming.
>>
>> Note to myself: Next time something similar things happens, check if
>> different clients AND different snapshots or original data behave the same.
>>
>> On 12/22/2022 4:27 PM, Kuhring, Mathias wrote:
>>
>> Dear ceph community,
>>
>>
>>
>> We have two ceph cluster of equal size, one main and one mirror, both
>> using cephadm and on version
>>
>> ceph version 17.2.1 (ec95624474b1871a821a912b8c3af68f8f8e7aa1) quincy
>> (stable)
>>
>>
>>
>> We are stuck with copying a large file (~ 64G) between the CephFS file
>> systems of the two clusters.
>>
>>
>> The source path is a snapshot (i.e. something like
>> /my/path/.snap/schedule_some-date/…).
>> But I don't think that should make any difference.
>>
>>
>>
>> First, I was thinking that I need to adapt some rsync parameters to work
>> better with bigger files on CephFS.
>>
>> But when confirming by just copying the file with cp, the transfer get's
>> also stuck.
>>
>> Without any error message, the process just keeps running (rsync or cp).
>>
>> But the file size on the target doesn't increase anymore at some point
>> (almost 85%).
>>
>>
>>
>> Main:
>>
>> -rw------- 1 cockpit-ws printadmin 68360698297 16. Nov 13:40
>> LB22_2764_dragen.bam
>>
>>
>>
>> Mirror:
>>
>> -rw------- 1 root root 58099499008 22. Dez 15:54 LB22_2764_dragen.bam
>>
>>
>>
>> Our CephFS file size limit is with 10 TB more than generous.
>> And as far as I know from clients there are indeed files in TB ranges on
>> the cluster without issues.
>>
>>
>>
>> I don't know if this is the file's fault or if this is some issue with
>> either of the CephFS' or cluster.
>>
>> And I don't know how to look and troubleshoot this.
>>
>> Can anybody give me a tip where I can start looking and debug these kind
>> of issues?
>>
>>
>>
>> Thank you very much.
>>
>>
>>
>> Best Wishes,
>>
>> Mathias
>> _______________________________________________
>> ceph-users mailing list -- ceph-users@xxxxxxx<mailto:ceph-users@xxxxxxx>
>> To unsubscribe send an email to ceph-users-leave@xxxxxxx<mailto:
>> ceph-users-leave@xxxxxxx>
>>
>>
>> --
>> Mathias Kuhring
>>
>> Dr. rer. nat.
>> Bioinformatician
>> HPC & Core Unit Bioinformatics
>> Berlin Institute of Health at Charité (BIH)
>>
>> E-Mail:  mathias.kuhring@xxxxxxxxxxxxxx<mailto:
>> mathias.kuhring@xxxxxxxxxxxxxx>
>> Mobile: +49 172 3475576
>> _______________________________________________
>> ceph-users mailing list -- ceph-users@xxxxxxx
>> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>>
>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx