Re: Ceph Octopus RGW 15.2.17 - files not available in rados while still in bucket index

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Ok, thank you, I'm planning to update to 15.2.17 but waiting for some user feedback first.

Istvan Szabo
Senior Infrastructure Engineer
---------------------------------------------------
Agoda Services Co., Ltd.
e: istvan.szabo@xxxxxxxxx
---------------------------------------------------

-----Original Message-----
From: Boris <bb@xxxxxxxxx> 
Sent: Monday, August 22, 2022 1:49 PM
To: ceph-users@xxxxxxx
Subject:  Re: Ceph Octopus RGW 15.2.17 - files not available in rados while still in bucket index

Email received from the internet. If in doubt, don't click any link nor open any attachment !
________________________________

Good morning Istvan,

sadly no, it’s not fixed. I just have an idea what might trigger the problem and how I can try to mitigate it.

I still don’t know what these errors are and why they happen.
I refuse to think that RGW „lose“ data, when OSDs become unstable.

Have a good start in in the week
 Boris

> Am 22.08.2022 um 05:12 schrieb Szabo, Istvan (Agoda) <Istvan.Szabo@xxxxxxxxx>:
>
> Hi,
>
> So your problem has it been fixed?
>
> Istvan Szabo
> Senior Infrastructure Engineer
> ---------------------------------------------------
> Agoda Services Co., Ltd.
> e: istvan.szabo@xxxxxxxxx
> ---------------------------------------------------
>
> -----Original Message-----
> From: Boris Behrens <bb@xxxxxxxxx>
> Sent: Monday, August 22, 2022 12:48 AM
> To: ceph-users@xxxxxxx
> Subject:  Re: Ceph Octopus RGW 15.2.17 - files not 
> available in rados while still in bucket index
>
> Email received from the internet. If in doubt, don't click any link nor open any attachment !
> ________________________________
>
> I just checked something else and it looks like this problem happens when our SSD OSDs get marked as laggy, because of the GC bug:
> https://tracker.ceph.com/issues/53585
>
> :2022-08-18T22:00:12.257+0000 7fb9dbe62700  0 log_channel(cluster) log 
> [INF] : osd.263 marked itself dead as of e658014
> :2022-08-18T22:01:48.727+0000 7fb9dbe62700  0 log_channel(cluster) log 
> [INF] : osd.242 marked itself dead as of e658018
> :2022-08-18T22:03:07.898+0000 7fb9dbe62700  0 log_channel(cluster) log 
> [INF] : osd.263 marked itself dead as of e658023
> :2022-08-18T22:10:54.963+0000 7fb9dbe62700  0 log_channel(cluster) log 
> [INF] : osd.242 marked itself dead as of e658028
>
> Out s3 cluster us also used for our backup center which got RBD exports from our rbd clusters (which are usually multiple GB/TB in size).
> We added some SSD OSDs and put all of our non-data pools on these SSD OSDs.
>
> This helped to leverage some pressure from the cluster, when the GC goes nuts. Maybe this happens together.
>
>> Am So., 21. Aug. 2022 um 19:34 Uhr schrieb Boris Behrens <bb@xxxxxxxxx>:
>>
>> Cheers everybody,
>>
>> I had this issue some time ago, and we though it was fixed, but it 
>> seems to happen again.
>> We have files, that get uploaded by one of our customer, only 
>> available in the index, but not in the rados.
>>
>> At first we thought this might be a bug (
>> https://tracker.ceph.com/issues/54528) which got fixed with the last 
>> pointrelease, but it seems not. And only on customer got this problem.
>> At the moment we thing it is some very weird usage of the s3 API 
>> (they developed their own library and used the AWS SDK for .net as a 
>> basis) together with multipart uploads.
>>
>> I am also not sure HOW they do the upload, because it is a backup 
>> that get uploaded every day and they seem to have multiple of them. I 
>> didn't went through all of our logs, but I managed to pull one 
>> lifecycle of a file from the logs and it showed very strange errors 
>> at the end and I couldn't find anything with this error.
>>
>> Hope someone can tell me what this is and how I can fix it.
>>
>> Cheers
>> Boris
>>
>> Strange errors:
>> 2022-08-18T22:04:29.538+0000 7f7ba9fcb700  0 req 9033182355071581504 
>> 183.407425780s s3:complete_multipart WARNING: failed to remove object 
>> sql-backup-de:_multipart_IM_DIFFERENTIAL_22.bak.2~ehGVVRPV3LnWW31bRmB
>> E
>> cOHSKB_zJAs.meta
>> 2022-08-18T22:04:29.542+0000 7f7ba9fcb700  0 req 9033182355071581504 
>> 183.411425768s s3:complete_multipart WARNING: failed to unlock 
>> CLUSTERUUID.BUCKET.INDENTIFIER__multipart_IM_DIFFERENTIAL_22.bak.2~eh
>> G
>> VVRPV3LnWW31bRmBEcOHSKB_zJAs.meta
>>
>> Full log (trimmed when only partNumber changed):
>> 2022-08-18T22:01:08.894838+0000 "GET
>> /sql-backup-sde/IM_DIFFERENTIAL_22.bak HTTP/1.1" 200 315392 -
>> "Boto3/1.24.23 Python/3.10.5 Linux/5.10.102-flatcar Botocore/1.27.23"
>> -
>> 2022-08-18T22:01:08.930838+0000 "POST 
>> /sql-backup-sde/IM_DIFFERENTIAL_22.bak?uploads HTTP/1.1" 200 271 -
>> "Boto3/1.24.23 Python/3.10.5 Linux/5.10.102-flatcar Botocore/1.27.23"
>> -
>> 2022-08-18T22:01:09.108374+0000 "POST 
>> /sql-backup-de/IM_DIFFERENTIAL_22.bak?uploads HTTP/1.1" 200 270 -
>> "Boto3/1.24.23 Python/3.10.5 Linux/5.10.102-flatcar Botocore/1.27.23"
>> -
>> 2022-08-18T22:01:09.472368+0000 "PUT
>> /sql-backup-sde/IM_DIFFERENTIAL_22.bak?uploadId=2~KX75VPCYFOZRPRLo5L0
>> y
>> tQuyp-nzbrT&partNumber=4 HTTP/1.1" 200 2523136 - "Boto3/1.24.23
>> Python/3.10.5 Linux/5.10.102-flatcar Botocore/1.27.23" - ..
>> 2022-08-18T22:01:09.619099+0000 "PUT
>> /sql-backup-sde/IM_DIFFERENTIAL_22.bak?uploadId=2~KX75VPCYFOZRPRLo5L0
>> y
>> tQuyp-nzbrT&partNumber=2 HTTP/1.1" 200 8388608 - "Boto3/1.24.23
>> Python/3.10.5 Linux/5.10.102-flatcar Botocore/1.27.23" -
>> 2022-08-18T22:01:09.706836+0000 "POST 
>> /sql-backup-sde/IM_DIFFERENTIAL_22.bak?uploadId=2~KX75VPCYFOZRPRLo5L0
>> y tQuyp-nzbrT HTTP/1.1" 200 334 - "Boto3/1.24.23 Python/3.10.5 
>> Linux/5.10.102-flatcar Botocore/1.27.23" -
>> 2022-08-18T22:01:09.852362+0000 "PUT
>> /sql-backup-de/IM_DIFFERENTIAL_22.bak?uploadId=2~ehGVVRPV3LnWW31bRmBE
>> c
>> OHSKB_zJAs&partNumber=1 HTTP/1.1" 200 8388608 - "Boto3/1.24.23
>> Python/3.10.5 Linux/5.10.102-flatcar Botocore/1.27.23" - ..
>> 2022-08-18T22:01:26.098900+0000 "PUT
>> /sql-backup-de/IM_DIFFERENTIAL_22.bak?uploadId=2~ehGVVRPV3LnWW31bRmBE
>> c
>> OHSKB_zJAs&partNumber=161 HTTP/1.1" 200 8388608 - "Boto3/1.24.23
>> Python/3.10.5 Linux/5.10.102-flatcar Botocore/1.27.23" -
>> 2022-08-18T22:02:14.103386+0000 "GET
>> /sql-backup-de/IM_DIFFERENTIAL_22.bak
>> HTTP/1.1" 200 4194304 - "Boto3/1.24.23 Python/3.10.5 
>> Linux/5.10.102-flatcar Botocore/1.27.23" -
>> 2022-08-18T22:02:26.275201+0000 "POST 
>> /sql-backup-de/IM_DIFFERENTIAL_22.bak?uploadId=2~ehGVVRPV3LnWW31bRmBE
>> c OHSKB_zJAs HTTP/1.1" 500 304 - "Boto3/1.24.23 Python/3.10.5 
>> Linux/5.10.102-flatcar Botocore/1.27.23" -
>> 2022-08-18T22:02:27.787178+0000 "POST 
>> /sql-backup-de/IM_DIFFERENTIAL_22.bak?uploadId=2~ehGVVRPV3LnWW31bRmBE
>> c OHSKB_zJAs HTTP/1.1" 500 304 - "Boto3/1.24.23 Python/3.10.5 
>> Linux/5.10.102-flatcar Botocore/1.27.23" -
>> 2022-08-18T22:02:29.386586+0000 "POST 
>> /sql-backup-de/IM_DIFFERENTIAL_22.bak?uploadId=2~ehGVVRPV3LnWW31bRmBE
>> c OHSKB_zJAs HTTP/1.1" 500 304 - "Boto3/1.24.23 Python/3.10.5 
>> Linux/5.10.102-flatcar Botocore/1.27.23" -
>> 2022-08-18T22:02:30.911130+0000 "POST 
>> /sql-backup-de/IM_DIFFERENTIAL_22.bak?uploadId=2~ehGVVRPV3LnWW31bRmBE
>> c OHSKB_zJAs HTTP/1.1" 500 304 - "Boto3/1.24.23 Python/3.10.5 
>> Linux/5.10.102-flatcar Botocore/1.27.23" -
>> 2022-08-18T22:02:30.999129+0000 "DELETE 
>> /sql-backup-de/IM_DIFFERENTIAL_22.bak?uploadId=2~ehGVVRPV3LnWW31bRmBE
>> c OHSKB_zJAs HTTP/1.1" 204 0 - "Boto3/1.24.23 Python/3.10.5 
>> Linux/5.10.102-flatcar Botocore/1.27.23" -
>> 2022-08-18T22:02:42.782544+0000 "GET
>> /sql-backup-de/IM_DIFFERENTIAL_22.bak
>> HTTP/1.1" 200 0 - "Boto3/1.24.23 Python/3.10.5 Linux/5.10.102-flatcar 
>> Botocore/1.27.23" -
>> 2022-08-18T22:04:29.538+0000 7f7ba9fcb700  0 req 9033182355071581504 
>> 183.407425780s s3:complete_multipart WARNING: failed to remove object 
>> sql-backup-de:_multipart_IM_DIFFERENTIAL_22.bak.2~ehGVVRPV3LnWW31bRmB
>> E
>> cOHSKB_zJAs.meta
>> 2022-08-18T22:04:29.542210+0000 "POST 
>> /sql-backup-de/IM_DIFFERENTIAL_22.bak?uploadId=2~ehGVVRPV3LnWW31bRmBE
>> c OHSKB_zJAs HTTP/1.1" 200 334 - "Boto3/1.24.23 Python/3.10.5 
>> Linux/5.10.102-flatcar Botocore/1.27.23" -
>> 2022-08-18T22:04:29.542+0000 7f7ba9fcb700  0 req 9033182355071581504 
>> 183.411425768s s3:complete_multipart WARNING: failed to unlock 
>> CLUSTERUUID.BUCKET.INDENTIFIER__multipart_IM_DIFFERENTIAL_22.bak.2~eh
>> G
>> VVRPV3LnWW31bRmBEcOHSKB_zJAs.meta
>>
>> --
>> Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend 
>> im groüen Saal.
>>
>
>
> --
> Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im groüen Saal.
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an 
> email to ceph-users-leave@xxxxxxx
>
> ________________________________
> This message is confidential and is for the sole use of the intended recipient(s). It may also be privileged or otherwise protected by copyright or other legal rules. If you have received it by mistake please let us know by reply email and delete it from your system. It is prohibited to copy this message or disclose its content to anyone. Any confidentiality or privilege is not waived or lost by any mistaken delivery or unauthorized disclosure of the message. All messages sent to and from Agoda may be monitored to ensure compliance with company policies, to protect the company's interests and to remove potential malware. Electronic messages may be intercepted, amended, lost or deleted, or contain viruses.
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux