Re: Data loss on appends, prod outage

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Thanks for sharing this. Following this thread, I realize we are also affected by this bug. We have multiple reports on corrupted tensorboard event file, which I think are caused by this bug.

We are using Ubuntu 20.04, the affected kernel version should be HWE kernel > 5.11 and < 5.11.0-34. The fix for Ubuntu kernel is here: https://git.launchpad.net/~ubuntu-kernel/ubuntu/+source/linux/+git/focal/commit/fs/ceph/addr.c?h=hwe-5.11&id=353cafd20b8c28423aeec0c474dab80dbcec3c44

Now we are working on upgrade every client to 5.11.0-34-generic.

Weiwen Hu

发件人: Nathan Fish<mailto:lordcirth@xxxxxxxxx>
发送时间: 2021年9月9日 2:41
收件人: ceph-users<mailto:ceph-users@xxxxxxx>
主题:  Re: Data loss on appends, prod outage

The bug appears to have already been reported:
https://apac01.safelinks.protection.outlook.com/?url=https%3A%2F%2Ftracker.ceph.com%2Fissues%2F51948&amp;data=04%7C01%7C%7Ceaa1b6aa0a6d4b04f17008d972f833b2%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637667232638555408%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=ng81sR414%2F5fD8fDOTNiX4MTRUMTTQiyetkM%2F0F5kt8%3D&amp;reserved=0

Also, it should be noted that the write append bug does sometimes
occur when writing from a single client, so controlling write patterns
is not sufficient to stop data loss.

On Wed, Sep 8, 2021 at 1:39 PM Frank Schilder <frans@xxxxxx> wrote:
>
> Can you make the devs aware of the regression?
>
> Best regards,
> =================
> Frank Schilder
> AIT Risø Campus
> Bygning 109, rum S14
>
> ________________________________________
> From: Nathan Fish <lordcirth@xxxxxxxxx>
> Sent: 08 September 2021 19:33
> To: ceph-users
> Subject:  Re: Data loss on appends, prod outage
>
> Rolling back to kernel 5.4 has resolved the issue.
>
> On Tue, Sep 7, 2021 at 3:51 PM Frank Schilder <frans@xxxxxx> wrote:
> >
> > Hi Nathan,
> >
> > > Is this the bug you are referring to? https://apac01.safelinks.protection.outlook.com/?url=https%3A%2F%2Ftracker.ceph.com%2Fissues%2F37713&amp;data=04%7C01%7C%7Ceaa1b6aa0a6d4b04f17008d972f833b2%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637667232638555408%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=L2%2Fz01BiWJCShilUErHJV%2FpD78GujjkJq3j2uMH257c%3D&amp;reserved=0
> >
> > yes, its one of them. I believe there were more such reports.
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux