Re: Files listed in radosgw BI but is not available in ceph

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Thanks for further clarification Dan.

Boris, if you have a test/QA environment on the same code as production,
you can confirm if the problem is as above. *Do NOT do this in production*
- if the problem exists it might result in losing production data.

1. Upload large S3 object that would take 10+ seconds to download (several
GB)
2. Download object to ensure it is working
3. Set "rgw_gc_obj_min_wait" to very low value (2-3 seconds)
4. Download object

Step (4) may succeed, but run this:
`radosgw-admin gc list`

And check for shadow objects associated with the S3 object.

Once the garbage collection completes, you will get the 404 NoSuchKey
return when you try to download the S3 object, although it will still
be listed as an object in the bucket.
Also recommend setting the "rgw_gc_obj_min_wait" back to a high value after
you finish testing.

On Thu, 22 Jul 2021 at 19:45, Dan van der Ster <dan@xxxxxxxxxxxxxx> wrote:

> Boris,
>
> To check if your issue is related to Rafael's, could you check your
> access logs for requests on the missing objects which lasted longer
> than one hour?
>
> I ask because Nautilus also has rgw_gc_obj_min_wait (2hr by default),
> which is the main config option related to
> https://tracker.ceph.com/issues/47866
>
>
> -- Dan
>
> On Thu, Jul 22, 2021 at 11:12 AM Dan van der Ster <dan@xxxxxxxxxxxxxx>
> wrote:
> >
> > Hi Rafael,
> >
> > AFAIU, that gc issue was not relevant for N -- the bug is in the new
> > rgw_gc code which landed in Octopus and was not backported to N.
> >
> > Well, RHCEPH had the new rgw_gc cls backported to it, and RHCEPH has
> > the bugfix you refer to:
> > * Wed Dec 02 2020 Ceph Jenkins <ceph-jenkins@xxxxxxxxxx> 2:14.2.11-86
> > - rgw: during GC defer, prevent new GC enqueue (rhbz#1892644)
> > https://bugzilla.redhat.com/show_bug.cgi?id=1892644
> >
> > But still, I think it shouldn't apply to the upstream community
> > Nautilus that we run.
> >
> > That said, this indeed looks really similar so perhaps Nautilus has
> > similar faulty gc logic.
> >
> > Cheers, Dan
> >
> > On Thu, Jul 22, 2021 at 6:47 AM Rafael Lopez <rafael.lopez@xxxxxxxxxx>
> wrote:
> > >
> > > hi boris,
> > >
> > > We hit an issue late last year that sounds similar to what you are
> experiencing. I am not sure if the fix was backported to nautilus, I can't
> see any reference to a nautilus backport so it's possible it was only
> backported to octopus (15.x), exception being red hat ceph nautilus.
> > >
> > > https://tracker.ceph.com/issues/47866?next_issue_id=48255#note-59
> > > https://www.mail-archive.com/ceph-users@xxxxxxx/msg05312.html
> > >
> > > Basically, a read request on a s3/swift object that took a very long
> time to complete would cause the associated rados data objects to be put in
> the GC queue, but the head object would still be present. So the s3 object
> would still show as present, `rados bi list` would show it (since head
> object was present) but the data objects would be gone, resulting in 404
> NoSuchKey when retrieving the object.
> > >
> > > raf
> > >
> > > On Wed, 21 Jul 2021 at 18:12, Boris Behrens <bb@xxxxxxxxx> wrote:
> > >>
> > >> Good morning everybody,
> > >>
> > >> we've dug further into it but still don't know how this could happen.
> > >> What we ruled out for now:
> > >> * Orphan objects cleanup process.
> > >> ** There is only one bucket with missing data (I checked all other
> > >> buckets yesterday)
> > >> ** The "keep this files" list is generated by radosgw-admin bukcet
> > >> rados list. I would doubt that there were files listed, that are
> > >> accessible via radosgw
> > >> ** The deleted files are somewhat random, but always with their
> > >> corresponding counterparts (per folder there are 2-3 files that belong
> > >> together)
> > >>
> > >> * Customer remove his data, but radosgw didn't clean up the bucket
> index
> > >> ** there are no delete requests in the buckets usage log.
> > >> ** customer told us, that they do not have a delete job for this
> bucket
> > >>
> > >> So I am lost with ideas that I could check, and hope that you people
> > >> might be able to help with further ideas.
> > >>
> > >>
> > >>
> > >>
> > >> --
> > >> Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend
> > >> im groüen Saal.
> > >> _______________________________________________
> > >> ceph-users mailing list -- ceph-users@xxxxxxx
> > >> To unsubscribe send an email to ceph-users-leave@xxxxxxx
> > >
> > >
> > >
> > > --
> > > Rafael Lopez
> > > Devops Systems Engineer
> > > Monash University eResearch Centre
> > >
> > > E: rafael.lopez@xxxxxxxxxx
> > >
>


-- 
*Rafael Lopez*
Devops Systems Engineer
Monash University eResearch Centre

T: +61 3 9905 9118
E: rafael.lopez@xxxxxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux