Sorry, I misunderstood the comment. Are there any known workarounds? This seems like a serious index corruption, and it prevents us from using Ceph with Singlestore, so any suggestions would be appreciated!
Best.
-Joseph Victor
On Wed, Jun 9, 2021 at 3:38 PM Yehuda Sadeh-Weinraub <yehuda@xxxxxxxxxx> wrote:
I'm not sure configuring the grace period would help, as there's a bug there.On Wed, Jun 9, 2021 at 2:57 PM Joseph Victor <joseph@xxxxxxxxxxxxxxx> wrote:Thanks for the response! This issue seems like precisely the issue we saw...Can the grace period be configured? Our logging suggests the PUT and list happen within the same millisecond.Best,-Joseph VictorOn Wed, Jun 9, 2021 at 2:50 PM Yehuda Sadeh-Weinraub <yehuda@xxxxxxxxxx> wrote:On Tue, Jun 8, 2021 at 7:13 PM Joseph Victor <joseph@xxxxxxxxxxxxxxx> wrote:Hey all, we were doing some testing of ceph against our product and we found some behavior we want to run by you.We are using the S3 ceph interface.Attached is a python file using boto3 which, when run against two different deployments of ceph (octopus ceph nano and our production nautilus 14.2.11 deployment), appears to repro a strange issue.After running for a while, a recently uploaded file forever disappears from list_objects requests. This file still appears to be visible to get_object if you know the specific name, but does not show up in list_objects.There are more details about the experiment in the attached python file.We produced a run of this experiment with debug logging, in which we see a trace messageRGWRados::cls_bucket_list_ordered: skipping <filename>In the same millisecond that the file was PUT.Reading the code, this comes from when a call to check_disk_state returns ENOENT, where we seeif (!list_state.is_delete_marker() && !astate->exists) {/* object doesn't exist right now -- hopefully because it's
* marked as !exists and got deleted */
if (list_state.exists) {
/* FIXME: what should happen now? Work out if there are any
* non-bad ways this could happen (there probably are, but annoying
* to handle!) */
}
// encode a suggested removal of that key
list_state.ver.epoch = io_ctx.get_last_version();
list_state.ver.pool = io_ctx.get_id();
cls_rgw_encode_suggestion(CEPH_RGW_REMOVE, list_state, suggested_updates);
return -ENOENT;
}It seems like this might be some kind of race between PUT and list_object in which some kind of object metadata is apparently deleted... the FIXME is at least a little suspicious :).I would love to know what's going on here, and if there is a fix or workaround we can do to prevent this behavior. Let me know if there is any other information we can provide.This FIXME probably exists there since the dawn of time. The code here identifies that a listed object doesn't exist and sends a suggestion to the index objclass to remove it. However, there should be a long grace period so that recently created object shouldn't be removed by the index (should be handled at src/cls/rgw/cls_rgw.cc iirc). It does sound like a bug that we had seen before, see here:... which I now see is still open. I'm not sure that the fix there doesn't cause other issues.Yehuda_______________________________________________Thank you so much!Best,-Joseph Victor
Dev mailing list -- dev@xxxxxxx
To unsubscribe send an email to dev-leave@xxxxxxx
_______________________________________________ Dev mailing list -- dev@xxxxxxx To unsubscribe send an email to dev-leave@xxxxxxx