On Tue, Apr 26, 2016 at 3:21 PM, Yehuda Sadeh-Weinraub <yehuda@xxxxxxxxxx> wrote: > On Tue, Apr 26, 2016 at 10:37 AM, Abhishek Lekshmanan <abhishek@xxxxxxxx> wrote: >> >> Abhishek L writes: >> >>> Yehuda Sadeh-Weinraub writes: >>> >>>> On Mon, Apr 25, 2016 at 1:17 AM, Abhishek Lekshmanan <abhishek@xxxxxxxx> wrote: >>>>> >>>>> Yehuda Sadeh-Weinraub writes: >>>>> >>>>>> On Tue, Apr 19, 2016 at 11:08 AM, Yehuda Sadeh-Weinraub >>>>>> <yehuda@xxxxxxxxxx> wrote: >>>>>>> On Tue, Apr 19, 2016 at 10:54 AM, Abhishek L >>>>>>> <abhishek.lekshmanan@xxxxxxxxx> wrote: >>>>>>>> >>>>>>>> Yehuda Sadeh-Weinraub writes: >>>>>>>> >>>>>>>>> On Tue, Apr 19, 2016 at 9:10 AM, Abhishek Lekshmanan <abhishek@xxxxxxxx> wrote: >>>>>>>>>> Trying deleting objects & buckets from a secondary zone in a RGW >>>>>>>>>> multisite configuration leads to some wierdness: >>>>>>>>>> >>>>>>>>>> 1. On deleting an object and the bucket immediately will mostly lead to >>>>>>>>>> object and bucket getting deleted in the secondary zone, but since we >>>>>>>>>> forward the bucket deletion to master only after we delete in secondary >>>>>>>>>> it will fail with 409 (BucketNotEmpty) and gets reraised as a 500 to the >>>>>>>>>> client. This _seems_ simple enough to fix if we forward the bucket >>>>>>>>>> deletion request to master zone before attempting deletion locally, >>>>>>>>>> (issue: http://tracker.ceph.com/issues/15540, possible fix: https://github.com/ceph/ceph/pull/8655) >>>>>>>>>> >>>>>>>>> >>>>>>>>> Yeah, this looks good. We'll get it through testing. >>>>>>>>> >>>>>>>>>> 2. Deletion of objects themselves: deletion of objects themselves seems >>>>>>>>>> to be a bit racy, deleting an object on a secondary zone succeeds, >>>>>>>>>> listing the bucket seems to show an empty list, but gets populated with >>>>>>>>>> the object again sometimes (this time with a newer timestamp), this is >>>>>>>>>> not always guaranteed to be reproduce, but I've seen this often with >>>>>>>>>> multipart uploads, as an eg: >>>>>>>>>> >>>>>>>>>> $ s3 -u list test-mp >>>>>>>>>> Key Last Modified Size >>>>>>>>>> -------------------------------------------------- -------------------- ----- >>>>>>>>>> test.img 2016-04-19T13:00:17Z 40M >>>>>>>>>> $ s3 -u delete test-mp/test.img >>>>>>>>>> $ s3 -u list test-mp >>>>>>>>>> Key Last Modified Size >>>>>>>>>> -------------------------------------------------- -------------------- ----- >>>>>>>>>> test.img 2016-04-19T13:00:45Z 40M >>>>>>>>>> $ s3 -u delete test-mp/test.img # wait for a min >>>>>>>>>> $ s3 -us list test-mp >>>>>>>>>> -------------------------------------------------- -------------------- ----- >>>>>>>>>> test.img 2016-04-19T13:01:52Z 40M >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Mostly seeing log entries of this form in both the cases ie. where >>>>>>>>>> delete object seems to be successfully delete in both master and >>>>>>>>>> secondary zone and the case where it succeeds in master and fails in >>>>>>>>>> secondary : >>>>>>>>>> >>>>>>>>>> 20 parsed entry: id=00000000027.27.2 iter->object=foo iter->instance= name=foo instance= ns= >>>>>>>>>> 20 [inc sync] skipping object: dkr:d8e0ec3d-b3da-43f8-a99b-38a5b4941b6f.14113.2:-1/foo: non-complete operation >>>>>>>>>> 20 parsed entry: id=00000000028.28.2 iter->object=foo iter->instance= name=foo instance= ns= >>>>>>>>>> 20 [inc sync] skipping object: dkr:d8e0ec3d-b3da-43f8-a99b-38a5b4941b6f.14113.2:-1/foo: canceled operation >>>>>>>>>> >>>>>>>>>> Any ideas on this? >>>>>>>>>> >>>>>>>>> >>>>>>>>> Do you have more than 2 zones syncing? Is it an object delete that >>>>>>>>> came right after the object creation? >>>>>>>> >>>>>>>> Only 2 zones ie. one master and one secondary, req, on secondary. The delete came right after the >>>>>>>> create though >>>>>>> >>>>>>> There are two issues that I see here. One is that we sync an object, >>>>>>> but end up with different mtime than the object's source. The second >>>>>>> issue is that we shouldn't have synced that object. >>>>>>> >>>>>>> There needs to be a check when syncing objects, to validate that we >>>>>>> don't sync an object that originated from the current zone (by >>>>>>> comparing the short zone id). We might be missing that. >>>>>>> >>>>>> >>>>>> For the first issue, see: >>>>>> https://github.com/ceph/ceph/pull/8685 >>>>>> >>>>>> However, create that follows by a delete will still be a problem, as >>>>>> when we sync the object we check it against the source mtime is newer >>>>>> than the destination mtime. This is problematic with deletes, as these >>>>>> don't have mtime once the object is removed. I think the solution >>>>>> would be by using temporary tombstone objects (we already have the olh >>>>>> framework that can provide what we need), that we'll garbage collect. >>>>> >>>>> Further information from logs if it helps: >>>>> >>>>> 2016-04-19 17:00:45.539356 7fc99effd700 0 _send_request(): deleting obj=test-mp:test.img >>>>> 2016-04-19 17:00:45.539902 7fc99effd700 20 _send_request(): skipping object removal obj=test-mp:test.img (obj mtime=2016-04-19 17:00:26.0.098255s, request timestamp=2016-04-19 17:00:17.0.395208s) >>>>> >>>>> This is what the master zone logs show, however the request timestamp >>>>> logged here is the `If-Modified-Since` value from secondary zone when >>>>> the actual object write was completed (and not the time when deletion >>>>> was completed), do we set the value of the deletion time anywhere else >>>>> in the BI log >>>>> >>>>> >>>> >>>> Did you apply PR 8685? >>>> >>>> Also, take a look at this: >>>> >>>> https://github.com/ceph/ceph/pull/8709 >>>> >>>> With the new code we do store the object creation time in the delete >>>> bucket index entry. That way we make sure we only sync object removal, >>>> if the object was the same or older than the one that was actually >>>> removed. >> >> Applied both PRs atop of master + 8655, basically now the object doesn't >> resync back to the secondary zone after deletion which we observed >> before.A create followed by an immediate delete succeeds delete in both the >> zones almost every time. >> >> However allowing the object to sync to primary by introducing a delay, >> for eg a script on secondary like: >> >> for i in {1..20}; do s3 -us put foobar/foo$i filename=test.img && sleep 3 && s3 -us delete foobar/foo$i; done >> >> gives a empty list on the secondary zone, on the primary zone however it >> looks like very few objects got actually deleted and others are still existing >> >> $ s3 -us list foobar >> Content-Type: application/xml >> Request-Id: tx000000000000000001db8-00571f9cdb-1015-us-east-1 >> Content-Length: 4713 >> Key Last Modified Size >> -------------------------------------------------- -------------------- ----- >> foo1 2016-04-26T14:24:25Z 87M >> foo10 2016-04-26T14:27:28Z 87M >> foo11 2016-04-26T14:27:50Z 87M >> foo12 2016-04-26T14:28:12Z 87M >> foo14 2016-04-26T14:28:48Z 87M >> foo15 2016-04-26T14:29:09Z 87M >> foo16 2016-04-26T14:29:30Z 87M >> foo17 2016-04-26T14:29:51Z 87M >> foo18 2016-04-26T14:30:12Z 87M >> foo19 2016-04-26T14:30:33Z 87M >> foo2 2016-04-26T14:24:47Z 87M >> foo20 2016-04-26T14:30:54Z 87M >> foo3 2016-04-26T14:25:07Z 87M >> foo6 2016-04-26T14:26:05Z 87M >> foo7 2016-04-26T14:26:24Z 87M >> foo8 2016-04-26T14:26:47Z 87M >> foo9 2016-04-26T14:27:07Z 87M >> >> Logs show this in case of failed deletes: >> 2016-04-26 18:31:01.793673 7ff3f1ffb700 10 If-UnModified-Since: 2016-04-26 18:30:54.0.751623s Last-Modified: 0.000000 >> 2016-04-26 18:31:01.793716 7ff3f1ffb700 20 _send_request(): delete_obj() obj=foobar3:foo20 returned ret=-2 >> >> In case of an object that succeeded deletion: >> 2016-04-26 18:28:40.673885 7ff3f27fc700 10 If-UnModified-Since: 2016-04-26 18:28:30.0.510155s Last-Modified: 2016-04-26 18:28:30.510155 >> >> More interesting log: >> 2016-04-26 18:25:52.660674 7ff3cf7fe700 10 If-UnModified-Since: 2016-04-26 18:25:46.0.700052s Last-Modified: 0.000000 >> 2016-04-26 18:25:52.660698 7ff3cf7fe700 20 _send_request(): delete_obj() obj=foobar3:foo5 returned ret=-2 >> ... >> 2016-04-26 18:25:59.341420 7ff3a67fc700 10 If-UnModified-Since: 2016-04-26 18:25:46.0.700052s Last-Modified: 2016-04-26 18:25:46.700052 >> [this was object foo5.. which was one of the objects that got deleted] >> >> Let me know if any other log info may be helpful (only debug rgw was >> set, debug ms was 0) > > Yes, this would be great. If you could provide log with 'debug rgw = > 20' and 'debug ms = 1' it'd be helpful. I'm trying to reproduce the > issue, and was able to get some bad behavior, but different than what > you describe. Please take a look at PR 8772, and let me know whether things work differently for you. Thanks, Yehuda > >> >> Possibly unrelated, a pkill radosgw sometimes used to throw an error >> message with segfault, haven't seen this before though: >> *** Caught signal (Segmentation fault) ** >> in thread 7fdfbddfcb40 thread_name:radosgw >> > > Do you have a backtrace? I found an issue that might explain this one, > but need a backtrace to validate. > > Thanks, > Yehuda > >> master sha1 before the patches was 25c2f5a793ff726facde236aa0c2dd78cf933c60 >> >> Regards >> Abhishek >>> >>> Hadn't applied the PR yet, I'll apply both and see if I can reproduce >>> the issue again. >>>> >>>> Yehuda >>> >>> Thanks >>> -- >>> Abhishek >> >> -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html