Re: Fixing a rgw bucket index

Yehuda Sadeh <yehuda@xxxxxxxxxxx> · Mon, 8 Apr 2013 11:26:00 -0700

This one fails because copy object into itself would only work if
replacing it's attrs (X_AMZ_METADATA_DIRECTIVE=REPLACE).

On Mon, Apr 8, 2013 at 10:35 AM, Erdem Agaoglu <erdem.agaoglu@xxxxxxxxx> wrote:
> This is the log grepped with the relevant threadid. It shows 400 in the last
> lines but nothing seems odd besides that.
> http://pastebin.com/xWCYmnXV
>
> Thanks for your interest.
>
>
> On Mon, Apr 8, 2013 at 8:21 PM, Yehuda Sadeh <yehuda@xxxxxxxxxxx> wrote:
>>
>> Each bucket has a unique prefix which you can get by doing radosgw-admin
>> bucket stats on that bucket. You can grep that prefix in 'rados ls -p
>> .rgw.buckets'.
>>
>> Do you have any rgw log showing why you get the Invalid Request response?
>> Can you also add 'debug ms = 1' for the log?
>>
>> Thanks
>>
>>
>> On Mon, Apr 8, 2013 at 10:12 AM, Erdem Agaoglu <erdem.agaoglu@xxxxxxxxx>
>> wrote:
>>>
>>> Just tried that file:
>>>
>>> $ s3cmd mv s3://imgiz/data/avatars/492/492923.jpg
>>> s3://imgiz/data/avatars/492/492923.jpg
>>> ERROR: S3 error: 400 (InvalidRequest)
>>>
>>> a more verbose output shows that the sign-headers was
>>>
>>> 'PUT\n\n\n\nx-amz-copy-source:/imgiz/data/avatars/492/492923.jpg\nx-amz-date:Mon,
>>> 08 Apr 2013 16:59:30
>>> +0000\nx-amz-metadata-directive:COPY\n/imgiz/data/avatars/492/492923.jpg'
>>>
>>> But i guess it doesn't work even if the index is correct. I get the same
>>> response on a clear bucket too.
>>>
>>> We might try that but we don't have a file list. I guess its possible
>>> with 'rados ls | grep | sed' ?
>>>
>>>
>>> On Mon, Apr 8, 2013 at 7:53 PM, Yehuda Sadeh <yehuda@xxxxxxxxxxx> wrote:
>>>>
>>>> Can you try copying one of these objects to itself? Would that work
>>>> and/or change the index entry? Another option would be to try copying all
>>>> the objects to a different bucket.
>>>>
>>>>
>>>> On Mon, Apr 8, 2013 at 9:48 AM, Erdem Agaoglu <erdem.agaoglu@xxxxxxxxx>
>>>> wrote:
>>>>>
>>>>> omap header and all other omap attributes was destroyed. I copied
>>>>> another index over the destroyed one to get a somewhat valid header and it
>>>>> seems intact. After a 'check --fix':
>>>>>
>>>>>  # rados -p .rgw.buckets getomapheader .dir.4470.1
>>>>> header (49 bytes) :
>>>>> 0000 : 03 02 2b 00 00 00 01 00 00 00 01 02 02 18 00 00 :
>>>>> ..+.............
>>>>> 0010 : 00 7d 7a 3f 6e 01 00 00 00 00 d0 00 7e 01 00 00 :
>>>>> .}z?n.......~...
>>>>> 0020 : 00 bb f5 01 00 00 00 00 00 00 00 00 00 00 00 00 :
>>>>> ................
>>>>> 0030 : 00                                              : .
>>>>>
>>>>>
>>>>> Rados shows objects are there:
>>>>>
>>>>> # rados ls -p .rgw.buckets |grep 4470.1_data/avatars
>>>>> 4470.1_data/avatars/11047/11047823_20101211154308.jpg
>>>>> 4470.1_data/avatars/106/106976-orig
>>>>> 4470.1_data/avatars/492/492923.jpg
>>>>> 4470.1_data/avatars/275/275479.jpg
>>>>> ...
>>>>>
>>>>>
>>>>> And i am able to GET them
>>>>>
>>>>> $ s3cmd get s3://imgiz/data/avatars/492/492923.jpg
>>>>> s3://imgiz/data/avatars/492/492923.jpg -> ./492923.jpg  [1 of 1]
>>>>>  3587 of 3587   100% in    0s    93.40 kB/s  done
>>>>>
>>>>>
>>>>> But unable to list them
>>>>>
>>>>> $ s3cmd ls s3://imgiz/data/avatars
>>>>> <NOTHING>
>>>>>
>>>>>
>>>>> My initial expectation was that 'bucket check --fix --check-objects'
>>>>> will actually read the files like 'rados ls' does and would recreate the
>>>>> missing omapkeys but it doesn't seem to do that. Now a simple check says
>>>>>
>>>>> # radosgw-admin bucket check -b imgiz
>>>>> { "existing_header": { "usage": { "rgw.main": { "size_kb": 6000607,
>>>>>               "size_kb_actual": 6258740,
>>>>>               "num_objects": 128443}}},
>>>>>   "calculated_header": { "usage": { "rgw.main": { "size_kb": 6000607,
>>>>>               "size_kb_actual": 6258740,
>>>>>               "num_objects": 128443}}}}
>>>>>
>>>>> But i know we have more than 128k objects.
>>>>>
>>>>>
>>>>>
>>>>> On Mon, Apr 8, 2013 at 7:17 PM, Yehuda Sadeh <yehuda@xxxxxxxxxxx>
>>>>> wrote:
>>>>>>
>>>>>> We'll need to have more info about the current state. Was just the
>>>>>> omap header destroyed, or does it still exist? What does the header
>>>>>> contain now? Are you able to actually access objects in that bucket,
>>>>>> but just fail to list them?
>>>>>>
>>>>>> On Mon, Apr 8, 2013 at 8:34 AM, Erdem Agaoglu
>>>>>> <erdem.agaoglu@xxxxxxxxx> wrote:
>>>>>> > Hi again,
>>>>>> >
>>>>>> > I managed to change the file with some other bucket's index.
>>>>>> > --check-objects
>>>>>> > --fix worked but my hopes have failed as it didn't actually read
>>>>>> > through the
>>>>>> > files or fixed anything. Any suggestions?
>>>>>> >
>>>>>> >
>>>>>> > On Thu, Apr 4, 2013 at 5:56 PM, Erdem Agaoglu
>>>>>> > <erdem.agaoglu@xxxxxxxxx>
>>>>>> > wrote:
>>>>>> >>
>>>>>> >> Hi all,
>>>>>> >>
>>>>>> >> After a major failure, and getting our cluster health back OK (with
>>>>>> >> some
>>>>>> >> help from inktank folks, thanks), we found out that we have managed
>>>>>> >> to
>>>>>> >> corrupt one of our bucket indices. As far as i can track it, we are
>>>>>> >> missing
>>>>>> >> the omapheader on that specific index, so we're unable to use
>>>>>> >> radosgw-admin
>>>>>> >> tools to fix it.
>>>>>> >>
>>>>>> >> While a healthy (smaller) bucket answers
>>>>>> >> # radosgw-admin bucket check -b imgdoviz
>>>>>> >> { "existing_header": { "usage": { "rgw.main": { "size_kb": 4140,
>>>>>> >>               "size_kb_actual": 4484,
>>>>>> >>               "num_objects": 157}}},
>>>>>> >>   "calculated_header": { "usage": { "rgw.main": { "size_kb": 4140,
>>>>>> >>               "size_kb_actual": 4484,
>>>>>> >>               "num_objects": 157}}}}
>>>>>> >>
>>>>>> >> The faulty one fails with
>>>>>> >> # radosgw-admin bucket check -b imgiz
>>>>>> >> failed to list objects in bucket=imgiz(@.rgw.buckets[4470.1])
>>>>>> >> err=(22)
>>>>>> >> Invalid argument
>>>>>> >> failed to check index err=(22) Invalid argument
>>>>>> >>
>>>>>> >> When i push further
>>>>>> >> # radosgw-admin bucket check -b imgiz --check-objects --fix
>>>>>> >> failed to list objects in bucket=imgiz(@.rgw.buckets[4470.1])
>>>>>> >> err=(22)
>>>>>> >> Invalid argument
>>>>>> >> Checking objects, decreasing bucket 2-phase commit timeout.
>>>>>> >> ** Note that timeout will reset only when operation completes
>>>>>> >> successfully
>>>>>> >> **
>>>>>> >> ERROR: failed operation r=-22
>>>>>> >> ERROR: failed operation r=-22
>>>>>> >> ..
>>>>>> >> last line keeps repeating without any progress.
>>>>>> >>
>>>>>> >> I checked the file omapheaders and while the healty bucket has:
>>>>>> >> # rados -p .rgw.buckets getomapheader .dir.6912.3
>>>>>> >> header (49 bytes) :
>>>>>> >> 0000 : 03 02 2b 00 00 00 01 00 00 00 01 02 02 18 00 00 :
>>>>>> >> ..+.............
>>>>>> >> 0010 : 00 a8 af 40 00 00 00 00 00 00 10 46 00 00 00 00 :
>>>>>> >> ...@.......F....
>>>>>> >> 0020 : 00 9d 00 00 00 00 00 00 00 00 00 00 00 00 00 00 :
>>>>>> >> ................
>>>>>> >> 0030 : 00                                              : .
>>>>>> >>
>>>>>> >> the faulty one is missing it
>>>>>> >> # rados -p .rgw.buckets getomapheader .dir.4470.1
>>>>>> >> header (0 bytes) :
>>>>>> >>
>>>>>> >>
>>>>>> >> I'm currently in the process of understanding how to create a
>>>>>> >> readable
>>>>>> >> header. My hopes are even while its wrong, radosgw-admin will be
>>>>>> >> able to
>>>>>> >> read through objects and fix the necessary parts. But i'm not sure
>>>>>> >> how to
>>>>>> >> set the new-header. It seems there is a setomapheader counterpart
>>>>>> >> for
>>>>>> >> getomapheader but it only accepts values from commandline so i
>>>>>> >> don't know
>>>>>> >> how to push a rgw-readable binary with it.
>>>>>> >>
>>>>>> >> Is this somewhat possible?
>>>>>> >>
>>>>>> >> Thanks in advance.
>>>>>> >>
>>>>>> >> --
>>>>>> >> erdem agaoglu
>>>>>> >
>>>>>> >
>>>>>> >
>>>>>> >
>>>>>> > --
>>>>>> > erdem agaoglu
>>>>>> >
>>>>>> > _______________________________________________
>>>>>> > ceph-users mailing list
>>>>>> > ceph-users@xxxxxxxxxxxxxx
>>>>>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>>>> >
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> erdem agaoglu
>>>>
>>>>
>>>
>>>
>>>
>>> --
>>> erdem agaoglu
>>
>>
>
>
>
> --
> erdem agaoglu
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com