Re: [Octopus] Beware the on-disk conversion

Jack <ceph@xxxxxxxxxxxxxx> · Fri, 3 Apr 2020 17:33:37 +0200

During conversion (or fsck), I stopped all other OSDs
I do not have enought main memory to run that kind of process as well as
the OSDs

osd.0 is a 6TB rusty device, fsck eats 35GB of memory
I have other rusty devices on that host: 3TB and 10TB

Best regards,

On 4/3/20 12:50 PM, Igor Fedotov wrote:
> Thanks, Jack.
> 
> One more question please - what's the actual maximum memory consumption
> for this specific OSD during fsck?
> 
> And is it backed by 3, 6 or 10 TB  drive ?
> 
> 
> Regards,
> 
> Igor
> 
> On 4/2/2020 7:15 PM, Jack wrote:
>> I do compress:
>> root@backup2:~# ceph daemon osd.0 config show | grep
>> bluestore_compression
>>      "bluestore_compression_algorithm": "snappy",
>>      "bluestore_compression_max_blob_size": "0",
>>      "bluestore_compression_max_blob_size_hdd": "524288",
>>      "bluestore_compression_max_blob_size_ssd": "65536",
>>      "bluestore_compression_min_blob_size": "0",
>>      "bluestore_compression_min_blob_size_hdd": "8192",
>>      "bluestore_compression_min_blob_size_ssd": "8192",
>>      "bluestore_compression_mode": "force",
>>      "bluestore_compression_required_ratio": "0.955000",
>>
>> I will deal with the memory consumption
>> After all, it just require more time (starting OSD one by one), and it
>> still fits in my main memory
>>
>> Thank you for checkout out the issue
>>
>>
>> On 4/2/20 5:28 PM, Igor Fedotov wrote:
>>> So this OSD has 32M of shared blobs and fsck loads them all into memory
>>> while processing. Hence the RAM consumption.
>>>
>>>
>>> I'm afraid there is no simple way to fix that, will create a ticket
>>> though.
>>>
>>>
>>> And a side question:
>>>
>>> 1) Do you use erasure coding and/or compression for rbd pool?
>>>
>>> These stats look suspicious
>>>
>>> POOL                        ID  STORED   (DATA)   (OMAP)   OBJECTS
>>> USED     (DATA)   (OMAP)   %USED  MAX AVAIL  QUOTA OBJECTS  QUOTA BYTES
>>> DIRTY    USED COMPR  UNDER COMPR
>>> rbd                          1  245 TiB  245 TiB  9.0 MiB   50.26M 151
>>> TiB  151 TiB  9.0 MiB  90.03     12 TiB  N/A N/A           50.26M
>>> 35 TiB      144 TiB
>>>
>>> Stored - 245 TiB, Used - 151 TiB
>>>
>>> Can't imagine any explanation other than applied compression.
>>>
>>>
>>> Thanks,
>>>
>>> Igor
>>>
>>>
>>>
>>> On 4/2/2020 5:59 PM, Jack wrote:
>>>> Here it is
>>>>
>>>> On 4/2/20 3:48 PM, Igor Fedotov wrote:
>>>>> And may I have the output for:
>>>>>
>>>>> ceph daemon osd.N calc_objectstore_db_histogram
>>>>>
>>>>> This will collect some stats on record types in OSD's DB.
>>>>>
>>>>>
>>>>> On 4/2/2020 4:13 PM, Jack wrote:
>>>>>> (fsck / quick-fix, same story)
>>>>>>
>>>>>> On 4/2/20 3:12 PM, Jack wrote:
>>>>>>> Hi,
>>>>>>>
>>>>>>> A simple fsck eats the same amount of memory
>>>>>>>
>>>>>>> Cluster usage: rbd with a bit of rgw
>>>>>>>
>>>>>>> Here is the ceph df detail
>>>>>>> All OSDs are single rusty devices
>>>>>>>
>>>>>>> On 4/2/20 2:19 PM, Igor Fedotov wrote:
>>>>>>>> Hi Jack,
>>>>>>>>
>>>>>>>> could you please try the following - stop one of already converted
>>>>>>>> OSDs
>>>>>>>> and do a quick-fix/fsck/repair against it using
>>>>>>>> ceph_bluestore_tool:
>>>>>>>>
>>>>>>>> ceph-bluestore-tool --path <path to osd> --command
>>>>>>>> quick-fix|fsck|repair
>>>>>>>>
>>>>>>>> Does it cause similar memory usage?
>>>>>>>>
>>>>>>>> You can stop experimenting if quick-fix reproduces the issue.
>>>>>>>>
>>>>>>>>
>>>>>>>> Also could you please describe your cluster and its usage a bit:
>>>>>>>> what's
>>>>>>>> the usage: rgw/rbd/cephfs? If possible - please share 'ceph df
>>>>>>>> detail'
>>>>>>>> output, do you have standalone DB volume at SSD/NVMe?
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>>
>>>>>>>> Igor
>>>>>>>>
>>>>>>>>
>>>>>>>> On 4/1/2020 6:28 PM, Jack wrote:
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> As the upgrade documentation tells:
>>>>>>>>>> Note that the first time each OSD starts, it will do a format
>>>>>>>>>> conversion to improve the accounting for “omap” data. This may
>>>>>>>>>> take a few minutes to as much as a few hours (for an HDD with
>>>>>>>>>> lots
>>>>>>>>>> of omap data). You can disable this automatic conversion with:
>>>>>>>>> What the documentation does not say is that this process takes a
>>>>>>>>> lot of
>>>>>>>>> memory
>>>>>>>>>
>>>>>>>>> I am upgrading a rusty cluster from Nautilus, you can check out
>>>>>>>>> the
>>>>>>>>> ram
>>>>>>>>> consumption as attachment
>>>>>>>>>
>>>>>>>>> First, we have a 3TB osd conversion: it tooks ~15min, and 19GB of
>>>>>>>>> memory
>>>>>>>>>
>>>>>>>>> Then, we have a larger 6TB osd conversion: it tooks more than 2
>>>>>>>>> hours,
>>>>>>>>> and 35GB of memory
>>>>>>>>>
>>>>>>>>> Finally, you have the largest 10TB osd: only 1H15, but 52GB of
>>>>>>>>> memory
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>>> ceph-users mailing list -- ceph-users@xxxxxxx
>>>>>>>>> To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx