Re: All OSD fails after few requests to RGW

Piotr Nowosielski <piotr.nowosielski@xxxxxxxxxxxxxxxx> · Wed, 10 May 2017 11:37:25 +0200

It is difficult for me to clearly state why some PGs have not been migrated.
crushmap settings? Weight of OSD?

One thing is certain - you will not find any information about the split
process in the logs ...

pn

-----Original Message-----
From: Anton Dmitriev [mailto:tech@xxxxxxxxxx]
Sent: Wednesday, May 10, 2017 10:14 AM
To: Piotr Nowosielski <piotr.nowosielski@xxxxxxxxxxxxxxxx>;
ceph-users@xxxxxxxxxxxxxx
Subject: Re:  All OSD fails after few requests to RGW

When I created cluster, I made a mistake in configuration, and set split
parameter to 32 and merge to 40, so 32*40*16 = 20480 files per folder.
After that I changed split to 8, and increased number of pg and pgp from
2048 to 4096 for pool, where problem occurs. While it was backfilling I
observed, that placement groups were backfilling from one set of 3 OSD to
another set of 3 OSD (replicated size = 3), so I made a conclusion, that PGs
are completely recreating while increasing PG and PGP for pool and after
this process number of files per directory must be Ok. But when backfilling
finished I found many directories in this pool with ~20
000 files. Why Increasing PG num did not helped? Or maybe after this process
some files will be deleted with some delay?

I couldn`t find any information about directory split process in logs, also
with osd and filestore debug 20. What pattern and in what log I need to grep
for finding it?

On 10.05.2017 10:36, Piotr Nowosielski wrote:
> You can:
> - change these parameters and use ceph-objectstore-tool
> - add OSD host - rebuild the cluster will reduce the number of files
> in the directories
> - wait until "split" operations are over ;-)
>
> In our case, we could afford to wait until the "split" operation is
> over (we have 2 clusters in slightly different configurations storing
> the same data)
>
> hint:
> When creating a new pool, use the parameter "expected_num_objects"
> https://www.suse.com/documentation/ses-4/book_storage_admin/data/ceph_
> pools_operate.html
>
> Piotr Nowosielski
> Senior Systems Engineer
> Zespół Infrastruktury 5
> Grupa Allegro sp. z o.o.
> Tel: +48 512 08 55 92
>
>
> -----Original Message-----
> From: Anton Dmitriev [mailto:tech@xxxxxxxxxx]
> Sent: Wednesday, May 10, 2017 9:19 AM
> To: Piotr Nowosielski <piotr.nowosielski@xxxxxxxxxxxxxxxx>;
> ceph-users@xxxxxxxxxxxxxx
> Subject: Re:  All OSD fails after few requests to RGW
>
> How did you solved it? Set new split/merge thresholds, and manually
> applied it by ceph-objectstore-tool --data-path
> /var/lib/ceph/osd/ceph-${osd_num} --journal-path
> /var/lib/ceph/osd/ceph-${osd_num}/journal
> --log-file=/var/log/ceph/objectstore_tool.${osd_num}.log --op
> apply-layout-settings --pool default.rgw.buckets.data
>
> on each OSD?
>
> How I can see in logs, that split occurs?
>
> On 10.05.2017 10:13, Piotr Nowosielski wrote:
>> Hey,
>> We had similar problems. Look for information on "Filestore merge and
>> split".
>>
>> Some explain:
>> The OSD, after reaching a certain number of files in the directory
>> (it depends of 'filestore merge threshold' and 'filestore split multiple'
>> parameters) rebuilds the structure of this directory.
>> If the files arrives, the OSD creates new subdirectories and moves
>> some of the files there.
>> If the files are missing the OSD will reduce the number of
>> subdirectories.
>>
>>
>> --
>> Piotr Nowosielski
>> Senior Systems Engineer
>> Zespół Infrastruktury 5
>> Grupa Allegro sp. z o.o.
>> Tel: +48 512 08 55 92
>>
>> Grupa Allegro Sp. z o.o. z siedzibą w Poznaniu, 60-166 Poznań, przy ul.
>> Grunwaldzka 182, wpisana do rejestru przedsiębiorców prowadzonego
>> przez Sąd Rejonowy Poznań - Nowe Miasto i Wilda, Wydział VIII
>> Gospodarczy Krajowego Rejestru Sądowego pod numerem KRS 0000268796, o
>> kapitale zakładowym w wysokości 33 976 500,00 zł, posiadająca numer
>> identyfikacji podatkowej NIP: 5272525995.
>>
>>
>>
>> -----Original Message-----
>> From: ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx] On Behalf
>> Of Anton Dmitriev
>> Sent: Wednesday, May 10, 2017 8:14 AM
>> To: ceph-users@xxxxxxxxxxxxxx
>> Subject: Re:  All OSD fails after few requests to RGW
>>
>> Hi!
>>
>> I increased pg_num and pgp_num for pool default.rgw.buckets.data from
>> 2048 to 4096, and it seems that situation became a bit better,
>> cluster dies after 20-30 PUTs, not after 1. Could someone please give
>> me some recommendations how to rescue the cluster?
>>
>> On 27.04.2017 09:59, Anton Dmitriev wrote:
>>> Cluster was going well for a long time, but on the previous week
>>> osds start to fail.
>>> We use cluster like image storage for Opennebula with small load and
>>> like object storage with high load.
>>> Sometimes disks of some osds utlized by 100 %, iostat shows avgqu-sz
>>> over 1000, while reading or writing a few kilobytes in a second,
>>> osds on this disks become unresponsive and cluster marks them down.
>>> We lower the load to object storage and situation became better.
>>>
>>> Yesterday situation became worse:
>>> If RGWs are disabled and there is no requests to object storage
>>> cluster performing well, but if enable RGWs and make a few PUTs or
>>> GETs all not SSD osds on all storages become in the same situation,
>>> described above.
>>> IOtop shows, that xfsaild/<disk> burns disks.
>>>
>>> trace-cmd record -e xfs\*  for a 10 seconds shows 10 milion objects,
>>> as i understand it means ~360 000 objects to push per one osd for a
>>> 10 seconds
>>>      $ wc -l t.t
>>> 10256873 t.t
>>>
>>> fragmentation on one of such disks is about 3%
>>>
>>> more information about cluster:
>>>
>>> https://yadi.sk/d/Y63mXQhl3HPvwt
>>>
>>> also debug logs for osd.33 while problem occurs
>>>
>>> https://yadi.sk/d/kiqsMF9L3HPvte
>>>
>>> debug_osd = 20/20
>>> debug_filestore = 20/20
>>> debug_tp = 20/20
>>>
>>>
>>>
>>> Ubuntu 14.04
>>> $ uname -a
>>> Linux storage01 4.2.0-42-generic #49~14.04.1-Ubuntu SMP Wed Jun 29
>>> 20:22:11 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
>>>
>>> Ceph 10.2.7
>>>
>>> 7 storages: Supermicro 28 osd 4tb 7200 JBOD + journal raid10 4 ssd
>>> intel 3510 800gb + 2 osd SSD intel 3710 400gb for rgw meta and index
>>> One of this storages differs only in number of osd, it has 26 osd on
>>> 4tb, instead of 28 on others
>>>
>>> Storages connect to each other by bonded 2x10gbit Clients connect to
>>> storages by bonded 2x1gbit
>>>
>>> in 5 storages 2 x CPU E5-2650v2  and 256 gb RAM in 2 storages 2 x
>>> CPU
>>> E5-2690v3  and 512 gb RAM
>>>
>>> 7 mons
>>> 3 rgw
>>>
>>> Help me please to rescue the cluster.
>>>
>>>
>> --
>> Dmitriev Anton
>>
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users@xxxxxxxxxxxxxx
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
> --
> Dmitriev Anton

--
Dmitriev Anton
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com