It is difficult for me to clearly state why some PGs have not been migrated. crushmap settings? Weight of OSD? One thing is certain - you will not find any information about the split process in the logs ... pn -----Original Message----- From: Anton Dmitriev [mailto:tech@xxxxxxxxxx] Sent: Wednesday, May 10, 2017 10:14 AM To: Piotr Nowosielski <piotr.nowosielski@xxxxxxxxxxxxxxxx>; ceph-users@xxxxxxxxxxxxxx Subject: Re: All OSD fails after few requests to RGW When I created cluster, I made a mistake in configuration, and set split parameter to 32 and merge to 40, so 32*40*16 = 20480 files per folder. After that I changed split to 8, and increased number of pg and pgp from 2048 to 4096 for pool, where problem occurs. While it was backfilling I observed, that placement groups were backfilling from one set of 3 OSD to another set of 3 OSD (replicated size = 3), so I made a conclusion, that PGs are completely recreating while increasing PG and PGP for pool and after this process number of files per directory must be Ok. But when backfilling finished I found many directories in this pool with ~20 000 files. Why Increasing PG num did not helped? Or maybe after this process some files will be deleted with some delay? I couldn`t find any information about directory split process in logs, also with osd and filestore debug 20. What pattern and in what log I need to grep for finding it? On 10.05.2017 10:36, Piotr Nowosielski wrote: > You can: > - change these parameters and use ceph-objectstore-tool > - add OSD host - rebuild the cluster will reduce the number of files > in the directories > - wait until "split" operations are over ;-) > > In our case, we could afford to wait until the "split" operation is > over (we have 2 clusters in slightly different configurations storing > the same data) > > hint: > When creating a new pool, use the parameter "expected_num_objects" > https://www.suse.com/documentation/ses-4/book_storage_admin/data/ceph_ > pools_operate.html > > Piotr Nowosielski > Senior Systems Engineer > Zespół Infrastruktury 5 > Grupa Allegro sp. z o.o. > Tel: +48 512 08 55 92 > > > -----Original Message----- > From: Anton Dmitriev [mailto:tech@xxxxxxxxxx] > Sent: Wednesday, May 10, 2017 9:19 AM > To: Piotr Nowosielski <piotr.nowosielski@xxxxxxxxxxxxxxxx>; > ceph-users@xxxxxxxxxxxxxx > Subject: Re: All OSD fails after few requests to RGW > > How did you solved it? Set new split/merge thresholds, and manually > applied it by ceph-objectstore-tool --data-path > /var/lib/ceph/osd/ceph-${osd_num} --journal-path > /var/lib/ceph/osd/ceph-${osd_num}/journal > --log-file=/var/log/ceph/objectstore_tool.${osd_num}.log --op > apply-layout-settings --pool default.rgw.buckets.data > > on each OSD? > > How I can see in logs, that split occurs? > > On 10.05.2017 10:13, Piotr Nowosielski wrote: >> Hey, >> We had similar problems. Look for information on "Filestore merge and >> split". >> >> Some explain: >> The OSD, after reaching a certain number of files in the directory >> (it depends of 'filestore merge threshold' and 'filestore split multiple' >> parameters) rebuilds the structure of this directory. >> If the files arrives, the OSD creates new subdirectories and moves >> some of the files there. >> If the files are missing the OSD will reduce the number of >> subdirectories. >> >> >> -- >> Piotr Nowosielski >> Senior Systems Engineer >> Zespół Infrastruktury 5 >> Grupa Allegro sp. z o.o. >> Tel: +48 512 08 55 92 >> >> Grupa Allegro Sp. z o.o. z siedzibą w Poznaniu, 60-166 Poznań, przy ul. >> Grunwaldzka 182, wpisana do rejestru przedsiębiorców prowadzonego >> przez Sąd Rejonowy Poznań - Nowe Miasto i Wilda, Wydział VIII >> Gospodarczy Krajowego Rejestru Sądowego pod numerem KRS 0000268796, o >> kapitale zakładowym w wysokości 33 976 500,00 zł, posiadająca numer >> identyfikacji podatkowej NIP: 5272525995. >> >> >> >> -----Original Message----- >> From: ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx] On Behalf >> Of Anton Dmitriev >> Sent: Wednesday, May 10, 2017 8:14 AM >> To: ceph-users@xxxxxxxxxxxxxx >> Subject: Re: All OSD fails after few requests to RGW >> >> Hi! >> >> I increased pg_num and pgp_num for pool default.rgw.buckets.data from >> 2048 to 4096, and it seems that situation became a bit better, >> cluster dies after 20-30 PUTs, not after 1. Could someone please give >> me some recommendations how to rescue the cluster? >> >> On 27.04.2017 09:59, Anton Dmitriev wrote: >>> Cluster was going well for a long time, but on the previous week >>> osds start to fail. >>> We use cluster like image storage for Opennebula with small load and >>> like object storage with high load. >>> Sometimes disks of some osds utlized by 100 %, iostat shows avgqu-sz >>> over 1000, while reading or writing a few kilobytes in a second, >>> osds on this disks become unresponsive and cluster marks them down. >>> We lower the load to object storage and situation became better. >>> >>> Yesterday situation became worse: >>> If RGWs are disabled and there is no requests to object storage >>> cluster performing well, but if enable RGWs and make a few PUTs or >>> GETs all not SSD osds on all storages become in the same situation, >>> described above. >>> IOtop shows, that xfsaild/<disk> burns disks. >>> >>> trace-cmd record -e xfs\* for a 10 seconds shows 10 milion objects, >>> as i understand it means ~360 000 objects to push per one osd for a >>> 10 seconds >>> $ wc -l t.t >>> 10256873 t.t >>> >>> fragmentation on one of such disks is about 3% >>> >>> more information about cluster: >>> >>> https://yadi.sk/d/Y63mXQhl3HPvwt >>> >>> also debug logs for osd.33 while problem occurs >>> >>> https://yadi.sk/d/kiqsMF9L3HPvte >>> >>> debug_osd = 20/20 >>> debug_filestore = 20/20 >>> debug_tp = 20/20 >>> >>> >>> >>> Ubuntu 14.04 >>> $ uname -a >>> Linux storage01 4.2.0-42-generic #49~14.04.1-Ubuntu SMP Wed Jun 29 >>> 20:22:11 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux >>> >>> Ceph 10.2.7 >>> >>> 7 storages: Supermicro 28 osd 4tb 7200 JBOD + journal raid10 4 ssd >>> intel 3510 800gb + 2 osd SSD intel 3710 400gb for rgw meta and index >>> One of this storages differs only in number of osd, it has 26 osd on >>> 4tb, instead of 28 on others >>> >>> Storages connect to each other by bonded 2x10gbit Clients connect to >>> storages by bonded 2x1gbit >>> >>> in 5 storages 2 x CPU E5-2650v2 and 256 gb RAM in 2 storages 2 x >>> CPU >>> E5-2690v3 and 512 gb RAM >>> >>> 7 mons >>> 3 rgw >>> >>> Help me please to rescue the cluster. >>> >>> >> -- >> Dmitriev Anton >> >> _______________________________________________ >> ceph-users mailing list >> ceph-users@xxxxxxxxxxxxxx >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > -- > Dmitriev Anton -- Dmitriev Anton _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com