Hi, Am 14.04.2016 um 03:32 schrieb Christian Balzer: > On Wed, 13 Apr 2016 14:51:58 +0200 Michael Metz-Martini | SpeedPartner GmbH wrote: >> Am 13.04.2016 um 04:29 schrieb Christian Balzer: >>> On Tue, 12 Apr 2016 09:00:19 +0200 Michael Metz-Martini | SpeedPartner GmbH wrote: >>>> Am 11.04.2016 um 23:39 schrieb Sage Weil: >>>>> ext4 has never been recommended, but we did test it. After Jewel is >>>>> out, we would like explicitly recommend *against* ext4 and stop >>>>> testing it. >>>> Hmmm. We're currently migrating away from xfs as we had some strange >>>> performance-issues which were resolved / got better by switching to >>>> ext4. We think this is related to our high number of objects (4358 >>>> Mobjects according to ceph -s). >>> It would be interesting to see on how this maps out to the OSDs/PGs. >>> I'd guess loads and loads of subdirectories per PG, which is probably >>> where Ext4 performs better than XFS. >> A simple ls -l takes "ages" on XFS while ext4 lists a directory >> immediately. According to our findings regarding XFS this seems to be >> "normal" behavior. > Just for the record, this is also influenced (for Ext4 at least) on how > much memory you have and the "vm/vfs_cache_pressure" settings. > Once Ext4 runs out of space in SLAB for dentry and ext4_inode_cache > (amongst others), it will become slower as well, since it has to go to the > disk. > Another thing to remember is that "ls" by itself is also a LOT faster than > "ls -l" since it accesses less data. 128 GB RAM for 21 OSD (each 4 TB in size). Kernel so far "untuned" regarding cache-pressure / inode-cache. >> pool name category KB objects >> data - 3240 2265521646 >> document_root - 577364 10150 >> images - 96197462245 2256616709 >> metadata - 1150105 35903724 >> queue - 542967346 173865 >> raw - 36875247450 13095410 >> >> total of 4736 pgs, 6 pools, 124 TB data, 4359 Mobjects >> >> What would you like to see? >> tree? du per Directory? > Just an example tree and typical size of the first "data layer". > [...] First levels seem to be empty, so: ./DIR_3 ./DIR_3/DIR_9 ./DIR_3/DIR_9/DIR_0 ./DIR_3/DIR_9/DIR_0/DIR_0 ./DIR_3/DIR_9/DIR_0/DIR_0/DIR_0 ./DIR_3/DIR_9/DIR_0/DIR_0/DIR_D ./DIR_3/DIR_9/DIR_0/DIR_0/DIR_E ./DIR_3/DIR_9/DIR_0/DIR_0/DIR_A ./DIR_3/DIR_9/DIR_0/DIR_0/DIR_C ./DIR_3/DIR_9/DIR_0/DIR_0/DIR_1 ./DIR_3/DIR_9/DIR_0/DIR_0/DIR_4 ./DIR_3/DIR_9/DIR_0/DIR_0/DIR_2 ./DIR_3/DIR_9/DIR_0/DIR_0/DIR_B ./DIR_3/DIR_9/DIR_0/DIR_0/DIR_5 ./DIR_3/DIR_9/DIR_0/DIR_0/DIR_3 ./DIR_3/DIR_9/DIR_0/DIR_0/DIR_9 ./DIR_3/DIR_9/DIR_0/DIR_0/DIR_6 ./DIR_3/DIR_9/DIR_0/DIR_0/DIR_F ./DIR_3/DIR_9/DIR_0/DIR_0/DIR_7 ./DIR_3/DIR_9/DIR_0/DIR_0/DIR_8 ./DIR_3/DIR_9/DIR_0/DIR_D ./DIR_3/DIR_9/DIR_0/DIR_D/DIR_0 ./DIR_3/DIR_9/DIR_0/DIR_D/DIR_D ./DIR_3/DIR_9/DIR_0/DIR_D/DIR_E ./DIR_3/DIR_9/DIR_0/DIR_D/DIR_A ./DIR_3/DIR_9/DIR_0/DIR_D/DIR_C ./DIR_3/DIR_9/DIR_0/DIR_D/DIR_1 ./DIR_3/DIR_9/DIR_0/DIR_D/DIR_4 ./DIR_3/DIR_9/DIR_0/DIR_D/DIR_2 ./DIR_3/DIR_9/DIR_0/DIR_D/DIR_B ./DIR_3/DIR_9/DIR_0/DIR_D/DIR_5 ./DIR_3/DIR_9/DIR_0/DIR_D/DIR_3 ./DIR_3/DIR_9/DIR_0/DIR_D/DIR_9 ./DIR_3/DIR_9/DIR_0/DIR_D/DIR_6 ./DIR_3/DIR_9/DIR_0/DIR_D/DIR_F ./DIR_3/DIR_9/DIR_0/DIR_D/DIR_7 ./DIR_3/DIR_9/DIR_0/DIR_D/DIR_8 ... /var/lib/ceph/osd/ceph-58/current/6.93_head/DIR_3/DIR_9/DIR_C/DIR_0$ du -ms * 99 DIR_0 102 DIR_1 105 DIR_2 102 DIR_3 101 DIR_4 105 DIR_5 106 DIR_6 102 DIR_7 105 DIR_8 98 DIR_9 99 DIR_A 105 DIR_B 103 DIR_C 100 DIR_D 103 DIR_E 104 DIR_F >> As you can see we have one data-object in pool "data" per file saved >> somewhere else. I'm not sure what's this related to, but maybe this is a >> must by cephfs. > That's rather confusing (even more so since I don't use CephFS), but it > feels wrong. > From what little I know about CephFS is that you can have only one FS per > cluster and the pools can be arbitrarily named (default data and metadata). [...] > My guess is that you somehow managed to create things in a way that > puts references (not the actual data) of everything in "images" to > "data". You can tune the pool by e.g. cephfs /mnt/storage/docroot set_layout -p 4 We thought this was a good idea so that we can change the replication size different for doc_root and raw-data if we like. Seems this was a bad idea for all objects. -- Kind regards Michael Metz-Martini _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com