Hi, experts,
We are using cephfs(15.2.*) with kernel mount on our production environment. And these days when we do massive read from cluster(multi processes), ceph health always report slow ops for some osds(build with hdd(8TB) which using ssd as db cache).
our cluster have more read than write request.
health log like below:
100 slow ops, oldest one blocked for 114 sec, [osd.* ...] has slow ops (SLOW _OPS)
my question is does there any best practices to process hundreds of millions small files(means 100kb-300kb each file and 10000+ files in each directory, also more than 5000 directory)?
Small files are slow on any HDD system, each HDD can only do around 100
ops per sec. Some things to try, not that some may involve data copy-ing:
-If your workload logic involves more processing on recent files, may
have 2 pools, 1 ssd pool for recent files and a larger hdd for less
accessed archived files.
-if you can group files to be processed in groups, maybe store them in
larger lumps like via tar files or even re-structure their data in
SQLite , then you would modify the processing application to tar/untar
the goup, or access data via SQLite.
-it may help to reduce the read_ahead_kb on your HHD devices to reduce
un-needed load.
-Using dm-cache on the HDD may help, though our experience with it is
not great (we use dm-writecache instead but is geared for speeding
writes), it should cache more recent read objects on ssd, but its
promotion algorithms may not match your workload pattern, maybe try it
first in a lab with similar workload pattern.
-Using Ceph cache tier may help, though our experience with it is not
great, its support is also deprecated
-The file sizes, average 150kb, are not large but also not extremely
small, you could lower the application concurrency/processes so not to
stress the disk % busy over say 80%, with 150kb size you should get
around 10 MB/s read speeds from your HDD. Having too much processes
could actually slow things.
-You may want to lower your scrub rates or increase the scrub window, if
you have a lot of small files this will already be stressing your HDDs.
-Any Ceph recovery healing with small files on HDD will also slow things
down but it is something to bear in mind not too much we can do.
/maged
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx