CEPH performance issues running as Spark storage layer

arabtura@xxxxxxxxx · Thu, 16 Jul 2020 13:38:12 -0000

Hi All, 
We’re using CEPH cluster (Nautulis 14.2.10) as a S3 object storage  layer for Spark 3 with Yarn running in distributed environment. The issue we see however is slow performance when running even simple spark query on data stored on large number of objects, for example 50.000 objects. We’re aware  of slow object listing in S3, but should that really kill the performance while using spark for reading\analyzing\writing  the data on S3? Running the same query on the same dataset content but stored in 100 files is multiple times faster.

Bucket and bucket index we use for spark are stored on OSDs with SSDs (we’re having 150 of them)  , we’re using 12 RGW instances, each limited to 32 concurrent connection by custom app to prevent RGW queue blowing up). When running spark queries, RGW queues rises – depending on number of spark executors – to around 30 per instance providing the number of executors per RGW instance is also in similar. We don’t see any other bottlenecks on infra side than RGW queues. We applied various tuning options for CEPH regarding RGW\OSD\Bluestore performance (ie: objecter_inflight_op_bytes, objecter_inflight_ops, rgw_bucket_index_max_aio, rgw_cache_lru_size ) but the spark works still really slow in above mentioned scenario. 
The other problem we observed is that when using 4 RGW instead of 12 we see performance degradation only in about 40-50%. We would  expect that RGW scaling will behave  more efficiently.

Does anyone using CEPH in similar way as a storage layer for Spark? Do you observer similar behavior and maybe have some workarounds\solutions for slowness when working with high number of objects?
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx