'DDOS on BlueStore'?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Sage et al,

After monitoring Ceph mailing lists (both devel and users) for a while as well as from some customer reports I've got a belief that BlueStore might cause operation slowdowns under specific circumstances. Which often result in 'slow ops' alerts or suicide timeouts (e.g. http://tracker.ceph.com/issues/34526).

One of the related observations is the presence of more or less intensive read load when such slowdowns happened, e.g. enabled scrubbing or cluster backfilling. Also I recall several cases when it had been recommended to disable scrubbing to fight the issues and this helped.

Unfortunately I haven't collected all the cases in a sorted manner hence just trying to share my feeling rather than facts.

But today I've got an idea that probably explains the behavior. Would like to get community feedback if it makes sense.

Here it is:

BlueStore has a per-collection (aka PG) read/write lock that protects both read and write operations. It allows concurrent reads from the objects belonging to the same collection and enforces exclusive write access.

R/W lock acquisition attempt puts new writer on wait if reader(s) are in progress. While new reader is allowed to acquire the lock that already acquired by other readers (but not a writer).

Hence one can imagine the situation when BlueStore gets massive read flow that is processed again and again while some writers are pending  for indefinite period of time (actually until the gap in this read flow).

The requirement is the presence of multiple read operations on the same collections overlapped in time.

Here is the sample picture:

read1  <---processing--->

write1 <waiting------------------------------------------------------

read2             <-----processing ------>

read3                                <-----processing ------>

read4 <-----processing ------>

and so on.

And a side note (just to mention but it might make the situation a bit worse) is that reading keeps the lock even when it performs potentially long ops like csum verification and decompression. Hence the probability for another reader to overlap with the existing one is increased.


Does this makes sense? Or I missed something?


Thanks,

Igor







[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux