On Wed, 8 Feb 2017, sheng qiu wrote: > Hi cepher, > > I noticed that when PG worker is processing the op, it may acquire > obc->ondisk_read_lock or obc->ondisk_write_lock. it's used to protect > concurrent read and write to same object. > > But to my understanding, the PG lock is held while worker is > processing op which already prevent concurrent access for the same > object. Does that mean those locks are not actually needed? This is an artifact of the somewhat weird FileStore write behavior. The OSD takes the lock, queues a write, and a callback called when the write completes releases the lock. This ensures that if there is a read it will block until the write is applied. FileStore is a bit strange in that the lock isn't released until the write is readable, which is actually *after* the write is committed. That's because it's doing write-ahead journaling, which commits the transaction to the journal before actually writing it to the file system (where it can be read back). BlueStore doesn't behave this way, but unfortunately it would take a lot of work to remove the need for the FileStore callback releasing ondisk_write_lock so that it could be removed. :( sage -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html