On Mon, Nov 21, 2022 at 10:00:49AM +0800, zhanchengbin wrote: > The process is deadlocked, and an I/O error occurs when logs > are replayed. Because in the I/O error handling function, I/O > is sent again and catch the mutexlock of CACHE_MTX. This is a legitimate bug, but the propsoed fix is not safe. There is a reason why we take the cache mutex, and that's because we need to prevent another thread from modifying the cache, possibly by ejecting the cache entry that we are in the middle of cleaning when raw_write_blk() is being called in reuse_cache(). Fortunately, we're safe on the read side, because we currently very carefully do not call raw_read_blk() while holding the CACHE_MUTEX. Instead, we write the data from the user-supplied buffer, and *then* take the cache mutex, and then save the data from the user-supplied buffer into the cache. So the problem is only on the write side, and what I think we need to do is to lift the call to channel->write_error() to the ultimate callers of raw_write_blk(), so that we return the error code to its callers in reuse_cache(), flush_cache_blocks(), and unix_write_blk64(), and let those upper-level functions call the write handler --- after they've had the chance to release any mutexes. - Ted