Hi, I'm running FIO benchmark to test my simple cluster (3 OSD's, 128 pg's - using Nautilus - v14.2.10) and after certain load of clients performing random read operations, the OSDs show very different performances in terms of op latency. In extreme cases there is an OSD that performs much worse than the others, despite receiving a similar number of operations. Getting more information on the distribution of operations, I can see that the operations are well distributed among the OSD's and the PG's, but in the OSD with poor performance, there is an internal queue (OSD Shard) that is dispatching requests very slowly. In my use case, for example, there is a OSD shard whose average wait time for operations was 120 ms and a OSD Shard that served a few more requests with an average wait time of 1.5 sec. The behavior of this queue ends up affecting the performance of ceph as a whole. The osd op queue implementation used is wpq, and during the execution I get a specific attribute of this queue (probably total_priority) that remains unchanged for a long time. The strange behavior is also repeated in other implementations (prio, m_clock). I've used the mimic version, another pg's distribution and the behavior is always the same, but it can happen in a different OSD or in a different shard. By default, the OSD has 5 shards. Increasing the number of shards considerably improves the performance of this OSD, but I would like to understand what is happening with this specific queue in the default configuration. Does anyone have any idea what might be happening? Thanks, Mafra. _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx