On Tue, May 19, 2020 at 3:11 PM thoralf schulze <t.schulze@xxxxxxxxxxxx> wrote: > > On 5/19/20 2:13 PM, Paul Emmerich wrote: > > 3) if necessary add more OSDs; common problem is having very > > few dedicated OSDs for the index pool; running the index on > > all OSDs (and having a fast DB device for every disk) is > > better. But sounds like you already have that > > nope, unfortunately not. default.rgw.buckets.index is an replicated pool > on hdds with only 4 pgs, i'll see if i can change that. > > these PGs should be distributed across all OSDs; in general it's a good idea to have at least as many PGs as you have OSDs of the target type for that pool (technically a third would be enough to target one PG per OSD, because of x3 replication) Paul > back to igors questions: > > > Some questions about your cases: > > - What kind of payload do you have - RGW or something else? > mostly cephfs. the most active pools in terms of i/o are the openstack > rgw ones, though. > > > - Have you done massive removals recently? > yes, see above > > > - How large are main and DB disks for suffering OSDs? How much is their > > current utilization? > for osd.293, for which i've sent the log: > main: 2tb hdd (5% used), db: 14gb partition on a 180gb nvme (~400mb used) > … i'll attach a perf dump for this osd. > > > - Do you see multiple "slow operation observed" patterns in OSD logs? > yes, although they do not necessarily correlate with osd down events. > > > Are they all about _collection_list function? > no, there are also submit_transact and _txc_committed_kv, with about the > same frequency as collection_list. > > thank you very much for your analysis & with kind regards, > thoralf. > _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx