Simon Leinen writes: >> I can suggest the following workarounds to start the OSD for now: >> 1) switch allocator to stupid by setting 'bluestore allocator' >> parameter to 'stupid'. Presume you have default setting of 'bitmap' >> now.. This will allow more continuous allocations for bluefs space >> claim. and hence shorter log write. But given high main disk >> fragmentation this might be not enough. 'stupid' allocator has some >> issues (e.g. high RAM utilization over time in some cases) as well but >> they're rather irrelevant for OSD startup. > Thanks, we'll try that & report. Using the "stupid" allocator, we never had any crashes with this assert. But the OSDs run more slowly this way. So what we ended up doing was: When an OSD crashed with this assert, we did an offline compaction of the DB, and then started it again with the bitmap allocator. So far the resulting OSDs seem to run fine. >> 2) Increase 'bluefs_max_log_runway' parameter to 8-12 MB (with the >> default value at 4MB). That looks helpful too, thanks! >> Suggest to start with 1) and then additionally proceed with 2) if the >> first one doesn't help. >> Once OSD is up and cluster is healthy please consider adding more DB >> space and/or OSDs to your cluster to fight dangerous factors I started >> with. Today we added some disks to existing servers (replacing old disks that had failed over the years - we don't replace them right away), and will create additional OSDs to take some load off the existing ones. We'll also try to get rid of some of the EC buckets with very high numbers of objects, again to reduce the load of the OSD DBs. Thanks again for your support, -- Simon. _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx