Re: Random heartbeat_map timed out

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I found some strange behavior in BlueFS! It seems on each heartbeat timed
out on my OSD nodes ceph_bluefs_read_prefetch_bytes will raise to 400MB/s
and it's too huge because in normal it has just 20MB sometimes 1KB/s on it
also client throughput is just 20MB on its disk! Is there any way to find
out what are these reads for? I don't have any backfilling and there is
just regular scrub and deep scrubs on my server!

Thanks.

On Thu, Dec 24, 2020 at 12:47 AM Seena Fallah <seenafallah@xxxxxxxxx> wrote:

> I have enabled bluefs_buffered_io on some of my OSD nodes and disable some
> others based on the server node situation and I'm experiencing this issue
> on both of them!
>
> How can manual RocksDB compaction help?
>
> Can you please share with me the topic names for this issue on the mailing
> list?
>
> On Wed, Dec 23, 2020 at 7:04 PM Igor Fedotov <ifedotov@xxxxxxx> wrote:
>
>> Hi Seena,
>>
>> one of the frequent cause for such a timeout is slow RocksDB
>> operationing. Which in turn might be caused by bluefs_buffered_io set to
>> false and/or DB "fragmentation" after massive data removal.
>>
>> Hence the potential workarounds are adjusting bluefs_buffered_io and
>> manual RocksDB compaction.
>>
>> This topic has been discussed in this mailing list and relevant tickets
>> multiple times.
>>
>>
>> Thanks,
>>
>> Igor
>>
>> On 12/23/2020 3:24 PM, Seena Fallah wrote:
>> > Hi,
>> >
>> > All my OSD nodes in the SSD tier are getting heartbeat_map timed out
>> > randomly and I don't find why!
>> >
>> > 7ff2ed3f2700  1 heartbeat_map is_healthy 'OSD::osd_op_tp thread
>> > 0x7ff2c8943700' had timed out after 15
>> >
>> > It occurs many times in a day and causes my cluster to be down.
>> >
>> > Is there any way to find why the OSDs get time out? I don't think it's
>> > because of heartbeat and there is an issue with OSD that came to the
>> > heartbeat to be timeout because ODSs don't suicide and OSDs get too slow
>> > and cause downtime on RBD and S3 gateway because the queue is full!
>> >
>> > Thanks.
>> > _______________________________________________
>> > ceph-users mailing list -- ceph-users@xxxxxxx
>> > To unsubscribe send an email to ceph-users-leave@xxxxxxx
>>
>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux