Re: Strange hardware behavior

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi!

No, this servers not have RAID controllers and all disks directly connected to motherboard.

[root@S-26-6-2-3 cph]# dd if=/dev/zero of=/dev/sde bs=1M count=100000 oflag=sync status=progress
104686682112 bytes (105 GB) copied, 443.051024 s, 236 MB/s

I think here is one of three options:

1. HDD actually does not write anything, but simply throws data away. But then it would be caught by the first deep scrub
2. Toshiba (all HDD in the cluster is TOSHIBA MG06ACA1) has learned to write to disk at high speed, but hides it from everyone. Due to an error in the firmware of the hard drive, the secret became apparent. That would be a better option. :)
3. The most likely option is that due to an error in Linux or the motherboard, one of the timers that Linux uses for timing starts to work incorrectly. I am talking about this with a Supermicro representative.

----- Original Message -----
> From: "Mark Nelson" <mnelson@xxxxxxxxxx>
> To: ceph-users@xxxxxxx
> Sent: Tuesday, 3 September, 2019 17:48:18
> Subject:  Re: Strange hardware behavior

> That was my thought as well.  It would be interesting to see the results
> of a much longer write test (say 100GB).
> 
> 
> Mark
> 
> 
> On 9/3/19 9:40 AM, Fabian Niepelt wrote:
>> Hey,
>>
>> are these drives connected to a RAID controller with a write cache? I've seen
>> lots of weird behaviors with them. You said the problem persists when rebooting
>> but not when power cycling which would reinforce a hardware component being the
>> culprit in this case.
>>
>> Greetings
>> Fabian
>>
>> Am Dienstag, den 03.09.2019, 14:13 +0300 schrieb Fyodor Ustinov:
>>> Hi!
>>>
>>> I understand that this question is not quite for this mailing list, but
>>> nonetheless, experts who may be encountered this have gathered here.
>>>
>>> I have 24 servers, and on each, after six months of work, the following began
>>> to happen:
>>>
>>> [root@S-26-5-1-2 cph]# uname -a
>>> Linux S-26-5-1-2 5.2.11-1.el7.elrepo.x86_64 #1 SMP Thu Aug 29 08:10:52 EDT
>>> 2019 x86_64 x86_64 x86_64 GNU/Linux
>>>
>>> [root@S-26-5-1-2 cph]# dd if=/dev/zero of=/dev/sdc bs=1M count=1000 oflag=sync
>>> 1048576000 bytes (1.0 GB) copied, 3.76334 s, 279 MB/s
>>>
>>> [root@S-26-5-1-2 cph]# dd if=/dev/zero of=/dev/sdd bs=1M count=1000 oflag=sync
>>> 1048576000 bytes (1.0 GB) copied, 4.54834 s, 231 MB/s
>>>
>>> sdc - SSD disk. sdd - HDD.
>>>
>>> It can be seen that ssd works somehow slowly, and hdd - too quickly.
>>>
>>> Reboot - nothing changes.
>>>
>>> And only poweroff/poweron cycle change behavior to normal:
>>>
>>> [root@S-26-5-1-2 cph]# dd if=/dev/zero of=/dev/sdc bs=1M count=1000 oflag=sync
>>> 1048576000 bytes (1.0 GB) copied, 3.24042 s, 324 MB/s
>>>
>>> [root@S-26-5-1-2 cph]# dd if=/dev/zero of=/dev/sdd bs=1M count=1000 oflag=sync
>>> 1048576000 bytes (1.0 GB) copied, 13.7709 s, 76.1 MB/s
>>>
>>> Absoluteli nothing in system and ceph log (this servers used for OSD) about
>>> that.
>>>
>>> Perhaps someone has encountered similar behavior?
>>>
>>> WBR,
>>>      Fyodor.
>>> _______________________________________________
>>> ceph-users mailing list -- ceph-users@xxxxxxx
>>> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>> _______________________________________________
>> ceph-users mailing list -- ceph-users@xxxxxxx
>> To unsubscribe send an email to ceph-users-leave@xxxxxxx
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux