Re: Ubuntu 17.10 or Debian 9.3 + Luminous = random OS hang ?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



How up to date is your VM environment? We saw something very similar last year with Linux VM’s running newish kernels. It turns out newer kernels supported a new feature of the vmxnet3 adapters which had a bug in ESXi. The fix was release last year some time in ESXi6.5 U1, or a workaround was to set an option in the VM config.

 

https://kb.vmware.com/s/article/2151480

 

 

 

From: ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx] On Behalf Of Youzhong Yang
Sent: 21 January 2018 19:50
To: Brad Hubbard <bhubbard@xxxxxxxxxx>
Cc: ceph-users <ceph-users@xxxxxxxxxxxxxx>
Subject: Re: Ubuntu 17.10 or Debian 9.3 + Luminous = random OS hang ?

 

As someone suggested, I installed linux-generic-hwe-16.04 package on Ubuntu 16.04 to get kernel of 17.10, and then rebooted all VMs, here is what I observed:

- ceph monitor node froze upon reboot, in another case froze after a few minutes 

- ceph OSD hosts easily froze

- ceph admin node (which runs no ceph service but ceph-deploy) never freezes

- ceph rgw nodes and ceph mgr so far so good

 

Here are two images I captured:

 

 

Thanks.

 

On Sat, Jan 20, 2018 at 7:03 PM, Brad Hubbard <bhubbard@xxxxxxxxxx> wrote:

On Fri, Jan 19, 2018 at 11:54 PM, Youzhong Yang <youzhong@xxxxxxxxx> wrote:
> I don't think it's hardware issue. All the hosts are VMs. By the way, using
> the same set of VMWare hypervisors, I switched back to Ubuntu 16.04 last
> night, so far so good, no freeze.

Too little information to make any sort of assessment I'm afraid but,
at this stage, this doesn't sound like a ceph issue.


>
> On Fri, Jan 19, 2018 at 8:50 AM, Daniel Baumann <daniel.baumann@xxxxxx>
> wrote:
>>
>> Hi,
>>
>> On 01/19/18 14:46, Youzhong Yang wrote:
>> > Just wondering if anyone has seen the same issue, or it's just me.
>>
>> we're using debian with our own backported kernels and ceph, works rock
>> solid.
>>
>> what you're describing sounds more like hardware issues to me. if you
>> don't fully "trust"/have confidence in your hardware (and your logs
>> don't reveal anything), I'd recommend running some burn-in tests
>> (memtest, cpuburn, etc.) on them for 24 hours/machine to rule out
>> cpu/ram/etc. issues.
>>
>> Regards,
>> Daniel
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users@xxxxxxxxxxxxxx
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>


--
Cheers,
Brad

 

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux