Re: Instance filesystem corrupt

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



 

We use OpenStack with Ceph

Recently we found a lot of filesystem corrupt incident on instances.

 

The OS of these instances includes Redhat / CentOS / Windows

The filesystem format includes : ext3 / ext4 / xfs / NTFS

 

We are trying to dig out the root causes.

 

 

 

 

From: Brian :: [mailto:bc@xxxxxxxx]
Sent: Friday, October 28, 2016 10:43 AM
To: Keynes Lee/WHQ/Wistron <Keynes_Lee@xxxxxxxxxxx>
Cc: ahmedmostafadev@xxxxxxxxx; dillaman@xxxxxxxxxx; ceph-users <ceph-users@xxxxxxxxxxxxxx>
Subject: Re: [ceph-users] Instance filesystem corrupt

 

What is the issue exactly?

 

 

On Fri, Oct 28, 2016 at 2:47 AM, <Keynes_Lee@xxxxxxxxxxx> wrote:

I think this issue may not related to your poor hardware.

 

Our cluster has 3 Ceph monitor and 4 OSD.

 

Each server has

2 cpu ( Intel(R) Xeon(R) CPU E5-2683 v3 @ 2.00GHz ) , 32 GB memory

OSD nodes has 2 SSD for journal disks  and 8 SATA disks ( 6TB / 7200 rpm )

ALL of them were connected to each other by 4 x 10Gbps cable ( 802.3 ad )

 

The utilization of our Cpeh is only 13% , most of time the IOPS was kept under 1500.

 

We still getting this issue…..

 

 

From: Ahmed Mostafa [mailto:ahmedmostafadev@xxxxxxxxx]
Sent: Friday, October 28, 2016 6:30 AM
To: dillaman@xxxxxxxxxx
Cc: Keynes Lee/WHQ/Wistron <Keynes_Lee@xxxxxxxxxxx>; ceph-users@xxxxxxxxxxxxxx
Subject: Re: [ceph-users] Instance filesystem corrupt

 

So i couldn't actually wait till the morning

 

I sat rbd cache to false and tried to create the same number of instances, but the same issue happened again.

 

I want to note, that if i rebooted any of the virtual machines that has this issue, it works without any problem afterwards.

 

Does this mean that over-utilization could be the cause of my problem ? The cluster i have have bad hardware and this is the only logical explanation i can reach .

 

By bad hardware i mean core-i5 processors for instance, i can see the %wa reaching 50-60% too.

 

Thank you

 

 

On Thu, Oct 27, 2016 at 4:13 PM, Jason Dillaman <jdillama@xxxxxxxxxx> wrote:

The only effect I could see out of a highly overloaded system would be that the OSDs might appear to become unresponsive to the VMs. Are any of you using cache tiering or librbd cache? For the latter, there was one issue [1] that can result in read corruption that affects hammer and prior releases.

 

 

On Thu, Oct 27, 2016 at 1:34 AM, Ahmed Mostafa <ahmedmostafadev@xxxxxxxxx> wrote:

This is more or less the same bahaviour i have in ky environment 

 

By any chance is anyone running their osds and their hypervisors on the same machine ?

 

And could high workload, like starting 40 - 60 or above virtual machines have an effect on this problem ?

 

 
On Thursday, 27 October 2016, <Keynes_Lee@xxxxxxxxxxx> wrote:

 

Most of filesystem corrupt causes instances crashed, we saw that after a shutdown / restart   

( triggered by OpenStack portal  buttons or triggered by OS commands in Instances )

 

Some are early-detected, we saw filesystem errors in OS logs on instances.

Then we make a filesystem check ( FSCK / chkdsk ) immediately, issue fixed.

 

cid:image007.jpg@01D1747D.DB260110

Keynes  Lee   

Direct:

+886-2-6612-1025

Mobile:

+886-9-1882-3787

Fax:

+886-2-6612-1991

 

E-Mail:

keynes_lee@xxxxxxxxxxx

 

 

From: Jason Dillaman [mailto:jdillama@xxxxxxxxxx]
Sent: Wednesday, October 26, 2016 9:38 PM
To: Keynes Lee/WHQ/Wistron <Keynes_Lee@xxxxxxxxxxx>
Cc: Will.Boege@xxxxxxxxxx; ceph-users <ceph-users@xxxxxxxxxxxxxx>
Subject: Re: [ceph-users] [EXTERNAL] Instance filesystem corrupt

 

I am not aware of any similar reports against librbd on Firefly. Do you use any configuration overrides? Does the filesystem corruption appears while the instances are running or only after a shutdown / restart of the instance?

 

On Wed, Oct 26, 2016 at 12:46 AM, <Keynes_Lee@xxxxxxxxxxx> wrote:

No , we are using Firefly (0.80.7).

As we are using HPE Helion OpenStack 2.1.5, and what the version is was embedded is Firefly.

 

An upgrade was planning, but should will not happen  soon.

 

 

 

 

 

From: Will.Boege [mailto:Will.Boege@xxxxxxxxxx]
Sent: Wednesday, October 26, 2016 12:03 PM
To: Keynes Lee/WHQ/Wistron <Keynes_Lee@xxxxxxxxxxx>; ceph-users@xxxxxxxxxxxxxx
Subject: Re: [EXTERNAL] [ceph-users] Instance filesystem corrupt

 

Just out of curiosity, did you recently upgrade to Jewel?

 

From: ceph-users <ceph-users-bounces@xxxxxxxxxxxxxx> on behalf of "Keynes_Lee@xxxxxxxxxxx" <Keynes_Lee@xxxxxxxxxxx>
Date: Tuesday, October 25, 2016 at 10:52 PM
To: "ceph-users@xxxxxxxxxxxxxx" <ceph-users@xxxxxxxxxxxxxx>
Subject: [EXTERNAL] [ceph-users] Instance filesystem corrupt

 

We are using OpenStack + Ceph.

Recently we found a lot of filesystem corrupt incident on instances.

Some of them are correctable, fixed by fsck, but the others have no luck, just corrupt and can never start up again.

 

We found this issue on vary operation systems of instances. They are

Redhat4 / CentOS 7 / Windows 2012

 

Could someone please advise us some troubleshooting direction ?

 

 

id:image007.jpg@01D1747D.DB260110

Keynes  Lee   

Direct:

+886-2-6612-1025

Mobile:

+886-9-1882-3787

Fax:

+886-2-6612-1991

 

E-Mail:

keynes_lee@xxxxxxxxxxx

 

 

---------------------------------------------------------------------------------------------------------------------------------------------------------------

This email contains confidential or legally privileged information and is for the sole use of its intended recipient.

Any unauthorized review, use, copying or distribution of this email or the content of this email is strictly prohibited.

If you are not the intended recipient, you may reply to the sender and should delete this e-mail immediately.

---------------------------------------------------------------------------------------------------------------------------------------------------------------


_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



 

--

Jason



 

--

Jason

 


_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

 

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux