Ceph / Debian 11 guest / corrupted file system

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Dear Ceph users,
I have a ceph cluster (ceph version 16.2.7) running on Proxmox 7 (Debian 11) with a Debian 11 guest (11.3).

On this specific host, every ~ 3-4 weeks, the file system is set to readonly. I then need to reboot the system and manually check the filesystem, then it works again.

In the syslog I find the following entries:

------------ snip ------------------
[Do Jul  7 13:11:43 2022] device-mapper: uevent: version 1.0.3
[Do Jul 7 13:11:43 2022] device-mapper: ioctl: 4.43.0-ioctl (2020-10-01) initialised: dm-devel@xxxxxxxxxx [Do Jul 7 13:11:45 2022] SGI XFS with ACLs, security attributes, realtime, quota, no debug enabled
[Do Jul  7 13:11:45 2022] JFS: nTxBlock = 8192, nTxLock = 65536
[Do Jul  7 13:11:45 2022] QNX4 filesystem 0.2.3 registered.
[Do Jul  7 13:11:46 2022] raid6: sse2x4   gen()  8327 MB/s
[Do Jul  7 13:11:46 2022] raid6: sse2x4   xor()  3454 MB/s
[Do Jul  7 13:11:46 2022] raid6: sse2x2   gen()  6684 MB/s
[Do Jul  7 13:11:46 2022] raid6: sse2x2   xor()  7149 MB/s
[Do Jul  7 13:11:46 2022] raid6: sse2x1   gen()  6194 MB/s
[Do Jul  7 13:11:46 2022] raid6: sse2x1   xor()  5981 MB/s
[Do Jul  7 13:11:46 2022] raid6: using algorithm sse2x4 gen() 8327 MB/s
[Do Jul  7 13:11:46 2022] raid6: .... xor() 3454 MB/s, rmw enabled
[Do Jul  7 13:11:46 2022] raid6: using intx1 recovery algorithm
[Do Jul  7 13:11:46 2022] xor: measuring software checksum speed
[Do Jul  7 13:11:46 2022]    prefetch64-sse  : 12793 MB/sec
[Do Jul  7 13:11:46 2022]    generic_sse     :  9978 MB/sec
[Do Jul  7 13:11:46 2022] xor: using function: prefetch64-sse (12793 MB/sec)
[Do Jul  7 13:11:46 2022] Btrfs loaded, crc32c=crc32c-generic
[Fr Jul  8 14:11:35 2022] EXT4-fs (vdc): error count since last fsck: 7
[Fr Jul 8 14:11:35 2022] EXT4-fs (vdc): initial error at time 1654815634: ext4_check_bdev_write_error:215 [Fr Jul 8 14:11:35 2022] EXT4-fs (vdc): last error at time 1657148419: ext4_journal_check_start:83
[Sa Jul  9 15:49:30 2022] EXT4-fs (vdc): error count since last fsck: 7
[Sa Jul 9 15:49:30 2022] EXT4-fs (vdc): initial error at time 1654815634: ext4_check_bdev_write_error:215 [Sa Jul 9 15:49:30 2022] EXT4-fs (vdc): last error at time 1657148419: ext4_journal_check_start:83
[So Jul 10 17:27:26 2022] EXT4-fs (vdc): error count since last fsck: 7
[So Jul 10 17:27:26 2022] EXT4-fs (vdc): initial error at time 1654815634: ext4_check_bdev_write_error:215 [So Jul 10 17:27:26 2022] EXT4-fs (vdc): last error at time 1657148419: ext4_journal_check_start:83
------------ snip ------------------

First of all I personally don't know why there are any messages about raid6, which I don't use at all on the guest.

Anyway, what troubles me are the file system errors below. Why are these file system errors happening? It seems that these errors pile up and after some time there are too many and then the system switches to readonly.

The strange thing is that I have some other guests on this cluster (and the node) running and there are no such errors at all.

What I can think of is the following:

- The underlying storage are 4 Nodes with each 2 * 8TB Toshiba harddisks. No harddisk has any SMART errors, btw.
- However, the storage can be VERY slow from time to time.
- Maybe in Debian 11 there is some timeout value, so that if the storage is temporarily slow, this "ext4_check_bdev_write_error" happens?

Is this a reasonable explanation and if yes, is there some way to increase this timeout?

Maybe it makes sense to switch from VirtIO to VirtIOSCSI as there such errors are handled in a better way?

Or could it be that there is some kind of strange data loss in my ceph setup and if yes - what can I do?

Best Regards,
Hermann


--
hermann@xxxxxxx
PGP/GPG: 299893C7 (on keyservers)
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux