Could it not be that, given the fact this is a volume carved out of a thin pool, the underlying free space threshold is near or at capacity and the the device mapper is more busy doing garbage collection than actual application IO's. I've seen many nightmares with thin provisioned volumes if this is not managed properly. Have you used the "fstrim" command or is the filesystem mounted with the "-o discard" option? Check some setting with, for example, "lvs -o lv_full_name,lv_health_status,lv_when_full". If it is a space issue make sure that you have enough free space in your volume groups and configure the "thin_pool_auto_extend_threshold" and "thin_pool_auto_extend_percentage". Also turn on "monitoring" if that is not enabled by default. Another thing that could be an issue is that the filesystem has the "-o discard" mount option set, device mapper wants to propagate this to the underlying hardware (or in your case a hypervisor which may also send this through to the underlying storage array), and that is not supported on that hardware. The "Inappropriate ioctl for device" message hints in that direction. Has there been movements of volumes to other provisioned disks or changes on the hypervisor? Cheers Erwin On 10/10/24 00:52, David Teigland wrote: > On Wed, Oct 09, 2024 at 06:28:26AM -0300, Fabricio Winter wrote: >> Hello people, we have been experiencing an issue with lvm2-thin on >> _some_ of our production servers where out of nowhere >> lvm2/device-mapper starts spamming error logs and I can't really seem >> to trace down the root cause. >> >> This is what the logs look like; >> Oct 9 06:25:02 U5bW8JT7 lvm[8020]: device-mapper: waitevent ioctl on >> LVM-CP5Gw8QrWLqwhBcJL87R1mc9Q9KTBtQQmOowipTAFuM7hqzHz6pRVvUaNO9FGzeq-tpool >> failed: Inappropriate ioctl for device >> Oct 9 06:25:02 U5bW8JT7 lvm[8020]: waitevent: dm_task_run failed: >> Inappropriate ioctl for device > It appears related to dmeventd monitoring the thin pools, and the kernel > returning ENOTTY when dmeventd does the DM_DEV_WAIT ioctl. Maybe there's > a fast retry loop in dmeventd on that error case rather than quitting. > I wonder if there's a way you could kill dmeventd when this happens. > > Dave > >
Attachment:
OpenPGP_signature.asc
Description: PGP signature
BEGIN:VCARD VERSION:4.0 N:van Londen;Erwin;;; FN:Erwin van Londen EMAIL;PREF=1;TYPE=work:erwin@xxxxxxxxxxxxxxxxxx URL;TYPE=work:https://erwinvanlonden.net END:VCARD
Attachment:
OpenPGP_0x985B90929D90E282.asc
Description: application/pgp-keys