Re: VM hangs when overwriting a file on erasure coded RBD

Peter Linder <peter.linder@xxxxxxxxxxxxxx> · Wed, 4 Oct 2023 09:37:28 +0200

Hi all,

I would like to follow up on this, it turns out that overwriting the 
file doesn't actually hang, but is just super slow, like several 
minutes. The process is busy in a syscall reading large amounts of what 
I'm assuming is filesystem metadata until the operation finally completes.

The thing is this VM has been randomly creating, deleting, overwriting, 
reading etc random files on a 25TB image for almost 2 years. The image 
has been around 50% full. I have copied all the data to a new image (new 
fs) and things are working fast again, so I don't think this is ceph's 
fault any more. Instead it looks like something that over time broke 
with ext4.

Sending this to the list in case someone else has a similar problem in 
the future.

/Peter

Den 2023-09-29 kl. 19:02, skrev peter.linder@xxxxxxxxxxxxxx:
Yes, this is all set up. It was working fine until after the problem 
with the osd host that lost the cluster/sync network occured.

There are a few other VMs that keep running along fine without this 
issue. I've restarted the problematic VM without success (that is, 
creating a file works, but overwriting it still hangs right away). 
fsck runs fine so reading the whole image works.

I'm kind of stumped as to what can cause this.

Because of the lengthy recovery, and then pg autoscaler currenty doing 
things there are currently lots of PGs that haven't been scrubbed, but 
I doubt that is an issue here.

Den 2023-09-29 kl. 18:52, skrev Anthony D'Atri:
EC for RBD wasn't possible until Luminous IIRC, so I had to ask.  You 
have a replicated metadata pool defined?  Does proxmox know that this 
is an EC pool?  When connecting it needs to know both the metata and 
data pools.

On Sep 29, 2023, at 12:49, peter.linder@xxxxxxxxxxxxxx wrote:

(sorry for duplicate emails)

This turns out to be a good question actually.

The cluster is running Quincy, 17.2.6.

The compute node that is running the VM is proxmox, version 7.4-3. 
Supposedly this is fairly new, but the version of librbd1 claims to 
be 14.2.21 when I check with "apt list". We are not using proxmox's 
own ceph cluster release. However we haven't had any issues with 
this setup before, but we haven't been using neither erasure coded 
pools nor had the node-half-dead problem for such a long time.

The VM is configured using proxmox which is not libvirt but similar, 
and krbd is not enabled. I don't know for sure if proxmox has its 
own librbd linked in qemu/kvm.

"ceph features" looks like this:

{
     "mon": [
         {
             "features": "0x3f01cfbf7ffdffff",
             "release": "luminous",
             "num": 5
         }
     ],
     "osd": [
         {
             "features": "0x3f01cfbf7ffdffff",
             "release": "luminous",
             "num": 24
         }
     ],
     "client": [
         {
             "features": "0x3f01cfb87fecffff",
             "release": "luminous",
             "num": 4
         },
         {
             "features": "0x3f01cfbf7ffdffff",
             "release": "luminous",
             "num": 12
         }
     ],
     "mgr": [
         {
             "features": "0x3f01cfbf7ffdffff",
             "release": "luminous",
             "num": 2
         }
     ]
}

Regards,

Peter

Den 2023-09-29 kl. 17:55, skrev Anthony D'Atri:
Which Ceph releases are installed on the VM and the back end?  Is 
the VM using librbd through libvirt, or krbd?

On Sep 29, 2023, at 09:09, Peter Linder 
<peter.linder@xxxxxxxxxxxxxx> wrote:

Dear all,

I have a problem that after an OSD host lost connection to the 
sync/cluster rear network for many hours (the public network was 
online), a test VM using RBD cant overwrite its files. I can 
create a new file inside it just fine, but not overwrite it, the 
process just hangs.

The VM's disk is on an erasure coded data pool and a replicated 
pool in front of it. EC overwrites is on for the pool.

The custer consists of 5 hosts and 4 OSDs on each, and separate 
hosts for compute. There is a public and separate cluster network, 
separated. In this case, the AOC cable to the cluster network went 
link down on a host and it had to be replaced and the host was 
rebooted. Recovery took about a week to complete. The host was 
half-down for about 12 hours like this.

I have some other VMs as well with images in the same pool 
(totally 4), and they seem to work fine, it is just this one that 
cant overwrite.

I'm thinking there is somehow something wrong with just this image?

Regards,

Peter
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx