Il 12/05/2017 11:36, Niels de Vos ha scritto: > On Thu, May 11, 2017 at 03:49:27PM +0200, Alessandro Briosi wrote: >> Il 11/05/2017 14:09, Niels de Vos ha scritto: >>> On Thu, May 11, 2017 at 12:35:42PM +0530, Krutika Dhananjay wrote: >>>> Niels, >>>> >>>> Allesandro's configuration does not have shard enabled. So it has >>>> definitely not got anything to do with shard not supporting seek fop. >>> Yes, but in case sharding would have been enabled, the seek FOP would be >>> handled correctly (detected as not supported at all). >>> >>> I'm still not sure how arbiter prevents doing shards though. We normally >>> advise to use sharding *and* (optional) arbiter for VM workloads, >>> arbiter without sharding has not been tested much. In addition, the seek >>> functionality is only available in recent kernels, so there has been >>> little testing on CentOS or similar enterprise Linux distributions. >> Where is stated that arbiter should be used with sharding? >> Or that arbiter functionality without sharding is still in "testing" phase? >> I thought that having a 3 replica on a 3 nodes cluster would have been a >> waste of space. (I can only support loosing 1 host at a time, and that's >> fine.) > There is no "arbiter should be used with sharding", our recommendations > are to use sharding for VM workloads, with an optional arbiter. But we > still expect VMs on non-sharded volumes to work just fine, with or > without arbiter. Sure and I'd like to use it. Though as there were corruption bug recently I preferred not using it yet. >> Anyway I had this happen also before with the same VM when there was no >> arbiter, and I thought it was for some strange reason a "quorum" thing >> which would trigger the file not beeing available in gluster, thogh >> there were no clues in the logs. >> So I added the arbiter brick, but it happened again last week. > If it is always the same VM, I wonder if there could be a small > filesystem corruption in that VM? Were there any actions done on the > storage of that VM, like resizing the block-device (VM image) or > something like that? Systems can sometimes try to access data outside of > the block device when it was resized, but the filesystem on the block > device was not. This would 'trick' the filesystem in thinking it has > more space to access than the block device has. If the filesystem access > in the VM is 'passed the block device', and this gets through to Gluster > which does a seek with that too large offset, the log you posted would > be a result. > The problem was on only 1 VM, but now it extended to another one, that's why I started reporting. >> The first VM I reported about going down was created on a volume with >> arbiter enabled from the start, so I dubt it's something to do with arbiter. >> >> I think it might have something to do with a load problem ? Though the >> hosts are really not beeing used that much. >> >> Anyway this is a brief description of my setup. >> >> 3 dell servers with RAID 10 SAS Disks >> each server has 2 bonded 1Gbps ethernets dedicated to gluster (2 >> dedicated to the proxmox cluster and 2 for comunication with the hosts >> on the LAN) (each on it's VLAN in the switch) >> Also jumbo frames are enabled on ethernets and switches. >> >> each server is a proxmox host which has gluster installed and configured >> as server and client. > Do you know how proxmox accesses the VM images? Does it use QEMU+gfapi > or is it all over a FUSE mount? New versions of QEMU+gfapi have seek > support, and only new versions of the Linux kernel support seek over > FUSE. In order to track where the problem may be, we need to look into > the client (QEMU or FUSE) that does the seek with an invalid offset. it uses quem+gfapi afaik -drive file=gluster://srvpve1g/datastore1/images/101/vm-101-disk-1.qcow2,if=none,id=drive-virtio0,format=qcow2,cache=none,aio=native,detect-zeroes=on -device virtio-blk-pci,drive=drive-virtio0,id=virtio0,bus=pci.0,addr=0xa,bootindex=100 >> The RAID has a LVM thin provisioned which is divided into 3 bricks (2 >> big for the data and 1 small for the arbiter). >> each Thin LVM is XFS formatted and mounted as brick. >> There are 3 volumes configured which replicate 3 with arbiter (so 2 >> really holding the data). >> Volumes are: >> datastore1: data on srv1 and srv2, arbiter srv3 >> datastore2: data on srv2 and srv3, arbiter srv1 >> datastore3: data on srv1 and srv3, arbiter srv2 >> >> On each datastore basically there is a main VM (plus some others which >> though are not so important). (3 VM are mainly important) >> >> datastore1 was converted from 2 replica to 3 replica with arbiter, the >> other 2 were created as described. >> >> The VM on the first datastore crashed more times (even where there was >> no arbiter, which I thought for some reason there was a split brain >> which gluster could not handle). >> >> Last week also the 2nd VM (on datastore2) crashed, and that's when I >> started the thread (before as there were no special errors logged I >> thought it could have been caused by something in the VM) >> >> Till now the 3rd VM never crashed. >> >> Still any help on this would be really appreciated. >> >> I know it could also be a problem somewhere else, but I have other >> setups without gluster which simply work. >> That's why I want to start the VM with gdb, to check next time why the >> kvm process shuts down. > If the problem in the log from the brick is any clue, I would say that > QEMU aborts when the seek failed. Somehow the seek got executed with a > too high offset (passed the size of the file), and that returned an > error. > > We'll need to find out what makes QEMU (or FUSE) think the file is > larger than it actually is on the brick. If you have a way of reprodcing > it, you could enable more verbose logging on the client side > (diagnostics.client-log-level volume option), but if you run many VMs, > that may accumilate a lot of logs. > > You probably should open a bug so that we have all the troubleshooting > and debugging details in one location. Once we find the problem we can > move the bug to the right component. > https://bugzilla.redhat.com/enter_bug.cgi?product=GlusterFS > > HTH, > Niels The thing is that when the VM is down and I check the logs there's nothing. Then when I start the VM the logs get populated with the seek error. Anyway I'll open a bug for this. Alessandro _______________________________________________ Gluster-users mailing list Gluster-users@xxxxxxxxxxx http://lists.gluster.org/mailman/listinfo/gluster-users