Re: VM going down

Alessandro Briosi <ab1@xxxxxxxxxxx> · Thu, 11 May 2017 15:49:27 +0200

    Il 11/05/2017 14:09, Niels de Vos ha
      scritto:

      On Thu, May 11, 2017 at 12:35:42PM +0530, Krutika Dhananjay wrote:

        Niels,

Allesandro's configuration does not have shard enabled. So it has
definitely not got anything to do with shard not supporting seek fop.

      Yes, but in case sharding would have been enabled, the seek FOP would be
handled correctly (detected as not supported at all).

I'm still not sure how arbiter prevents doing shards though. We normally
advise to use sharding *and* (optional) arbiter for VM workloads,
arbiter without sharding has not been tested much. In addition, the seek
functionality is only available in recent kernels, so there has been
little testing on CentOS or similar enterprise Linux distributions.

    Where is stated that arbiter should be used with sharding?

    Or that arbiter functionality without sharding is still in "testing"
    phase?

    I thought that having a 3 replica on a 3 nodes cluster would have
    been a waste of space. (I can only support loosing 1 host at a time,
    and that's fine.)

    Anyway I had this happen also before with the same VM when there was
    no arbiter, and I thought it was for some strange reason a "quorum"
    thing which would trigger the file not beeing available in gluster,
    thogh there were no clues in the logs.

    So I added the arbiter brick, but it happened again last week.

    The first VM I reported about going down was created on a volume
    with arbiter enabled from the start, so I dubt it's something to do
    with arbiter.

    I think it might have something to do with a load problem ? Though
    the hosts are really not beeing used that much.

    Anyway this is a brief description of my setup.

    3 dell servers with RAID 10 SAS Disks

    each server has 2 bonded 1Gbps ethernets dedicated to gluster (2
    dedicated to the proxmox cluster and 2 for comunication with the
    hosts on the LAN) (each on it's VLAN in the switch)

    Also jumbo frames are enabled on ethernets and switches.

    each server is a proxmox host which has gluster installed and
    configured as server and client.

    The RAID has a LVM thin provisioned which is divided into 3 bricks
    (2 big for the data and 1 small for the arbiter).

    each Thin LVM is XFS formatted and mounted as brick.

    There are 3 volumes configured which replicate 3 with arbiter (so 2
    really holding the data).

    Volumes are:

    datastore1: data on srv1 and srv2, arbiter srv3

    datastore2: data on srv2 and srv3, arbiter srv1

    datastore3: data on srv1 and srv3, arbiter srv2

    On each datastore basically there is a main VM (plus some others
    which though are not so important). (3 VM are mainly important)

    datastore1 was converted from 2 replica to 3 replica with arbiter,
    the other 2 were created as described.

    The VM on the first datastore crashed more times (even where there
    was no arbiter, which I thought for some reason there was a split
    brain which gluster could not handle).

    Last week also the 2nd VM (on datastore2) crashed, and that's when I
    started the thread (before as there were no special errors logged I
    thought it could have been caused by something in the VM)

    Till now the 3rd VM never crashed.

    Still any help on this would be really appreciated.

    I know it could also be a problem somewhere else, but I have other
    setups without gluster which simply work.

    That's why I want to start the VM with gdb, to check next time why
    the kvm process shuts down.

    Alessandro

_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://lists.gluster.org/mailman/listinfo/gluster-users