Re: Expected behaviour of hypervisor on Gluster node loss

Niklaus Hofer <niklaus.hofer@xxxxxxxxxxxxxxxxx> · Wed, 1 Feb 2017 09:21:32 +0100

HI

On 2017-02-01 05:35, Vijay Bellur wrote:
On Mon, Jan 30, 2017 at 6:25 AM, Niklaus Hofer
<niklaus.hofer@xxxxxxxxxxxxxxxxx
<mailto:niklaus.hofer@xxxxxxxxxxxxxxxxx>> wrote:

    Hi

    I have a question concerning the 'correct' behaviour of GlusterFS:

    We a nice Gluster setup up and running. Most things are working
    nicely. Our setup is as follows:
     - Storage is a 2+1 Gluster setup (2 replicating hosts + 1 arbiter)
    with a volume for virtual machines.
     - Two virtualisation hosts running libvirt / qemu / kvm.

Are you using something like oVirt or proxmox for managing your
virtualization cluster?

We are using our own fork of FOSS-Cloud [0]. FOSS-cloud uses libvirt 
underneath. We are currently on Qemu 2.1.3 and libvirt 1.2.8.

    Now the question is, what is supposed to happen when we unplug one
    of the storage nodes (aka power outage in one of our data centers)?
    Initially we were hoping that the virtualisation hosts would
    automatically switch over to the second storage node and keep all
    VMs running.

    However, during our tests, we have found that this is not the case.
    Instead, when we unplug one of the storage nodes, the virtual
    machines run into all sorts of problems; being unable to read/write,
    crashing applications and even corrupting the filesystem. That is of
    course not acceptable.

    Reading the documentation again, we now think that we have
    misunderstood what we're supposed to be doing. To our understanding,
    what should happen is this:
     - If the virtualisation host is connected to the storage node which
    is still running:
       - everything is fine and the VM keeps running
     - If the virtualisation host was connected to the storage node
    which is now absent:
       - qemu is supposed to 'pause' / 'freeze' the VM
       - Virtualisation host waits for ping timeout
       - Virtualisation host switches over to the other storage node
       - qemu 'unpauses' the VMs
       - The VM is fully operational again

    Does my description match the 'optimal' GlusterFS behaviour?

Can you provide more details about your gluster volume configuration and
the options enabled on the volume?

We upgraded Gluster from 3.5.X to 3.8.8. After the upgrade we switched 
from a two node replicating setup to two nodes + arbiter.

The volume hosting the virtual disks uses the settings from the 'virt' 
group, except for 'features.shard' which we disabled. I attached the 
exact list of options.

[0] http://www.foss-cloud.org/en/wiki/FOSS-Cloud

Greetings
Niklaus Hofer
--
stepping stone GmbH
Neufeldstrasse 9
CH-3012 Bern

Telefon: +41 31 332 53 63
www.stepping-stone.ch
niklaus.hofer@xxxxxxxxxxxxxxxxx
Option                                  Value                                   
------                                  -----                                   
cluster.lookup-unhashed                 on                                      
cluster.lookup-optimize                 off                                     
cluster.min-free-disk                   10%                                     
cluster.min-free-inodes                 5%                                      
cluster.rebalance-stats                 off                                     
cluster.subvols-per-directory           (null)                                  
cluster.readdir-optimize                off                                     
cluster.rsync-hash-regex                (null)                                  
cluster.extra-hash-regex                (null)                                  
cluster.dht-xattr-name                  trusted.glusterfs.dht                   
cluster.randomize-hash-range-by-gfid    off                                     
cluster.rebal-throttle                  normal                                  
cluster.lock-migration                  off                                     
cluster.local-volume-name               (null)                                  
cluster.weighted-rebalance              on                                      
cluster.switch-pattern                  (null)                                  
cluster.entry-change-log                on                                      
cluster.read-subvolume                  (null)                                  
cluster.read-subvolume-index            -1                                      
cluster.read-hash-mode                  2                                       
cluster.background-self-heal-count      8                                       
cluster.metadata-self-heal              on                                      
cluster.data-self-heal                  on                                      
cluster.entry-self-heal                 on                                      
cluster.self-heal-daemon                on                                      
cluster.heal-timeout                    600                                     
cluster.self-heal-window-size           1                                       
cluster.data-change-log                 on                                      
cluster.metadata-change-log             on                                      
cluster.data-self-heal-algorithm        full                                    
cluster.eager-lock                      enable                                  
disperse.eager-lock                     on                                      
cluster.quorum-type                     auto                                    
cluster.quorum-count                    (null)                                  
cluster.choose-local                    true                                    
cluster.self-heal-readdir-size          1KB                                     
cluster.post-op-delay-secs              1                                       
cluster.ensure-durability               on                                      
cluster.consistent-metadata             no                                      
cluster.heal-wait-queue-length          128                                     
cluster.favorite-child-policy           none                                    
cluster.stripe-block-size               128KB                                   
cluster.stripe-coalesce                 true                                    
diagnostics.latency-measurement         off                                     
diagnostics.dump-fd-stats               off                                     
diagnostics.count-fop-hits              off                                     
diagnostics.brick-log-level             INFO                                    
diagnostics.client-log-level            INFO                                    
diagnostics.brick-sys-log-level         CRITICAL                                
diagnostics.client-sys-log-level        CRITICAL                                
diagnostics.brick-logger                (null)                                  
diagnostics.client-logger               (null)                                  
diagnostics.brick-log-format            (null)                                  
diagnostics.client-log-format           (null)                                  
diagnostics.brick-log-buf-size          5                                       
diagnostics.client-log-buf-size         5                                       
diagnostics.brick-log-flush-timeout     120                                     
diagnostics.client-log-flush-timeout    120                                     
diagnostics.stats-dump-interval         0                                       
diagnostics.fop-sample-interval         0                                       
diagnostics.fop-sample-buf-size         65535                                   
diagnostics.stats-dnscache-ttl-sec      86400                                   
performance.cache-max-file-size         0                                       
performance.cache-min-file-size         0                                       
performance.cache-refresh-timeout       1                                       
performance.cache-priority                                                      
performance.cache-size                  32MB                                    
performance.io-thread-count             32                                      
performance.high-prio-threads           16                                      
performance.normal-prio-threads         16                                      
performance.low-prio-threads            32                                      
performance.least-prio-threads          1                                       
performance.enable-least-priority       on                                      
performance.least-rate-limit            0                                       
performance.cache-size                  128MB                                   
performance.flush-behind                on                                      
performance.nfs.flush-behind            on                                      
performance.write-behind-window-size    1MB                                     
performance.resync-failed-syncs-after-fsyncoff                                     
performance.nfs.write-behind-window-size1MB                                     
performance.strict-o-direct             off                                     
performance.nfs.strict-o-direct         off                                     
performance.strict-write-ordering       off                                     
performance.nfs.strict-write-ordering   off                                     
performance.lazy-open                   yes                                     
performance.read-after-open             no                                      
performance.read-ahead-page-count       4                                       
performance.md-cache-timeout            1                                       
performance.cache-swift-metadata        true                                    
features.encryption                     off                                     
encryption.master-key                   (null)                                  
encryption.data-key-size                256                                     
encryption.block-size                   4096                                    
network.frame-timeout                   1800                                    
network.ping-timeout                    15                                      
network.tcp-window-size                 (null)                                  
features.lock-heal                      off                                     
features.grace-timeout                  10                                      
network.remote-dio                      enable                                  
client.event-threads                    2                                       
network.ping-timeout                    15                                      
network.tcp-window-size                 (null)                                  
network.inode-lru-limit                 16384                                   
auth.allow                              10.1.120.11,10.1.120.12,10.1.120.16,10.1.120.13,10.1.120.14,10.1.120.15
auth.reject                             (null)                                  
transport.keepalive                     (null)                                  
server.allow-insecure                   On                                      
server.root-squash                      off                                     
server.anonuid                          65534                                   
server.anongid                          65534                                   
server.statedump-path                   /var/run/gluster                        
server.outstanding-rpc-limit            64                                      
features.lock-heal                      off                                     
features.grace-timeout                  10                                      
server.ssl                              (null)                                  
auth.ssl-allow                          *                                       
server.manage-gids                      off                                     
server.dynamic-auth                     on                                      
client.send-gids                        on                                      
server.gid-timeout                      300                                     
server.own-thread                       (null)                                  
server.event-threads                    2                                       
ssl.own-cert                            (null)                                  
ssl.private-key                         (null)                                  
ssl.ca-list                             (null)                                  
ssl.crl-path                            (null)                                  
ssl.certificate-depth                   (null)                                  
ssl.cipher-list                         (null)                                  
ssl.dh-param                            (null)                                  
ssl.ec-curve                            (null)                                  
performance.write-behind                on                                      
performance.read-ahead                  off                                     
performance.readdir-ahead               off                                     
performance.io-cache                    off                                     
performance.quick-read                  off                                     
performance.open-behind                 on                                      
performance.stat-prefetch               off                                     
performance.client-io-threads           off                                     
performance.nfs.write-behind            on                                      
performance.nfs.read-ahead              off                                     
performance.nfs.io-cache                off                                     
performance.nfs.quick-read              off                                     
performance.nfs.stat-prefetch           off                                     
performance.nfs.io-threads              off                                     
performance.force-readdirp              true                                    
features.uss                            off                                     
features.snapshot-directory             .snaps                                  
features.show-snapshot-directory        off                                     
network.compression                     off                                     
network.compression.window-size         -15                                     
network.compression.mem-level           8                                       
network.compression.min-size            0                                       
network.compression.compression-level   -1                                      
network.compression.debug               false                                   
features.limit-usage                    (null)                                  
features.quota-timeout                  0                                       
features.default-soft-limit             80%                                     
features.soft-timeout                   60                                      
features.hard-timeout                   5                                       
features.alert-time                     86400                                   
features.quota-deem-statfs              off                                     
geo-replication.indexing                off                                     
geo-replication.indexing                off                                     
geo-replication.ignore-pid-check        off                                     
geo-replication.ignore-pid-check        off                                     
features.quota                          off                                     
features.inode-quota                    off                                     
features.bitrot                         disable                                 
debug.trace                             off                                     
debug.log-history                       no                                      
debug.log-file                          no                                      
debug.exclude-ops                       (null)                                  
debug.include-ops                       (null)                                  
debug.error-gen                         off                                     
debug.error-failure                     (null)                                  
debug.error-number                      (null)                                  
debug.random-failure                    off                                     
debug.error-fops                        (null)                                  
nfs.enable-ino32                        no                                      
nfs.mem-factor                          15                                      
nfs.export-dirs                         on                                      
nfs.export-volumes                      on                                      
nfs.addr-namelookup                     off                                     
nfs.dynamic-volumes                     off                                     
nfs.register-with-portmap               on                                      
nfs.outstanding-rpc-limit               16                                      
nfs.port                                2049                                    
nfs.rpc-auth-unix                       on                                      
nfs.rpc-auth-null                       on                                      
nfs.rpc-auth-allow                      all                                     
nfs.rpc-auth-reject                     none                                    
nfs.ports-insecure                      off                                     
nfs.trusted-sync                        off                                     
nfs.trusted-write                       off                                     
nfs.volume-access                       read-write                              
nfs.export-dir                                                                  
nfs.disable                             On                                      
nfs.nlm                                 on                                      
nfs.acl                                 on                                      
nfs.mount-udp                           off                                     
nfs.mount-rmtab                         /var/lib/glusterd/nfs/rmtab             
nfs.rpc-statd                           /sbin/rpc.statd                         
nfs.server-aux-gids                     off                                     
nfs.drc                                 off                                     
nfs.drc-size                            0x20000                                 
nfs.read-size                           (1 * 1048576ULL)                        
nfs.write-size                          (1 * 1048576ULL)                        
nfs.readdir-size                        (1 * 1048576ULL)                        
nfs.rdirplus                            on                                      
nfs.exports-auth-enable                 (null)                                  
nfs.auth-refresh-interval-sec           (null)                                  
nfs.auth-cache-ttl-sec                  (null)                                  
features.read-only                      off                                     
features.worm                           off                                     
features.worm-file-level                off                                     
features.default-retention-period       120                                     
features.retention-mode                 relax                                   
features.auto-commit-period             180                                     
storage.linux-aio                       off                                     
storage.batch-fsync-mode                reverse-fsync                           
storage.batch-fsync-delay-usec          0                                       
storage.owner-uid                       -1                                      
storage.owner-gid                       -1                                      
storage.node-uuid-pathinfo              off                                     
storage.health-check-interval           30                                      
storage.build-pgfid                     off                                     
storage.bd-aio                          off                                     
cluster.server-quorum-type              server                                  
cluster.server-quorum-ratio             0                                       
changelog.changelog                     off                                     
changelog.changelog-dir                 (null)                                  
changelog.encoding                      ascii                                   
changelog.rollover-time                 15                                      
changelog.fsync-interval                5                                       
changelog.changelog-barrier-timeout     120                                     
changelog.capture-del-path              off                                     
features.barrier                        disable                                 
features.barrier-timeout                120                                     
features.trash                          off                                     
features.trash-dir                      .trashcan                               
features.trash-eliminate-path           (null)                                  
features.trash-max-filesize             5MB                                     
features.trash-internal-op              off                                     
cluster.enable-shared-storage           disable                                 
cluster.write-freq-threshold            0                                       
cluster.read-freq-threshold             0                                       
cluster.tier-pause                      off                                     
cluster.tier-promote-frequency          120                                     
cluster.tier-demote-frequency           3600                                    
cluster.watermark-hi                    90                                      
cluster.watermark-low                   75                                      
cluster.tier-mode                       cache                                   
cluster.tier-max-promote-file-size      0                                       
cluster.tier-max-mb                     4000                                    
cluster.tier-max-files                  10000                                   
features.ctr-enabled                    off                                     
features.record-counters                off                                     
features.ctr-record-metadata-heat       off                                     
features.ctr_link_consistency           off                                     
features.ctr_lookupheal_link_timeout    300                                     
features.ctr_lookupheal_inode_timeout   300                                     
features.ctr-sql-db-cachesize           1000                                    
features.ctr-sql-db-wal-autocheckpoint  1000                                    
locks.trace                             off                                     
locks.mandatory-locking                 off                                     
cluster.disperse-self-heal-daemon       enable                                  
cluster.quorum-reads                    no                                      
client.bind-insecure                    (null)                                  
ganesha.enable                          off                                     
features.shard                          off                                     
features.shard-block-size               4MB                                     
features.scrub-throttle                 lazy                                    
features.scrub-freq                     biweekly                                
features.scrub                          false                                   
features.expiry-time                    120                                     
features.cache-invalidation             off                                     
features.cache-invalidation-timeout     60                                      
features.leases                         off                                     
features.lease-lock-recall-timeout      60                                      
disperse.background-heals               8                                       
disperse.heal-wait-qlength              128                                     
cluster.heal-timeout                    600                                     
dht.force-readdirp                      on                                      
disperse.read-policy                    round-robin                             
cluster.shd-max-threads                 1                                       
cluster.shd-wait-qlength                1024                                    
cluster.locking-scheme                  full                                    
cluster.granular-entry-heal             no                                      
_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://lists.gluster.org/mailman/listinfo/gluster-users