Oh, you really don't want to go below 30s, I was told. I'm using 30 seconds for the timeout, and indeed when a node goes down the VM freez for 30 seconds, but I've never seen them go read only for that. I _only_ use virtio though, maybe it's that. What are you using ? On Fri, Sep 08, 2017 at 11:41:13AM +0200, Pavel Szalbot wrote: > Back to replica 3 w/o arbiter. Two fio jobs running (direct=1 and > direct=0), rebooting one node... and VM dmesg looks like: > > [ 483.862664] blk_update_request: I/O error, dev vda, sector 23125016 > [ 483.898034] blk_update_request: I/O error, dev vda, sector 2161832 > [ 483.901103] blk_update_request: I/O error, dev vda, sector 2161832 > [ 483.904045] Aborting journal on device vda1-8. > [ 483.906959] blk_update_request: I/O error, dev vda, sector 2099200 > [ 483.908306] blk_update_request: I/O error, dev vda, sector 2099200 > [ 483.909585] Buffer I/O error on dev vda1, logical block 262144, > lost sync page write > [ 483.911121] blk_update_request: I/O error, dev vda, sector 2048 > [ 483.912192] blk_update_request: I/O error, dev vda, sector 2048 > [ 483.913221] Buffer I/O error on dev vda1, logical block 0, lost > sync page write > [ 483.914546] EXT4-fs error (device vda1): > ext4_journal_check_start:56: Detected aborted journal > [ 483.916230] EXT4-fs (vda1): Remounting filesystem read-only > [ 483.917231] EXT4-fs (vda1): previous I/O error to superblock detected > [ 483.917353] JBD2: Error -5 detected when updating journal > superblock for vda1-8. > [ 483.921106] blk_update_request: I/O error, dev vda, sector 2048 > [ 483.922147] blk_update_request: I/O error, dev vda, sector 2048 > [ 483.923107] Buffer I/O error on dev vda1, logical block 0, lost > sync page write > > Root fs is read-only even with 1s ping-timeout... > > I really hope I have been idiot for almost a year now and someone > shows what am I doing completely wrong because I dream about joining > the hordes of fellow colleagues who store multiple VMs in gluster and > never had a problem with it. I also suspect the CentOS libvirt version > to be the cause. > > -ps > > > On Fri, Sep 8, 2017 at 10:50 AM, Pavel Szalbot <pavel.szalbot@xxxxxxxxx> wrote: > > FYI I set up replica 3 (no arbiter this time), did the same thing - > > rebooted one node during lots of file IO on VM and IO stopped. > > > > As I mentioned either here or in another thread, this behavior is > > caused by high default of network.ping-timeout. My main problem used > > to be that setting it to low values like 3s or even 2s did not prevent > > the FS to be mounted as read-only in the past (at least with arbiter) > > and docs describe reconnect as very costly. If I set ping-timeout to > > 1s disaster of read-only mount is now prevented. > > > > However I find it very strange because in the past I actually did end > > up with read-only filesystem despite of the low ping-timeout. > > > > With replica 3 after node reboot iftop shows data flowing only to the > > one of remaining two nodes and there is no entry in heal info for the > > volume. Explanation would be very much appreciated ;-) > > > > Few minutes later I reverted back to replica 3 with arbiter (group > > virt, ping-timeout 1). All nodes are up. During first fio run, VM > > disconnected my ssh session, so I reconnected and saw ext4 problems in > > dmesg. I deleted the VM and started a new one. Glustershd.log fills > > with metadata heal shortly after fio job starts, but this time system > > is stable. > > Rebooting one of the nodes does not cause any problem (watching heal > > log, i/o on vm). > > > > So I decided to put more stress on VMs disk - I added second job with > > direct=1 and started it (now both are running) while one gluster node > > is still booting. What happened? One fio job reports "Bus error" and > > VM segfaults when trying to run dmesg... > > > > Is this gfapi related? Is this bug in arbiter? > _______________________________________________ > Gluster-users mailing list > Gluster-users@xxxxxxxxxxx > http://lists.gluster.org/mailman/listinfo/gluster-users
Attachment:
signature.asc
Description: Digital signature
_______________________________________________ Gluster-users mailing list Gluster-users@xxxxxxxxxxx http://lists.gluster.org/mailman/listinfo/gluster-users