Hi Xavi, Thank you for the suggestions, these are extremely helpful. I haven't thought it could be ZFS problem. I went back and checked a longer monitoring window and now I can see a pattern. Please see this attached Grafana screenshot (also available here: https://cl.ly/070J2y3n1u0F . Note that the data gaps were when I took down the server for rebooting): Between 8/4 - 8/6, I tried two transfer tests, and experienced 2 the gluster hanging problems. One during the first transfer, and another one happened shortly after the second transfer. I blocked both in pink lines. Looks like during my transfer tests, free memory was almost exhausted. The system has a very high cached memory, which I think was due to ZFS ARC. However, I am under the impression that ZFS will release space from ARC if it observes low system available memory. I am not sure why it didn't do that. I did't tweak related ZFS parameters. zfs_arc_max was set to 0 (default value). According to doc, it is "Max arc size of ARC in bytes. If set to 0 then it will consume 1/2 of system RAM." So it appeared that this setting didn't work. When the server was under heavy IO, the used memory was instead decreased, which I can't explain. May I ask if you, or anyone else in this group, has recommendation on ZFS settings for my setup? My server has 64GB physical memory and 150GB SSD space reserved for L2_ARC.The zpool has 6 vdevs and each has 12TB * 10 hard drives on raidz2. Total usable space in the zpool is 482TB. Thank you, Yuhao
|
_______________________________________________ Gluster-users mailing list Gluster-users@xxxxxxxxxxx https://lists.gluster.org/mailman/listinfo/gluster-users