Re: GFS2 DLM problem on NVMes

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Dave and Steven,

Thank you for the assistance.

 

We made some progress here and would like to share with you.

 

#1.

We’ve set ‘vm.vfs_cache_pressure’ to zero and ran tests. As a result, we couldn’t see the same problem happening and observed that the slab grew slowly and saturated to 25GB during the overnight test. We will keep running test with this, but it’d be appeciated if you can advise any risks when we stick with this config.

#2.

We’ve tested with different ‘toss_secs’ as advised. When we configured it as 1000, we saw the ‘send_repeat_remove’ log after 1000sec. We can test with other values on ‘toss_secs’, but we think it would have the same problem potentially when freeing up the slab after the configured sec.

 

Do our results make sense to you?

 

Best Regards,

Eric Chang(Hong-seok), Manager | Software Defined Storage Lab | SK Telecom Co., LTD.

echang@xxxxxx | mobile: +82-10-4996-3690 | skype: ehschang

 

 

Re: GFS2 DLM problem on NVMes

 

•From: David Teigland <teigland redhat com>

To: bj sung sk com

Cc: linux-cluster redhat com

Subject: Re: GFS2 DLM problem on NVMes

Date: Mon, 20 Nov 2017 13:09:32 -0600

 

> We are developing storage systems using 10 NVMes (current test set).

> Using MD RAID10 + CLVM/GFS2 over four hosts achieves 22 GB/s (Max. on Reads).

 

Does MD RAID10 work correctly under GFS2?  Does the RAID10 make use of the

recent md-cluster enhancements (which also use the dlm)?

 

> However, a GFS2 DLM problem occurred. The problem is that each host

> frequently reports dlm: gfs2: send_repeat_remove kernel messages,

> and I/O throughput becomes unstable and low.

 

send_repeat_remove is a mysterious corner case, related to the resource

directory becoming out of sync with the actual resource master.  There's

an inherent race in this area of the dlm which is hard to solve because

the same record (mapping of resource name to master nodeid) needs to be

changed consistently on two nodes.  Perhaps in the future the dlm could be

enhanced with some algorithm to do that better.  For now, it just repeats

the change (logging the message you see).  If the repeated operation is

working, then things won't be permanently stuck.

 

The most likely cause, it seems to me, is that the speed of storage

relative to the speed of the network is triggering pathological timing

issues in the dlm.  Try adjusting the "toss_secs" tunable, which controls

how long a node will hold on to an unused resource before giving up

mastery of it (the master change is what leads to the inconsistency

mentioned above.)

 

  echo 1000 > /sys/kernel/config/dlm/cluster/toss_secs

 

The default is 10, I'd try 100/1000/10000.  A number too large could have

negative consequences of not freeing enough dlm resources that will never

be used again, e.g. if you are deleting a lot of files.  Set this number

before mounting gfs for it to take effect.

 

In the past, I think that send_repeat_remove has tended to appear when

there's a huge volume of dlm messages, triggered by excessive caching done

by gfs when there's a large amount of system memory.  The huge volume of

dlm messages results in the messages appearing in unusual sequences,

reversing the usual cause-effect.

 

Dave

 

 

Re: GFS2 DLM problem on NVMes

 

•From: Steven Whitehouse <swhiteho redhat com>

•To: linux-cluster redhat com, Mark Ferrell <mferrell redhat com>, David Teigland <teigland redhat com>

•Subject: Re: GFS2 DLM problem on NVMes

•Date: Mon, 20 Nov 2017 10:40:54 +0000

   

 

Hi,

 

On 20/11/17 04:23, 성백재 wrote:

 

 

Hello, List.

 

We are developing storage systems using 10 NVMes (current test set).

 

Using MD RAID10 + CLVM/GFS2 over four hosts achieves 22 GB/s (Max. on Reads).

However, a GFS2 DLM problem occurred. The problem is that each host frequently reports “dlm: gfs2: send_repeat_remove” kernel messages, and I/O throughput becomes unstable and low.

I found a GFS2 commit message about “send_repeat_remove” function.

(https://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2.git/commit/?id=96006ea6d4eea73466e90ef353bf34e507724e77)

 

 

Information about the test environment.

 

Four hosts share 10 NVMes, and each host deploys CLVM/GFS2 on top of the cluster MD RAID1 + MD RAID0.

GFS2 has 2,000 directories, each with 1,900 media files (3 MB on average).

Each host runs 20 threads of NGINX, and each thread randomly reads media files on demand.

The Linux kernel version is 4.11.8.

 

 

Can you offer suggestions or directions to solve these problems?

 

Thank you in advance :)

 

 

Best regards,

/Jay Sung

 

 I'm copying in our DLM experts. It would be good to open a bug at Red Hat's bugzilla to track this issue (and a customer case too, if you are a customer). It looks like something that will need some investigation to get to the bottom of what is going on. I suspect that a tcpdump of the DLM traffic when the issue occurs would be the first thing to try, so that we can try and match the message to the protocol dump. That may not be easy since I suspect that there is a large quantity of DLM traffic in your set up, and that will make finding the specific messages more tricky.

 

Just out of interest, what kind of network is this running over? How much bandwidth is DLM taking up?

Steve.

 

GFS2 DLM problem on NVMes

 

•From: 성백재 <bj sung sk com>

•To: "linux-cluster redhat com" <linux-cluster redhat com>

•Subject: GFS2 DLM problem on NVMes

•Date: Mon, 20 Nov 2017 04:23:35 +0000

 

Hello, List.

 

 

We are developing storage systems using 10 NVMes (current test set).

Using MD RAID10 + CLVM/GFS2 over four hosts achieves 22 GB/s (Max. on Reads).

However, a GFS2 DLM problem occurred. The problem is that each host frequently reports “dlm: gfs2: send_repeat_remove” kernel messages, and I/O throughput becomes unstable and low.

I found a GFS2 commit message about “send_repeat_remove” function.

(https://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2.git/commit/?id=96006ea6d4eea73466e90ef353bf34e507724e77)

 

Information about the test environment.

 

Four hosts share 10 NVMes, and each host deploys CLVM/GFS2 on top of the cluster MD RAID1 + MD RAID0.

GFS2 has 2,000 directories, each with 1,900 media files (3 MB on average).

 

Each host runs 20 threads of NGINX, and each thread randomly reads media files on demand.

The Linux kernel version is 4.11.8.

Can you offer suggestions or directions to solve these problems?

 

Thank you in advance :)

 

Best regards,

/Jay Sung

Jay Sung (Baegjae), Manager | Software Defined Storage Lab | SK Telecom Co., LTD.

bj sung sk com | mobile: +82-10-2087-5637

-- 
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster

[Index of Archives]     [Corosync Cluster Engine]     [GFS]     [Linux Virtualization]     [Centos Virtualization]     [Centos]     [Linux RAID]     [Fedora Users]     [Fedora SELinux]     [Big List of Linux Books]     [Yosemite Camping]

  Powered by Linux