Re: GFS2 DLM problem on NVMes

"Eric H. Chang" <echang@xxxxxx> · Thu, 23 Nov 2017 05:36:02 +0000

Hi Dave,

When errors started to come out, the system got slower (perf degraded) and lots of error messages showed up repeatedly.
 Specifically, when the large amount of slab memory was reclaimed such as 9GB to 6GB, the about 30 error messages came out.
‘send_repeat_remove’ messages were printed about 5 times intermittently as well. But the system didn’t get stuck.

We are running JMeter tool to simulate the CDN workloads and there are 2 million files(3MB size per file) in my storage
 that are read by 4 host servers.
160Gbps bandwidth were reached using 16 client servers with 10Gb and 4 host servers with 40Gb that runs GFS. Hope
 this helps you understand my usage. 

eric

-----Original Message-----

From: David Teigland [mailto:teigland@xxxxxxxxxx] 

Sent: Thursday, November 23, 2017 12:04 AM

To: 장홍석/SW-Defined Storage Lab <echang@xxxxxx>

Cc: linux-cluster@xxxxxxxxxx; swhiteho@xxxxxxxxxx; mferrell@xxxxxxxxxx; 성백재/SW-Defined Storage Lab <bj.sung@xxxxxx>;
윤진혁/SW-Defined Storage Lab <jhyoon01@xxxxxx>;
민항준/SW-Defined Storage Lab <hangjun.min@xxxxxx>

Subject: Re: [Linux-cluster] GFS2 DLM problem on NVMes

On Wed, Nov 22, 2017 at 04:32:13AM +0000, Eric H. Chang wrote:
> We  ve tested with different   toss_secs   as advised. When we

> configured it as 1000, we saw the   send_repeat_remove   log after

> 1000sec. We can test with other values on   toss_secs  , but we think

> it would have the same problem potentially when freeing up the slab

> after the configured sec.

Do you see many of these messages?  Do gfs operations become stuck after they appear?

-- 
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster