Re: can not load more than 51 nvme rdma device

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





On 5/16/2018 8:57 AM, 李春 wrote:
Hi:

Hi,


I encountered a problem of nvme-rdma on mellanox network card.
Thanks in advance for your help.


# Problem Description

## Steps to Reproduce
* Two nodes (nodeA, nodeB) are linked through the mellanox 56Gb
connectX-3 network card.
* nodeA outputs 100 disks through 100 subsystems via nvmet-rdma, one disk
per subsystem.

Any reason for this type of 1:1 configuration ?
Can you expose all 100 disks using 1 subsystem or 10 disks per subsystem ?
Do you understand the difference in resource allocation between both cases ?

* Load the disk with "nvme connect" on nodeB. When it is loaded into
the 51st disk, it will complain ```Failed to write to
/dev/nvme-fabrics: Cannot allocate memory```. The disk load command is as
follows, using 10 queues to load the disk:

This is because you try to allocate more MRs that the maximum support of the device.
In NVNe/RDMA We create "queue-size" number of MRs per each created IO queue.


```
nvme connect -t rdma -a 172.16.128.51 -s 4421 -n s01.4421.01
--nr-io-queues=10 -k 1 -l 6000 -c 1 -q woqu

try to use --queue-size=16 in your conect command.
You don't realy need so many resources (10 io queues with 128 queue-size each) to saturate 56Gb wire.

```

* If you load the disk with 1 queue at this time, it can be loaded
successfully without error.

* Additional Information: This problem does not occur when we load
with a 100GbE network adapter.

The max_mr for this adapter is much bigger.
If the above solutions are not enough, then we can dig-in more to low level drivers...



## log information
* When the nodeB is normally loaded, the /var/log/message information is as
follows:

```
May 8 19:10:37 qdata-lite52-dev kernel: nvme nvme47: creating 10 I/O queues.
May 8 19:10:37 qdata-lite52-dev kernel: nvme nvme47: new ctrl: NQN
"s01.4421.48", addr 172.16.128.51:4421
```


* Warning message when loading the 50th disk
```
May 8 15:26:55 qdata-lite52-dev kernel: nvme nvme50: creating 10 I/O queues.
May 8 15:26:55 qdata-lite52-dev kernel: blk-mq: reduced tag depth (128 ->
16)
May 8 15:26:55 qdata-lite52-dev kernel: nvme nvme50: new ctrl: NQN
"s01.4421.45", addr 172.16.128.51:4421
```


* Error will be reported when loading the 51st disk
```
May 8 15:26:55 qdata-lite52-dev kernel: blk-mq: reduced tag depth (31 -> 15)
May 8 15:26:55 qdata-lite52-dev kernel: nvme nvme51: creating 10 I/O queues.
May 8 15:26:55 qdata-lite52-dev kernel: blk-mq: failed to allocate request
map
```


## environment

* OS:rhel 7.4
* network card:56Gb ConnectX-3
```
44:00.0 Network controller: Mellanox Technologies MT27500 Family
[ConnectX-3]
```


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux