Re: Fwd: pyverbs test_resize_cq fails in some cases

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 3/23/23 21:38, Bob Pearson wrote:
> 
> 
> 
> -------- Forwarded Message --------
> Subject: pyverbs test_resize_cq fails in some cases
> Date: Thu, 23 Mar 2023 21:37:28 -0500
> From: Bob Pearson <rpearsonhpe@xxxxxxxxx>
> To: Edward Srouji <edwards@xxxxxxxxxx>
> CC: Jason Gunthorpe <jgg@xxxxxxxxxx>, Zhu Yanjun <zyjzyj2000@xxxxxxxxx>
> 
> Edward,
> 
> The pyverbs test: test_resize_cq fails for the rxe driver in some cases.
> 
> The code is definitely racy:
> 
> def test_resize_cq(self):
>         """
>         Test resize CQ, start with specific value and then increase and decrease
>         the CQ size. The test also check bad flow of decrease the CQ size when
>         there are more completions on it than the new value.
>         """
>         self.create_players(CQUDResources, cq_depth=3)
>         # Decrease the CQ size.
>         new_cq_size = 1
>         try:
>             self.client.cq.resize(new_cq_size)
>         except PyverbsRDMAError as ex:
>             if ex.error_code == errno.EOPNOTSUPP:
>                 raise unittest.SkipTest('Resize CQ is not supported')
>             raise ex
>         self.assertTrue(self.client.cq.cqe >= new_cq_size,
>                         f'The actual CQ size ({self.client.cq.cqe}) is less '
>                         'than guaranteed ({new_cq_size})')
> 
>         # Increase the CQ size.
>         new_cq_size = 7
>         self.client.cq.resize(new_cq_size)
>         self.assertTrue(self.client.cq.cqe >= new_cq_size,
>                         f'The actual CQ size ({self.client.cq.cqe}) is less '
>                         'than guaranteed ({new_cq_size})')
> 
>         # Fill the CQ entries except one for avoid cq_overrun warnings.
>         send_wr, _ = u.get_send_elements(self.client, False)
>         ah_client = u.get_global_ah(self.client, self.gid_index, self.ib_port)
>         for i in range(self.client.cq.cqe - 1):
>             u.send(self.client, send_wr, ah=ah_client)
> 
> 	### This posts 6 send work requests but does not wait for them to complete
> 	### The following proceeds while the sends are executing in the background
> 	### and the test can fail. An easy fix is to add the line
>         time.sleep(1/1000)  ### or maybe something a little larger but this works for me.
> 	### alternatively you could test the data at the destination.
> 
>         # Decrease the CQ size to less than the CQ unpolled entries.
>         new_cq_size = 1
>         with self.assertRaises(PyverbsRDMAError) as ex:
>             self.client.cq.resize(new_cq_size)
>         self.assertEqual(ex.exception.error_code, errno.EINVAL)
> 
> Bob

Edward,

This is showing up again. The software drivers are slower than the hardware drivers.
Now that the send side logic in the rxe driver is handled by a workqueue and if it is
scheduled instead of running inline this test will fail some percentage of the time.
Especially when the test is run for the first time but 5-10% of the time after that.
If you run the send side logic inline it is fast enough that the test doesn't fail but
that doesn't allow the workqueue tasks to spread out on the available cores so for high
QP count it is much better to schedule them.

There are a few time.sleep(1) calls in the pyverbs test suite but that is wasteful and
better is time.sleep(0.001) which works fine for this test and doesn't slow down the
run time for the whole suite noticeably.

Bottom line this test case is racy.

If you want I can submit a 2 line patch that does this. Or if you would that would be
better.

Bob



[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux