Re: [RFC net-next 0/2] Optimize the parallelism of SMC-R connections

"D. Wythe" <alibuda@xxxxxxxxxxxxxxxxx> · Mon, 25 Sep 2023 18:10:46 +0800

On 9/21/23 8:36 PM, Alexandra Winter wrote:
On 18.09.23 05:58, D. Wythe wrote:
Hi Alexandra,

Sorry for the late reply. I have been thinking about the question you mentioned for a while, and this is a great opportunity to discuss this issue.
My point is that the purpose of the locks is to minimize the expansion of the number of link groups as much as possible.

As we all know, the SMC-R protocol has the following specifications:

  * A SMC-R connection MUST be mapped into one link group.
  * A link group is usually created by a connection, which is also known
    as "First Contact."

If we start from scratch, we can design the connection process as follows:

1. Check if there are any available link groups. If so, map the
    connection into it and go to step 3.
2. Mark this connection as "First Contact," create a link group, and
    mark the new link group as unavailable.
3. Finish connection establishment.
4. If the connection is "First Contact," mark the new link group as
    available and map the connection into it.

I think there is no logical problem with this process, but there is a practical issue where burst traffic can result in burst link groups.

For example, if there are 10,000 incoming connections, based on the above logic, the most extreme scenario would be to create 10,000 link groups.
This can cause significant memory pressure and even be used for security attacks.

To address this goal, the simplest way is to make each connection process mutually exclusive, having the following process:

1. Block other incoming connections.
2. Check if there are any available link groups. If so, map the
    connection into it and go to step 4.
3. Mark this connection as "First Contact," create a link group, and
    mark it as unavailable.
4. Finish connection establishment.
5. If the connection is "First Contact," mark the new link group as
    available and map the connection into it.
6. Allow other connections to come in.

And this is our current process now!

Regarding the purpose of the locks, to minimize the expansion of the number of link groups. If we agree with this point, we can observe that
in phase 2 going to phase 4, this process will never create a new link group. Obviously, the lock is not needed here.
Well, you still have issue of a link group going away. Thread 1 is deleting the last connection from a link group and shutting it down. Thread 2 is adding a 'second' connection (from its poitn ov view) to the linkgroup.

Hi Alexandra,

That's right.  But even if we do nothing, the current implements still 
has this problem.
And this problem can be solved by the spinlock inside smc_conn_create, 
rather than the
pending lock.

And also deleting the last connection from a link group will not 
shutting the down right now,
usually waiting for 10 minutes of idle time.

Then the last question: why is the lock needed until after smc_clc_send_confirm in the new-LGR case? We can try to move phase 6 ahead as follows:

1. Block other incoming connections.
2. Check if there are any available link groups. If so, map the
    connection into it and go to step 4.
3. Mark this connection as "First Contact," create a link group, and
    mark it as unavailable.
4. Allow other connections to come in.
5. Finish connection establishment.
6. If the connection is "First Contact," mark the new link group as
    available and map the connection into it.

There is also no problem with this process! However, note that this logic does not address burst issues.
Burst traffic will still result in burst link groups because a new link group can only be marked as available when the "First Contact" is completed,
which is after sending the CLC Confirm.

Hope my point is helpful to you. If you have any questions, please let me know. Thanks.

Best wishes,
D. Wythe
You are asking exactly the right questions here. Creation of new connections is on the critical path,
and if the design can be optimized for parallelism that will increase perfromance, while insufficient
locking will create nasty bugs.
Many programmers have dealt with these issues before us. I would recommend to consult existing proven
patterns; e.g. the ones listed in Paul McKenney's book
(https://mirrors.edge.kernel.org/pub/linux/kernel/people/paulmck/perfbook/)
e.g. 'Chapter 10.3 Read-Mostly Data Structures' and of course the kernel documentation folder.
Improving an existing codebase like smc without breaking is not trivial. Obviuosly a step-by-step approach,
works best. So if you can identify actions that can be be done under a smaller (as in more granular) lock
instead of under a global lock. OR change a mutex into R/W or RCU.
Smaller changes are easier to review (and bisect in case of regressions).

I have to say it's quite hard to make the lock smaller, we have indeed 
considered the impact of the complexity of the patch on review,
and this might be the simplest solution we can think of. If this 
solution is not okay for you, perhaps we can discuss
whether there is a better solution ?

Best wishes,
D. Wythe