在 2024/11/1 18:52, Dust Li 写道: > On 2024-11-01 16:23:42, liqiang wrote: >> connections based on redis-benchmark (test in smc loopback-ism mode): > > I think you can run test wrk/nginx test with short-lived connection. > For example: > > ``` > # client > wrk -H "Connection: close" http://$serverIp > > # server > nginx > ``` I tested with nginx, the test command is: # server smc_run nginx # client smc_run wrk -t <2,4,8,16,32,64> -c 200 -H "Connection: close" http://127.0.0.1 Requests/sec --------+---------------+---------------+ req/s | without patch | apply patch | --------+---------------+---------------+ -t 2 |6924.18 |7456.54 | --------+---------------+---------------+ -t 4 |8731.68 |9660.33 | --------+---------------+---------------+ -t 8 |11363.22 |13802.08 | --------+---------------+---------------+ -t 16 |12040.12 |18666.69 | --------+---------------+---------------+ -t 32 |11460.82 |17017.28 | --------+---------------+---------------+ -t 64 |11018.65 |14974.80 | --------+---------------+---------------+ Transfer/sec --------+---------------+---------------+ trans/s | without patch | apply patch | --------+---------------+---------------+ -t 2 |24.72MB |26.62MB | --------+---------------+---------------+ -t 4 |31.18MB |34.49MB | --------+---------------+---------------+ -t 8 |40.57MB |49.28MB | --------+---------------+---------------+ -t 16 |42.99MB |66.65MB | --------+---------------+---------------+ -t 32 |40.92MB |60.76MB | --------+---------------+---------------+ -t 64 |39.34MB |53.47MB | --------+---------------+---------------+ > >> >> 1. On the current version: >> [x.832733] smc_buf_get_slot cost:602 ns, walk 10 buf_descs >> [x.832860] smc_buf_get_slot cost:329 ns, walk 12 buf_descs >> [x.832999] smc_buf_get_slot cost:479 ns, walk 17 buf_descs >> [x.833157] smc_buf_get_slot cost:679 ns, walk 13 buf_descs >> ... >> [x.045240] smc_buf_get_slot cost:5528 ns, walk 196 buf_descs >> [x.045389] smc_buf_get_slot cost:4721 ns, walk 197 buf_descs >> [x.045537] smc_buf_get_slot cost:4075 ns, walk 198 buf_descs >> [x.046010] smc_buf_get_slot cost:6476 ns, walk 199 buf_descs >> >> 2. Apply this patch: >> [x.180857] smc_buf_get_slot_free cost:75 ns >> [x.181001] smc_buf_get_slot_free cost:147 ns >> [x.181128] smc_buf_get_slot_free cost:97 ns >> [x.181282] smc_buf_get_slot_free cost:132 ns >> [x.181451] smc_buf_get_slot_free cost:74 ns >> >> It can be seen from the data that it takes about 5~6us to traverse 200 > > Based on your data, I'm afraid the short-lived connection > test won't show much benificial. Since the time to complete a > SMC-R connection should be several orders of magnitude larger > than 100ns. Sorry, I didn't explain my test data well before. The main optimized functions of this patch are as follows: ``` struct smc_buf_desc *smc_buf_get_slot(...) { struct smc_buf_desc *buf_slot; down_read(lock); list_for_each_entry(buf_slot, buf_list, list) { if (cmpxchg(&buf_slot->used, 0, 1) == 0) { up_read(lock); return buf_slot; } } up_read(lock); return NULL; } ``` The above data is the time-consuming data of this function. If the current system has 200 active links, then during the process of establishing a new SMC connection, this function must traverse all 200 active links, which will take 5~6us. If there are already 1,000 for active links, it takes about 30us. After optimization, this function takes <100ns, it has nothing to do with the number of active links. Moreover, the lock has been removed, which is firendly to multi-thread parallel scenarios. The optimized code is as follows: ``` static struct smc_buf_desc *smc_buf_get_slot_free(struct llist_head *buf_llist) { struct smc_buf_desc *buf_free; struct llist_node *llnode; if (llist_empty(buf_llist)) return NULL; // lock-less link list don't need an lock llnode = llist_del_first(buf_llist); buf_free = llist_entry(llnode, struct smc_buf_desc, llist); WRITE_ONCE(buf_free->used, 1); return buf_free; } ``` -- Cheers, Li Qiang