Re: [PATCH v4 6/7] hugetlb: parallelize 2M hugetlb allocation and initialization

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 2024/1/22 19:30, Muchun Song wrote:
On Jan 22, 2024, at 18:12, Gang Li <gang.li@xxxxxxxxx> wrote:

On 2024/1/22 15:10, Muchun Song wrote:> On 2024/1/18 20:39, Gang Li wrote:
+static void __init hugetlb_alloc_node(unsigned long start, unsigned long end, void *arg)
   {
-    unsigned long i;
+    struct hstate *h = (struct hstate *)arg;
+    int i, num = end - start;
+    nodemask_t node_alloc_noretry;
+    unsigned long flags;
+    int next_node = 0;
This should be first_online_node which may be not zero.

That's right. Thanks!

-    for (i = 0; i < h->max_huge_pages; ++i) {
-        if (!alloc_bootmem_huge_page(h, NUMA_NO_NODE))
+    /* Bit mask controlling how hard we retry per-node allocations.*/
+    nodes_clear(node_alloc_noretry);
+
+    for (i = 0; i < num; ++i) {
+        struct folio *folio = alloc_pool_huge_folio(h, &node_states[N_MEMORY],
+                        &node_alloc_noretry, &next_node);
+        if (!folio)
               break;
+        spin_lock_irqsave(&hugetlb_lock, flags);
I suspect there will more contention on this lock when parallelizing.

In the worst case, there are only 'numa node number' of threads in
contention. And in my testing, it doesn't degrade performance, but
rather improves performance due to the reduced granularity.

So, the performance does not change if you move the lock out of
loop?


If we move the lock out of loop, then multi-threading becomes single-threading, which definitely reduces performance.

```
+       spin_lock_irqsave(&hugetlb_lock, flags);
        for (i = 0; i < num; ++i) {
struct folio *folio = alloc_pool_huge_folio(h, &node_states[N_MEMORY], &node_alloc_noretry, &next_node);
                if (!folio)
                        break;
-               spin_lock_irqsave(&hugetlb_lock, flags);
                __prep_account_new_huge_page(h, folio_nid(folio));
                enqueue_hugetlb_folio(h, folio);
-               spin_unlock_irqrestore(&hugetlb_lock, flags);
                cond_resched();
        }
+       spin_unlock_irqrestore(&hugetlb_lock, flags);
 }
```





[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux