On Thu, 23 Sep 2021 10:53:44 -0700 Mike Kravetz <mike.kravetz@xxxxxxxxxx> wrote: > Two new sysfs files are added to demote hugtlb pages. These files are > both per-hugetlb page size and per node. Files are: > demote_size - The size in Kb that pages are demoted to. (read-write) > demote - The number of huge pages to demote. (write-only) > > By default, demote_size is the next smallest huge page size. Valid huge > page sizes less than huge page size may be written to this file. When > huge pages are demoted, they are demoted to this size. > > Writing a value to demote will result in an attempt to demote that > number of hugetlb pages to an appropriate number of demote_size pages. > > NOTE: Demote interfaces are only provided for huge page sizes if there > is a smaller target demote huge page size. For example, on x86 1GB huge > pages will have demote interfaces. 2MB huge pages will not have demote > interfaces. > > This patch does not provide full demote functionality. It only provides > the sysfs interfaces. > > It also provides documentation for the new interfaces. > > ... > > +static ssize_t demote_store(struct kobject *kobj, > + struct kobj_attribute *attr, const char *buf, size_t len) > +{ > + unsigned long nr_demote; > + unsigned long nr_available; > + nodemask_t nodes_allowed, *n_mask; > + struct hstate *h; > + int err; > + int nid; > + > + err = kstrtoul(buf, 10, &nr_demote); > + if (err) > + return err; > + h = kobj_to_hstate(kobj, &nid); > + > + /* Synchronize with other sysfs operations modifying huge pages */ > + mutex_lock(&h->resize_lock); > + > + spin_lock_irq(&hugetlb_lock); > + if (nid != NUMA_NO_NODE) { > + nr_available = h->free_huge_pages_node[nid]; > + init_nodemask_of_node(&nodes_allowed, nid); > + n_mask = &nodes_allowed; > + } else { > + nr_available = h->free_huge_pages; > + n_mask = &node_states[N_MEMORY]; > + } > + nr_available -= h->resv_huge_pages; > + if (nr_available <= 0) > + goto out; > + nr_demote = min(nr_available, nr_demote); > + > + while (nr_demote) { > + if (!demote_pool_huge_page(h, n_mask)) > + break; > + > + /* > + * We may have dropped the lock in the routines to > + * demote/free a page. Recompute nr_demote as counts could > + * have changed and we want to make sure we do not demote > + * a reserved huge page. > + */ This comment doesn't become true until patch #4, and is a bit confusing in patch #1. Also, saying "the lock" is far less helpful than saying "hugetlb_lock"! > + nr_demote--; > + if (nid != NUMA_NO_NODE) > + nr_available = h->free_huge_pages_node[nid]; > + else > + nr_available = h->free_huge_pages; > + nr_available -= h->resv_huge_pages; > + if (nr_available <= 0) > + nr_demote = 0; > + else > + nr_demote = min(nr_available, nr_demote); > + } > + > +out: > + spin_unlock_irq(&hugetlb_lock); How long can we spend with IRQs disabled here (after patch #4!)? > + mutex_unlock(&h->resize_lock); > + > + return len; > +} > +HSTATE_ATTR_WO(demote); > +