On 12/12/2023 14:54, David Hildenbrand wrote: > On 07.12.23 17:12, Ryan Roberts wrote: >> In preparation for adding support for anonymous multi-size THP, >> introduce new sysfs structure that will be used to control the new >> behaviours. A new directory is added under transparent_hugepage for each >> supported THP size, and contains an `enabled` file, which can be set to >> "inherit" (to inherit the global setting), "always", "madvise" or >> "never". For now, the kernel still only supports PMD-sized anonymous >> THP, so only 1 directory is populated. >> >> The first half of the change converts transhuge_vma_suitable() and >> hugepage_vma_check() so that they take a bitfield of orders for which >> the user wants to determine support, and the functions filter out all >> the orders that can't be supported, given the current sysfs >> configuration and the VMA dimensions. The resulting functions are >> renamed to thp_vma_suitable_orders() and thp_vma_allowable_orders() >> respectively. Convenience functions that take a single, unencoded order >> and return a boolean are also defined as thp_vma_suitable_order() and >> thp_vma_allowable_order(). >> >> The second half of the change implements the new sysfs interface. It has >> been done so that each supported THP size has a `struct thpsize`, which >> describes the relevant metadata and is itself a kobject. This is pretty >> minimal for now, but should make it easy to add new per-thpsize files to >> the interface if needed in future (e.g. per-size defrag). Rather than >> keep the `enabled` state directly in the struct thpsize, I've elected to >> directly encode it into huge_anon_orders_[always|madvise|inherit] >> bitfields since this reduces the amount of work required in >> thp_vma_allowable_orders() which is called for every page fault. >> >> See Documentation/admin-guide/mm/transhuge.rst, as modified by this >> commit, for details of how the new sysfs interface works. >> >> Reviewed-by: Barry Song <v-songbaohua@xxxxxxxx> >> Tested-by: Kefeng Wang <wangkefeng.wang@xxxxxxxxxx> >> Tested-by: John Hubbard <jhubbard@xxxxxxxxxx> >> Signed-off-by: Ryan Roberts <ryan.roberts@xxxxxxx> >> --- > > [...] > >> + >> +static ssize_t thpsize_enabled_store(struct kobject *kobj, >> + struct kobj_attribute *attr, >> + const char *buf, size_t count) >> +{ >> + int order = to_thpsize(kobj)->order; >> + ssize_t ret = count; >> + >> + if (sysfs_streq(buf, "always")) { >> + spin_lock(&huge_anon_orders_lock); >> + clear_bit(order, &huge_anon_orders_inherit); >> + clear_bit(order, &huge_anon_orders_madvise); >> + set_bit(order, &huge_anon_orders_always); >> + spin_unlock(&huge_anon_orders_lock); >> + } else if (sysfs_streq(buf, "inherit")) { >> + spin_lock(&huge_anon_orders_lock); >> + clear_bit(order, &huge_anon_orders_always); >> + clear_bit(order, &huge_anon_orders_madvise); >> + set_bit(order, &huge_anon_orders_inherit); >> + spin_unlock(&huge_anon_orders_lock); >> + } else if (sysfs_streq(buf, "madvise")) { >> + spin_lock(&huge_anon_orders_lock); >> + clear_bit(order, &huge_anon_orders_always); >> + clear_bit(order, &huge_anon_orders_inherit); >> + set_bit(order, &huge_anon_orders_madvise); >> + spin_unlock(&huge_anon_orders_lock); >> + } else if (sysfs_streq(buf, "never")) { >> + spin_lock(&huge_anon_orders_lock); >> + clear_bit(order, &huge_anon_orders_always); >> + clear_bit(order, &huge_anon_orders_inherit); >> + clear_bit(order, &huge_anon_orders_madvise); >> + spin_unlock(&huge_anon_orders_lock); > > Why not perform lock/unlock only once in surrounding code? :) I was nervous that sysfs_streq() may be unhappy in atomic context... Unfounded? > > > Much better > > Acked-by: David Hildenbrand <david@xxxxxxxxxx> >