On Fri, Jul 21, 2023 at 3:31 AM Yosry Ahmed <yosryahmed@xxxxxxxxxx> wrote: > > On Thu, Jul 20, 2023 at 4:34 AM Hyeonggon Yoo <42.hyeyoo@xxxxxxxxx> wrote: > > > > On Thu, Jul 20, 2023 at 4:55 PM Yosry Ahmed <yosryahmed@xxxxxxxxxx> wrote: > > > > > > On Thu, Jul 20, 2023 at 12:18 AM Sergey Senozhatsky > > > <senozhatsky@xxxxxxxxxxxx> wrote: > > > > > > > > On (23/07/13 13:20), Hyeonggon Yoo wrote: > > > > > The purpose of this series is to define own memory descriptor for zsmalloc, > > > > > instead of re-using various fields of struct page. This is a part of the > > > > > effort to reduce the size of struct page to unsigned long and enable > > > > > dynamic allocation of memory descriptors. > > > > > > > > > > While [1] outlines this ultimate objective, the current use of struct page > > > > > is highly dependent on its definition, making it challenging to separately > > > > > allocate memory descriptors. > > > > > > > > I glanced through the series and it all looks pretty straight forward to > > > > me. I'll have a closer look. And we definitely need Minchan to ACK it. > > > > > > > > > Therefore, this series introduces new descriptor for zsmalloc, called > > > > > zsdesc. It overlays struct page for now, but will eventually be allocated > > > > > independently in the future. > > > > > > > > So I don't expect zsmalloc memory usage increase. On one hand for each > > > > physical page that zspage consists of we will allocate zsdesc (extra bytes), > > > > but at the same time struct page gets slimmer. So we should be even, or > > > > am I wrong? > > > > > > Well, it depends. Here is my understanding (which may be completely wrong): > > > > > > The end goal would be to have an 8-byte memdesc for each order-0 page, > > > and then allocate a specialized struct per-folio according to the use > > > case. In this case, we would have a memdesc and a zsdesc for each > > > order-0 page. If sizeof(zsdesc) is 64 bytes (on 64-bit), then it's a > > > net loss. The savings only start kicking in with higher order folios. > > > As of now, zsmalloc only uses order-0 pages as far as I can tell, so > > > the usage would increase if I understand correctly. > > > > I partially agree with you that the point of memdesc stuff is > > allocating a use-case specific > > descriptor per folio. but I thought the primary gain from memdesc was > > from anon and file pages > > (where high order pages are more usable), rather than zsmalloc. > > > > And I believe enabling a memory descriptor per folio would be > > impossible (or inefficient) > > if zsmalloc and other subsystems are using struct page in the current > > way (or please tell me I'm wrong?) > > > > So I expect the primary gain would be from high-order anon/file folios, > > while this series is a prerequisite for them to work sanely. > > Right, I agree with that, sorry if I wasn't clear. I meant that > generally speaking, we see gains from memdesc from higher order > folios, so for zsmalloc specifically we probably won't see seeing any > savings, and *might* see some extra usage (which I might be wrong > about, see below). Yeah, even if I said, "oh, we don't necessarily need to use extra memory for zsdesc" below, a slight increase wouldn't hurt too much in that perspective, because there will be savings from other users of memdesc. > > > It seems to me though the sizeof(zsdesc) is actually 56 bytes (on > > > 64-bit), so sizeof(zsdesc) + sizeof(memdesc) would be equal to the > > > current size of struct page. If that's true, then there is no loss, > > > > Yeah, zsdesc would be 56 bytes on 64 bit CPUs as memcg_data field is > > not used in zsmalloc. > > More fields in the current struct page might not be needed in the > > future, although it's hard to say at the moment. > > but it's not a loss. > > Is page->memcg_data something that we can drop? Aren't there code > paths that will check page->memcg_data even for kernel pages (e.g. > __folio_put() -> __folio_put_small() -> mem_cgroup_uncharge() ) ? zsmalloc pages are not accounted for via __GFP_ACCOUNT, and IIUC the current implementation of zswap memcg charging does not use memcg_data either - so I think it can be dropped. I think we don't want to increase memdesc to 16 bytes by adding memcg_data. It should be in use-case specific descriptors if it can be charged to memcg? > > > and there's potential gain if we start using higher order folios in > > > zsmalloc in the future. > > > > AFAICS zsmalloc should work even when the system memory is fragmented, > > so we may implement fallback allocation (as currently discussed in > > large anon folios thread). > > Of course, any usage of higher order folios in zsmalloc must have a > fallback logic, although it might be simpler for zsmalloc than anon > folios. I agree that's off topic here. > > It might work, but IMHO the purpose of this series is to enable memdesc > > for large anon/file folios, rather than seeing a large gain in zsmalloc itself. > > (But even in zsmalloc, it's not a loss) > > > > > (That is of course unless we want to maintain cache line alignment for > > > the zsdescs, then we might end up using 64 bytes anyway). > > > > we already don't require cache line alignment for struct page. the current > > alignment requirement is due to SLUB's cmpxchg128 operation, not cache > > line alignment. > > I thought we want struct page to be cache line aligned (to avoid > having to fetch two cache lines for one struct page), but I can easily > be wrong. Right. I admit that even if it's not required to be cache line aligned, it is 64 bytes in commonly used configurations. and changing it could affect some workloads. But I think for zsdesc it would be better not to align by cache line size, before observing degradations due to alignment. By the time zsmalloc is intensively used, it shouldn't be a huge issue. > > I might be wrong in some aspects, so please tell me if I am. > > And thank you and Sergey for taking a look at this! > > Thanks to you for doing the work! No problem! :)