Re: Regression with kernel 6.3 "kernel BUG at include/linux/swapops.h:472!"

Helge Deller <deller@xxxxxx> · Sun, 14 May 2023 02:09:10 +0200

On 5/14/23 01:32, David Hildenbrand wrote:

This fix makes it work like the layout I documented.

Yes, and your layout looks good for me.

Good :)

What I originally tried doing was reusing one of the spare bits instead of reworking
the layout. Apparently, I got the old layout wrong. :(

Don't worry! Your patch harmonizes parisc to the other platforms, which is good.

So if I understood the layout right this time, maybe we can just use one of the two
spare bits: _PAGE_HUGE (or alternatively, _PAGE_DIRTY_BIT)?

Yes, or keep what you suggested.

What I don't understand yet is the original code:
#define __swp_type(x)                     ((x).val & 0x1f)
#define __swp_offset(x)                   ( (((x).val >> 6) &  0x7) | \
                                            (((x).val >> 8) & ~0x7) )
#define __swp_entry(type, offset)         ((swp_entry_t) { (type) | \
                                              ((offset &  0x7) << 6) | \
                                              ((offset & ~0x7) << 8) })

Don't we loose one of the offset bits?

Let's assume we have the offset 0xff. Encoding it with type 0 would be

((0xff & 0x7) << 6) | ((0xff & ~0x7) << 8)
-> (0x7 << 6) | (0xf8 << 8)
-> 0x1c0 | 0xf800
-> 0xf9c0

Extracting the offset:

((0xf9c0 >> 6) & 0x7) | ((0xf9c0 >> 8) & ~0x7)
-> (0x3e7 & 0x7) | (0xf9 & ~0x7)
-> 0x7 | 0xf8
-> 0xff

I think it's correct.

Yes. Seems good.

The confusing part (that resulted in the BUG here) is that we end up wasting bit #26, because there is a spare bit between the type and the offset.

Maybe a relic from the past -- or copy-and-paste, because some archs supported types with > 5 bits, but core-MM only ever uses 5 bits.

Hard to say. It has been as such since a long time....

Mask 0x7 is 3 bits, but we shift by 6 and 8 (=2 bits difference), so I believe the second shift should be 9.
If it would be 9, then no &0x07 is needed and only one shift would be sufficient.

I don't know much in the swap pte area, but isn't the previous original code wrong?
Which bits of the swp_entry are used where?
I think the old code was correct. There are apparently two spare bits that we can use. I just messed up the old layout, thinking there is only one.

So we can either use the new layout I documented (with the fix you propose), or use another layout.

I think I prefer the layout which you documented.

In any case, we *gain* one more bit for the offset compared to the old layout.

I'm more than happy to keep the new layout. Regarding your fix, maybe avoid the other ~0x7 as well by using similar shifting in __swp_entry()

  #define __swp_entry(type, offset)         ((swp_entry_t) { \
                          ((type) & 0x1f) | \
                         ((offset &  0x7) << 5) | \
                         ((offset >> 3) << 10) })

So it's easier to match to the logic/values in __swp_offset().

Yes, it's much better.
I fixed it up like this in my current git tree.

In any case,
Reviewed-by: David Hildenbrand <david@xxxxxxxxxx>

Great. Thanks for your help & suggestions!

Helge