On Thu, 10 Aug 2023 at 08:55, Kent Overstreet <kent.overstreet@xxxxxxxxx> wrote: > > Heh, I liked the bitfields - I prefer that to open coding structs, which > is a major pet peeve of mine. But the text size went down a lot a lot > without them (would like to know why the compiler couldn't constant fold > all that stuff out, but... not enough to bother). Bitfields are actually absolutely *horrioble* for many many reasons. The bit ordering being undefined is just one of them. Yes, they are convenient syntax, but the downsides of them means that you should basically only use them for code that has absolutely zero subtle issues. Avoid them like the plague with any kind of "data transfer issues", so in the kernel avoid using them for user interfaces unless you are willing to deal with the duplication and horror of __LITTLE_ENDIAN_BITFIELD etc. Also avoid them if there is any chance of "raw format" issues: either saving binary data formats, or - as in your original code - using unions. As I pointed out your code was actively buggy, because you thought it was little-endian. That's not even true on little-endian machines (ie POWERPC is big-endian in bitfields, even when being little-endian in bytes!). Finally, as you found out, it also easily generates horrid code. It's just _harder_ for compilers to do the right thing, particularly when it's not obvious that other parts of the structure may be "don't care" because they got initialized earlier (or will be initialized later). Together with threading requirements, compilers might do a bad job either because of the complexity, or simply because of subtle consistency rules. End result: by all means use bitfields for the *simple* cases where they are used purely for internal C code with no form of external exposure, but be aware that even then the syntax convenience easily comes at a cost. > > On x86, you'd never see that as an issue, since all writes are > > releases, so the 'barrier()' compiler ordering ends up forcing the > > right magic. > > Yep, agreed. But you should realize that on other architectures, I think that "barrier() + plain write" is actively buggy. On x86 it's safe, but on arm (and in fact pretty much anything but s390), the barrier() does nothing in SMP. Yes, it constrains the compiler, but the earlier writes to remove the entry from the list may happen *later* as far as other CPUs are concerned. Which can be a huge problem if the "struct six_lock_waiter" is on the stack - which I assume it is - and the waiter is just spinning on w->lock_acquired. The data structure may be re-used as regular stack space by the time the list removal code happens. Debugging things like that is a nightmare. And you'll never see it on x86, and it doesn't look possible when looking at the code, and the oopses on other architectures will be completely random stack corruption some time after it got the lock. So this is kind of why I worry about locking. It's really easy to write code that works 99.9% of the time, but then breaks when you are unlucky. And depending on the pattern, the "when you are unlucky" may or may not be possible on x86. It's not like x86 has total memory ordering either, it's just stronger than most. > > Some of the other oddity is around the this_cpu ops, but I suspect > > that is at least partly then because we don't have acquire/release > > versions of the local cpu ops that the code looks like it would want. > > You mean using full barriers where acquire/release would be sufficient? Yes. That code looks like it should work, but be hugely less efficient than it might be. "smp_mb()" tends to be expensive everywhere, even x86. Of course, I might be missing some other cases. That percpu reader queue worries me a bit just because it ends up generating ordering based on two different things - the lock word _and_ the percpu word. And I get very nervous if the final "this gets the lock" isn't some obvious "try_cmpxchg_acquire()" or similar, just because we've historically had *so* many very subtle bugs in just about every single lock we've ever had. > Matthew was planning on sending the iov_iter patch to you - right around > now, I believe, as a bugfix, since right now > copy_page_from_iter_atomic() silently does crazy things if you pass it a > compound page. > > Block layer patches aside, are there any _others_ you really want to go > via maintainers? It was mainly just the iov and the block layer. The superblock cases I really don't understand why you insist on just being different from everybody else. Your exclusivity arguments make no sense to me. Just open the damn thing. No other filesystem has ever had the fundamental problems you describe. You can do any exclusivity test you want in the "test()/set()" functions passed to sget(). You say that it's a problem because of a "single spinlock", but it hasn't been a problem for anybody else. I don't understand why you are so special. The whole problem seems made-up. > More broadly, it would make me really happy if we could get certain > people to take a more constructive, "what do we really care about here > and how do we move forward" attitude instead of turning every > interaction into an opportunity to dig their heels in on process and > throw up barriers. Honestly, I think one huge problem here is that you've been working on this for a long time (what - a decade by now?) and you've made all these decisions that you explicitly wanted to be done independently and intentionally outside the kernel. And then you feel that "now it's ready to be included", and think that all those decisions you made outside of the mainline kernel now *have* to be done that way, and basically sent your first pull request as a fait-accompli. The six-locks showed some of that, but as long as they are bcachefs-internal, I don't much care. The sget() thing really just smells like "this is how I designed things, and that's it". Linus