On Mon, Sep 30, 2013 at 5:36 PM, Michael Neuling <mikey@xxxxxxxxxxx> wrote: > > The scary part is that we to make all register volatile. You were not > that keen on doing this as there are a lot of places in exception > entry/exit where we only save/restore a subset of the registers. We'd > need to catch all these. Ugh. It's very possible it's not worth using for the kernel then. The example I posted is normally fine *without* any transactional support, since it's a very local per-dentry lock, and since we only take that lock when the last reference drops (so it's not some common directory dentry, it's a end-point file dentry). In fact, on ARM they just made the cmpxchg much faster by making it entirely non-serializing (since it only updates a reference count, there is no locking involved apart from checking that the lock state is unlocked) So there is basically never any contention, and the transaction needs to basically be pretty much the same cost as a "cmpxchg". It's not clear if the intel TSX is good enough for that, and if you have to save a lot of registers in order to use transactions on POWER8, I doubt it's worthwhile. We have very few - if any - locks where contention or even cache bouncing is common or normal. Sure, we have a few particular loads that can trigger it, but even that is becoming rare. So from a performance standpoint, the target always needs to be "comparable to hot spinlock in local cache". >> They also have interesting ordering semantics vs. locks, we need to be >> a tad careful (as long as we don't access a lock variable >> transactionally we should be ok. If we do, then spin_unlock needs a >> stronger barrier). > > Yep. Well, just about any kernel transaction will at least read the state of a lock. Without that, it's generally totally useless. My dput() example sequence very much verified that the lock was not held, for example. I'm not sure how that affects anything. The actual transaction had better not be visible inside the locked region (ie as far as any lock users go, transactions better all happen fully before or after the lock, if they read the lock and see it being unlocked). That said, I cannot see how POWER8 could possibly violate that rule. The whole "transactions are atomic" is kind of the whole and only point of a transaction. So I'm not sure what odd lock restrictions POWER8 could have. > FWIW eg. > > tbegin > beq abort /* passes first time through */ > .... > transactional stuff > .... > tend > b pass > > abort: > > pass: That's fine, and matches the x86 semantics fairly closely, except "xbegin" kind of "contains" that "jump to abort address". But we could definitely use the same models. Call it "transaction_begin/abort/end()", and it should be architecture-neutral naming-wise. Of course, if tbegin then acts basically like some crazy assembly-level setjmp (I'm guessing it does exactly, and presumably precisely that kind of compiler support - ie a function with "__attribute((returns_twice))" in gcc-speak), the overhead of doing it may kill it. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html