I have just finished Appendix B of perfbook(2017.01.02a), but the
function smp_read_barrier_depends() and how it make the code below
correct really confused me.
In B.7, paragraph 2, it say:
(1) "Yes, this does mean that Alpha can in effect fetch the data
pointed to before it fetches the pointer itself,..."
and after presenting the code example:
1 struct el *insert(long key, long data)
2 {
3 struct el *p;
4 p = kmalloc(sizeof(*p), GFP_ATOMIC);
5 spin_lock(&mutex);
6 p->next = head.next;
7 p->key = key;
8 p->data = data;
9 smp_wmb();
10 head.next = p;
11 spin_unlock(&mutex);
12 }
13
14 struct el *search(long key)
15 {
16 struct el *p;
17 p = head.next;
18 while (p != &head) {
19 /* BUG ON ALPHA!!! */
20 if (p->key == key) {
21 return (p);
22 }
23 p = p->next;
24 };
25 return (NULL);
26 }
it says:
(2) "On Alpha, the smp_wmb() will guarantee that the cache invalidates
performed by lines 6-8 will reach the interconnect before that of line
10 does, but make absolutely no guarantee about the order in which the
new values will reach the reading CPU's core"
My question is, how exactly does this code break on Alpha and how the
smp_read_barrier_depends() help make it correct, as follow:
18 while (p != &head) {
19 smp_read_barrier_depends();
20 if (p->key == key) {
21 return (p);
According to (2), I guess that the code breaks because the "new values"
arrive in reading CPU in disorder, even though "cache invalidation
messages" arrive in order. That says, in Figure B.10, even though the
reading CPU core get invalidation message of
p->next
p->key
p->data
before invalidation message of `head.next', it might not get the value of
p->next
p->key
p->data
before `head.next', which resulting in code break. Is that correct ?
The whole paragraphs do not refer to any exact line of code so I really
confusing.
And, if that is correct, can I infer that all other CPUs except Alpha
would guarantee that "new values" and "cache invalidation messages"
would arrive in reading CPU in order, with proper memory barriers like
that at line 9 ?
Frankly, I consider the some narratives in Appendix B pretty
confusing(no offense):
1. At paragraph 4 in page 350 of the two-column perfbook.2017.01.02a, it
says:
"Figure B.10 shows how ... Assume that the list header `head' will
be processed by cache bank 0, and that the new element will be processed
by cache bank 1 ... For example, it is possible that reading CPU's cache
bank 1 is very busy, but cache bank 0 is idle..."
As there are bank 0 and bank 1 in both the writing CPU and the
reading CPU, it is hard to infer which cache bank 0 is processing the
header `head' and which cache bank 1 is processing the new element, and
as a result I don't know how that disorder happen.
2. In figure B.10, both CPU have a "(r)mb Sequencing" and "(r)mb
Sequencing", but not all of this are necessary. So, what do those
sequencing mean ?
I have read the mail at
http://h41379.www4.hpe.com/wizard/wiz_2637.html
but cannot find anything directly related to Alpha's weird feature. Can
anyone provide any hint?(which paragraph...)
regards,
Yubin Ruan
--
To unsubscribe from this list: send the line "unsubscribe perfbook" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html