Both schedulers have a race condition that happens in the following situation: We have an entry in our table that already has expired according to it's last use time. Then we need to schedule a new connection that uses this entry. CPU 1 CPU 2 ip_vs_lblc_schedule() ip_vs_lblc_get() lock table for read find entry unlock table ip_vs_lblc_check_expire() lock table for write kfree() expired entry unlock table return invalid entry Problem is that we assign the last use time outside of our critical region. We can make hitting this race more difficult, if not impossible, if we assign the last use time while still holding the lock for reading. That gives us six minutes during which it's save to use the entry, which should be enough for our use case, as we're going to use it immediately and don't keep a long reference to it. We're holding the lock for reading and not for writing. The last use time is an unsigned long, so the assignment should be atomic by itself. And we don't care, if some other user sets it to a slightly different value. The read_unlock() implies a barrier so that other CPUs see the new last use time during cleanup, even if we're just using a read lock. Other solutions would be: 1) protect the whole ip_vs_lblc_schedule() with write_lock()ing the lock, 2) add reference counting for the entries, 3) protect each entry with it's own lock. And all are bad for performance. Comments? Ideas? --- net/ipv4/ipvs/ip_vs_lblc.c | 4 +++- net/ipv4/ipvs/ip_vs_lblcr.c | 4 +++- 2 files changed, 6 insertions(+), 2 deletions(-) diff --git a/net/ipv4/ipvs/ip_vs_lblc.c b/net/ipv4/ipvs/ip_vs_lblc.c index 0efa3db..65f8414 100644 --- a/net/ipv4/ipvs/ip_vs_lblc.c +++ b/net/ipv4/ipvs/ip_vs_lblc.c @@ -144,6 +144,8 @@ ip_vs_lblc_new(__be32 daddr, struct ip_vs_dest *dest) atomic_inc(&dest->refcnt); en->dest = dest; + en->lastuse = jiffies; + return en; } @@ -214,6 +216,7 @@ ip_vs_lblc_get(struct ip_vs_lblc_table *tbl, __be32 addr) list_for_each_entry(en, &tbl->bucket[hash], list) { if (en->addr == addr) { /* HIT */ + en->lastuse = jiffies; read_unlock(&tbl->lock); return en; } @@ -519,7 +522,6 @@ ip_vs_lblc_schedule(struct ip_vs_service *svc, const struct sk_buff *skb) en->dest = dest; } } - en->lastuse = jiffies; IP_VS_DBG(6, "LBLC: destination IP address %u.%u.%u.%u " "--> server %u.%u.%u.%u:%d\n", diff --git a/net/ipv4/ipvs/ip_vs_lblcr.c b/net/ipv4/ipvs/ip_vs_lblcr.c index 8e3bbeb..3e8cec7 100644 --- a/net/ipv4/ipvs/ip_vs_lblcr.c +++ b/net/ipv4/ipvs/ip_vs_lblcr.c @@ -328,6 +328,8 @@ static inline struct ip_vs_lblcr_entry *ip_vs_lblcr_new(__be32 daddr) INIT_LIST_HEAD(&en->list); en->addr = daddr; + en->lastuse = jiffies; + /* initilize its dest set */ atomic_set(&(en->set.size), 0); en->set.list = NULL; @@ -399,6 +401,7 @@ ip_vs_lblcr_get(struct ip_vs_lblcr_table *tbl, __be32 addr) list_for_each_entry(en, &tbl->bucket[hash], list) { if (en->addr == addr) { /* HIT */ + en->lastuse = jiffies; read_unlock(&tbl->lock); return en; } @@ -708,7 +711,6 @@ ip_vs_lblcr_schedule(struct ip_vs_service *svc, const struct sk_buff *skb) ip_vs_dest_set_erase(&en->set, m); } } - en->lastuse = jiffies; IP_VS_DBG(6, "LBLCR: destination IP address %u.%u.%u.%u " "--> server %u.%u.%u.%u:%d\n", -- To unsubscribe from this list: send the line "unsubscribe lvs-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html