On Tue, 17 Mar 2015, Sergey Popovich wrote: > I did not say that you should do anything, I just give an option (like > branch) to let you quickly see changes I made if you wish. Maybe you > find them totally useless:) I see - a branch costs nothing, so that's no problem. Then create a numbered list of the patches, put them in a tar file and let me know from where I can download it. > On the other hand I'm worry about if I start supplying my patch series > (say this takes one month) there is no major changes will be entered to > the master ipset git tree, so I get rebase again and again, as actually > happened before commit 920ddfa09 where I have made some changes (bug > fixes at that moment). Such things may happen when multiple patches flow in and some parts are just modified. But in such a case like this, changes can be communicated back and forth. > > > For example quick overview of the current RCU implementation in the > > > ip_set_core.c gives no hint on how all further readers are prevented > > > from entering set before it is being destroyed (i.e. we can destroy > > > core set data structures with ip_set_destroy_set() while they being > > > accessed by the ip_set_test(). > > > > That's excluded by the set references: set can be destroyed only when it's > > reference counter is zero and ip_set_test() can be called for referenced > > sets only. > > I did not agree with this since ip_set.c:ip_set_destroy_set() after > checking of reference counters is called, but ip_set(inst, index) = NULL > (which itself looks unclear since pointer assignments alone not > guaranteed to be atomic, thus rcu_assign_pointer() and friends exists) > used without any RCU grace period waiters like synchronize_rcu(). No. Please check xt_set.c and all the checkentry functions: sets are referenced there and until such reference is not created, no ip_set_test|add|del() function can use the set. The reference is released in the match/target destroy functions - and the set can be destroyed only after that. When the pointer assigment happens above, there cannot be any user of the set, there's no need for any protection. > Yes I see rcu_barrier() is used in each set module, however it is more > for waiting of callbacks to complete (or kfree_rcu). It's called from the module unload path. Actually, at that time there isn't any set belonging to the given set type (because otherwise the module could not be removed). > Therefore ip_set_test() could easily enter set structures before > ip_set(inst, index) = NULL and destroyed in the ip_set_destroy_set() at > the same time. That simply cannot happen due to the reference counter protection. > There is other things to consider: ip_set_add()/ip_set_del() and of course > timeout enabled set gc routine (I known del_timer_sync() is the first > in ->destroy() method). > > set_bit(j, n->used); > if (old != ERR_PTR(-ENOENT)) { > rcu_assign_pointer(hbucket(t, key), n); > > This looks as data publishing prior to using it where bit is set before > pointer to the new data assigned to the bucket. When new hashbucket is created, the set_bit() call is actually an initialization. > Calling ip_set_ext_destroy() concurrently with readers seems also not > friendly. Currently this is only for commens, but if any other extension, > used by the readers is added problems will arise. As I mentioned, that's the only part which needs fixing. > In general any modification to the data should be consistent for > concurrent readers all times, and should be only modified/released after > RCU grace period (or in case of RCU-bh after one softirq scheduling). > Modification data inline seems worse to me. When data can safely be modified, then that's the better approach. > > > Also I don't think there is a good idea to modify cidr book keeping > > > data without any protection from the concurrent readers. > > > > cidr book keeping was changed so that it is safe to modify it without > > any protection. The worst case is that a reader loops at the same cidr > > value twice. > > Anyway this seems worse to me. Why? No data corruption can happen and that's the fastest way. The worst case scenario is not the common case. > > The main point of the ipset git tree is to keep and maintain support > > kernel versions back to the last (i.e. 2.6.32) longterm kernel tree. > > However debugging locking issues without lockdep is much harder and > having one when it is available seems good. It can probably be added in a wrapper which gives noop for older kernels. > > > Using test_bit() in the fastpath loops to skip empty entries with > > > bitmap probably wastes more time than using ffs()/ffz(). > > > > I'm sure patches for that can be applied without any problem. > > With minimal, since ffs()/ffz() may be architecture specific, or even > generic taking unsigned long. However on 32 bit arches sizeof(unsigned > long) == 4, but net:iface uses up to 64 slots, so u64 type should be > used with __ffs64(). > > As for optimizations I have many other small ones:) Great, I'm looking forward for improvements! > > > Providing additionall variant callback ->uref() to handle references > > > seems confusing in aspects of set dump: we release set items prior to > > > dumping them. Furthermore due to netlink_dump_start() nature (callbacks) > > > there is no guarantee that we do not get pointer to the new table after > > > resize (for hashes) where all dumps become inconsistent anyway. Running > > > via callbacks implies no locking on the entire ipset structures (checked > > > with lockdep), only first call of the callback may be locked with NFNL. > > > > The uref() callback is required for the parallel resizing and dumping: > > dumping keeps the original hash table and the uref referecen counter tells > > the system which function should destroy it: resizing (because there are > > no parallel dump using the old hash table) or the dumping (because > > resizing successfully created a new hash table). > > Yes, I'm already aware of this problem. The problem is already solved in the current ipset git tree. Best regards, Jozsef - E-mail : kadlec@xxxxxxxxxxxxxxxxx, kadlecsik.jozsef@xxxxxxxxxxxxx PGP key : http://www.kfki.hu/~kadlec/pgp_public_key.txt Address : Wigner Research Centre for Physics, Hungarian Academy of Sciences H-1525 Budapest 114, POB. 49, Hungary -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html