Hello, Mikulas. On Tue, Feb 18, 2014 at 08:57:11PM -0500, Mikulas Patocka wrote: > Hi Tejun > > Two years ago, I reported a bug in workqueues - a work item that is > supposed to be bound to a specific CPU can be migrated to a different CPU > when the origianl CPU is disabled by writing zero to > /sys/devices/system/cpu/cpu*/online > > This causes crashes in dm-crypt, because it assumes that a work item stays > on the same CPU. For better or worse, per-cpu workqueues have never guaranteed that cpus won't go down while a work item is executing. If a workqueue user needs such guarantee, it's required to use one of the CPU down hooks to cancel and flush such work items. This is partly because workqueue itself doesn't distinguish work items which need to be bound for correctness and just use affinity as optimization. The distinction is made by the user. It has certain benefits as it makes clear in the code local to the specific user that it's incurring latency in CPU down operations which happen to be fairly hot in certain configurations. Besides, it's not really clear what behavior workqueue can enforce - should it try to drain as in wq shutdown sequence, or should it trigger WARN if work items are requeueing, or should it just leave them hanging until CPU comes back again? If we do the last, what about the ones which are using percpu workqeueus as optimization? So, if dm-crypt is depending on affinity and not taking care of it via cpu hotplug hooks, it's something which should be fixed from dm-crypt side. Thanks. -- tejun -- dm-devel mailing list dm-devel@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/dm-devel