Hi all, On Mon, Jan 31, 2011 at 12:50:39PM +0100, Petr Uzel wrote: > On Fri, Jan 28, 2011 at 08:53:55PM -0500, Ted Ts'o wrote: > > On Fri, Jan 28, 2011 at 03:12:23PM +0100, Petr Uzel wrote: > > > > > > Sometimes, when there are multiple processes generating time-based > > > UUIDs in a loop (each of them using multiple threads), it can happen > > > that the socket backlog in uuidd daemon is full and so the connection > > > request is refused. uuid_generate_time() then fallbacks to unsafe way of > > > generating UUID, which may cause duplicate time-based UUIDs to be > > > generated. > > > > It's probably also a good idea to increase the socket backlog queue > > length in uuidd, which is currently set to 5 for no good reason. We > > should probably just change that to be SOMAXCONN. > > > > I don't have a problem with engineering solutions which wait until you > > can talk to the uuidd, and returning an error to the processes if it > > fails, but it's better not to fail in the first place, and so > > increasing the listen backlog parameter would be a good thing to do. > > I agree; we should try to make libuuid not to fail in the first place. After another week of staring at the code, I think I have finally found the root cause of the problem. First, I'd like to describe the tests I did. As I wrote in my previous mail, I couldn't reproduce the problem anywhere else except the VirtualBox VM (with SLES) at first. I tried also another virtual machine (with different system) and it was even more weird - sometimes I could reproduce, but after a reboot, I couldn't. Comparing the 'dmesg's, I figured out that it depends on the clocksource being used by the kernel. The problem is reproducible only using 'jiffies' as the clocksource, but I haven't seen it on a machine/VM using 'tsc' or any other clocksource. The problem with jiffies is that the granularity of timestamps is quite high (above 10ms on my testing VM). As far as I understand, the kernel uses jiffies only as the last resort, if other possible clocksources can not be used (TSC unstable), which sometimes happened, sometimes not. After booting my workstation with 'clocksource=jiffies' boot parameter, the problem started to appear there as well. As far as I understand the libuuid/uuidd code, the high clocksource granularity should not cause troubles, as long as the application/uuidd have access to the clock.txt (I ran all the tests as root). But apparently, the clocksource=jiffies made the duplicate UUIDs more likely. Next, to make the tests as simple as possible, I also disabled the uuidd daemon (removed the binary). If I understand the libuuid code correctly, all the processes/threads generating UUIDs synchronize using the clock.txt (if they have access), so the duplicate UUIDs shouldn't appear even theoretically (is this true?) However, I still got duplicates. Now to the point finally: I think the locking of clock.txt actually does not work as expected. Let's have one process, running as root, with multiple threads, each of them calling uuid_generate_time() and uuidd daemon isn't installed. The threads should serialize accesses to clock.txt using fcntl(2) locks. However, due to how fcntl works, it is not possible to use it for synchronization of threads that belong to the same process - it behaves as if no lock was there (it does not matter that the threads access/lock the file using different - thread specific - file descriptor). So the accesses to clock.txt are not serialized -> race condition -> duplicate UUIDs. OTOH, flock(2) can be used for this purpose. I have replaced the fcntl with flock and can not reproduce it anymore. I looked into the list of limitations of flock(2) - I don't see any disadvantages of using flock() instead of fcntl() in this context, so I propose switching to flock()-based locking of clock.txt. (I'll send the patch shortly). What do you think? I would appreciate any feedback on this matter. Best regards, Petr -- Petr Uzel IRC: ptr_uzl @ freenode
Attachment:
pgpHKQUEHqsVN.pgp
Description: PGP signature