Re: LVM archive management ( /etc/lvm/archives) expiry / retention misbehaves after index #100, 000.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I opened this Bugzilla issue for tracking purposes:

https://bugzilla.redhat.com/show_bug.cgi?id=1481085

On Sun, Aug 13, 2017 at 8:05 AM, Mark Mielke <mark.mielke@gmail.com> wrote:
I searched around for this a bit, and although other users may have hit this, I didn't find a good explanation offered. I suspect the users clean it up manually and then it disappears for another 2 years. I hope this message will get captured by Google, and help somebody else out. Also, I hope to have some discussion about this as it seems like an easily preventable problem.

The archive file names are generated like:

                if (dm_snprintf(archive_name, sizeof(archive_name),
                                 "%s/%s_%05u-%d.vg",
                                 dir, vg->name, ix, rnum) < 0) {

The directory scanning code that loads the archive file names into memory recognizes a problem, although it isn't explicit about what the problem is:

        /* Sort fails beyond 5-digit indexes */
        if ((count = scandir(dir, &dirent, NULL, alphasort)) < 0) {
                log_error("Couldn't scan the archive directory (%s).", dir);
                return 0;
        }

The file names encode the index like "00000". The sorting code uses "alphasort", which will only work properly as long as the index stays within 5 digits. As soon as it exceeds 5 digits, it begins to sort the "100000" to the beginning, and "99999" to the end. Then, new archives seems to *all* be "100000". We had some 40,000 indexes with "100000" before we noticed. And, because the index is followed by a random number, it would only expire a few of the "100000" before it would hit one that was younger than the 30 days retention period set by default. When I reduced the retention period to 7 days, it expired only about 12 archive files of 40,000 archive files. This behaviour is probably due to random number distribution ensuring that there are always some recent records near 0?

This issue eventually affects everyone, although obviously the people that use features like snapshots more frequently (we use it every 15 minutes, across multiple volumes) will hit it sooner, 

There are a few fixes possible... Probably, "alphasort" should not be used at all, but a context aware sort should be used, that can filter and sort as it goes, decoding the index correctly as a number, and comparing it as a number. Then, if performance is desirable, and scalability, it would be ideal if it did it in a single pass, and buffering only the minimum needed to expire the correct archive files.

We hit this on RHEL 7.2. I wasn't surprised to find it in RHEL 7.2, but I was surprised that it still exists on "master". "git blame" says this has been an issue since 2002:

5be981bab5 (Alasdair Kergon  2002-05-07 12:47:11 +0000 139)     /* Sort fails beyond 5-digit indexes */
59d6420b9a (Joe Thornber     2002-02-08 11:58:18 +0000 140)     if ((count = scandir(dir, &dirent, NULL, alphasort)) < 0) {
b8f47d5f69 (Alasdair Kergon  2009-07-15 20:02:46 +0000 141)             log_error("Couldn't scan the archive directory (%s).", dir);
952d12a5f5 (Alasdair Kergon  2002-01-09 19:16:48 +0000 142)             return 0;
952d12a5f5 (Alasdair Kergon  2002-01-09 19:16:48 +0000 143)     }

Ouch... :-)

For anybody that does hit this.... Prune the archive files with index < 100000 is effective. It starts counting from 100000, and you now have 9X more life before it will happen again... :-)

--
Mark Mielke <mark.mielke@gmail.com>




--
Mark Mielke <mark.mielke@gmail.com>

_______________________________________________
linux-lvm mailing list
linux-lvm@redhat.com
https://www.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/

[Index of Archives]     [Gluster Users]     [Kernel Development]     [Linux Clusters]     [Device Mapper]     [Security]     [Bugtraq]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]

  Powered by Linux