On Mon, Oct 25, 2010 at 2:32 AM, Hugh Dickins <hughd@xxxxxxxxxx> wrote: > Karel, Michael, > > In kernel 2.6.36 (and in stable kernel 2.6.35.5) I made a tiny change > to the swapon(const char *path, int swapflags) system call interface: > kernel commit 3399446632739fcd05fd8b272b476a69c6e6d14a > swap: discard while swapping only if SWAP_FLAG_DISCARD > > I added this definition to the kernel's include/linux/swap.h: > #define SWAP_FLAG_DISCARD 0x10000 /* discard swap cluster after use */ > (which I think needs to find its way into /usr/include/sys/swap.h somehow); > and if that bit is set in swapflags (previously devoted to swap priority), > then swapping will discard clusters of swap pages in between freeing them > and re-writing to them, if the swap device supports that (discard or trim > being notification from OS to device that stale data is no longer needed). > > (No error occurs if the device does not support discard, and the only sign > that it is discarding is "D" at the end of the kernel's "Adding XXXk swap" > message. And that flag only governs discarding while swapping: whether > or not it is supplied, every swapon tries to discard the entire swap area, > less its swap header page - that will normally be an efficient operation. > Earlier kernels than 2.6.35.5 will just ignore the flag.) > > The story behind is this. In 2.6.29 I implemented this behaviour for any > device that supports discard, before having any real SSD that supported > it: in the hope that it would give a useful boost to swapping on SSD once > device support arrived. About a year ago, OCZ implemented discard in their > Vertex, then Intel in their X25: testing around 2.6.32, the OCZ showed > measurable (though disappointing) benefit, the Intel arguable benefit. > > But since then, the kernel's block-layer support for discard has gone > through a series of changes (with most recently a shift from reliance > on I/O barriers to waiting on completions), and the SSD firmwares have > been updated too. With the result that once I tested again, shortly > after 2.6.35 came out (alerted by a bugreport from Nigel Cunningham), > it proved to be a disadvantage on the Intel, and a disaster on the OCZ > (even after a patch from hch removed the most obvious overhead). > > As things stand at present, we could just remove the swap discard > support; but since several filesystems (including ext4 and btrfs and > fat) are offering a "discard" mount option, I thought swap should take > the same course, and offer a "--discard" or "-d" option to swapon(8). > > Below is the patch I tested with. I could sign it off, except for that > #ifndef SWAP_FLAG_DISCARD business at the start (I'm rather hoping that > you, Karel, will have a hotline to getting that addition to sys/swap.h). > Something else questionable about it, is that final "if (all && discard) > swapon_usage(stderr, 1)" hunk: I'm not sure what's best when you > "swapon -a -d", given that /etc/fstab can itself say "discard" or not; > I took the easiest way to code it up, and made "swapon -a -d" an error. > > I apologize to you both for taking so long to alert you (and sorry for > being so verbose! but I think the background makes more sense of it). > But as you can see from the above, there is no great urgency to adding > and documenting this support. Hugh, would you write a patch for the man-page? Or just supply me with the raw text that should be added to the man page? Thanks, Michael > --- util-linux-ng-2.18/mount/swapon.c.0 2010-06-14 02:19:19.000000000 -0700 > +++ util-linux-ng-2.18/mount/swapon.c 2010-09-05 22:07:48.000000000 -0700 > @@ -29,6 +29,9 @@ > #ifdef HAVE_SYS_SWAP_H > # include <sys/swap.h> > #endif > +#ifndef SWAP_FLAG_DISCARD > +#define SWAP_FLAG_DISCARD 0x10000 /* discard swap cluster after use */ > +#endif > > #ifndef SWAPON_HAS_TWO_ARGS > /* libc is insane, let's call the kernel */ > @@ -54,6 +57,7 @@ enum { > > int all; > int priority = -1; /* non-prioritized swap by default */ > +int discard; > > /* If true, don't complain if the device/file doesn't exist */ > int ifexists; > @@ -65,6 +69,7 @@ char *progname; > static struct option longswaponopts[] = { > /* swapon only */ > { "priority", required_argument, 0, 'p' }, > + { "discard", 0, 0, 'd' }, > { "ifexists", 0, 0, 'e' }, > { "summary", 0, 0, 's' }, > { "fixpgsz", 0, 0, 'f' }, > @@ -92,7 +97,7 @@ static void > swapon_usage(FILE *fp, int n) { > fprintf(fp, _("\nUsage:\n" > " %1$s -a [-e] [-v] [-f] enable all swaps from /etc/fstab\n" > - " %1$s [-p priority] [-v] [-f] <special> enable given swap\n" > + " %1$s [-p priority] [-d] [-v] [-f] <special> enable given swap\n" > " %1$s -s display swap usage summary\n" > " %1$s -h display help\n" > " %1$s -V display version\n\n"), progname); > @@ -493,6 +498,8 @@ do_swapon(const char *orig_special, int > << SWAP_FLAG_PRIO_SHIFT); > } > #endif > + if (discard) > + flags |= SWAP_FLAG_DISCARD; > status = swapon(special, flags); > if (status < 0) > warn(_("%s: swapon failed"), orig_special); > @@ -582,6 +589,8 @@ swapon_all(void) { > opt = strtok(NULL, ",")) { > if (strncmp(opt, "pri=", 4) == 0) > pri = atoi(opt+4); > + if (strcmp(opt, "discard") == 0) > + discard = 1; > if (strcmp(opt, "noauto") == 0) > skip = 1; > } > @@ -605,6 +614,7 @@ swapon_all(void) { > } > fclose(fp); > > + discard = 0; > return status; > } > > @@ -632,7 +642,7 @@ main_swapon(int argc, char *argv[]) { > int status = 0; > int c, i; > > - while ((c = getopt_long(argc, argv, "ahefp:svVL:U:", > + while ((c = getopt_long(argc, argv, "ahdefp:svVL:U:", > longswaponopts, NULL)) != -1) { > switch (c) { > case 'a': /* all */ > @@ -650,6 +660,9 @@ main_swapon(int argc, char *argv[]) { > case 'U': > addu(optarg); > break; > + case 'd': > + discard = 1; > + break; > case 'e': /* ifexists */ > ifexists = 1; > break; > @@ -680,6 +693,9 @@ main_swapon(int argc, char *argv[]) { > if (ifexists && (!all || strcmp(progname, "swapon"))) > swapon_usage(stderr, 1); > > + if (all && discard) > + swapon_usage(stderr, 1); > + > if (all) > status |= swapon_all(); > > -- Michael Kerrisk Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/ Author of "The Linux Programming Interface"; http://man7.org/tlpi/ -- To unsubscribe from this list: send the line "unsubscribe util-linux-ng" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html