On Tue, Aug 26, 2014 at 12:28 AM, David Horner <ds2horner@xxxxxxxxx> wrote: > On Mon, Aug 25, 2014 at 2:12 PM, Dan Streetman <ddstreet@xxxxxxxx> wrote: >> On Mon, Aug 25, 2014 at 4:22 AM, David Horner <ds2horner@xxxxxxxxx> wrote: >>> On Mon, Aug 25, 2014 at 12:37 AM, Minchan Kim <minchan@xxxxxxxxxx> wrote: >>>> On Sun, Aug 24, 2014 at 11:40:50PM -0400, David Horner wrote: >>>>> On Sun, Aug 24, 2014 at 7:56 PM, Minchan Kim <minchan@xxxxxxxxxx> wrote: >>>>> > Hello David, >>>>> > >>>>> > On Fri, Aug 22, 2014 at 06:55:38AM -0400, David Horner wrote: >>>>> >> On Thu, Aug 21, 2014 at 8:42 PM, Minchan Kim <minchan@xxxxxxxxxx> wrote: >>>>> >> > Since zram has no control feature to limit memory usage, >>>>> >> > it makes hard to manage system memrory. >>>>> >> > >>>>> >> > This patch adds new knob "mem_limit" via sysfs to set up the >>>>> >> > a limit so that zram could fail allocation once it reaches >>>>> >> > the limit. >>>>> >> > >>>>> >> > In addition, user could change the limit in runtime so that >>>>> >> > he could manage the memory more dynamically. >>>>> >> > >>>>> >> - Default is no limit so it doesn't break old behavior. >>>>> >> + Initial state is no limit so it doesn't break old behavior. >>>>> >> >>>>> >> I understand your previous post now. >>>>> >> >>>>> >> I was saying that setting to either a null value or garbage >>>>> >> (which is interpreted as zero by memparse(buf, NULL);) >>>>> >> removes the limit. >>>>> >> >>>>> >> I think this is "surprise" behaviour and rather the null case should >>>>> >> return -EINVAL >>>>> >> The test below should be "good enough" though not catching all garbage. >>>>> > >>>>> > Thanks for suggesting but as I said, it should be fixed in memparse itself, >>>>> > not caller if it is really problem so I don't want to touch it in this >>>>> > patchset. It's not critical for adding the feature. >>>>> > >>>>> >>>>> I've looked into the memparse function more since we talked. >>>>> I do believe a wrapper function around it for the typical use by sysfs would >>>>> be very valuable. >>>> >>>> Agree. >>>> >>>>> However, there is nothing wrong with memparse itself that needs to be fixed. >>>>> >>>>> It does what it is documented to do very well (In My Uninformed Opinion). >>>>> It provides everything that a caller needs to manage the token that it >>>>> processes. >>>>> It thus handles strings like "7,,5,8,,9" with the implied zeros. >>>> >>>> Maybe strict_memparse would be better to protect such things so you >>>> could find several places to clean it up. >>>> >>>>> >>>>> The fact that other callers don't check the return pointer value to >>>>> see if only a null >>>>> string was processed, is not its fault. >>>>> Nor that it may not be ideally suited to sysfs attributes; that other store >>>>> functions use it in a given manner does not means that is correct - >>>>> nor that it is >>>>> incorrect for that "knob". Some attributes could be just as valid with >>>>> null zeros. >>>>> >>>>> And you are correct, to disambiguate the zero is not required for the >>>>> limit feature. >>>>> Your original patch which disallowed zero was full feature for mem_limit. >>>>> It is the requested non-crucial feature to allow zero to reestablish >>>>> the initial state >>>>> that benefits from distinguishing an explicit zero from a "default zero' >>>>> when garbage is written. >>>>> >>>>> The final argument is that if we release this feature as is the undocumented >>>>> functionality could be relied upon, and when later fixed: user space breaks. >>>> >>>> I don't get it. Why does it break userspace? >>>> The sysfs-block-zram says "0" means disable the limit. >>>> If someone writes *garabge* but work as if disabling the limit, >>>> it's not a right thing and he already broke although it worked >>>> so it would be not a problem if we fix later. >>>> (ie, we don't need to take care of broken userspace) >>>> Am I missing your point? >>>> >>> >>> Perhaps you are missing my point, perhaps ignoring or dismissing. >>> >>> Basically, if a facility works in a useful way, even if it was designed for >>> different usage, that becomes the "accepted" interface/usage. >>> The developer may not have intended that usage or may even considered >>> it wrong and a broken usage, but it is what it is and people become >>> reliant on that behaviour. >>> >>> Case in point is memparse itself. >>> >>> The developer intentionally sets the return pointer because that is the >>> only value that can be validated for correct performance. >>> The return value allows -ve so the standard error message passing is not valid. >>> Unfortunately, C allows the user to pass a NULL value in the parameter. >>> The developer could consider that absurd and fundamentally broken. >>> But to the user it is a valid situation, because (perhaps) it can't be >>> bothered to handle error cases. >>> >>> So, who is to blame. >>> You say memparse, that it is fundamentally broken, >>> because it didn't check to see that it was used correctly. >>> And I say mem_limit_store is fundamentally broken, >>> because it didn't check to see that it was used correctly. >> >> I think we should look at what the rest of the kernel does as far as >> checking memparse results. It appears to be a mix of some code >> checking memparse while others don't. The most common way to check >> appears to be to verify that memparse actually parsed at least 1 >> character, e.g.: >> oldp = p; >> mem_size = memparse(p, &p); >> if (p == oldp) >> return -EINVAL; >> >> although other places where 0 isn't valid can simply check for that: >> mem_size = memparse(p, &p); >> /* don't remove all of memory when handling "mem={invalid}" param */ >> if (mem_size == 0) >> return -EINVAL; >> >> or even the other memparse use in zram_drv.c: >> disksize = memparse(buf, NULL); >> if (!disksize) >> return -EINVAL; >> >> >> And there seem to be other places where (maybe?) there's no checking >> at all. However, it also seems like many cases of memparse usage are >> looking for a non-zero value, and therefore they can either >> immediately check for zero/invalid or (possibly) later code has checks >> to avoid using any zero value. In this case though, 0 is a valid >> value. So, while I agree that if a user passes an invalid (i.e. >> non-numeric) value it's clearly user error, it might be closer to the >> apparent (although unwritten AFAICT) memparse usage api to check the >> result for validity; in our case a simple check if at least 1 char was >> parsed is all that's needed, e.g.: >> >> { >> u64 limit; >> char *tmp = buf; >> struct zram *zram = dev_to_zram(dev); >> >> limit = memparse(buf, &tmp); >> if (buf == tmp) /* no chars parsed, invalid input */ >> return -EINVAL; >> down_write(&zram->init_lock); >> ... >> >> >> Separate from this patch, it would also help if the lib/cmdline.c >> memparse doc was at least updated to clarify when the result should be >> checked for validity > > FWIW: > I was pondering why I thought this was the wrong place. > On reflection the best explanation is that it is not validity - > the program does what it does quite well. > (although it does have flaws for use by sysfs > 1) it uses simple_strtoull which according to kernel.h#L269 is obsolete > 2) it checks for a suffix in the null zero case > (that means G,K,M are all valid memory size constants, > and I think that should not be in the definition of > valid mem parms) > 3) it does nothing to enforce termination of the input. > Both simple_strtoull and its successor kstrtoull are not > buffer overrun safe. > And so neither is memparse. > It may be the sysfs buffer management does some magic here > - but I have not seen it documented nor in code.) > > Rather than _validity_ it is _applicability_ that needs explaining. > And that is not documented in the function that does its thing. > But rather in the code that uses it, and more specifically in the framework > established for its specific use - as in sysfs for numeric memory parameters. Well, sysfs isn't the only user of memparse, over half of its usage is from arch/, presumably for kernel boot param parsing. So the doc on its usage shouldn't only be for sysfs. > >> and how best to do that (e.g. if 0 is an invalid value, just check if >> the result is 0; if 0 is a possible valid value, check if any chars >> were parsed). >> >> >>> >>> The difference is that memparse cannot stop being abused >>> (C allows the NULL argument and extensive tricks are required to address that) >>> however, we can readily fix mem_limit_store and ensure >>> 1) no regression when the interface IS fixed and >>> 2) predictable behaviour when accidental or "fuzzy" input arrives. >>> >>> >>>>> They say getting API right is a difficult exercise. I suggest, if we >>>>> don't insisting on >>>>> an explicit zero we have the API wrong. >>>>> >>>>> I don't think you disagreed, just that the burden to get it correct >>>>> lay elsewhere. >>>>> >>>>> If that is the case it doesn't really matter, we cannot release this >>>>> interface until >>>>> it is corrected wherever it must be. >>>>> >>>>> And my zero check was a poor hack. >>>>> >>>>> I should have explicitly checked the returned pointer value. >>>>> >>>>> I will send that proposed revision, and hopefully you will consider it >>>>> for inclusion. >>>>> >>>>> >>>>> >>>>> >>>>> >> >>>>> >> > >>>>> >> > Signed-off-by: Minchan Kim <minchan@xxxxxxxxxx> >>>>> >> > --- >>>>> >> > Documentation/ABI/testing/sysfs-block-zram | 10 ++++++++ >>>>> >> > Documentation/blockdev/zram.txt | 24 ++++++++++++++--- >>>>> >> > drivers/block/zram/zram_drv.c | 41 ++++++++++++++++++++++++++++++ >>>>> >> > drivers/block/zram/zram_drv.h | 5 ++++ >>>>> >> > 4 files changed, 76 insertions(+), 4 deletions(-) >>>>> >> > >>>>> >> > diff --git a/Documentation/ABI/testing/sysfs-block-zram b/Documentation/ABI/testing/sysfs-block-zram >>>>> >> > index 70ec992514d0..b8c779d64968 100644 >>>>> >> > --- a/Documentation/ABI/testing/sysfs-block-zram >>>>> >> > +++ b/Documentation/ABI/testing/sysfs-block-zram >>>>> >> > @@ -119,3 +119,13 @@ Description: >>>>> >> > efficiency can be calculated using compr_data_size and this >>>>> >> > statistic. >>>>> >> > Unit: bytes >>>>> >> > + >>>>> >> > +What: /sys/block/zram<id>/mem_limit >>>>> >> > +Date: August 2014 >>>>> >> > +Contact: Minchan Kim <minchan@xxxxxxxxxx> >>>>> >> > +Description: >>>>> >> > + The mem_limit file is read/write and specifies the amount >>>>> >> > + of memory to be able to consume memory to store store >>>>> >> > + compressed data. The limit could be changed in run time >>>>> >> > - and "0" is default which means disable the limit. >>>>> >> > + and "0" means disable the limit. No limit is the initial state. >>>>> >> >>>>> >> there should be no default in the API. >>>>> > >>>>> > Thanks. >>>>> > >>>>> >> >>>>> >> > + Unit: bytes >>>>> >> > diff --git a/Documentation/blockdev/zram.txt b/Documentation/blockdev/zram.txt >>>>> >> > index 0595c3f56ccf..82c6a41116db 100644 >>>>> >> > --- a/Documentation/blockdev/zram.txt >>>>> >> > +++ b/Documentation/blockdev/zram.txt >>>>> >> > @@ -74,14 +74,30 @@ There is little point creating a zram of greater than twice the size of memory >>>>> >> > since we expect a 2:1 compression ratio. Note that zram uses about 0.1% of the >>>>> >> > size of the disk when not in use so a huge zram is wasteful. >>>>> >> > >>>>> >> > -5) Activate: >>>>> >> > +5) Set memory limit: Optional >>>>> >> > + Set memory limit by writing the value to sysfs node 'mem_limit'. >>>>> >> > + The value can be either in bytes or you can use mem suffixes. >>>>> >> > + In addition, you could change the value in runtime. >>>>> >> > + Examples: >>>>> >> > + # limit /dev/zram0 with 50MB memory >>>>> >> > + echo $((50*1024*1024)) > /sys/block/zram0/mem_limit >>>>> >> > + >>>>> >> > + # Using mem suffixes >>>>> >> > + echo 256K > /sys/block/zram0/mem_limit >>>>> >> > + echo 512M > /sys/block/zram0/mem_limit >>>>> >> > + echo 1G > /sys/block/zram0/mem_limit >>>>> >> > + >>>>> >> > + # To disable memory limit >>>>> >> > + echo 0 > /sys/block/zram0/mem_limit >>>>> >> > + >>>>> >> > +6) Activate: >>>>> >> > mkswap /dev/zram0 >>>>> >> > swapon /dev/zram0 >>>>> >> > >>>>> >> > mkfs.ext4 /dev/zram1 >>>>> >> > mount /dev/zram1 /tmp >>>>> >> > >>>>> >> > -6) Stats: >>>>> >> > +7) Stats: >>>>> >> > Per-device statistics are exported as various nodes under >>>>> >> > /sys/block/zram<id>/ >>>>> >> > disksize >>>>> >> > @@ -96,11 +112,11 @@ size of the disk when not in use so a huge zram is wasteful. >>>>> >> > compr_data_size >>>>> >> > mem_used_total >>>>> >> > >>>>> >> > -7) Deactivate: >>>>> >> > +8) Deactivate: >>>>> >> > swapoff /dev/zram0 >>>>> >> > umount /dev/zram1 >>>>> >> > >>>>> >> > -8) Reset: >>>>> >> > +9) Reset: >>>>> >> > Write any positive value to 'reset' sysfs node >>>>> >> > echo 1 > /sys/block/zram0/reset >>>>> >> > echo 1 > /sys/block/zram1/reset >>>>> >> > diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c >>>>> >> > index f0b8b30a7128..370c355eb127 100644 >>>>> >> > --- a/drivers/block/zram/zram_drv.c >>>>> >> > +++ b/drivers/block/zram/zram_drv.c >>>>> >> > @@ -122,6 +122,33 @@ static ssize_t max_comp_streams_show(struct device *dev, >>>>> >> > return scnprintf(buf, PAGE_SIZE, "%d\n", val); >>>>> >> > } >>>>> >> > >>>>> >> > +static ssize_t mem_limit_show(struct device *dev, >>>>> >> > + struct device_attribute *attr, char *buf) >>>>> >> > +{ >>>>> >> > + u64 val; >>>>> >> > + struct zram *zram = dev_to_zram(dev); >>>>> >> > + >>>>> >> > + down_read(&zram->init_lock); >>>>> >> > + val = zram->limit_pages; >>>>> >> > + up_read(&zram->init_lock); >>>>> >> > + >>>>> >> > + return scnprintf(buf, PAGE_SIZE, "%llu\n", val << PAGE_SHIFT); >>>>> >> > +} >>>>> >> > + >>>>> >> > +static ssize_t mem_limit_store(struct device *dev, >>>>> >> > + struct device_attribute *attr, const char *buf, size_t len) >>>>> >> > +{ >>>>> >> > + u64 limit; >>>>> >> > + struct zram *zram = dev_to_zram(dev); >>>>> >> > + >>>>> >> > + limit = memparse(buf, NULL); >>>>> >> >>>>> >> if (limit = 0 && buf != "0") >>>>> >> return -EINVAL >>>>> >> >>>>> >> > + down_write(&zram->init_lock); >>>>> >> > + zram->limit_pages = PAGE_ALIGN(limit) >> PAGE_SHIFT; >>>>> >> > + up_write(&zram->init_lock); >>>>> >> > + >>>>> >> > + return len; >>>>> >> > +} >>>>> >> > + >>>>> >> > static ssize_t max_comp_streams_store(struct device *dev, >>>>> >> > struct device_attribute *attr, const char *buf, size_t len) >>>>> >> > { >>>>> >> > @@ -513,6 +540,14 @@ static int zram_bvec_write(struct zram *zram, struct bio_vec *bvec, u32 index, >>>>> >> > ret = -ENOMEM; >>>>> >> > goto out; >>>>> >> > } >>>>> >> > + >>>>> >> > + if (zram->limit_pages && >>>>> >> > + zs_get_total_pages(meta->mem_pool) > zram->limit_pages) { >>>>> >> > + zs_free(meta->mem_pool, handle); >>>>> >> > + ret = -ENOMEM; >>>>> >> > + goto out; >>>>> >> > + } >>>>> >> > + >>>>> >> > cmem = zs_map_object(meta->mem_pool, handle, ZS_MM_WO); >>>>> >> > >>>>> >> > if ((clen == PAGE_SIZE) && !is_partial_io(bvec)) { >>>>> >> > @@ -617,6 +652,9 @@ static void zram_reset_device(struct zram *zram, bool reset_capacity) >>>>> >> > struct zram_meta *meta; >>>>> >> > >>>>> >> > down_write(&zram->init_lock); >>>>> >> > + >>>>> >> > + zram->limit_pages = 0; >>>>> >> > + >>>>> >> > if (!init_done(zram)) { >>>>> >> > up_write(&zram->init_lock); >>>>> >> > return; >>>>> >> > @@ -857,6 +895,8 @@ static DEVICE_ATTR(initstate, S_IRUGO, initstate_show, NULL); >>>>> >> > static DEVICE_ATTR(reset, S_IWUSR, NULL, reset_store); >>>>> >> > static DEVICE_ATTR(orig_data_size, S_IRUGO, orig_data_size_show, NULL); >>>>> >> > static DEVICE_ATTR(mem_used_total, S_IRUGO, mem_used_total_show, NULL); >>>>> >> > +static DEVICE_ATTR(mem_limit, S_IRUGO | S_IWUSR, mem_limit_show, >>>>> >> > + mem_limit_store); >>>>> >> > static DEVICE_ATTR(max_comp_streams, S_IRUGO | S_IWUSR, >>>>> >> > max_comp_streams_show, max_comp_streams_store); >>>>> >> > static DEVICE_ATTR(comp_algorithm, S_IRUGO | S_IWUSR, >>>>> >> > @@ -885,6 +925,7 @@ static struct attribute *zram_disk_attrs[] = { >>>>> >> > &dev_attr_orig_data_size.attr, >>>>> >> > &dev_attr_compr_data_size.attr, >>>>> >> > &dev_attr_mem_used_total.attr, >>>>> >> > + &dev_attr_mem_limit.attr, >>>>> >> > &dev_attr_max_comp_streams.attr, >>>>> >> > &dev_attr_comp_algorithm.attr, >>>>> >> > NULL, >>>>> >> > diff --git a/drivers/block/zram/zram_drv.h b/drivers/block/zram/zram_drv.h >>>>> >> > index e0f725c87cc6..b7aa9c21553f 100644 >>>>> >> > --- a/drivers/block/zram/zram_drv.h >>>>> >> > +++ b/drivers/block/zram/zram_drv.h >>>>> >> > @@ -112,6 +112,11 @@ struct zram { >>>>> >> > u64 disksize; /* bytes */ >>>>> >> > int max_comp_streams; >>>>> >> > struct zram_stats stats; >>>>> >> > + /* >>>>> >> > + * the number of pages zram can consume for storing compressed data >>>>> >> > + */ >>>>> >> > + unsigned long limit_pages; >>>>> >> > + >>>>> >> > char compressor[10]; >>>>> >> > }; >>>>> >> > #endif >>>>> >> > -- >>>>> >> > 2.0.0 >>>>> >> > >>>>> >> >>>>> >> -- >>>>> >> To unsubscribe, send a message with 'unsubscribe linux-mm' in >>>>> >> the body to majordomo@xxxxxxxxx. For more info on Linux MM, >>>>> >> see: http://www.linux-mm.org/ . >>>>> >> Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a> >>>>> > >>>>> > -- >>>>> > Kind regards, >>>>> > Minchan Kim >>>>> >>>>> -- >>>>> To unsubscribe, send a message with 'unsubscribe linux-mm' in >>>>> the body to majordomo@xxxxxxxxx. For more info on Linux MM, >>>>> see: http://www.linux-mm.org/ . >>>>> Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a> >>>> >>>> -- >>>> Kind regards, >>>> Minchan Kim -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>