The need for performing read disturb is determined according to new statistics collected per eraseblock: - read counter: incremented at each read operation reset at each erase - last erase time stamp: updated at each erase This patch adds the infrastructure for the above statistics Signed-off-by: Tanya Brokhman <tlinder@xxxxxxxxxxxxxx> --- Changes from V1: - Documentation file was added Documentation/mtd/ubi/ubi-read-disturb.txt | 145 +++++++++++++++++++++++++++++ drivers/mtd/ubi/build.c | 57 ++++++++++++ drivers/mtd/ubi/fastmap.c | 14 ++- drivers/mtd/ubi/ubi-media.h | 32 ++++++- drivers/mtd/ubi/ubi.h | 34 +++++++ drivers/mtd/ubi/wl.c | 6 ++ 6 files changed, 280 insertions(+), 8 deletions(-) create mode 100644 Documentation/mtd/ubi/ubi-read-disturb.txt diff --git a/Documentation/mtd/ubi/ubi-read-disturb.txt b/Documentation/mtd/ubi/ubi-read-disturb.txt new file mode 100644 index 0000000..4d3efef --- /dev/null +++ b/Documentation/mtd/ubi/ubi-read-disturb.txt @@ -0,0 +1,145 @@ + +1. Introduction +=============== +Raw NAND flash memories are one of the most common storage devices in present +day embedded systems. The most common devices in which one can find raw NAND +flash cards in are mobile phones. +One of the limitations of the NAND devices is the method used to read NAND +flash memory may cause bit-flips on the surrounding cells and result in +uncorrectable ECC errors. This is known as the read disturb or data retention +failure. +Today’s Linux NAND drivers implementation doesn’t address the read disturb and +the data retention limitations of the NAND devices. + + +2. The problem +============== +There are two characteristics of the raw NAND that are not addressed by the +NAND driver at the moment: + +2.1 Read Disturb +---------------- +The method used to read NAND flash memory can cause nearby cells in the same +memory block to change their value over time (become programmed). This +phenomenon is known as read disturb. The threshold number of reads that leads +to this issue is generally in the hundreds of thousands between intervening +erase operations. When reading continuously from one cell, that cell will not +fail but rather one of the surrounding cells may fail on a subsequent read. If +read disturb is not addressed, there is a high possibility of data loss - if +the errors are too numerous to correct. + +2.2 Data Retention +------------------ +Another NAND flash limitation is Data Retention (of rarely accessed blocks). +The ability of the NAND device to remain in its programmed state decreases over +time. + +To date these issues could be overlooked since the possibility of their +occurrence in today’s NAND devices is very low. With the evolution of NAND +devices and the requirement for a “long life” NAND flash, read disturb and data +retention can no longer be ignored otherwise there will be data loss over time. + + +3. The Solution +=============== +Handling both of the described above types of blocks (read disturb and data +retention) is done by means of scrubbing. Scrubbing in essence is: +- Copy the data from block X to new block Y +- Erase block X + +3.1 Handling Read disturb blocks +-------------------------------- +3.1.1 Identification +In order to identify potential read-disturb blocks, a read counter is +maintained per each PEB. The read counter is incremented as part of each read +operation, and is reset in every erase operation. +In each read operation the read counter is verified. This counter is also +verified at initiation phase, when attaching UBI to an MTD device. + +3.1.2 Saving on NAND +Due to the physical characteristics of the NAND flash memory, write operations +can only be performed on an erased block. Due to this, the read counter can’t +be saved as part of the meta-data that is saved on flash per each erase block, +and therefore can exist only in RAM. Once we power off the device, the read +counter will no longer be valid. In order to overcome this issue and to save +the read counter’s value through reboots of the system, it is saved as part of +the fastmap data on the flash. + +3.1.3 Error recovery +It is possible that the fastmap data won’t be valid on boot up - for example if +a sudden power cut occurred. In such case a default value will be assigned to +each PEB. The default value for the read counter will be assigned as follows: +- Free erase blocks: It’s safe to assume that the read counter for free + blocks was 0 prior to the power off since a block is marked as “free” + after it was erased. Such blocks will be assigned read counter 0. +- Allocated erase blocks: We can make no assumptions on the amount of + reads performed on allocated data blocks. To be on the safe side the + default read counter assigned to these blocks is the + read_disturb_threshold/2. + +3.1.4 Enhancements to Fastmap (work in progress) +In order to lower the possibility of fastmap being invalid on boot up we +increase the pool of events which trigger the fastmap data being saved on +flash. A global read counter is maintained per UBI device. It is incremented as +part of each read operation that is performed on any of the device PEBs. When +a pre-defined threshold is reached, a fastmap flush will be scheduled. This +counter is reset on each flush of the fastmap data. + +3.1.2 "Fixing" the Read disturbed blocks +If the read counter reaches a pre-defined threshold the block will be scheduled +for scrubbing. + + +3.2 Data Retention blocks +------------------------- +3.2.1 Identification +In order to identify rarely accessed blocks a “last erase timestamp” is +maintained per PEB. The resolution of this timestamp is in days and it is +updated during each erase operation performed on a PEB. +This timestamp is verified at initiation phase, when attaching UBI to an MTD +device. If the delta between time of verification and the last_erase_timestamp +is higher than a pre-defined threshold, the PEB will be scheduled for +scrubbing. +In order to identify data retention blocks, an outside intervention is required +in form of a user space application. This app will be periodically activated by +the user and will trigger the scanning of all of the flash PEBs and the +verification of the last erase timestamp of each PEB against a pre-defined +threshold. +When activating the user space utility, one should keep in mind that this +process will take some time. As a result the recommendation for it to be +activated during device idle time. + +3.2.2 Saving on NAND +The last erase timestamp is saved as part of the PEB meta-data on NAND, per +each PEB. It is saved as part of the fastmap meta-data as well. In case no +fastmap is available, it will be retrieved from the PEB meta saved on flash. +If it’s missing on the flash as well, a default value equaling the average of +erase timestamps of other PEBs of the device, will be assigned. + + +4. Backward compatibility of the proposed solution +================================================== +As mentioned before, read counters can only be saved as part of the fastmap +meta-data. Since the fastmap layout changes a new fastmap version is defined, +one that supports Read disturb meta data. +When loading an older image, which doesn’t support read disturb, the fastmap +(if present) will be found invalid and the attach process will trigger the +scanning the whole device. A default read counter will be assigned to the PEB, +as described in section 3.1.3. +The default last erase timestamp will be set according to the average timestamp +of all PEBs of the device. In case of an old image, where no last erase +timestamp present, a default value of last_erase_timestamp_threshold/2 will +be assigned. + + +5. Conclusions +============== +The described solution addresses both the read disturb and the data retention +issues, thereby allowing a long life usage for NAND devices. +The downside of the proposed solution is that the meta-data increases, and as +a result the size of the fastmap data also increases. +In our testing no performance impact was observed since the verification or +saving of the counters/timestamp is performed in O(1). +The solution above is implemented with minimal possible code changes since it +reuses the - already implemented - scrubbing mechanism used in UBI wear +leveling subsystem. diff --git a/drivers/mtd/ubi/build.c b/drivers/mtd/ubi/build.c index 6e30a3c..34fe23a 100644 --- a/drivers/mtd/ubi/build.c +++ b/drivers/mtd/ubi/build.c @@ -1,6 +1,9 @@ /* * Copyright (c) International Business Machines Corp., 2006 * Copyright (c) Nokia Corporation, 2007 + * Copyright (c) 2014, Linux Foundation. All rights reserved. + * Linux Foundation chooses to take subject only to the GPLv2 + * license terms, and distributes only under these terms. * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by @@ -118,6 +121,10 @@ static struct class_attribute ubi_version = static ssize_t dev_attribute_show(struct device *dev, struct device_attribute *attr, char *buf); +static ssize_t dev_attribute_store(struct device *dev, + struct device_attribute *attr, const char *buf, + size_t count); + /* UBI device attributes (correspond to files in '/<sysfs>/class/ubi/ubiX') */ static struct device_attribute dev_eraseblock_size = __ATTR(eraseblock_size, S_IRUGO, dev_attribute_show, NULL); @@ -141,6 +148,12 @@ static struct device_attribute dev_bgt_enabled = __ATTR(bgt_enabled, S_IRUGO, dev_attribute_show, NULL); static struct device_attribute dev_mtd_num = __ATTR(mtd_num, S_IRUGO, dev_attribute_show, NULL); +static struct device_attribute dev_dt_threshold = + __ATTR(dt_threshold, (S_IWUSR | S_IRUGO), dev_attribute_show, + dev_attribute_store); +static struct device_attribute dev_rd_threshold = + __ATTR(rd_threshold, (S_IWUSR | S_IRUGO), dev_attribute_show, + dev_attribute_store); /** * ubi_volume_notify - send a volume change notification. @@ -378,6 +391,10 @@ static ssize_t dev_attribute_show(struct device *dev, ret = sprintf(buf, "%d\n", ubi->thread_enabled); else if (attr == &dev_mtd_num) ret = sprintf(buf, "%d\n", ubi->mtd->index); + else if (attr == &dev_dt_threshold) + ret = sprintf(buf, "%d\n", ubi->dt_threshold); + else if (attr == &dev_rd_threshold) + ret = sprintf(buf, "%d\n", ubi->rd_threshold); else ret = -EINVAL; @@ -385,6 +402,38 @@ static ssize_t dev_attribute_show(struct device *dev, return ret; } +static ssize_t dev_attribute_store(struct device *dev, + struct device_attribute *attr, + const char *buf, size_t count) +{ + int value; + struct ubi_device *ubi; + + ubi = container_of(dev, struct ubi_device, dev); + ubi = ubi_get_device(ubi->ubi_num); + if (!ubi) + return -ENODEV; + + if (kstrtos32(buf, 10, &value)) + return -EINVAL; + /* Consider triggering full scan if threshods change */ + else if (attr == &dev_dt_threshold) { + if (value < UBI_MAX_DT_THRESHOLD) + ubi->dt_threshold = value; + else + pr_err("Max supported threshold value is %d", + UBI_MAX_DT_THRESHOLD); + } else if (attr == &dev_rd_threshold) { + if (value < UBI_MAX_READCOUNTER) + ubi->rd_threshold = value; + else + pr_err("Max supported threshold value is %d", + UBI_MAX_READCOUNTER); + } + + return count; +} + static void dev_release(struct device *dev) { struct ubi_device *ubi = container_of(dev, struct ubi_device, dev); @@ -445,6 +494,12 @@ static int ubi_sysfs_init(struct ubi_device *ubi, int *ref) if (err) return err; err = device_create_file(&ubi->dev, &dev_mtd_num); + if (err) + return err; + err = device_create_file(&ubi->dev, &dev_dt_threshold); + if (err) + return err; + err = device_create_file(&ubi->dev, &dev_rd_threshold); return err; } @@ -455,6 +510,8 @@ static int ubi_sysfs_init(struct ubi_device *ubi, int *ref) static void ubi_sysfs_close(struct ubi_device *ubi) { device_remove_file(&ubi->dev, &dev_mtd_num); + device_remove_file(&ubi->dev, &dev_dt_threshold); + device_remove_file(&ubi->dev, &dev_rd_threshold); device_remove_file(&ubi->dev, &dev_bgt_enabled); device_remove_file(&ubi->dev, &dev_min_io_size); device_remove_file(&ubi->dev, &dev_max_vol_count); diff --git a/drivers/mtd/ubi/fastmap.c b/drivers/mtd/ubi/fastmap.c index 0431b46..5399aa2 100644 --- a/drivers/mtd/ubi/fastmap.c +++ b/drivers/mtd/ubi/fastmap.c @@ -1,5 +1,7 @@ /* * Copyright (c) 2012 Linutronix GmbH + * Copyright (c) 2014, Linux Foundation. All rights reserved. + * * Author: Richard Weinberger <richard@xxxxxx> * * This program is free software; you can redistribute it and/or modify @@ -727,9 +729,9 @@ static int ubi_attach_fastmap(struct ubi_device *ubi, } for (j = 0; j < be32_to_cpu(fm_eba->reserved_pebs); j++) { - int pnum = be32_to_cpu(fm_eba->pnum[j]); + int pnum = be32_to_cpu(fm_eba->peb_data[j].pnum); - if ((int)be32_to_cpu(fm_eba->pnum[j]) < 0) + if ((int)be32_to_cpu(fm_eba->peb_data[j].pnum) < 0) continue; aeb = NULL; @@ -757,7 +759,8 @@ static int ubi_attach_fastmap(struct ubi_device *ubi, } aeb->lnum = j; - aeb->pnum = be32_to_cpu(fm_eba->pnum[j]); + aeb->pnum = + be32_to_cpu(fm_eba->peb_data[j].pnum); aeb->ec = -1; aeb->scrub = aeb->copy_flag = aeb->sqnum = 0; list_add_tail(&aeb->u.list, &eba_orphans); @@ -1250,11 +1253,12 @@ static int ubi_write_fastmap(struct ubi_device *ubi, vol->vol_type == UBI_STATIC_VOLUME); feba = (struct ubi_fm_eba *)(fm_raw + fm_pos); - fm_pos += sizeof(*feba) + (sizeof(__be32) * vol->reserved_pebs); + fm_pos += sizeof(*feba) + + 2 * (sizeof(__be32) * vol->reserved_pebs); ubi_assert(fm_pos <= ubi->fm_size); for (j = 0; j < vol->reserved_pebs; j++) - feba->pnum[j] = cpu_to_be32(vol->eba_tbl[j]); + feba->peb_data[j].pnum = cpu_to_be32(vol->eba_tbl[j]); feba->reserved_pebs = cpu_to_be32(j); feba->magic = cpu_to_be32(UBI_FM_EBA_MAGIC); diff --git a/drivers/mtd/ubi/ubi-media.h b/drivers/mtd/ubi/ubi-media.h index ac2b24d..da418ad 100644 --- a/drivers/mtd/ubi/ubi-media.h +++ b/drivers/mtd/ubi/ubi-media.h @@ -1,5 +1,8 @@ /* * Copyright (c) International Business Machines Corp., 2006 + * Copyright (c) 2014, Linux Foundation. All rights reserved. + * Linux Foundation chooses to take subject only to the GPLv2 + * license terms, and distributes only under these terms. * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by @@ -38,6 +41,15 @@ /* The highest erase counter value supported by this implementation */ #define UBI_MAX_ERASECOUNTER 0x7FFFFFFF +/* The highest read counter value supported by this implementation */ +#define UBI_MAX_READCOUNTER 0x7FFFFFFD /* (0x7FFFFFFF - 2)*/ + +/* + * The highest data retention threshold value supported + * by this implementation + */ +#define UBI_MAX_DT_THRESHOLD 0x7FFFFFFF + /* The initial CRC32 value used when calculating CRC checksums */ #define UBI_CRC32_INIT 0xFFFFFFFFU @@ -130,6 +142,7 @@ enum { * @vid_hdr_offset: where the VID header starts * @data_offset: where the user data start * @image_seq: image sequence number + * @last_erase_time: time stamp of the last erase operation * @padding2: reserved for future, zeroes * @hdr_crc: erase counter header CRC checksum * @@ -162,7 +175,8 @@ struct ubi_ec_hdr { __be32 vid_hdr_offset; __be32 data_offset; __be32 image_seq; - __u8 padding2[32]; + __be64 last_erase_time; /*curr time in sec == unsigned long time_t*/ + __u8 padding2[24]; __be32 hdr_crc; } __packed; @@ -413,6 +427,8 @@ struct ubi_vtbl_record { * @used_blocks: number of PEBs used by this fastmap * @block_loc: an array containing the location of all PEBs of the fastmap * @block_ec: the erase counter of each used PEB + * @block_rc: the read counter of each used PEB + * @block_let: the last erase timestamp of each used PEB * @sqnum: highest sequence number value at the time while taking the fastmap * */ @@ -424,6 +440,8 @@ struct ubi_fm_sb { __be32 used_blocks; __be32 block_loc[UBI_FM_MAX_BLOCKS]; __be32 block_ec[UBI_FM_MAX_BLOCKS]; + __be32 block_rc[UBI_FM_MAX_BLOCKS]; + __be64 block_let[UBI_FM_MAX_BLOCKS]; __be64 sqnum; __u8 padding2[32]; } __packed; @@ -469,13 +487,17 @@ struct ubi_fm_scan_pool { /* ubi_fm_scan_pool is followed by nfree+nused struct ubi_fm_ec records */ /** - * struct ubi_fm_ec - stores the erase counter of a PEB + * struct ubi_fm_ec - stores the erase/read counter of a PEB * @pnum: PEB number * @ec: ec of this PEB + * @rc: rc of this PEB + * @last_erase_time: last erase time stamp of this PEB */ struct ubi_fm_ec { __be32 pnum; __be32 ec; + __be32 rc; + __be64 last_erase_time; } __packed; /** @@ -506,10 +528,14 @@ struct ubi_fm_volhdr { * @magic: EBA table magic number * @reserved_pebs: number of table entries * @pnum: PEB number of LEB (LEB is the index) + * @rc: Read counter of the LEBs PEB (LEB is the index) */ struct ubi_fm_eba { __be32 magic; __be32 reserved_pebs; - __be32 pnum[0]; + struct { + __be32 pnum; + __be32 rc; + } peb_data[0]; } __packed; #endif /* !__UBI_MEDIA_H__ */ diff --git a/drivers/mtd/ubi/ubi.h b/drivers/mtd/ubi/ubi.h index 7bf4163..6c7e53e 100644 --- a/drivers/mtd/ubi/ubi.h +++ b/drivers/mtd/ubi/ubi.h @@ -1,6 +1,9 @@ /* * Copyright (c) International Business Machines Corp., 2006 * Copyright (c) Nokia Corporation, 2006, 2007 + * Copyright (c) 2014, Linux Foundation. All rights reserved. + * Linux Foundation chooses to take subject only to the GPLv2 + * license terms, and distributes only under these terms. * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by @@ -84,6 +87,22 @@ #define UBI_UNKNOWN -1 /* + * This parameter defines the maximum read counter of eraseblocks + * of UBI devices. When this threshold is exceeded, UBI starts performing + * wear leveling by means of moving data from eraseblock with low erase + * counter to eraseblocks with high erase counter. + */ +#define UBI_RD_THRESHOLD 100000 + +/* + * This parameter defines the maximun interval (in days) between two + * erasures of an eraseblock. When this interval is reached, UBI starts + * performing wear leveling by means of moving data from eraseblock with + * low erase counter to eraseblocks with high erase counter. + */ +#define UBI_DT_THRESHOLD 120 + +/* * The UBI debugfs directory name pattern and maximum name length (3 for "ubi" * + 2 for the number plus 1 for the trailing zero byte. */ @@ -155,6 +174,8 @@ enum { * @u.rb: link in the corresponding (free/used) RB-tree * @u.list: link in the protection queue * @ec: erase counter + * @last_erase_time: time stamp of the last erase opp + * @rc: read counter * @pnum: physical eraseblock number * * This data structure is used in the WL sub-system. Each physical eraseblock @@ -167,6 +188,8 @@ struct ubi_wl_entry { struct list_head list; } u; int ec; + long last_erase_time; + int rc; int pnum; }; @@ -451,6 +474,10 @@ struct ubi_debug_info { * @bgt_thread: background thread description object * @thread_enabled: if the background thread is enabled * @bgt_name: background thread name + * @rd_threshold: read counter threshold See UBI_RD_THRESHOLD + * for more info + * @dt_threshold: data retention threshold. See UBI_DT_THRESHOLD + * for more info * * @flash_size: underlying MTD device size (in bytes) * @peb_count: count of physical eraseblocks on the MTD device @@ -553,6 +580,9 @@ struct ubi_device { struct task_struct *bgt_thread; int thread_enabled; char bgt_name[sizeof(UBI_BGT_NAME_PATTERN)+2]; + int rd_threshold; + int dt_threshold; + /* I/O sub-system's stuff */ long long flash_size; @@ -588,6 +618,8 @@ struct ubi_device { /** * struct ubi_ainf_peb - attach information about a physical eraseblock. * @ec: erase counter (%UBI_UNKNOWN if it is unknown) + * @rc: read counter (%UBI_UNKNOWN if it is unknown) + * @last_erase_time: last erase time stamp (%UBI_UNKNOWN if it is unknown) * @pnum: physical eraseblock number * @vol_id: ID of the volume this LEB belongs to * @lnum: logical eraseblock number @@ -604,6 +636,8 @@ struct ubi_device { */ struct ubi_ainf_peb { int ec; + int rc; + long last_erase_time; int pnum; int vol_id; int lnum; diff --git a/drivers/mtd/ubi/wl.c b/drivers/mtd/ubi/wl.c index 20f4917..33d33e43 100644 --- a/drivers/mtd/ubi/wl.c +++ b/drivers/mtd/ubi/wl.c @@ -1,5 +1,8 @@ /* * Copyright (c) International Business Machines Corp., 2006 + * Copyright (c) 2014, Linux Foundation. All rights reserved. + * Linux Foundation chooses to take subject only to the GPLv2 + * license terms, and distributes only under these terms. * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by @@ -1898,6 +1901,9 @@ int ubi_wl_init(struct ubi_device *ubi, struct ubi_attach_info *ai) INIT_LIST_HEAD(&ubi->pq[i]); ubi->pq_head = 0; + ubi->rd_threshold = UBI_RD_THRESHOLD; + ubi->dt_threshold = UBI_DT_THRESHOLD; + list_for_each_entry_safe(aeb, tmp, &ai->erase, u.list) { cond_resched(); -- Qualcomm Israel, on behalf of Qualcomm Innovation Center, Inc. The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project -- To unsubscribe from this list: send the line "unsubscribe linux-arm-msm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html