On Tue, Apr 03 2018 at 12:07am -0400, Dennis Yang <dennisyang@xxxxxxxx> wrote: > Hi, > > Recently we have came across an issue that dm-thin pool will be > switched to READ_ONLY mode because dm_pool_alloc_data_block() returns > -ENOSPC. AFAIK, this should not happen since alloc_data_block() will > check if there is any free space (and commit metadata if it first > reports no free space) before it allocates pool block. In addition, > total virtual space of all thin volumes is smaller than the pool > physical space in my testing environment which makes pool impossible > to run out of space. > > This issue could be easily reproduced by the following steps. > > 1) Create a thin pool and a slightly smaller thin volume > > sudo dmsetup create meta --table "0 40000000 linear /dev/sdf 0" > > sudo dmsetup create data --table "0 10240000 linear /dev/md125 0" > > sudo dd if=/dev/zero of=/dev/mapper/meta bs=1M count=1 > > sudo dmsetup create pool --table "0 10240000 thin-pool /dev/mapper/meta /dev/mapper/data 1024 0 2 skip_block_zeroing error_if_no_space" > > sudo dmsetup message pool 0 "create_thin 0" > > sudo dmsetup create thin --table "0 10238976 thin /dev/mapper/pool 0" > > 2) Make a filesystem and mount it > > sudo mkfs.ext4 -E lazy_itable_init=0,lazy_journal_init=0 /dev/mapper/thin > > sudo mount /dev/mapper/thin /mnt > > 3) Write a file to mount point until it takes all the space > > sudo dd if=/dev/zero of=/mnt/zero.img bs=1M > > 4) Remove this file and trim mount point > > sudo rm /mnt/zero.img > > sudo fstrim /mnt > > Repeat step 3 and 4 multiple times and the pool will be switched to > READ_ONLY mode and need_checks flag will be set. Kernel message shows > the following messages. > [ 3952.723937] device-mapper: thin: 252:2: metadata operation > 'dm_pool_alloc_data_block' failed: error = -28 > [ 3952.723940] device-mapper: thin: 252:2: aborting current metadata transaction > [ 3952.725860] device-mapper: thin: 252:2: switching pool to read-only mode > > This root cause of this issue is that dm-thin will first remove > mapping and increase corresponding blocks' reference count to prevent > them from being reused before DISCARD bios get processed by the > underlying layers. However. increasing blocks' reference count could > also increase the nr_allocated_this_transaction in struct sm_disk > which makes smd->old_ll.nr_allocated + > smd->nr_allocated_this_transaction bigger than smd->old_ll.nr_blocks. > In this case, alloc_data_block() will never commit metadata to reset > the begin pointer of struct sm_disk, because sm_disk_get_nr_free() > always return an underflow value. > > If you need more information, please feel free to let me know. FYI, I just staged the following fix: https://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm.git/commit/?h=dm-4.18&id=2b21877316f3a517554c1b34e6b32f4d1ad10493 -- dm-devel mailing list dm-devel@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/dm-devel