Re: On allowing remounting the partition containing dm-mapped boot disk as read-write

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Firstly, thankyou for your reply, However, I still think you are mistaken.
"There is no double re-mounting of the filesystem". 

I try again, but in case my writing isn't convincing. Let's agree to disagree, until
I have more conclusive proof, either to what confirm what you say, or what I say.   

I have captured some logs from the initramfs. Included near the end of this post.
I also tried my antiventoy idea. I believe it worked !!. It is included at the end of this post.
The logs help to see whats going on.

> Besides caching at the dm layer, there is caching at the filesystem
> layer (files, directories and other stuff).  The caching at the DM
> layer is not the issue.
If at all there is an issue, its at DM layer, But, if you say there is no issue at the DM
layer, then maybe there is no issue at all. 
The 2 filesystems in question, one that sits inside a partition of the 
dm-mapped vdisk-file and one that sits in the outside partition,
do not touch/interfere with one another's operation and metadata structures
as they change and modify their own filesystem structures. Any 'block write' from the 
dm-mapped disk does not go through the outside filesystem driver, it goes
directly to the blocks on disk.
>
> If rootfs mounts the dm-device someplace as a
> (ntfs)filesystem(readonly), and then later you also mount that same
> (ntfs)filesystem (read-write) someplace then you have mounted the
> filesystem twice (not a shared bind mount that appears to be 2 mounts
> but in reality is only a single mount), and the filesystem code is not
> safe when there are 2 unique separate mount instances, the fs caching
> is not shared.
This is not what is happening. 
The same filesystem is not being re-mounted.
In fact, by using rd.break=pre-mount, in the initramfs, before pivot, I have confirmed that 
no mounts have been done. Other than initramfs filesystem, no disk filesystem exists.
But, using a block table  a device map has been created directly to the blocks on disk.
so one can issue : dm table /dev/mapper/ventoy and see the block map.
The blocks correspond the the exact block-locations of the file on disk. 
After pivot, the filesystem in one the partitions inside /dev/mapper/ventoy (specifically
/dev/mapper/ventoy4), becomes the rootfs of the OS being booted up. 
The outside /dev/sdc1 is still unmounted.
But, after booting-up and logging-in, it cannot be mounted, 
Because, I think, the fs driver cannot get exclusive access to all blocks that constitute
/dev/sdc1 because the device mapper is holding onto some blocks. 
That's what, I think, the ventoy dm_patch overrides.
(you say its because of a clash with a prior filesystem mount, which is incorrect)
> 

> ie if the filesystem is already mounted as a filesystem someplace
It is not already mounted as a filesystem someplace.
> (mapping a dm device is not mounting) and you mount it a 2nd time then
> you are at risk of corruption.
As filesytems are separate, there won't be corruption for that reason. It is not a remount.

> 
> For the mount command to get a busy indicates that something has
> already mounted it someplace and has locked the device(to prevent a
> 2nd corrupting mount from happening), and the entire purpose of the
> module appears to be to disable the safety provided by this locking
> and allow one to mount the fs again.
I've grabbed /proc/mounts to be sure of what I am saying, 
I think, this mount-busy is not due to a filesystem-vs-filesystem clash but a
dm-map-vs-filesystem clash.  My guess is, the dm_patch is a crude way to make 
the devicemapper think the blocks as still available.

> And note that with double mounts (usually I have seen/debugged the
> crash/ouage from them with shared SAN storage)  they can "APPEAR" to
> kind of work if you do not use them too much, but eventually the 2
> instances of filesystem code step on each other and it ends badly
> (kernel crash, data corruption, loss of the filesystem).
I know how terrible, a double mount can get. Not, exactly a double mount, but once I
experienced something similar.  I had a virtualbox VM in windows accessing a btrfs partition 
on a disk partition (pass-through). put VM to sleep/suspend (not shutdown), forgot about it. 
shutdown windows, and rebooted in linux. There was probably un-committed writes in
its journal. In Linux, simple mounting failed, linux warned me, but I unwittingly 
force mounted the btrfs partition, which remounted the partition from some earlier state, 
and each successive modification put the filesystem in a more inconsistent
state, until a full messup. I don't remember specific btrfs terminology, but had to sift through 
binary dump of the btrfs fs to find superblocks of the older btrfs root tree of 
that previous state and then use special btrfs commands to extract the files out. 

This ventoy situation, is more like, the filesystem on the outside-partition has full control of it self
and all its files and folders, no one is touching or interfering with it. There exists a file,
the vdisk file, which by self-contract, the user is not supposed to touch. Unbeknowst to the
outside-partition-filesystem driver, the device mapper device, which was created even
before mounting of the filesystem on the outside-partition, directly writes into its blocks 
without invoking the partition's filesystem driver. So the containing partition filesystem
driver would never know that the content of a file are sneakily changing. NTFS doesn't 
do hashes/checksums, if it did it would know something is up. Whenever the filesystem
driver sees the file by way of a directory traversal, just sees a static unchanged fixed size file.

> If you need this module so that you can use another (non-bind) mount
> command someplace on the same device then you are double mounting.
> The point of the busy device is that it is mounted someplace, and the
> entire point of the module is to bypass the protection.
> 
> It is the filesystem code and filesystem caching that is not safe.
> Both filesystems assume they have total control of filesystem data
> structures, and at the very least the mount writing will change those
> structures, and if the read-only mount may read blocks that were part
> of one file but now aren't part of that file returning the wrong data.
> 
I got your point, but I think its not double mounting of a filesystem. 
I'll keep an open mind to try find/see evidence that it is the way you described. 

=== LOGS
# rd.break=pre-mount was added to linux cmd line in the grub menu
# the below logs were collected from the initramfs of the vdisk fedora-37 m02_lnx.raw.img.vtoy 
[root@sirius tmp]# cat vtoy3_m02.log
# cat /proc/mounts
rootfs / rootfs rw 0 0
proc /proc proc rw,nosuid,nodev,noexec,relatime 0 0
sysfs /sys sysfs rw,nosuid,nodev,noexec,relatime 0 0
devtmpfs /dev devtmpfs rw,nosuid,size=4096k,nr_inodes=1048576,mode=755,inode64 0 0
securityfs /sys/kernel/security securityfs rw,nosuid,nodev,noexec,relatime 0 0
tmpfs /dev/shm tmpfs rw,nosuid,nodev,inode64 0 0
devpts /dev/pts devpts rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=000 0 0
tmpfs /run tmpfs rw,nosuid,nodev,size=3261448k,nr_inodes=819200,mode=755,inode64 0 0
cgroup2 /sys/fs/cgroup cgroup2 rw,nosuid,nodev,noexec,relatime,nsdelegate,memory_recursiveprot 0 0
pstore /sys/fs/pstore pstore rw,nosuid,nodev,noexec,relatime 0 0
efivarfs /sys/firmware/efi/efivars efivarfs rw,nosuid,nodev,noexec,relatime 0 0
bpf /sys/fs/bpf bpf rw,nosuid,nodev,noexec,relatime,mode=700 0 0
ramfs /run/credentials/systemd-sysusers.service ramfs ro,nosuid,nodev,noexec,relatime,mode=700 0 0
rpc_pipefs /var/lib/nfs/rpc_pipefs rpc_pipefs rw,relatime 0 0
#
# cat /etc/mtab
rootfs / rootfs rw 0 0
proc /proc proc rw,nosuid,nodev,noexec,relatime 0 0
sysfs /sys sysfs rw,nosuid,nodev,noexec,relatime 0 0
devtmpfs /dev devtmpfs rw,nosuid,size=4096k,nr_inodes=1048576,mode=755,inode64 0 0
securityfs /sys/kernel/security securityfs rw,nosuid,nodev,noexec,relatime 0 0
tmpfs /dev/shm tmpfs rw,nosuid,nodev,inode64 0 0
devpts /dev/pts devpts rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=000 0 0
tmpfs /run tmpfs rw,nosuid,nodev,size=3261448k,nr_inodes=819200,mode=755,inode64 0 0
cgroup2 /sys/fs/cgroup cgroup2 rw,nosuid,nodev,noexec,relatime,nsdelegate,memory_recursiveprot 0 0
pstore /sys/fs/pstore pstore rw,nosuid,nodev,noexec,relatime 0 0
efivarfs /sys/firmware/efi/efivars efivarfs rw,nosuid,nodev,noexec,relatime 0 0
bpf /sys/fs/bpf bpf rw,nosuid,nodev,noexec,relatime,mode=700 0 0
ramfs /run/credentials/systemd-sysusers.service ramfs ro,nosuid,nodev,noexec,relatime,mode=700 0 0
rpc_pipefs /var/lib/nfs/rpc_pipefs rpc_pipefs rw,relatime 0 0
#
# ls -l /etc/mtab
lrwxrwxrwx 1 root root 17 Feb 16 14:59 /etc/mtab -> /proc/self/mounts
#
# ls -l /proc/mounts
lrwxrwxrwx 1 root root 11 Feb 16 14:59 /proc/mounts -> self/mounts
#
# dmsetup status
ventoy: 0 97386496 linear
ventoy1: 0 520192 linear
ventoy2: 0 4096 linear
ventoy3: 0 1572864 linear
ventoy4: 0 95285248 linear
#
# dmsetup ls
ventoy  (253:0)
ventoy1 (253:1)
ventoy2 (253:2)
ventoy3 (253:3)
ventoy4 (253:4)
#
# dmsetup info ventoy
Name:              ventoy
State:             ACTIVE
Read Ahead:        256
Tables present:    LIVE
Open count:        4
Event number:      0
Major, minor:      253, 0
Number of targets: 1

#
# dmsetup status -v
Name:              ventoy
State:             ACTIVE
Read Ahead:        256
Tables present:    LIVE
Open count:        4
Event number:      0
Major, minor:      253, 0
Number of targets: 1

0 97386496 linear

Name:              ventoy1
State:             ACTIVE
Read Ahead:        256
Tables present:    LIVE
Open count:        0
Event number:      0
Major, minor:      253, 1
Number of targets: 1

0 520192 linear

Name:              ventoy2
State:             ACTIVE
Read Ahead:        256
Tables present:    LIVE
Open count:        0
Event number:      0
Major, minor:      253, 2
Number of targets: 1

0 4096 linear

Name:              ventoy3
State:             ACTIVE
Read Ahead:        256
Tables present:    LIVE
Open count:        0
Event number:      0
Major, minor:      253, 3
Number of targets: 1

0 1572864 linear

Name:              ventoy4
State:             ACTIVE
Read Ahead:        256
Tables present:    LIVE
Open count:        0
Event number:      0
Major, minor:      253, 4
Number of targets: 1

0 95285248 linear

#
# dmsetup table
ventoy: 0 97386496 linear 8:33 1973076888
ventoy1: 0 520192 linear 253:0 2048
ventoy2: 0 4096 linear 253:0 522240
ventoy3: 0 1572864 linear 253:0 526336
ventoy4: 0 95285248 linear 253:0 2099200
#
# dmsetup table -v
Name:              ventoy
State:             ACTIVE
Read Ahead:        256
Tables present:    LIVE
Open count:        4
Event number:      0
Major, minor:      253, 0
Number of targets: 1

0 97386496 linear 8:33 1973076888

Name:              ventoy1
State:             ACTIVE
Read Ahead:        256
Tables present:    LIVE
Open count:        0
Event number:      0
Major, minor:      253, 1
Number of targets: 1

0 520192 linear 253:0 2048

Name:              ventoy2
State:             ACTIVE
Read Ahead:        256
Tables present:    LIVE
Open count:        0
Event number:      0
Major, minor:      253, 2
Number of targets: 1

0 4096 linear 253:0 522240

Name:              ventoy3
State:             ACTIVE
Read Ahead:        256
Tables present:    LIVE
Open count:        0
Event number:      0
Major, minor:      253, 3
Number of targets: 1

0 1572864 linear 253:0 526336

Name:              ventoy4
State:             ACTIVE
Read Ahead:        256
Tables present:    LIVE
Open count:        0
Event number:      0
Major, minor:      253, 4
Number of targets: 1

0 95285248 linear 253:0 2099200

#
# ls -l /dev/sdc1
brw-rw---- 1 root disk 8, 33 Feb 16 14:59 /dev/sdc1
#
# ls -l /dev/mapper/ventoy
lrwxrwxrwx 1 root root 7 Feb 16 14:59 /dev/mapper/ventoy -> ../dm-0
#
# ls -l /dev/dm-0
brw-rw---- 1 root disk 253, 0 Feb 16 14:59 /dev/dm-0
#
# cat /proc/cmdline
BOOT_IMAGE=(hd3,gpt3)/vmlinuz-6.1.11-200.fc37.x86_64 root=UUID=27857c40-0518-4c8e-872a-a55bc80c9847 ro rootflags=subvol=root_01 rhgb quiet rd.break=pre-mount
#
# ls -l /dev/sda
brw-rw---- 1 root disk 8, 0 Feb 16 14:59 /dev/sda
#
# ls -l /dev/sdb
brw-rw---- 1 root disk 8, 16 Feb 16 14:59 /dev/sdb
#
# ls -l /dev/sdc
brw-rw---- 1 root disk 8, 32 Feb 16 14:59 /dev/sdc
#
# cat /proc/devices
Character devices:
  1 mem
  4 /dev/vc/0
  4 tty
  4 ttyS
  5 /dev/tty
  5 /dev/console
  5 /dev/ptmx
  7 vcs
 10 misc
 13 input
 21 sg
 29 fb
128 ptm
136 pts
180 usb
188 ttyUSB
189 usb_device
202 cpu/msr
203 cpu/cpuid
226 drm
234 megaraid_sas_ioctl
235 megadev_legacy
236 pmcsas
237 nvme-generic
238 nvme
239 uio
240 binder
241 hidraw
242 ttyDBC
243 usbmon
244 wwan_port
245 bsg
246 watchdog
247 ptp
248 pps
249 lirc
250 rtc
251 dma_heap
252 dax
253 tpm
254 gpiochip
509 aux
510 cec
511 aac

Block devices:
  8 sd
  9 md
 11 sr
 65 sd
 66 sd
 67 sd
 68 sd
 69 sd
 70 sd
 71 sd
128 sd
129 sd
130 sd
131 sd
132 sd
133 sd
134 sd
135 sd
253 device-mapper
254 mdp
259 blkext
#
# mount /dev/sda12 /tmp/a12
#
# cat /proc/mounts | grep a12 -B 3
bpf /sys/fs/bpf bpf rw,nosuid,nodev,noexec,relatime,mode=700 0 0
ramfs /run/credentials/systemd-sysusers.service ramfs ro,nosuid,nodev,noexec,relatime,mode=700 0 0
rpc_pipefs /var/lib/nfs/rpc_pipefs rpc_pipefs rw,relatime 0 0
/dev/sda12 /tmp/a12 btrfs rw,relatime,ssd,space_cache,subvolid=5,subvol=/ 0 0
#
# cp /tmp/vtoy3_m02,log /tmp/a12/gana/vtoy3_m02.log
#
# cat /tmp/vtoy.log
ventoy_do_dm_patch
get_addr=0xffffffff92a9f7a0 get_size=496
put_addr=0xffffffff92a9f990 put_size=224
kprobe_reg_addr=ffffffff921ed630 kprobe_unreg_addr=ffffffff921ecc20
ro_addr=ffffffff92089430 rw_addr=ffffffff92089460 printk_addr=0
template module is /lib/modules/6.1.11-200.fc37.x86_64/kernel/fs/zonefs/zonefs.ko.xz zonefs.ko.xz
insmod: ERROR: could not insert module /tmp/dm_patch.ko: Invalid parameters
#
# /usr/bin/vtoydump
=== ventoy runtime data ===
disk name : /dev/sdc
disk size : 2000398934016
disk part : 1
filesystem: ntfs
image size: 49861885952
image path: /transcend/m02_lnx.raw.img.vtoy
#
# dmesg | grep dm
[    0.197961] MDS CPU bug present and SMT on, data leak possible. See https://www.kernel.org/doc/html/latest/admin-guide/hw-vuln/mds.html for more details.
[    1.441071] device-mapper: ioctl: 4.47.0-ioctl (2022-07-28) initialised: dm-devel@xxxxxxxxxx
[    6.094644] dm_patch: module verification failed: signature and/or required key missing - tainting kernel
[    7.071137] BTRFS: device fsid 27857c40-0518-4c8e-872a-a55bc80c9847 devid 1 transid 7440 /dev/dm-4 scanned by systemd-udevd (656)
#
# dmesg | grep sd
[    1.744888] sd 0:0:0:0: Attached scsi generic sg0 type 0
[    1.745132] sd 0:0:0:0: [sda] 1000215216 512-byte logical blocks: (512 GB/477 GiB)
[    1.745148] sd 0:0:0:0: [sda] Write Protect is off
[    1.745151] sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
[    1.745184] sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[    1.745218] sd 0:0:0:0: [sda] Preferred minimum I/O size 512 bytes
[    1.753036]  sda: sda1 sda2 sda3 sda4 sda5 sda6 sda7 sda8 sda9 sda10 sda11 sda12 sda13 sda14 sda15 sda16 sda17
[    1.753900] sd 0:0:0:0: [sda] Attached SCSI disk
[    2.149445] sd 2:0:0:0: Attached scsi generic sg1 type 0
[    2.149646] sd 2:0:0:0: [sdb] 3907029168 512-byte logical blocks: (2.00 TB/1.82 TiB)
[    2.149656] sd 2:0:0:0: [sdb] 4096-byte physical blocks
[    2.149734] sd 2:0:0:0: [sdb] Write Protect is off
[    2.149741] sd 2:0:0:0: [sdb] Mode Sense: 00 3a 00 00
[    2.149844] sd 2:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[    2.149927] sd 2:0:0:0: [sdb] Preferred minimum I/O size 4096 bytes
[    2.150312] sd 4:0:0:0: Attached scsi generic sg2 type 0
[    2.150511] sd 4:0:0:0: [sdc] 3907029168 512-byte logical blocks: (2.00 TB/1.82 TiB)
[    2.150542] sd 4:0:0:0: [sdc] 4096-byte physical blocks
[    2.150588] sd 4:0:0:0: [sdc] Write Protect is off
[    2.150596] sd 4:0:0:0: [sdc] Mode Sense: 00 3a 00 00
[    2.150652] sd 4:0:0:0: [sdc] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[    2.150717] sd 4:0:0:0: [sdc] Preferred minimum I/O size 4096 bytes
[    2.238833]  sdb: sdb1 sdb2 sdb3 sdb4 sdb5 sdb6 sdb7
[    2.239285] sd 2:0:0:0: [sdb] Attached SCSI disk
[    2.298831]  sdc: sdc1 sdc2 sdc3 sdc4 sdc5 sdc6 sdc7 sdc8 sdc9 sdc10 sdc11
[    2.299987] sd 4:0:0:0: [sdc] Attached SCSI disk
[    3.975488] BTRFS: device fsid 487f55dc-4a7c-4424-9941-352386ecc749 devid 1 transid 230715 /dev/sda9 scanned by systemd-udevd (651)
[    3.979333] BTRFS: device fsid 38d6229a-c049-45df-b6eb-dc254746fb6b devid 1 transid 538 /dev/sda10 scanned by systemd-udevd (641)
[    3.982654] BTRFS: device fsid 3562d3c6-ea67-4e3b-8231-ce6756f4d3bf devid 1 transid 31788 /dev/sda8 scanned by systemd-udevd (640)
[    3.992185] BTRFS: device fsid aaa67901-4412-47cd-b93f-49e758bfc50c devid 1 transid 1060161 /dev/sda12 scanned by systemd-udevd (657)
[    4.707518] BTRFS: device fsid 96fe0ea4-ea54-4a58-925c-c978a9b36e09 devid 1 transid 1147 /dev/sdc7 scanned by systemd-udevd (658)
[    4.769927] BTRFS: device fsid 5c4cc4a0-55a2-4cc7-ae39-e0aada6ba74b devid 1 transid 2493 /dev/sdb4 scanned by systemd-udevd (637)
[ 1230.681698] BTRFS info (device sda12): using crc32c (crc32c-intel) checksum algorithm
[ 1230.681706] BTRFS info (device sda12): disk space caching is enabled
[ 1230.686197] BTRFS info (device sda12): enabling ssd optimizations
[ 1305.232037] BTRFS info (device sda12): using crc32c (crc32c-intel) checksum algorithm
[ 1305.232045] BTRFS info (device sda12): disk space caching is enabled
[ 1305.236469] BTRFS info (device sda12): enabling ssd optimizations
#
# ls -l /sys/firmware/efi/efivars | grep toy
-rw-r--r-- 1 root root  516 Feb 16 14:59 VentoyOsParam-77772020-2e77-6576-6e74-6f792e6e6574
#

# the efivar file that was copied out and this is its hexdump 
# ventoy-grub perhaps uses this to send info to ventoy scripts 
[root@sirius tmp]# cat VentoyOsParam-77772020-2e77-6576-6e74-6f792e6e6574 | od -A x -t x1z -v
000000 06 00 00 00 20 20 77 77 77 2e 76 65 6e 74 6f 79  >....  [http://www.ventoy<]www.ventoy<
000010 2e 6e 65 74 38 a9 77 92 6c 88 80 4e 3a a0 29 77  >.net8.w.l..N:.)w<
000020 b8 3e 63 06 35 00 60 11 c1 d1 01 00 00 01 00 01  >.>c.5.`.........<
000030 00 2f 74 72 61 6e 73 63 65 6e 64 2f 6d 30 32 5f  >./transcend/m02_<
000040 6c 6e 78 2e 72 61 77 2e 69 6d 67 2e 76 74 6f 79  >lnx.raw.img.vtoy<
000050 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  >................<
000060 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  >................<
000070 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  >................<
000080 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  >................<
000090 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  >................<
0000a0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  >................<
0000b0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  >................<
0000c0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  >................<
0000d0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  >................<
0000e0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  >................<
0000f0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  >................<
000100 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  >................<
000110 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  >................<
000120 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  >................<
000130 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  >................<
000140 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  >................<
000150 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  >................<
000160 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  >................<
000170 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  >................<
000180 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  >................<
000190 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  >................<
0001a0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  >................<
0001b0 00 00 00 00 9c 0b 00 00 00 00 70 92 9c 00 00 00  >..........p.....<
0001c0 00 2c 00 00 00 00 00 00 00 00 01 00 f3 d9 b6 2d  >.,.............-<
0001d0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  >................<
0001e0 00 00 00 00 00 f3 d9 b6 2d 00 00 00 00 00 00 00  >........-.......<
0001f0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  >................<
000200 00 00 00 00                                      >....<
000204

=== ATTEMPTING THE ANTIVENTOY IDEA 
> > An idea just came to me. There maybe a way around the disk sector caching.
> > Will this work or be more safer?
> > What if, in addition to /dev/mapper/ventoy, a second dm device
> > /dev/mapper/antiventoy of the same size of the host partition /dev/sdc1 is created
> > by stitching together the other remaining sectors of the host partition /dev/sdc1,
> > with the vdisk sectors swapped for with null sectors. Then the two dm created disks:
> > /dev/mapper/ventoy and /dev/mapper/antiventoy can be mounted independently,
> > without overlap of disk sectors, separating their caching.
> > The self-contract will still be needed, to not alter the location/size of fs-entry.
> > I'll suggest the above to ventoy. Your thoughts will be helpful.

The below logs are collected after the fedora vdisk boots up. 

[root@sirius tmp]# cat vtoy4_m02_fedora_postboot.txt
https://www.kernel.org/doc/Documentation/device-mapper/linear.txt
https://www.kernel.org/doc/Documentation/device-mapper/zero.txt

[root@fedora gana]# sgdisk /dev/sdc  -p | grep -i toy
   1            2048      2566852607   1.2 TiB     0700  15KJ_Ventoy
   2      2566852608      2566918143   32.0 MiB    0700  15KJ_VTOYEFI

[root@fedora gana]# dmsetup table
ventoy: 0 97386496 linear 8:33 1973076888
ventoy1: 0 520192 linear 253:0 2048
ventoy2: 0 4096 linear 253:0 522240
ventoy3: 0 1572864 linear 253:0 526336
ventoy4: 0 95285248 linear 253:0 2099200

size of partition
[root@fedora notes]# echo $((2566852607-2048+1))
2566850560
end of vdisk
[root@fedora notes]# echo $((1973076888+97386496))
2070463384
blocks after vdisk
[root@fedora notes]# echo $((2566850560-2070463384))
496387176

[root@fedora ~]# cat  tmp/dmtab_antiventoy
0 1973076888 linear 8:33 0
1973076888  97386496 zero
2070463384 496387176 linear 8:33 2070463384

[root@fedora ~]# cat tmp/dmtab_antiventoy | dmsetup create antiventoy
[root@fedora ~]# dmsetup table
antiventoy: 0 1973076888 linear 8:33 0
antiventoy: 1973076888 97386496 zero
antiventoy: 2070463384 496387176 linear 8:33 2070463384
ventoy: 0 97386496 linear 8:33 1973076888
ventoy1: 0 520192 linear 253:0 2048
ventoy2: 0 4096 linear 253:0 522240
ventoy3: 0 1572864 linear 253:0 526336
ventoy4: 0 95285248 linear 253:0 2099200

[root@fedora ~]# mount -t ntfs3 /dev/mapper/antiventoy /mnt/t1 -o ro
[root@fedora ~]#

[gana@fedora ~]$ ls /mnt/t1
'$RECYCLE.BIN'       HQ                           tmpq        win10inst
 clonezilla          Recovery                     transcend
 DumpStack.log.tmp  'System Volume Information'   ventoy
 freshwin            test                         vstorage

[gana@fedora ~]$ df | grep t1
/dev/mapper/antiventoy 1283425276 1047620704 235804572  82% /mnt/t1

[gana@fedora ~]$ find /mnt/t1 | wc
  23941   24019 1617981

[root@fedora ~]# umount /dev/mapper/antiventoy
[root@fedora ~]#

By side stepping the blocks that were allocated /dev/mapper/ventoy, the contention for 
was removed, and the new device /dev/mapper/antiventoy was created. 

Long ago, I had defragmented the ntfs partition in win10. so the vdisk file is a single
contiguous unfragmented sequence of blocks. It would have been more work if it 
were fragmented but still do-able. ventoy does not require the file to be contiguous. 
However, I think, contiguous dm-maps are more performant.

The advantage of this method is that that no dm-patching is required. Hence, no kernel taint.
/dev/mapper/ventoy gives access to filesystems inside designated-blocks (the location of the file on disk)
/dev/mapper/antiventoy gives access to the outside-filesystem.

These block boundary calculations need to be perfect, in order to present a filesystem 
as perfect as the original. The job of the antiventoy dm-map is to just substitute the vdisk 
file-blocks with zero-blocks, in order to avoid the contention that prevents mount. 
Some testing will help with assurance of safety. I mounted the fs "ro" to be safe.
Don't know what tool I can use to check the sanity of the ntfs filesystem under linux
ntfsck from ntfsprogs was not helpful. It is an incomplete implemntation, which, for the 
moment does nothing. 

thx
-Gana
 
> On Wed, Feb 15, 2023 at 11:23 PM Ganapathi Kamath <hgkamath@xxxxxxxxxxx> wrote:
> >
> >
> > Firstly, thankyou for your reply. I'm not a kernel expert, so I value what you say.
> > but as I raised the issue I felt I had to defend the usefulness, userbase and the need.
> >
> > > Typically double mounts are done via bind mounts (not really double
> > > mounted just the device showing someplace else).   Or one would do a
> > > mount -o remount,rw <originaldir> and remount it rw so you could write
> > > to it.
> >
> > > A real double mount where the kernel fs modules manages both mounts as
> > > if it was a separate device won't work reliably, as the filesystem
> > > module/caching assumes it has total control.   So writing to a
> > > read-write device that has a separate read-only mount with some data
> > > in the read cache will not return the true data in all cases.    2
> > > read-write (done any way--except with a clustering fs modules) are
> > > going to cause corruption of the underlying filesystem.
> >
> > I want to clarify, even though ventoy uses the word 'remount' to describe the
> > feature, the host file system isn't mounted twice.  There is no loopback-fs
> > to bind mount. and dmsetup status shows linear sequences of blocks allocated
> > to the devmapper device.
> >
> > For this feature to work, the Linux being booted up, creates the devmapper
> > device by after having first somehow determined the sectors occupied by
> > the file in the filesystem. Then mounts the partitions inside devmapper device
> > and then pivots to the discovered rootfs and continues booting.
> >
> > So what I think you are saying is that a mount of /dev/sdc1 and
> > /dev/mapper/ventoy are claiming to use the hard-disk sectors, and asking the
> > kernel to consider them as part of its disk-sector caching mechanism.
> >
> > Booting virtual-disks this way is also called nativeboot.
> > The way this nativeboot so far works, has a little danger.
> > Three safe guards are to be followed by self contract:
> > 1) The virtual-disk-file must be a fixed size, it cannot be allowed to grow or shrink.
> > 2) The virtual-disk-file must not be manipulated/touched/read from the host-partition.
> > 3) The filesystem driver shouldn't defrag, scrub or relocate the virtual-disk-file.
> > This is so that the the file entry in the outside fs remains untouched.
> > Usually, as this is done by root and such administrative user knows what he is
> > doing, so it is not so much of a problem.
> > If one adheres to the above self-contract, the filesystem code for partitions inside
> > the dm-device does not interfere with the filesystem code for the outside partition.
> >
> > An idea just came to me. There maybe a way around the disk sector caching.
> > Will this work or be more safer?
> > What if, in addition to /dev/mapper/ventoy, a second dm device
> > /dev/mapper/antiventoy of the same size of the host partition /dev/sdc1 is created
> > by stitching together the other remaining sectors of the host partition /dev/sdc1,
> > with the vdisk sectors swapped for with null sectors. Then the two dm created disks:
> > /dev/mapper/ventoy and /dev/mapper/antiventoy can be mounted independently,
> > without overlap of disk sectors, separating their caching.
> > The self-contract will still be needed, to not alter the location/size of fs-entry.
> > I'll suggest the above to ventoy. Your thoughts will be helpful.
> >
> > > Given that the use case for the module is dangerous and the use case
> > > is of questionable usefulness I would think that is no point of the
> > > module.  The module's intent seems to be to get around the exclusive
> > > locks that the filesystem (for good reason) is putting on the device.
> >
> > I believe that the danger can be mitigated with a good idea and proper coding.
> > But the door shouldn't be shut.
> >
> > Its usefulness and base of users is really there. The use case is really important
> > 1) to those users who dualboot windows/linux, multi boot other OS-es
> > and juggle between them for want of SSD space,
> > 2) to multiboot alternate OS. but have limited on-machine disk-space
> > 2) to mounting different live isos often, which are often re-downloaded due to updates.
> > 3) to those keeping a host of recovery isos-s at hand for emergency like
> > WINRE, WIN-installer, linux-installer, HBCD-PE, gparted, clonezilla,
> > rescuezilla, seabios, samsung-wizard at hand,
> >
> > Why not a VM?:
> > VM-s are nice but a bit slower than nativeboot. Many things cannot be done
> > inside a VM such as get full native experience of a live iso, GPU support and all.
> > Some system level recovery and repair tools must be booted as native.
> >
> > In the old days Harddisks, USB drives, iso files were small.
> > vdisks were inexistent.
> > One had to burn live-isos to cd/dvd. Disks have grown larger.
> > Burning DVDs is such a waste now.
> >
> > At one point I considered having a small number of 8GB microsd cards to function
> > just like tiny dvds/floppies. But managing them is also a hassle, as they are stored
> > external.
> >
> > Disadvantages of flashing USB drive
> > * flashing a USB drive, which say is 32gb, with a tiny <3gb ISO file. can result in it wasting
> > space as it creates a small partition, limiting the drive's usefulness.
> > * One doesn't want too many usb drives lying around to boot different iso-s
> > * In my experience, flashing seems to have a higher frequency of bricking the USB key.
> >
> > With multiboot solutions, Its much easier to copy in and out liveisos between
> > large filesytems ExFAT, NTFS, ext4 . Linux (as of 6.1) has mature fs driver for
> > ExFAT(5.4) and NTFS (5.15)
> >
> > I've have tried creating my own grub2 configurations to loop mount isos.
> > but then its too much work to maintain. One has to update grub2 config files
> > everytime one downloads and deletes ISOs. Its preferable, that this is
> > auto-detected or dynamically done.
> >
> > Then I considered other multiboot solutions like
> > YUMI, Unetbootin, MultiBootUSB, supergrub2
> >
> > Ventoy seems to best them, by
> > - automatic detection of isos with grub2 menu
> > - also being able to nativeboot vdisks.
> > Ventoy seems to be fork of grub2 with some additional code to handle vdisks.
> >
> > One problem with partitioning systems for multi-OS machines is that the
> > limited disk space of a 512gb SSD drive gets broken and underutilized between partitions.
> > This leaves less usable space in a home partition.
> > Filesystems like btrfs allow having one big volume, installing an OS to
> > subvolumes and allow booting from subvolumes. Thereby sharing unused
> > space. One can then backup-up, offload and restoring subvolumes on need.
> > but this isn't cross platform. and requires more mental cognitive involvement.
> >
> > Consider having windows and linux dual boot. and a data partition.
> > One might give 64 gb to each OS. that means when booted into one OS, the
> > space occupied by the other OS is a waste. before you know it, there is very
> > little space on the SSD  due to all the OS partitions.
> >
> > Ventoy allows one to just keep a few 40gb vdisk files in the 512 gb partition.
> > User can easily move all unused images an external backup, until later use.
> >
> > Now onto ventoy dm_patch itself.
> > I tried reading the patch code dm_patch.c .
> > It is strange to me. seems to be finding specifc addresses in memory;
> > blanking them, inserting opcodes like 0x80, etc.
> > The method taken (kernel/initramfs patching) is spooky.
> > It also inserts code/scripts/files into the initramfs for liveisos, which it can do, because it
> > is the bootloader, and its MOK (machine owner key) has been added to the UEFI.
> > Even though ventoy seems to be honest trustable opensource GPL developer, you never
> > know if the mechanism of side-patching initramfs and kernel allows for future exploits by
> > malicious entities/governments.
> > One wants to be tension free on that front and kernel developers have a responsibility
> > to keep linux-users from straying over to risky solutions.
> > This is apart from how duplicated work it seems to maintain it that I mentioned in the bug.
> > A legit desirable feature shouldn't have to rely on these techniques.
> > Its better, for the feature to exist with the blessing of kernel code review and signing
> >
> > Here, I'm not giving a 'because windows does, linux should do so too',
> > justification. But, windows does do nativeboot of vhdx now since Win10-1903.
> > Only, to mention, that nativebooting vdisks is a useful enough thing, that Microsoft
> > also allows for it. As is the case for ventoy, the vhdx should be fixed-size and not be
> > touched in the hosting drive. They note their common scenarios and benefits.
> > https://learn.microsoft.com/en-us/windows-hardware/manufacture/desktop/deploy-windows-on-a-vhd--native-boot?view=windows-11
> > I configured a BCDBOOT/BCDEDIT entry to boot a HBCD-PE vhdx this way.
> >
> > If one completely abandoned windows, without the need for cross-platform usability,
> > foregoing VM-attachable/host-mountable mountable vdisks. there could be other
> > solutions such as btrfs subvolumes. But most home users laptops, come with windows,
> > have limited space and getting rid of windows and windows compatible technology
> > completely may not be an option.
> >
> > Are there dm-devel kernel developers, who have tried ventoy or explored ventoy like solutions.
> > I do want to be assured that some dm-devel developer is put their great mind to this
> > if at least on the backburner.
> >
> > Sorry if too verbose. I value your time.
> > Thanks
> > -Gana
> >
> > On Wed, Feb 15, 2023 at 3:33 AM Ganapathi Kamath <hgkamath@xxxxxxxxxxx> wrote:
> > >>
> > >> I am just an ordinary user of Linux and ventoy .
> > >> Q)
> > >> https://github.com/ventoy/Ventoy/issues/2234
> > >> Is what I have suggested here, meaningful?
> > >> Is there contra-indications to not do it or an alternative suggestions?
> > >> thoughts?
> > >>
> > >> Ventoy, a GPL software, uses a small kernel patch to achieve a small remountability feature.
> > >> It seemed to me, that patching the kernel per distribution is sub-optimal.
> > >> I couldn't find a previous relevant discussion on this on dm-devel, but it seems like a requirement would've been well known and this would have already been discussed. What does the actually patch do?
> > >>
> > >> Thx
> > >> -Gana
> > >>
> > > --
> > > dm-devel mailing list
> > > dm-devel@xxxxxxxxxx
> > v https://listman.redhat.com/mailman/listinfo/dm-devel
> > >

--
dm-devel mailing list
dm-devel@xxxxxxxxxx
https://listman.redhat.com/mailman/listinfo/dm-devel





[Index of Archives]     [DM Crypt]     [Fedora Desktop]     [ATA RAID]     [Fedora Marketing]     [Fedora Packaging]     [Fedora SELinux]     [Yosemite Discussion]     [KDE Users]     [Fedora Docs]

  Powered by Linux