On Thu, Feb 16, 2012 at 12:50 PM, Alex Elsayed <eternaleye@xxxxxxxxx> wrote: > On Thu, Feb 16, 2012 at 12:33 PM, Piergiorgio Sartor > <piergiorgio.sartor@xxxxxxxx> wrote: >> Hi Alex, >> >>> Oh sure, the cache is persistent. But device discovery order is undefined, and >>> if the backing device is no different from one without a cache and writeback >>> caching is enabled the kernel has no *possible* way to know that a caching >>> device is needed or even exists. So it mounts it, but it doesn't have any of the >>> data in the writeback cache meaning it thinks the filesystem is corrupted. >>> Depending on the filesystem and exactly what is missing, it may run some >>> in-kernel recovery code that alters the disk. You just lost your data. >> >> nonono, I believe I wrote that the kernel >> should *first* look for caching devices >> and later for the others... >> >> The formatting thing is, clearly, a much >> standard approach, for the current kernel >> architecture, but nothing forbids to have >> a hierarchical search of devices. >> This could be done, for example, by assigning >> different classes to each device type, to >> be scanned in a specific order. >> >> In this scope (not bcache, but device discovery) >> it is already a problem a layered software RAID >> with metadata 1.0 together with 1.2 (or 1.1). >> Where the first lies at the end and the second >> at the beginning of the HDDs, making it difficult >> (but not impossible) to find out which is the >> outer and which is the inner one. > > The difference is that for MD devices, both types > of metadata are on the same block device. You're > prioritizing which *type of metadata* is checked > for first in that case. For bcache, you'd have to > scan /dev/sdz before /dev/sda if sdz is the cache > and sda is the backing device. Now consider a > few things: > > 1.) SCSI/SATA devices may be probed in parallel > > 2.) udev gets events when each device is probed, > *not* after all devices have been probed > > 3.) The bcache device may not even be attached > to the system at the time > > 4.) Even in the MD case, there is still *some* > change to the backing device, there is still some > sort of data there that says "hey, there's more." > A totally unchanged backing device won't do that. > Even if it doesn't invalidate the other metadata, it > still tells the kernel that it's not enough - think of > it as invalidating it at the logical rather than the > physical level > > 3 and 4 are the really critical ones. If the cable > that connects the SSD to the computer is flaky, > and it never gets probed, and there is *no* > metadata on the backing device, there is > *exactly* zero information available to the kernel > to inform it that a backing device ever existed at all. Er, to inform it that a *cache* device ever existed > > Also, you say that the cache must be scanned > before the backing device - but how do you know > it's a cache or a backing device until you've probed it? > You could delay sending any uevents untill all > devices are probed, except there are some devices > that take 30sec timeouts and fail, or iscsi, or devices > that get plugged in at runtime, or... > > And since you can't do that, you have a chicken > and egg problem. You can't probe the backing > device before the cache, but you don't know which > is the cache until you probe it. And there may be > more than one of each. You can have one cache > and 200 backing devices, in theory. Want to take > the odds that the cache gets probed first at random? > Because the kernel doesn't have enough information > for it to be anything other than random. -- To unsubscribe from this list: send the line "unsubscribe linux-bcache" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html