RE: [LSF/MM TOPIC] linux servers as a storage server - what'smissing?

"Loke, Chetan" <Chetan.Loke@xxxxxxxxxxxx> · Thu, 19 Jan 2012 12:32:59 -0500

> > True, a single front-end won't see all of those LUNs/devices. So not
> a
> > big concern
> > about the front-end hosts.
> >
> > I am thinking of a use-case where folks can use a linux-box to
manage
> > their different storage arrays.
> > So this linux box with 'libstoragemgmt + app' needs to
> > manage(scan/create/delete/so on) all those LUNs.
> >
> 
> People do have boxes with thousands of luns though & file systems in
> active use.
> Both for SAN and NAS volumes.
> 
> One of the challenges is what to do when just one LUN (or NFS server)
> crashes
> and burns. 

The FS needs to go read-only(plain & simple) because you don't know
what's going on.
You can't risk writing data anymore. Let the apps fail. You can make it
happen even today.
It's a simple exercise.

Like others, I have seen/debugged enough weirdness when it comes to
resets/aborts(FYI - 200+ hosts in a cluster).
Because of NDA reasons I can't disclose a whole lot but folks have
fixed/enhanced 
scsi stack to make resets/aborts fully robust. And you need folks who
can debug 
'apps/FS/block/initiator/wire-protocol/target-side' in one shot. Simple.
So when you say 'crash & burn' then either or 'all' of the above(minus
the protocol handling) might need fixing.

> You simply cannot "reboot" the server to clean up after one
> bad mount when you have thousands of other happy users runs on
thousands/hundreds
> of other mount points :)

Again, can't front-end can go read only and limit the outage w/o
disturbing thousands of users?

Chetan Loke
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html