On Mon, 2007-06-04 at 00:51 +0200, Alexander Staubo wrote: > True, and when they overlap you tend to want to describe the > validation errors in one place, not two -- either the database or the > app, not both. Relational databases have traditionally argued that > these rules should be in the former, so that there's one layer through > which every single change has to go. > I disagree here. You often _do_ want to describe some types of errors twice. You check the same thing in different ways at different points in the code, and that isolates errors and assures developers that certain assumptions are safe. In the database world, it's particularly important to use declarative constraints. If developer A inserts bad data and developer B uses that bad data, it could compound the problem and yet remain invisible until the problem is almost impossible to debug. Constraints assure the developers that they are starting with some known state. Applications should check for inconsistencies when it makes sense, as well. Every important API that I can think of checks the input, and reports some kind of useful error when the assumptions are violated. Every system call has all sorts of useful error codes. For example, read() can return EBADF. Nobody thinks "Hey, I'll send a random integer for the file descriptor", but I'd be willing to bet that the error condition has been reached by accident before, and probably saved someone a lot of time versus just filling the target buffer with random bytes and returning success. I would argue it's more important in a database, because the error conditions can persist for a longer period of time and cause more damage in the process, but the idea is the same. > I agree. In my experience, however, the best kind of data model is the > one that is immediately mappable to user-level concepts -- to human > concepts. A "user" relation has attributes like "name", "birth_date", > etc. If you manage to keep the model flat and friendly enough, you can > map the attributes to forms and translate attribute-level errors > directly to form error messages. > I think you're oversimplifying. What you say works when user input is a separate, contained, isolated chunk of data. In that case, any error is only related to the self-consistency of the input, and can easily be mapped back to a user-level error. However, it breaks down when you have constraints on the interrelationships between pieces of data. These interrelationships are what can be broken from multiple points in the application code, and there is no way to map backwards from the constraint violation to a specific user error. Hence, the application must translate. Try to imagine some of the complexities in a scheduling system, and what kind of constraints that might involve. Then, think about how some of the same constraints might be broken in very different ways. Time conflicts could come about either by participants overscheduling themselves, or by the event itself shifting in time such that some participants are overscheduled. Perhaps someone tries to sign up for an event that's already full, or perhaps the venue moves to a new location with a lower capacity. I can't think of any way to map backwards from the constraint violation to the user level error. You could probably imagine similar problems with an inventory system. > > The application has much more information about the user and the context > > of the error that the database shouldn't have. For instance, the > > language that the user speaks might affect the error message. > > Localization is easily accomplished by piping the error message through gettext. > And what about the two-column unique index that can be violated from two different aspects of the UI? You only get one database error, but you really should present two different errors to the user. Any time that a constraint can be violated through two completely different paths, your one-to-one constraint-to-application-error breaks down. The root of the issue is that the database knows that an inconsistent state has been reached, it does not know _how_, nor should it. The how might change as new code is added or perhaps as new bugs are introduced. Constraints in an RDBMS should be declarative which is very important (you don't need to prove that a column is always in a unique state, you look, and it says it is unique). You can add procedural code to an RDBMS, but you can end up making your database your application that way. User-level errors are heavily dependent on _how_ the error occurred. The database doesn't know this, so the constraint violation message shouldn't presume to know how it happened. I'll use the analogy to read() again: who passes EBADF back to the user directly? Does that mean it's a bad API? No, it just means it had no idea why you passed it a bad file descriptor, but it knows it's bad, and it tells the caller. Similarly with exceptions in java/ruby/python: you shouldn't pass those exceptions back to the user. Regards, Jeff Davis