Saturday, June 27, 2009

Boolean columns suck as statuses or types

A boolean column in a database table can usually have two possible values, TRUE or FALSE (I say usually, for one could make the column in question NULLable, allowing for the absence of a value). The problem with boolean columns is that in any non-trivial database, they're not expressive enough to describe the status or type of an entity. Consider the following simple requirement.

Users may access the system by typing in their username and password.

Easy enough; here's the table (we might even imagine an index on username and password to speed up logins):


As they often do, the requirements changed:

We'll also need the ability to disable certain users' access.

"That's easy enough", thinks the new developer, "I'll add a boolean field indicating whether or not the User is active. After all they're either active or they're not." Here's his design.


We can imagine that an inactive user attempts to login, and the system politely informs them that:

We're sorry but your account is no longer active. If you feel this is an error, please contact support at: support@domain.com.

Requirements changed again (at least in this example the requirements are still reasonable):

We're worried about spammers, so new users will have to verify their accounts before we allow them to login.

This is a standard process for lots of websites, the system sends you an email with a link containing a code which verifies your account. I am an English speaker and have learnt the meaning of the terms "active" and "inactive". Certainly un-verified new users are not active as they cannot log-in, but they're not really inactive either for we've not explicitly disabled them. Imagine showing an unverified new user the no-longer-active message above, their response would be some variation of "but I just registered!" (shortly thereafter the user stops attempting to use your crappy amateur-hour application).

Unfortunately there are lots of bad solutions to this problem and only a few good ones . Here's a bad solution:

Bad Solution 1 - another fun boolean field

"That's just another field, isVerified." Here's his design.


Look at the complexity we've caused.

Active users can be defined as isActive = TRUE AND isVerified = TRUE, New un-verified users can be defined as isActive = TRUE AND isVerfieid = FALSE, disabled users have the status of isActive = FALSE AND isVerified = FALSE. This is not maintainable.

Bad Solution 2 - NULLable boolean field

"Let's set isActive NULL for un-verified users" I'm not including the table graphic for this design, as I recently ate.




Great now our data-model says you're either inactive, active or we have no idea what your status is. I'd say the isActive column no longer really means "is-active". This is not maintainable either.

If you've read this far and are relatively sober, you'll realize that I've been building an argument to support my earlier assertion that in any non-trivial database, boolean columns are not expressive enough to describe the status or type of an entity. If you're not already convinced of this fact, imagine handling the further requirement via boolean columns:

We'd like to expire the accounts of users who've not paid within the last thirty-days. We also need to know the users who've expired so we can notify them.

What we need is a design which corresponds to how we think about the data. Most English speakers do not, for example, describe someone standing as isWalking = FALSE AND isSitting = FALSE, so why should your data-model act that way? Consider this simple design:

This design handles all the above requirements. Possible values in UserStatus are:

1 - Active
2 - Inactive
3 - Unverified
4 - Expired for non-payment
5 - Mocked for bad db design
6 - sent to an Intro. to Databases class

This design is flexible for if requirements change and we realize we need to account for a new status, just add a row to UserStatus. If it turns out that its possible for a user to be in more than one UserStatus (I can see a few un-named developers as being in statuses 5 and 6), we only need to add that as a new UserStatus ("7 - Mocked and re-educated" or perhaps "7 - Lucky to be working as a Developer").

If you keep things simple and model your data in an expressive way you'll be fine.

0 comments: