When designing a data schema to store a variety of entities it can be common to require a selection of flags to be stored. There are different ways in which the data can be designed to accommodate this which I wanted to look at here.
Examples of these fields could be an ‘Is Active’ flag against an account, or a ‘Has Shipped’ indicator for an order. These are not the types of data which may be enumerations such as an order status being ‘Ordered’, ‘Processing’, ‘Shipped’, etc. The flags we’re looking at here can typically be identified where we have field names which start with ‘Was…’, ‘Is…’, ‘Has…’ etc. indicating a boolean value.
Using bit values
The first option we’re going to look at is using the BIT data type values. With this option we’ll add each field individually using a BIT data type. This will keep things clear and distinct in the schema, and similarly when referencing the fields it’ll be clear what they’re being used for and how. A simple table may look like the below:
CREATE TABLE dbo.BitFlags (
ID INT IDENTITY(1, 1) PRIMARY KEY,
AccountName VARCHAR(20),
IsActive BIT NOT NULL,
IsSuspended BIT NOT NULL,
AllowDiscount BIT NOT NULL
);
Extending this further we can add a constraint to make sure that an account can only be either Active or Suspended. Due to the way we’ve designed our data this is very clear:
ALTER TABLE dbo.BitFlags
ADD CONSTRAINT ActiveOrSuspended
CHECK ((IsActive = 1 AND IsSuspended = 0)
OR (IsActive = 0 AND IsSuspended = 1));
Taking a step back from the usability it’s worth digging a little more into how it impacts our data storage using this option. We’ve previously looked at how these values are stored by the engine and we know that when adding a BIT value to a table we’ll actually allocate a whole byte of data.
This will mean sacrificing a small amount of overhead on our storage for one or two values, however the benefit of this is that we add additional fields to the table which can utilise this storage without needing to rewrite our pages when populating the values. Nice and efficient.
Overall I like this approach for consistency, clarity, and familiarity, particularly when working with a shared code base.
Using a bit mask
Another approach we could consider is using a bit mask where we’d condense all of the values down into a single field. In this instance we’ll manually be allocating the bits inside a larger numeric value to create a new combined value. The table structure with this approach would look a little different:
CREATE TABLE dbo.BitMask (
ID INT IDENTITY(1, 1) PRIMARY KEY,
AccountName VARCHAR(20),
Flags TINYINT NOT NULL
);
For anyone unfamiliar with how this approach works for storing our values, we’ll essentially assign a single bit to each piece of data we want to store. When we want to retrieve or modify these values we then need to do some binary maths to figure extract the data. In this case our values are:
- Active = 1
- Suspended = 2
- Allow Discount = 4
Based on the value we store in our Flags field we’ll indicate which combination of are applicable. For example the value of 5 would indicate an Active account which Allows Discount, whereas a value of 2 would be for a Suspended account which does not Allow Discount.
You can immediately see that this isn’t as easy to pick up for folks who are unfamiliar with the approach. On top of this is the obscurity of the data without knowing the mapping of fields to the appropriate binary values the data is worthless.
The complexity extends to logic surrounding the table too. If we wanted to replicate the check constraint above it would now look a little different:
ALTER TABLE dbo.BitMask
ADD CONSTRAINT ActiveOrSuspendedMask
CHECK (Flags & 3 IN (1, 2));
If you aren’t familiar with these operators that’s fine – we’ll come back to these at a later point. The takeaway here is that the logic becomes much more obscure, albeit slightly more concise.
When we create these masks we also have the responsibility for defining the maximum number of values we’ll be using from the outset. We’ll need to choose an appropriate numeric data type for the number of flags we plan to use, such as TINYINT, SMALLINT or INT if we planned to use 8, 16 or 32 values respectively.
This can be an advantage as we can plan ahead and allocate the storage ahead of time. Conversely any changes in the future would need to change the data type manually. On top of this, there aren’t certain increments such as 3, 5, 6, or 7 bytes available.
Putting aside these challenges, this option can be very useful when you have a large number of flags which you need to use. This can massively simplify the complexity of a table schema and extracting or manipulating values can be done with careful forethought.
This is an approach I prefer when we’re dealing with a larger number of flags when you can get benefits from planning the additional storage ahead of time. One of the key requirements for this approach is accurate and up to date documentation to support the flags being used to allow developers to make sure they can reference what they need to effectively.
Wrap up
In this post we’ve look at two approached for storing multiple boolean flags – via individual fields or a single combined field.
Having individual fields is a typical design pattern as the schema is more self documenting than when needing an external source to define the masking applied in the single-value approach. Additionally when using the fields we’ve seen the added complexity in defining a simple CONSTRAINT against our data.
Whilst using the masking approach can lead to more challenges with the schema (think: indexing) there are benefits to the approach which come at scale. This includes being able to plan ahead to provision storage up front, as well as simplifying the schema when we have a large number of other fields to manage.
Personally I land on the side of using separate fields for each of the values. I feel the clarity which that provides gives much more benefit as the solution evolves over time and it needs to change or new developers come to maintain it.
With that said, one of my favourite aspects of using a mask is the ability to use and manipulate the data within the combined flag. This requires heavy usage of bitwise operators which is something I want to touch on in a follow up post.
How about you, what’s your typical approach? Have you used a mask before and have any war stories from the experience?
One reply on “Schema Design for Bit Flags”
[…] few weeks back we looked at options for storing multiple boolean flag values in a table. One of the options we considered was the use of a Bit Mask where we packed […]