Common Mistakes Made in Database Design

Nick Youngson

Whether you are working with a database that holds hundreds of records or millions of records, proper database design is always important. Not only will it make retrieving the information much easier, it will also simplify expanding the database in the future. Unfortunately, it's easy to fall into a few traps that can make things difficult in the future.

There are entire books written on the subject of normalizing a database, but if you simply avoid these common mistakes, you will be on the right track to good database design.

Mistake #1: Repeating Fields in a Table

A basic rule of thumb for good database design is to recognize repeating data and to put those repeating columns in their own table. Repeating fields in a table is common for those who have come from the world of spreadsheets, but while spreadsheets tend to be flat by design, databases should be relational. It's like going from 2D to 3D.

Luckily, repetitive fields are usually easy to spot. Just take a look at this table:

OrderIDProduct1Product2Product3
1Teddy BearsJelly Beans 
2Jelly Beans  

 

What happens when an order contains four products? We would need to add another field to the table to support more than three products. And if we've built a client application around the table to help us input data, we may need to modify it with the new product field. And how do we find all of the orders with Jellybeans in the order? We would be forced to query every product field in the table with an SQL statement that might look like: SELECT * FROM Products WHERE Product1='Jelly Beans' OR Product2='Jelly Beans' OR Product3='Jelly Beans'.

Instead of having a single table that stuffs all the information together, we should have three tables that each hold a distinct piece of information. In this example, we would want an Orders table with information about the order itself, a Products table with all of our products and a ProductOrders tablet that linked products to the order.

OrderIDCustomerIDOrder DateTotal
171/24/1719.99
291/25/1724.99

 

ProductIDProductCount
1Teddy Bears1
2Jelly Beans100

 

ProductOrderIDProductIDOrderID
10111
10221

 

Notice how each table has its own unique ID field. This is the primary key. We link tables by using a primary key value as a foreign key in another table. Read more about primary keys and foreign keys.

Mistake #2: Embedding a Table in a Table 

This is another common mistake, but it doesn't always stand out quite as much as repetitive fields. When designing a database, you want to make sure all of the data in a table relates to itself. It's like that child's game about spotting what is different. If you have a banana, a strawberry, a peach and a television set, the television set probably belongs somewhere else.

Along the same lines, if you have a table of sales people, all of the information in that table should relate specifically to that sales person. Any extra information that isn't unique to that sales person may belong somewhere else in your database. 

SalesIDFirstLastAddressPhoneNumberOfficeOfficeNumber
1SamElliot118 Main St, Austin, TX(215) 555-5858Austin Downtown(212) 421-2412
2AliceSmith504 2nd Street, New York, NY(211) 122-1821New York (East)(211) 855-4541
3JoeParish428 Aker St, Austin, TX(215) 545-5545Austin Downtown(212) 421-2412

 

While this table might look like it is all related to the individual salesperson, it actually has a table embedded within the table. Notice how the Office and OfficeNumber repeat with "Austin Downtown". What if an office phone number changes? You would need to update a whole set of data for one single piece of information changing, which is never a good thing. These fields should be moved to their own table.

SalesIDFirstLastAddressPhoneNumberOfficeID
1SamElliot118 Main St, Austin, TX(215) 555-58581
2AliceSmith504 2nd Street, New York, NY(211) 122-18212
3JoeParish428 Aker St, Austin, TX(215) 545-55451

 

OfficeIDOfficeOfficeNumber
1Austin Downtown(212) 421-2412
2New York (East)(211) 855-4541

 

This type of design also gives you the ability to add additional information to the Office table without creating a nightmare of clutter in the sales person table. Imagine how much work it would be to simply keep track of the street address, city, state and zip code if all of that information was in the sales person table!

Mistake #3: Putting Two or More Pieces of Information Into a Single Field

Embedding the office information into the sales person table wasn't the only problem with that database. The address field contained three pieces of information: the street address, the city and the state. Each field in the database should only contain one single piece of information. When you have multiple pieces of information in a single field, it can become harder to query the database for information.

For example, what if we wanted to run a query on all sales people from Austin? We would need to search within the address field, which is not only inefficient, but can return bad information. After all, what happens if someone lived on Austin street in Portland, Oregon?  

Here's what the table should look like:

SalesIDFirstLastAddress1Address2CityStateZipPhone
1SamElliot118 Main St AustinTX787202155555858
2AliceSmith504 2nd St New YorkNY100222111221821
3JoeParish428 Aker StApt 304AustinTX787162155455545

 

There are a couple of things to note here. First, "Address1" and "Address2" would seem to fall under the repetitive fields mistake.

However, in this case they are referring to separate pieces of data that relate directly to the sales person rather than a repeating group of data that should go in its own table.

Also, as a bonus mistake to avoid, notice how the formatting for the phone number has been stripped out of the table. You should avoid storing the format of fields when at all possible. In the case of phone numbers, there are multiple ways people write a phone number: 215-555-5858 or (215) 555-5858. This would make searching for a sales person by their phone number or doing a search of sales people in the same area code more difficult.

Mistake #4: Not Using a Correct Primary Key

In most instances, you will want to use an automatically incrementing number or some other generated number or alphanumeric for your primary key. You should avoid using any actual information for the primary key even if it sounds like it would make a good identifier.

For example, we each have our own individual social security number, so using the social security number for an employee database might sound like a good idea. But while rare, it's possible for even a social security number to change, and we never want our primary key to change.

And that is the problem with using actual information as a key value. It can change.

Mistake #5: Not Using a Naming Convention

This might not sound like a big deal when you first get started designing your database, but once you get to the point of writing queries against the database to retrieve information, having a naming convention will help as you memorize field names.

Just imagine how much more difficult that process would be if names were stored as FirstName,LastName in one table and first_name, last_name in another table.

The two most popular naming conventions are capitalizing the first letter of every word in the field or separating words using an underscore. You may also see some developers capitalizing the first letter of every word except the first word: firstName, lastName.

You will also want to decide on using singular table names or plural table names. Is it an Order table or an Orders table? Is it a Customer table or Customers table? Again, you don't want to be stuck with an Order table and a Customers table.

The naming convention you choose isn't as important as the process of actually choosing and sticking to a naming convention.

Mistake #6: Improper Indexing

Indexing is one of the hardest things to get right, especially for those new at database design. All primary keys and foreign keys should be indexed. These are what link tables together, so without an index, you will see very poor performance out of your database.

But what are too often missed are the other fields. These are the "WHERE" fields. If you are often going to narrow your search by using a field in a WHERE clause, you want to think about putting an index on that field. However, you don't want to overly index the table, which can also hurt performance.

How to decide? This is part of the art of database design. There are no hard limits on how many indexes you should put on a table. Primarily, you want to index any field that is frequently used in a WHERE clause. Read more about properly indexing your database.