Dealing with a Bad Design – #1

I am sure that there will be many posts about dealing with bad design. So, I am going to number them as we go so I can see just how many different bad designs need a solution!

Duplicate Fields

Like many shops, the Court never had a DBA.  If you look at the Court’s PDB (Platypus Database) it is obvious that developers with no idea on how to create a relational database had “designed” it.  For over 20 years, developers had just added tables as they needed for their specific task.  To make matters worse, the platform is an IBM iSeries (formerly known as the AS400) where, depending on how a program is written, a change to a table design requires the recompilation of all programs that use the table.  And since there was no source control process it was impossible to tell WHICH programs would need to be recompiled.

The solution of course was to copy an existing table that had most of what the new process needed, rename it, and just add a few new fields…viola!!
That was the database design process.

This causes havoc when trying to write applications that need to interface with the database. The first issue, how do you keep all the data synchronized? If you update data in one table, how does it get updated in the other? The second issue, which fields need to be included in a query? The ones from the first table or the second, almost identical, table? Unfortunately in cases like this the only way to know is to ask someone and learn the structure. Eventually you will figure out when to use which table.

Fake Dates

Another example, back in the day when storage was a main consideration and before there were specific date fields, the Court decided to use a six digit INTEGER data type as the ‘Date’ field, YYMMDD. So looking at the date 111 – what date would you figure that to be? Did you guess January 11, 2000? No? huh, I’m surprised that you didn’t see that 😉

Then there was the Y2K issue. A perfect time to fix that bad decision because by then the AS400 DID have a Date datatype. Nope…let’s just add an EIGHT long INTEGER data type to every table that duplicates the six digit date and store the dates as YYYYMMDD.

And again we have to rely on someone making sure that all the dates are updated correctly. But of course there are exceptions. There’s at least one table that the eight digit date did not get added. It has to be joined into another table using the six digit date…but sometimes that data isn’t updated so the query returns no data.

Committing Spreadsheet

But I think one of the more prevalent bad design decisions is not creating the proper many to many structure – committing spreadsheet. Remember in the last post we talked about normalization and the many-to-many relationship of Jurors to Trials. The WRONG way to store this information is to do something like:

TrialID DefendantName Juror1 Juror2 Juror3 Juror4 Juror5 Juror6 ....Juror 25

Here’s why this is wrong. It is rigid – what if someone wants 26 jurors? Now you have to change the table design (and maybe the application) just to add one more person to a Jury Panel. What if they only want 20 Jurors? now there are NULLs in the data for Juror21 – Juror25. (While NULL has its place in databases this is not one of them – NULL means ‘unknown’ – these Jurors are not unknown – they don’t exist, that is a different state.)

Other Platypus Designs

I would love to hear how your PDB is designed! Send me an email or comment on your design issues and I will see if I have a solution to post for you!

Developer Golden Egg

When you have to work with a database where a non-PDB Hunter has committed spreadsheet, you can use SQL to create what SHOULD have been done. Using a UNION we can create the many-to-many table like so:

SELECT 1 as ID, TrialID, Juror1 As JurorID FROM BadTable Where Juror1 is not null
UNION
SELECT 2, TrialID, Juror2 FROM BadTable Where Juror2 is not null
...
SELECT 25, TrialID, Juror25 FROM BadTable Where Juror25 is not null

Now you have a result set that is normalized and can be used as part of another query, to join into, display results from, the same things you could do with it if it HAD been designed correctly.

First Weapon – Understanding the Relational Model

For years I have been and will continue to recommend that anyone working with databases and queries read the following articles

The Fundamentals of Relational Database Design by Paul Litwin

and

Understanding SQL Joins @ devshed.com

These two articles helped solidify my formal education in databases and for years I thought I understood the Relational Model and some of the more advanced SQL Joins…which I did, to a point.  However, to be Platypus Hunters, we must delve deeper and understand how databases are represented by the relational model.

Over the past few weeks I have been reading Exam 70-461: Querying MS SQL Server 2012 Training Kit preparing to become a Microsoft Certified Solutions Associate: SQL Server 2012.  After reading Chapter 1 – Foundations of Querying I feel I have an even better understanding of the relational model and what makes something relational.  So, my Hunters-to-be, let’s discover Cobb and his relations.

There were several “aha” moments reading this first chapter.  The first came from reading this:

A relation in the relational model is what SQL calls a table.  The two are not synonymous.  You could say that a table is an attempt by SQL to represent a relation….

That made me re-think some of my preconceived notions about what I knew about databases, tables and the relational model.

So going forward with an open mind, I read:

Some of the most important principals to understand about T-SQL stem from the relational model’s core foundations-set theory and predicate logic.

Remember that the heading of a relation is a set of attributes, and the body a set of tuples.  So what is a set?  According to the creator of mathematical set theory, Georg Cantor, a set is described as follows:

“By a ‘set’ we mean any collection of M into a whole of definite, distinct objects m (which are called the ‘elements’ of M) of our perception or of our thought.”

Set Theory

Reading this chapter exposed me to concepts that I had knowledge of but no understanding.  I had heard “real” database people refer to tuples but had never really “gotten it”.  I had to go look up “tuple” on the All-Knowing Wikipedia.  As it relates to math it

is an ordered list of elements.   Tuples are usually written by listing the elements within parentheses ‘()’ and separated by commas; for example, (2,7,4,1,7) denotes a 5-tuple.

That was another “aha” for me.  I could see that this was a set of data, like what is returned from a query.  That made sense to me.

Looking into set theory I found that a relation has a heading and a body.  The heading is a set of attributes.  These attributes are referred to by name and type .  The body is a set of tuples.  These concepts become transformed by SQL into something recognizable.  The relation is a table showing the heading as column names and tuples as rows in the table.

(So, remember in the last episode when we said that thinking of a database like a spreadsheet was ok for non-Hunters, this concept is the real deal.)

Do you remember back in elementary school when you learned about VENN diagrams?  You had to draw all those circles that represented MAMMALS and EGG LAYING CREATURES and come up with the INTERSECTION showing Mammals that are Egg Laying Creatures?

Venn Diagram

Venn Diagram

That’s Set Theory – see you know this – you just learned it so long ago, you’ve forgotten!

Back to Cantor’s definition of a set.  It should be considered as a single, whole thing….a SET of something, in this case tuples.  It should be a set with no duplicates (there are exceptions to this in the real world, but theoretically…) and while not explicitly stated, no order to the set – as one of the tek-tip gurus would say, think of a result set as a bag of marbles, all the records jumbled up together – (when, in a query, you add sorting you have created a sequence, not a relation – a topic for another day!)

Predicate Logic

A predicate is defined as an expression that when applied to an object makes a proposition evaluate to either true or false. For instance, ‘hire date greater than 1/1/2012’ is an example of a predicate. For each employee you can evaluate this predicate and create a proposition that returns true if that employee was hired after that date.  Any expression that you can write to evaluate criteria to true or false is a predicate.  Is it greater than another number, does a name match, any other criteria.

Predicate Logic is a core element of the relational model.  Using predicates you can enforce data integrity, filter data, and even define the data model.

While I didn’t study these concepts formally, I did have an intuitive understanding of them. Understanding these concepts will be the first weapon in your arsenal to fight the Platypus Database.

Developer’s Golden Egg

I have decided that when I have some tidbit that might be helpful to a certain group of my readership, the Platypus will give you a Golden Egg – the first one goes to the Developers!  In fact, for a while, most of them will probably be for developers…that is what I did for a long time!

One thing that I tried to do as an application developer was iterate through the results of my queries in my application and not on the database.  Remember, think of the results of your query as a set.  So, if you need to get all the rows that meet some criteria try to figure out a way using a query to just return to your program the information in which you are interested.   Conversely, try not to return ALL the rows and have your application determine which ones meet your criteria, let the database do what databases are good at doing.

Also. try to use aggregate functions when possible.  Don’t bring back a whole record set to your application and count the number of rows, if you just need to know the total number of rows use the COUNT function and just return what you need.