Check out our new training course on AngularJS for Flex Developers

Ask the Author: Date Formatting in SQL

The question came from a reader.


how do I format a data that looks like 2006-03-12 00:00:00 to MMDDYYYY format and have the month and days all be two characters long?

In most cases, one would just return a date field from the database and format it for display using ColdFusion (or your language of choice).

But, if for whatever reason you need to format a date in SQL, there are some options. The reader was using SQL Server, so the rest of this article will assume that we are dealing with SQL Server.

First, you should look at the convert function. The convert function is used for date formatting in SQL. Unfortunately, it does not have the flexibility that CF offers you with its' mask parameter. It accepts three parameters:

  • Date Type: The data type is the type of the result after a convert. Since we're trying to format a date, most likely this will be char or varchar.
  • Expression: In the case of dates, this is the date (or a SQL expression that results in date)
  • Format Code: This is a code that specifies the format of the resulting date. A full list of codes, and their returned date formats, is listed in documentation.

The reader was asking about formatting the date using MMDDYYYY. Unfortunately, none of the date formats offer that specifically. 112 is the closest, which is YYYYMMDD. You can use this:


select CONVERT(varchar(8),GETDATE(),112)

( Note: For the data type, I'm going to use a varchar(8), since we want 8 characters in the result )

This will give us 20061213 . Close, but not what was asked for. 101 and 110 will give us the data in the right order, but not in the required format:


select CONVERT(varchar(10),GETDATE(),110)
select CONVERT(varchar(10),GETDATE(),101)

Will give us '12-13-2006' and '12/13/2006' respectively. I could have used the SUBSTRING function to wipe out the '-' or '/' but I didn't chose that route. Instead I chose to use the datepart function.

Datepart accepts 2 arguments:

  • The Date Part: The date part specifies the piece of the date you want to get. In this example, we are worried about the year (yyyy), the month (mm), and the day (dd)
  • The Date: this is the date you want to extra parts of.

So, using the datepart function we can piece together something like this:


select cast(datepart(month,getdate()) as varchar(2)) + cast(datepart(day,getdate()) as varchar(2)) + cast(datepart(year,getdate()) as varchar(4))

Which will return 12132006. I'm using cast to return all various date parts as string values. And I'm using the plus (+) sign to concatenate the results.

I've solved the problem solved, right?

Almost. It turns out that datepart will not return leading zeros for one digit days or one digit months. That would violate our requested format.

You can use a case inside SQL to solve this. A case in SQL is just like a cfif statement in ColdFusion. Here is an example that will give us the month:


select
case when datepart(month,getdate()) <= 9 then cast('0' + cast(datepart(month,getdate()) as varchar(1)) as varchar(2)) else cast(datepart(month,getdate()) as varchar(2)) end

This will return the month. It, basically, says if the current month is less than or equal to 9 (AKA A single digit) return the character '0' concatenated with month. Else return the month w/ no extra characters because it's already two digits.

Now add in the day:


select
case when datepart(month,getdate()) <= 9 then cast('0' + cast(datepart(month,getdate()) as varchar(1)) as varchar(2)) else cast(datepart(month,getdate()) as varchar(2)) end +

case when datepart(day,getdate()) <= 9 then cast('0' + cast(datepart(day,getdate()) as varchar(1)) as varchar(2)) else cast(datepart(day,getdate()) as varchar(2)) end

The day code is, basically, the same code except the datepart gets the day, not the month.

And finally, add in the year:


select
case when datepart(month,getdate()) <= 9 then cast('0' + cast(datepart(month,getdate()) as varchar(1)) as varchar(2)) else cast(datepart(month,getdate()) as varchar(2)) end +

case when datepart(day,getdate()) <= 9 then cast('0' + cast(datepart(day,getdate()) as varchar(1)) as varchar(2)) else cast(datepart(day,getdate()) as varchar(2)) end +

cast(datepart(year,getdate()) as varchar(4))

And bingo, I have a happy reader with a happy client.

Now, this is a lot of code and it will make your SQL statement look sloppy and hard to read and hard to maintain. My next step was to move this information into a SQL User Defined Function. I created a SQL User Defined Function to format the date like this:


CREATE FUNCTION FormatDate ( @mydate datetime )
RETURNS varchar(8) AS
BEGIN

return (
case when datepart(month,@mydate) <= 9 then
cast('0' + cast(datepart(month,@mydate) as varchar(1)) as varchar(2))
else
cast(datepart(month,@mydate) as varchar(2))
end +
case when datepart(day,@mydate) <= 9 then
cast('0' + cast(datepart(day,@mydate) as varchar(1)) as varchar(2))
else
cast(datepart(day,@mydate) as varchar(2))
end +
cast(datepart(year,@mydate) as varchar(4)) )

END

You can call it from code like this:


select dbo.FormatDate(getdate())
( If I didn't mention that this is only tested in SQL Server 2000 and your mileage may vary, let me disclaim that now )

Badly Named Columns

Pretend you are dealing with an e-commerce site. The orders are stored in two tables. One table is the order table, containing the customer information and overall order information. The other is an item table, which contains information on each item in the order.

Pretty standard stuff, right?

Now, let's pretend the orders table has a field named "total". What does this field contain?

I bet you said "It contains the total order amount." You probably know that an order total is calculated by taking the Item Price * Quantity for all items in the order and adding them together. ( this is called a sum, and is basic math )

A few of you might say that the site doesn't need to store the order total. You can create a view that contains the sale price, and is calculated from the items table. It's actually pretty easy to do this:


select orderID, sum(salesprice*quantity) as count
from items
group by orderID

( That code is off the cuff and completely untested ). I'd agree with those that thought this far through it.

Back to the original question: How many said that "orders.total" probably stores the total dollar amount of unfilled items? The app I'm working on is using the column to store that. It took me about an hour of code reviews to figure it out.

If the field was named "UnfilledTotal" or something similar, it would have been much more obvious.

More SQL

I was talking about Advanced Insert Statements a while ago. In my example, i was dealing with users and their status. I was writing a scheduled task to automatically change a user from a 'hold' status back to an active status.

I'm trying to write an attrition report on the same data. Attrition is a fancy way to say you want to know why users are 'leaving'. In a single query I'd like to get:

  • The names of users who left
  • The reason they left
  • The number of people for each reason
  • The total number of people who left

I thought this would a relatively trivial thing to write. Unless I'm missing something completely too obvious, it wasn't. My final solution is not intuitive (at least not to me), so I thought some people might wanna hear about it and tuck it in the back of their mind. Let's examine the relevant table structure:


Users (UserID, Firstname, lastname)

UserStatusHistory (UserStatusHistoryID, UserID, ClientStatus, StatusDate, ReasonCancelledID )

ReasonCancelled (ReasonCancelledID, Reason)

The first table is the users table. The second contains the user's status history. The third table contains the reason that the user cancelled. For the sake of this example, I'm going to assume that we don't care about selecting the user's current status. Anyone who has cancelled is fine to return, even if they later became 'active'.

query.recordcount will give us the total number of people who left, so that leaves us with three items to retrieve. First, we can grab the reason cancelled:


select reasoncancelled.reason
from ReasonCancelled join UserStatusHistory on (userstatushistory.reasoncancelledID = reasoncancelled.reasoncancelledID )

No big deal. One down. Then we can add the count of users for each reason:


select reasoncancelled.reason, count(UserStatusHistory.userID)
from ReasonCancelled join UserStatusHistory on (userstatushistory.reasoncancelledID = reasoncancelled.reasoncancelledID )
group by reasoncancelled.reason

This is gonna be so freakin' easy, right? Let's add the users


select reasoncancelled.reason, count(UserStatusHistory.userID), users.firstname, users.lastname
from ReasonCancelled join UserStatusHistory on (userstatushistory.reasoncancelledID = reasoncancelled.reasoncancelledID )
join users on (UserStatusHistory .userID = users.UserID)
group by reasoncancelled.reason, users.firstname, users.lastname

We can assume that if the ReasonCancelledID is filled out, they must have left. So, I'm All done! I can write some grouped output and bam, it's all good, right? Unfortunately no. Once you start grouping on the firstname and lastname, your count goes down to 1. And because of the aggregate function, you can't add the user information to the select clause without also adding it to the group clause.

So, what is the answer? I added a subquery in the select clause which will select the total based on the reason. This will make more sense when you see it.


select reasoncancelled.reason, users.firstname, users.lastname,

(select count(sush.userID) from userstatushistory sush
where sush.reasoncancelledID = reasoncancelled.reasoncancelledID and
    sush.StatusDate >
= #form.startdate# and
    sush.StatusDate <= #form.enddate#        
) as numUsersWhoLeftGroup
        

from reasoncancelled join userstatushistory on (userstatushistory.reasoncancelledID = reasoncancelled.reasoncancelledID )
    join users on (userstatushistory.userID = users.userID)

group by reasoncancelled.reason, userstatushistory.statusdate, users.firstname, users.lastname

And bingo, it started working as expected. I'm getting the proper sub totals. The subquery is just a select of its own, except that it references the reasoncancelledID from the parent query.

Last night I was getting internal SQL errors and I'm not sure why. Today, I revisited this concept from scratch and it started working. Nothing like a good night's sleep to solve all your problems, right? If we wanted to get really adventurous, we could use a subquery to get the total number of users who left


select reasoncancelled.reason, users.firstname, users.lastname,

(select count(sush.userID) from userstatushistory sush
where sush.reasoncancelledID = reasoncancelled.reasoncancelledID and
    sush.StatusDate >
= #form.startdate# and
    sush.StatusDate <= #form.enddate#        
) as numUsersWhoLeftGroup,

(select count(tush.userID) from userstatushistory tush where tush.reasoncancelledID is not null ) as numUsersWhoLeftTotal

from reasoncancelled join userstatushistory on (userstatushistory.reasoncancelledID = reasoncancelled.reasoncancelledID )
    join users on (userstatushistory.userID = users.userID)

group by reasoncancelled.reason, userstatushistory.statusdate, users.firstname, users.lastname

Probably using the recordcount is more efficient / easier than writing a subquery to get the total number of users who cancelled.

As always, this is a simplified example of a real world situation, and the code may not work flawlessy (but the concept should be sound)

Advanced Insert Statements

So, let's say you have an application with (oh the horror) users. This application keeps track of a user status. Let's say this is a sales application. For the sake of simplicity we can assume that there are two statuses, 'active' and 'on hold'. They might be stored in a table like this:


Statuses (StatusID, Status)

And User Status Information, might be in a table like this:


UserStatusHistory(UserSTatusHistoryID, UserID, StatusID, StatusDate, endDate)

Why did I put an end date in there? For the purpose of this example we're going to assume that the "On hold" status can expire. When the status expires, the system should automatically change the user status back to Active. You'd implement that with a scheduled task, right? I would too! But, what is the code?

The easiest end route to the problem is something like this:


<cfquery>
select UserStatusHistory.*
from UserStatusHistory
where UserStatusHistory.StatusID = [on hold ID] and
        userstatushistory.userstatushistoryID in (select top 1 ushsub.userstatushistoryID
                        from userstatushistory ushsub
                        where ushsub.userID = userstatushistory.userID
                        order by statusdate desc
                            ) and
UserStatusHistory.enddate <= GETDATE()
</cfquery>
<cfloop query>
<cfquery>
insert into userstatushistory (UserID, StatusID, StatusDate, endDate)
values(query.userID, [active status], now(), null)
</cfquery>

</cfloop>

The code does this, run a query to get all the current statuses that are on hold and the on-hold end date is less than the current date (AKA we want to switch them back). Then it loops over the query and inserts a new history row for each user that needs status updating.

You run a lot of queries in the first method. There is a better (or at least different) way:


<cfquery>
insert into userstatushistory (UserID, StatusID, StatusDate, endDate)
select UserStatusHistory.userID, UserStatusHistory.StatusID, now(), null
from UserStatusHistory
where UserStatusHistory.StatusID = [on hold ID] and
        userstatushistory.userstatushistoryID in (select top 1 ushsub.userstatushistoryID
                        from userstatushistory ushsub
                        where ushsub.userID = userstatushistory.userID
                        order by statusdate desc
                            ) and
UserStatusHistory.enddate <= GETDATE()
</cfquery>

All done within a single cfquery statement. There is no looping and no multiple queries. The query runs in one fowl swoop, theoretically without bogging down your server.

This post was not proof-read, and the code may have typos.

Alternatives to Database Recursion

Sean Corfield recently posted an entry about recursionin the database. This got me thinking. I once had a teacher who was adamantly against recursion for performance reasons, saying that Towers of Hanaoi was the only example where recursion was beneficial from a performance standpoint.

I have been unable to prove him wrong; can anyone else do so?

As such, I've always been cautious of recursion.

I recently spoke to a client about an application with a similar structure (to what Sean describes in his post). It was storing part information in a self-referencing table. (One part can be made to build other parts, etc.. ). It turned out that this application was a drag, because in some cases it was recursing up to 300 times to get all the data in the 'recursive tree'. This is a situation where recursion was just not working. We spoke, and I suggested this type of solution.

For simplicity, I'll use a table like this:


RecurisveExample(PKID, Data, RecurseOrder)

PKID is the primary key. Data is whatever data we are storing (whether it is part numbers or cat pedigree) RecurseOrder is a text string

For each element in the table, you give it a 3-character identifier (In my example, I'll use 3 characters, but the identifier can actually be 4 or more). The RecurseOrder field contains 'Parent's identifier' + 'Current identifier'

So, to put some data in this table:
1 Root 1 001
2 Root 2 002
3 Root 3 003
4Child 1.1 001001
5 Child 1.2 001002
6 Child 1.3 001003
7 Child 1.4 001004
8 Child 2.1 002001
9 Child 2.2 002002
10 Child 2.3 002003
11 Child 2.3.1 002003001
12 Child 2.3.2 002003002
13 Child 2.3.2.1 002003002001
14 Child 2.3.2.2 002003002002
15 Child 2.2.1 002002001
16 Child 2.2.2 002002002

The root is assigned '001', '002', '003'. The children of root 1 are assigned '001001','001002', '001003'. The children of child 1 are assigned '001001001', '001001002', '001001003', and so on.

Given any root, you can get the tree in recursive order just by sorting on the recurseorder column. To get the recurseorder on root 2, for example, you can do this:


select * from recursiveexample
where recurseorder like '002%'
order by recurseorder

To get the tree, starting at the third child of root 2, you can do this:


select * from recursiveexample
where recurseorder like '002003%'
order by recurseorder

In some cases, such as discussion groups (where you want to retrieve a whole thread in post order) I have found this to be an elegant solution.

With a three-character identifier, you are limited to 1000 elements before you run out of thread strings. But you use a larger identifier string if you need more.

Don't do it in the Database!!!

"Common Sense" dictates that you should do as much logic in the database as possible. 'Put it all in a query" the experts say. "Try to mimize the number of queries." would often be said. The web server is under more load than the database server, so off-load logic to the DB.

This thread of advice probably stems from a lot of code that is something like this:


<cfquery name="MyQuery">
Get User Data
</cfquery>

<cfoutput query="MyQuery">
<cfquery name="MyQuery2">
Get Other Data
where UserID=#Myquery.UserID#
</cfquery>
</cfoutput>

I know we've all written stuff like this (in our earlier days). go ahead, raise your hands and admit. (I thought so).

In cases like this a join would have gotten the information in one database query. Less queries means less round trips to the database, and everything is more effecient. So, we go on our merry way, with the thought that we should try to minimize database queries for application efficiency.

Unfortunately, the "Do it all in a single query" mind-set can be pushed too far. Suppose you designed a database for a CD cataloging system (my favorite example). You might have structured the database something like this:


Genre (GenreID, Genre)
Band (BandID, BandName)
Albums (AlbumID, Albumname)
Songs (SongID, SongName)

And of course, you need intersection tables:


Band_Genre (BandID, GenreID)
Band_Albums (BandID, AlbumID)
Band_Songs (BandID, SongID)

Sorry, I didn't create an database diagram image for ya'll. Now, in this structure, Genre and Songs and Albums have no relation to each other. They are all just "branches" hanging off the "trunk" of a Band.

Now, you may think to yourself, I'm gonna load up all this information because that is the most effecient way to do it.


select * from band
join Band_Genre on (Bands.bandID = Band_Genre.BandID)
join Genre on (Genre.GenreID = Band_Genre.BandID)
join band_songs on (bands.bandID = band_songs.bandID)
join songs on (songs.bandID = band_songs.songID)

That query will load up the band info, the genre info, and all the songs, right? Nope, wrong. What if the band has no genre defined? You won't get any information returned, even if song information exists. That is the nature of the join.

"But, hey Jeff." you might say. "We can fix that with an outer join, right?" Yes, of course you can.


select * from band
left outer join Band_Genre on (Bands.bandID = Band_Genre.BandID)
left outer join Genre on (Genre.GenreID = Band_Genre.BandID)
left outer join band_songs on (bands.bandID = band_songs.bandID)
left outer join songs on (songs.bandID = band_songs.songID)

Okay, so our updated query will retrieve all the proper information. If our band has no genres, we'll still get the all of the songs returned. That's good right? Empty genre columns will be returned, even if there are only 10 songs. It's not data, and maybe is not a big deal.

But, with the outer joins, what happens if all data is there? Suppose there are 2 genres and 10 songs. For each genre, you'll get all 10 songs, meaning you're returning the song data twice instead of once. Instead of the "12" rows that you need, you're getting 20 (That's a 60% increase if I know anything about Math, which I may not).

That may not be a big deal when dealing with the 4 tables in this example. But, when you're drawing data out of 54 different tables all in one query, the problem quickly balloons out of control. The code is not effecient or maintainable. You got the worst of all words.

So, before overdosing on outer joins, think twice. Maybe two queries is quicker.

Tell me I'm wrong.

All Content Copyright 2005, 2006, 2007, 2008, 2009 Jeffry Houser. May not be reused without permission
BlogCFC was created by Raymond Camden. This blog is running version 5.9.2.002.