More SQL

I was talking about Advanced Insert Statements a while ago. In my example, i was dealing with users and their status. I was writing a scheduled task to automatically change a user from a 'hold' status back to an active status.

I'm trying to write an attrition report on the same data. Attrition is a fancy way to say you want to know why users are 'leaving'. In a single query I'd like to get:

  • The names of users who left
  • The reason they left
  • The number of people for each reason
  • The total number of people who left

I thought this would a relatively trivial thing to write. Unless I'm missing something completely too obvious, it wasn't. My final solution is not intuitive (at least not to me), so I thought some people might wanna hear about it and tuck it in the back of their mind. Let's examine the relevant table structure:


Users (UserID, Firstname, lastname)

UserStatusHistory (UserStatusHistoryID, UserID, ClientStatus, StatusDate, ReasonCancelledID )

ReasonCancelled (ReasonCancelledID, Reason)

The first table is the users table. The second contains the user's status history. The third table contains the reason that the user cancelled. For the sake of this example, I'm going to assume that we don't care about selecting the user's current status. Anyone who has cancelled is fine to return, even if they later became 'active'.

query.recordcount will give us the total number of people who left, so that leaves us with three items to retrieve. First, we can grab the reason cancelled:


select reasoncancelled.reason
from ReasonCancelled join UserStatusHistory on (userstatushistory.reasoncancelledID = reasoncancelled.reasoncancelledID )

No big deal. One down. Then we can add the count of users for each reason:


select reasoncancelled.reason, count(UserStatusHistory.userID)
from ReasonCancelled join UserStatusHistory on (userstatushistory.reasoncancelledID = reasoncancelled.reasoncancelledID )
group by reasoncancelled.reason

This is gonna be so freakin' easy, right? Let's add the users


select reasoncancelled.reason, count(UserStatusHistory.userID), users.firstname, users.lastname
from ReasonCancelled join UserStatusHistory on (userstatushistory.reasoncancelledID = reasoncancelled.reasoncancelledID )
join users on (UserStatusHistory .userID = users.UserID)
group by reasoncancelled.reason, users.firstname, users.lastname

We can assume that if the ReasonCancelledID is filled out, they must have left. So, I'm All done! I can write some grouped output and bam, it's all good, right? Unfortunately no. Once you start grouping on the firstname and lastname, your count goes down to 1. And because of the aggregate function, you can't add the user information to the select clause without also adding it to the group clause.

So, what is the answer? I added a subquery in the select clause which will select the total based on the reason. This will make more sense when you see it.


select reasoncancelled.reason, users.firstname, users.lastname,

(select count(sush.userID) from userstatushistory sush
where sush.reasoncancelledID = reasoncancelled.reasoncancelledID and
    sush.StatusDate >
= #form.startdate# and
    sush.StatusDate <= #form.enddate#        
) as numUsersWhoLeftGroup
        

from reasoncancelled join userstatushistory on (userstatushistory.reasoncancelledID = reasoncancelled.reasoncancelledID )
    join users on (userstatushistory.userID = users.userID)

group by reasoncancelled.reason, userstatushistory.statusdate, users.firstname, users.lastname

And bingo, it started working as expected. I'm getting the proper sub totals. The subquery is just a select of its own, except that it references the reasoncancelledID from the parent query.

Last night I was getting internal SQL errors and I'm not sure why. Today, I revisited this concept from scratch and it started working. Nothing like a good night's sleep to solve all your problems, right? If we wanted to get really adventurous, we could use a subquery to get the total number of users who left


select reasoncancelled.reason, users.firstname, users.lastname,

(select count(sush.userID) from userstatushistory sush
where sush.reasoncancelledID = reasoncancelled.reasoncancelledID and
    sush.StatusDate >
= #form.startdate# and
    sush.StatusDate <= #form.enddate#        
) as numUsersWhoLeftGroup,

(select count(tush.userID) from userstatushistory tush where tush.reasoncancelledID is not null ) as numUsersWhoLeftTotal

from reasoncancelled join userstatushistory on (userstatushistory.reasoncancelledID = reasoncancelled.reasoncancelledID )
    join users on (userstatushistory.userID = users.userID)

group by reasoncancelled.reason, userstatushistory.statusdate, users.firstname, users.lastname

Probably using the recordcount is more efficient / easier than writing a subquery to get the total number of users who cancelled.

As always, this is a simplified example of a real world situation, and the code may not work flawlessy (but the concept should be sound)

Comments (Comment Moderation is enabled. Your comment will not appear until approved.)
All Content Copyright 2005, 2006, 2007, 2008, 2009 Jeffry Houser. May not be reused without permission
BlogCFC was created by Raymond Camden. This blog is running version 5.9.2.002.