Don't do it in the Database!!!

"Common Sense" dictates that you should do as much logic in the database as possible. 'Put it all in a query" the experts say. "Try to mimize the number of queries." would often be said. The web server is under more load than the database server, so off-load logic to the DB.

This thread of advice probably stems from a lot of code that is something like this:


<cfquery name="MyQuery">
Get User Data
</cfquery>

<cfoutput query="MyQuery">
<cfquery name="MyQuery2">
Get Other Data
where UserID=#Myquery.UserID#
</cfquery>
</cfoutput>

I know we've all written stuff like this (in our earlier days). go ahead, raise your hands and admit. (I thought so).

In cases like this a join would have gotten the information in one database query. Less queries means less round trips to the database, and everything is more effecient. So, we go on our merry way, with the thought that we should try to minimize database queries for application efficiency.

Unfortunately, the "Do it all in a single query" mind-set can be pushed too far. Suppose you designed a database for a CD cataloging system (my favorite example). You might have structured the database something like this:


Genre (GenreID, Genre)
Band (BandID, BandName)
Albums (AlbumID, Albumname)
Songs (SongID, SongName)

And of course, you need intersection tables:


Band_Genre (BandID, GenreID)
Band_Albums (BandID, AlbumID)
Band_Songs (BandID, SongID)

Sorry, I didn't create an database diagram image for ya'll. Now, in this structure, Genre and Songs and Albums have no relation to each other. They are all just "branches" hanging off the "trunk" of a Band.

Now, you may think to yourself, I'm gonna load up all this information because that is the most effecient way to do it.


select * from band
join Band_Genre on (Bands.bandID = Band_Genre.BandID)
join Genre on (Genre.GenreID = Band_Genre.BandID)
join band_songs on (bands.bandID = band_songs.bandID)
join songs on (songs.bandID = band_songs.songID)

That query will load up the band info, the genre info, and all the songs, right? Nope, wrong. What if the band has no genre defined? You won't get any information returned, even if song information exists. That is the nature of the join.

"But, hey Jeff." you might say. "We can fix that with an outer join, right?" Yes, of course you can.


select * from band
left outer join Band_Genre on (Bands.bandID = Band_Genre.BandID)
left outer join Genre on (Genre.GenreID = Band_Genre.BandID)
left outer join band_songs on (bands.bandID = band_songs.bandID)
left outer join songs on (songs.bandID = band_songs.songID)

Okay, so our updated query will retrieve all the proper information. If our band has no genres, we'll still get the all of the songs returned. That's good right? Empty genre columns will be returned, even if there are only 10 songs. It's not data, and maybe is not a big deal.

But, with the outer joins, what happens if all data is there? Suppose there are 2 genres and 10 songs. For each genre, you'll get all 10 songs, meaning you're returning the song data twice instead of once. Instead of the "12" rows that you need, you're getting 20 (That's a 60% increase if I know anything about Math, which I may not).

That may not be a big deal when dealing with the 4 tables in this example. But, when you're drawing data out of 54 different tables all in one query, the problem quickly balloons out of control. The code is not effecient or maintainable. You got the worst of all words.

So, before overdosing on outer joins, think twice. Maybe two queries is quicker.

Tell me I'm wrong.

Comments (Comment Moderation is enabled. Your comment will not appear until approved.)
Mimi's Gravatar HI, I am a new database programming student though I have been a strong user in my other life. My teacher asked us to create 3 tables for a quiz one of which was an intersection table of the other two. He asked us to populate all the tables. I was under the impression that one does not "Populate" an intersection table, as I understand the reason to create the table is to create the relationship between two tables. So my question is, was I wrong to turn in an unpopulated intersection table? I did "pop" the other two and then ran the required queries but i am now not sure that my understanding of relationships is correct. anybody got a good explanation why he would want me to put data in or am I right for not doing it in the first place?
THanks a bunch Mimi
# Posted By Mimi | 6/25/08 1:08 PM
Jeffry Houser's Gravatar Mimi,

Intersection tables usually contain ids from one table to the other and are used to demonstrate a many to many relationship. It sounds like your understanding of them is correct.

The data that they contain is usually the primary keys from the two tables they intersect. If you didn't put any data in the intersection table, you would have no relationship defined between the two tables. To me that sounds like an error.
# Posted By Jeffry Houser | 6/26/08 9:14 AM
David's Gravatar Jeffry, I'm confused. Surely you'll get the same result here using all inner joins? I.e. if there are 2 genres and 10 songs you'll still get 20 rows. I think your argument here is about single queries versus multiple queries, rather than inner versus outer joins.

BTW I would write the query as

SELECT *
FROM
band
LEFT OUTER JOIN (band_genre JOIN genre ON band_genre.genreID = genre.genreID)
ON band.bandID = band_genre.bandID
LEFT OUTER JOIN (band_songs JOIN songs ON band_songs.songID = songs.songID)
ON band.bandID = band_songs.bandID

That's from my academic background in formal logic (I'm relatively new to using SQL in anger) - I assume the query will be a little more efficient but haven't confirmed that yet.

I write a lot of queries like this in my job, but agree with your point for cases like genre/songs where duplication will occur that the multiple query path is more sensible.
# Posted By David | 9/29/09 10:40 PM
Jeffry Houser's Gravatar David,

Yes the argument was intended to be about multiple queries vs single queries.

I you use an inner join you will not necessarily get 20 rows. It depends on the relationship between songs and genre. If a genre has no songs; then it will not be returned in an inner join. If a song has no genre, then it will not be returned in an inner join.

An outer join would return all info.

The left outer joins you specify would work, but will potentially return a lot of duplicate data or empty fields.

Your query is an interesting mix of inner and outer joins, but I believe the end result is that you're doing an inner join with real complicated syntax.

In some situations the data trickled through multiple tables / most of the database. This was ballooning up the query significantly and unneedlessly, which was causing a huge processing slow down as I parsed the data and turned it into Value Objects.
# Posted By Jeffry Houser | 9/29/09 10:58 PM
All Content Copyright 2005, 2006, 2007, 2008, 2009 Jeffry Houser. May not be reused without permission
BlogCFC was created by Raymond Camden. This blog is running version 5.9.2.002.