Bug with Query of Query: using LIKE incorrectly finds records with null values
Note: This blog post is from 2008. Some content may be outdated--though not necessarily. Same with links and subsequent comments from myself or others. Corrections are welcome, in the comments. And I may revise the content as necessary.Someone reported on a mailing list that they'd noticed a mistake in the results of a Query or Query operation when using a LIKE. It finds a record whose value for the column is null, which is of course wrong (that doesn't match the LIKE pattern--indeed a null never equals or doesn't equal anything).
The problem can be worked around by adding "and column is not null", but it really surprised me (and him, obviously) to discover the bug. Maybe someone else has written of it before.
But I tested it in 8.01 and can confirm it's still an issue. While I've recommended that he file it as a bug, I wanted to share it here until it's resolved. Hope this may help someone. (If we're missing something, feel free to comment.)
Here's some code you can run which uses the example app database that comes with CF. See the comments for what's wrong:
select * from employees
</CFQUERY>
<!--- note that by default, all records in the example employees table have values in all columns --->
<cfdump var="#demoq#" label="before insert">
<!--- insert a record with only one column (causing the others to be null)
--->
<CFQUERY NAME="insert" DATASOURCE="cfdocexamples" >
insert into employees (firstname) values ('charlie')
</CFQUERY>
<CFQUERY NAME="test2" DATASOURCE="cfdocexamples">
select * from employees
</CFQUERY>
<cfdump var="#test2#" label="after insert">
<!--- do a Q of Q of that query, for a column which would have a null for that last added records, so you'd expect it not to find it --->
<cfquery dbtype="query" name="test3">
select * from test2
where department like 'Sales%'
</cfquery>
<!--- that record with the null value still shows up! --->
<cfdump var="#test3#" label="dump of Q of Q">
Again, the problem is that the record with the null value still shows up in the Q of Q result, and it definitely should not. A solution for now is that if you change the Where clause to "department is not null and department like 'Sales%'", then it properly does not show the null-valued record(s) anymore.
(If anyone plays with this on their own code, note as well that Q of Q is case-sensitive, so no records would be found at all if you mistakenly used 'sales%' rather than 'Sales%', above.)
For more content like this from Charlie Arehart:Need more help with problems?
- Signup to get his blog posts by email:
- Follow his blog RSS feed
- View the rest of his blog posts
- View his blog posts on the Adobe CF portal
- If you may prefer direct help, rather than digging around here/elsewhere or via comments, he can help via his online consulting services
- See that page for more on how he can help a) over the web, safely and securely, b) usually very quickly, c) teaching you along the way, and d) with satisfaction guaranteed
Just had to throw that in there.
Null = unknown
So technically, as far as CF is concerned, null could be correct and selects the record..
If you don't want the nulls, select where [column name] IS NOT null (the correct way to ask).
Null is a beautiful thing, and nothing to be afraid of using, as long as its understood and used correctly.
Null may mean unknown, but it's also true that it's never *equal* or *not equal* to anything. So you cannot say (nor will a real database determine) that "null could be correct". Sorry, but that's just bunk.
Did you run the code? Either way, if you changed the test3 query from being a Q of Q to just being a query of the real database, did you notice that the DBMS does NOT return the records with the null values? Nor should it. The NULL values are not LIKE the pattern offered--nor are they like them. So they should be ignored (as indeed they are by a query against the real DB, just not in the Q of Q.) As I noted, you can workaround things by adding "department is not null", but we shouldn't have to. The records shouldn't show up.
So this is very definitely a bug. I feel like I've made (and restated) my case pretty clearly here. If you still think I'm wrong, please show me how that's so.
As for whether null's a beautiful thing, well that's debatable. There are situations where people can get confused because of them. Say one has 100 records in a table and 10 have a value of "x", and 5 have a value of null. If one asks the DB to select those that are not "x", it won't find 90 but 85, because the records with null are not found (again, like I said earlier in this comment). I wonder if that's really the sort of thing you were thinking of.
Ben offered another perspective on problems with nulls (that others share). But I wasn't debating that in this post.
I still stand by my assertion that this is a bug. I'm willing to be persuaded, but please use proof cases, not mere accusations and opinion. If you or someone proves me wrong, I absolutely will retract my contention.
I did not mean to offend you. But I'm afraid that you are making the confusing statements. This is not an accusation, but a statement of fact.
You said, "Null may mean unknown, but it's also true that its never "equal" or "not equal" to anything".
I'm afraid its either one or the other, and my definition is correct. How do you say that something is "like unknown", that simply doesn't make since. You have to ask the correct way ("is" or "is not") to get the correct results.
The block of data that is at a space that the database knows as null, could indeed be the value that you are trying to match or filter.
The proof is in the definition.
Yes, whether or not null is beautiful is certainly an opinion, and one that I will hold forever. What null is and how it is handled is certainly a subject of much debate, as is evident in how the different db vendors handle null. SQL Server would see it your way, Oracle would see it mine. That's ok, its these type of discussions that make computer science fun and exciting.
I'll download Oracle and try it out. That would indeed be interesting and if it's so, I'll thank you for sharing it. As you say, then this discussion has helped clarify things for people coming from different backgrounds.
I have reread my original post, I first want to apologize for sounding so terse. My wife gets on me about it allot. Not to worry, she’ll have me straight soon ;). I've read a bunch of your books and material and truly respect you as a pro.
If it’s cool with you, I'd like to expound a little on why I believe NULL is a good thing and should be handled as I've been recommending. I had been thinking about NULL a lot after having been doing some MS SQL to Oracle conversions recently and receiving the same results as you. I’ve always been an Oracle developer so just always used IS NULL and IS NOT NULL, and my SQL Server friends were shocked when they found they had to start using IS NULL and IS NOT NULL in their queries if they wanted reliable results.
So here is Scott’s “Why to NULL”
Large DBMS systems are first and foremost concerned with speed. Everybody seems to want lots of data, and they all expect it almost instantaneously.
Writing a record to a database is an expensive task. There are usually constraints to check, values to add to indexes, foreign keys to check, ect.. This being the case, the system should avoid all unnecessary activity.
If when one creates a record, and does not give a value for one of the attributes, a database has a couple of options on how to handle it:
a.) It could format the space or use a default value. This way if per chance (all be, I'll admit the chance is very, very close to nil) the operating system or some other program had written a value in the space we would be using if we were writing a value, we know that it is not a value that is going to be used incorrectly by our database system. The biggest problem here is what if all 0's is a legitimate value in your system? How do you separate the real values from attributes that are simply formatted to all 0's? You can't and hence you restrict what your attribute’s possible target domain can be. Also the system has to write to disk something that is not really valid in our system, using unnecessary time and disk space.
b.) My personal favorite way of handling it being the lazy programmer I am, is it could do nothing. But my laziness has its advantages. I don't spend extra time and disk space writing values to the disk that are not really valid in my system. Also my attribute's target domain is in no way restricted. This kind of puts us back at square one though; we don't know what value is in the space we would be holding the value.
So what to do now? Let’s go back to our main goal, obtain as much speed as possible. This is done at several different levels, including a lovely little language that speaks to the database that can be quickly parsed. The language specification is the logical place to put in place an allocation for my lazy way of handling writing new records with missing values for attributes. A noun NULL, a single verb IS and a simple modifier NOT. Now, not only am I not writing extra values, and saving disk space, but I have a super fast way to find if the database has written anything for that attributes value because its written into the language.
So why would Oracle, any DBMS, or CF return these columns if not specifically asked not to? I truly cannot speak for them except to know they want to be as fast as possible and to speculate that they might feel that since they cannot guarantee that the value is not what you are looking for they return it. They might be trying to force you to write more proper code. Or they could be doing it because they think it’s funny we have to type 15 extra letters at the end of our SQL statements (of course since they would be typing them too, this is probably not the case ;)).
So I guess all this is to say, I believe using NULL is good because it saves time and space, two very valuable commodities. Using NULL widens my possible target domain. Having to say IS NULL or IS NOT NULL is not a work around because it’s part of the language that speaks to our database. Using unnecessary default values is in fact a work around, and doing so will cost one in query execution time and a bloated database size. The fact that some DBMS systems and some applications force us use IS NULL and IS NOT NULL by retuning NULL records when we don't ask for them is good too as it makes us better programmers by forcing us to write more proper code.
I’ll get off my Soap box now. Thanks for letting me rant. Let me know what you think and how your tests go.
Kind Regards,
Scott
In fact, I wish I'd gotten to write here before you had, because I have a pretty significant point to make which again I think will cause you to reconsider your statements.
I installed Oracle, and I tested the code I discussed in previous comments, running the LIKE query directly against Oracle (not a Q of Q), and it DID NOT return the null values. So again I have to ask, why are you going on like this?
I still stand by my assertion that Q of Q is wrong to present those records with null values when a LIKE clause it used. It shouldn't. It doesn't in SQL Server, Oracle, or other databases I've tried.
Thanks for your kind regards in your last note. I'm really not trying to get into a p*ssing contest. I hope you'll consider what I have to say. If you might like to discuss things by email rather than here on the blog, I can be reached at charlie (at) carehart.org. Perhaps then you or I can post whatever conclusion that may lead to.
I'm afraid we will simply have to agree to disagree, again that's ok. I know that using IS NULL or IS NOT NULL is not a bug or a work around, and is required for reliable results. I also know that I will never run into your "bug" because I write proper ANSI SQL. It’s part of the ANSI SQL Specification for a reason, which was the main point that I was trying to get across.
Thank you for your comments and listening to me rant.
I digress...
Look, answer the simple questions: are you or are you not contending that a LIKE should find records with null values in the column being searched? Your first comment and those since have asserted that you think it should. It does not. You then said, "well, it does in Oracle", but I showed that still, it does not. This is why I'm confused about your continuing to defend your position.
Yes, the original post is about CF's query of queries feature, and I was pointing out a bug in that. The "workaround", of using IS NOT NULL, was mentioned just to make the null records go away (as they should not show up, and would not in a real database).
This has nothing to do with "writing proper ansi sql", and it sounds like maybe you don't use query of queries. You've also not referred to any testing you've done with LIKE even against a real database (as opposed to Q of Q). I don't want to have any further discussion here about issues with nulls outside of Q of Q. Again, that's not the point. Unless you have something to clarify this confusion, yes, let's leave at at we agree to disagree. :-)
I guess because to me it's like saying that since most people understand what I mean when I just use a sentence fragment, I shouldn't have to write out the full sentence.
SQL is a language, I'm simply advocating proper SQL grammar. If you leave off the IS NULL or IS NOT NULL from a sql statement on a table that has NULLs in it and you are trying to select or filter the columns with NULLs, leaving off the IS NULL or IS NOT NULL is simply an incomplete statement, not a bug. You get the NULL results in your Q of Q because you are not using a complete sentence. You really need an example for that? I believe you gave one in your original post.
I promise that if you stick w/ programming Oracle, especially older versions (I've not run into a production implementation of 11g yet), you will run into sporadic results if you don't use a complete SQL sentence also. And Charlie come on, I think even you would agree that running one query on a DBMS is hardly a true test ;-).
Thanks again.
I suppose I can see where you might assert that, to make it clear then for people and the DBMS what you mean (or what you do not mean), rather than leaving things open to determination solely by the DBMS.
Fair enough, but then if one DOES understand what the DBMS does with NULLs and then DOES expect always for it to work as they understand (and it will), then one could argue that adding these all the time would be simple verbosity and redundancy.
I think it's there that perhaps I see now how you feel this is a matter of debate, regardless of whether one uses Q of Q or not. OK, uncle. :-) Point taken, and left for future readers.
(But I'll still assert that it's a bug that Q of Q finds records with nulls in a column searched with a LIKE, since no other DBMS I've tested, or heard of, does that. And yes, I think a single test is enough. It would shock me to learn that there would be times when it would or would not, but I'll leave that for others to explore.)
Cheers.
I certainly would not expect the NULL to match the value (or, as in the case of the issue in query of query, the pattern) when using WHERE myfield = "myvalue" but I would have expected it to NOT match the value or pattern when using WHERE myfield NOT "myvalue" such that the rows with the NULL value would have appeared in the results (so in other words, the part where it doesn't match something is intuitive to me, but the part where it doesn't not match something is not intuitive to me).
I thought I'd play with it a little bit in Access (people knock Access, but in my opinion it's a great DB for playing around in) and found that the results are all as you described, Charlie, but there's one interesting (and to me, frustrating) case that I noticed: the NULL values do appear in the results if I use WHERE myfield NOT LIKE "*" (in Access, you use "*" where in many other DBs you use "%").
I guess the thought is that effectively what I'm asking is for everything that's not like anything and NULL fits that bill. Still, it seems inconsistent.
To make matters even more confusing, if I use WHERE myfield NOT LIKE "**" then the NULL values do not appear in the results! For sake of clarity, let me note that the records that appear in the results of queries using WHERE myfield LIKE "*" and WHERE myfield LIKE "**" are the same (that is, it doesn't matter if I use "*" or "**"). The machine with Access on it is an old one so the version of Access is 2002; perhaps the results will be different in newer versions of Access.
But if so, then all the more reason that Scott's advice to "complete the sentence" and specify exactly how to handle NULL values is wise counsel. I know this deal with using no real pattern and only the wildcard is very much an edge case, but it's still enough to convince me of the wisdom of being specific of how NULL values should be handled.
It would be nice to be able to rely on NULL values never either matching nor not matching anything, but here's a place where we see that isn't true and Scott is referencing, albeit less specifically, others and so it seems pretty clear that, even if broadly true, it's not universally true that NULL values will neither match nor not match.
In case anyone's screen resolution may make it hard to see, in his 2nd paragraph,the difference between his 2 sample LIKE clauses (that produced different results) were that the first had one asterisk, the 2nd had two. At first, I couldn't see that and thought it was a typo and they were the same. Thought I'd save anyone else the concern. :-)
I'm sure Scott will appreciate your vindication of his assertion, and I thank you (both) for adding to the knowledge on this topic.