Today I spent a good chunk of the afternoon writing an SQL query, crashing MySQL many times by changing the criteria in the where clause. First, a bit of background. Currently there is a database table with various demographic data, and privacy flags for how different pieces should appear in directory. There are multiple applications that use this data, and currently they apply the flags independently in the code before output. I need to add another replication of this data, and as I need to verify the proper use of the privacy flags in relation to types of users anyways, and now the database supports views, it seemed a natural time to rework this and make it centralized so we only had one code base to support.
The goal of the view is to find directory data on an individual regardless of account existence, and to find account base name and email address, even if directory data is suppressed. Background on MySQL views is there are two algorithms - merge and temptable. Merge substitutes view columns and tables into your query and then runs it, while temptable runs it as written outputting into a temp table, and then uses that in your query. Unless you have are doing certain types of processing on the data or have only small amounts of data, merge is much faster and more efficient (MySQL can optimize it).
There are two primary tables with data, the first being accounts and the second being demographics. There is also a database view (using the merge algorithm) which pulls a person's primary email address based on the account table and some other tables. One of the privacy flags is to suppress all directory information. Unfortunately as there's no support for outer join (and union pushes the view algorithm from merge to temptable), the way to get the account data is either to add an if on every field, or do a left self-join on the demographic data (first being basis to find no account, second being conditional for directory data). The self-join works awesome, until you add a criteria to search for say someone with a username x and ZIP y. If you do that the database crashes.
After much poking to determine which combinations of fields in the where clause cause this, and a stack trace indicating it's crashing in a join optimize function, I at least have an idea what's happening. It appears when two or more tables are involved in a query at least twice each (remember the merge view pulling the account table in a second time?), and the optimizer function determines that the where criteria referencing them are both used to narrow the output before the other defined join criteria, it can't handle determining how to connect them and segfaults.
The "quick" fix would be to convert the thing to temptable format, as then it runs the entire thing, and can't optimize it. That makes queries take a lot longer, and one shouldn't have to force the database engine not to optimize to prevent it from crashing. There are a couple ways I can fix this by making the query longer, but I want to track down the bug so it can get fixed first. And my boss told me I'm not allowed to think about work this weekend and it's past time to go home. So I get to document my thoughts on it so I remember Monday, meaning you get that fun story...
It has its shortcomings, but it's come a long way in the last few years (transactions, triggers, views). It's biggest advantages are it's fast out of the box (for basic selects, it blows away pretty much everything else even when they're tweaked), has a big enough feature set for most users, and it seems nearly everything these days has a client for it (both stand alone client and open-source app support). Oh yea, and the free part...
The biggest drawback was it wasn't forgiving for crap queries, as the optimizer sucked. It used to be if you did a straight join on two large tables and tell it how they line up in the where, and it'll generate the cartisian product and then filter. I just tried it with a more recent version, and it now handles it well though. Obviously it has issues on some more complex stuff (thus this post), but I guess none of my old big complaints seem to be issues anymore.
That is good. Yeah, the whole free bit is rather appealing... plus enough people use it Google is useful.
I have heard from more than one person that joins tend to be slow. I am guessing that sort of depends on what, exactly, the queries are.
I still remember killing the server during one of my finals... if I remember correctly it was the Web Programming final. One of my project-mates built a rather awful query that put a load well beyond what was reasonable. I cannot remember if we actually crashed it or just slowed it down enough to have the rest of the Happy Lab mad at us.
If it's an old version I could see slow joins, otherwise probably check indexes (put "explain" before the query, and fix anything that doesn't get an index or ref).
As for killing the server, I wouldn't be surprised if it didn't work so they "canceled" it, which doesn't necessarily stop the server from finishing the query. Reloading a page because it doesn't come up because of bad SQL can quickly cause problems.
Hehe, yeah.
I remember spamming one of my roommates projects by holding down Command-R in Safari. He was adding new data using PHP + Get data in the URL and it was fun to watch the entries pile in. Not terribly destructive, but it sure annoyed him. :)
Copyright ©2000-2008 Jeremy Mooney (jeremy-at-qux-dot-net)
Bah, I am still a true nerd and like reading about things like this. :)
Though I first read "temptable" as "tempt" + "-able" instead of "temp" + "table". I was a little confused...
What are your opinions on mySQL in general? Are there any good alternatives? I have heard random people online complaining about it but I just figured they were being obsessive.