Blog all the things!

The Fallacy of "Realistic" Testing

I've heard it a lot over the years: "That's not a realistic test!"  You know something?  You're right, it's not a realistic test.  It's a lab test.  It's a carefully-constructed scenario, designed to examine the performance characteristics of one specific aspect of the product you asked me to test.  You can't simulate the real world in the lab.  Furthermore, you wouldn't want to.  There are far too many uncontrolled and uncontrollable variables present in the real world, for one thing.  For another, simulating real-world conditions in the lab in my particular area of expertise would require Google-datacenter-levels of computing power and host count.

"But can't you do X?"  No, no I can't.  I fully understand that you want to be able to tell people that our tests faithfully represent the performance you'll see in the real world.  I do; I get that.  It's a powerful statement.  But you hired me for my expertise and knowledge.  Please face facts when I tell you that the spectrum of tests I can perform in the lab are not representative of real-world behavior, nor can they be.  If they were, you'd be faced with a result nobody could explain, and you'd wind up asking me to do what I already do: systematically vary parameters in a controlled setting to tease apart the variables under our direct control that affect performance.  This is at least approaching the general area of the scientific theory.  Throwing messy input at something just to see what happens and then demanding explanations thereof is about as close to a sound scientific method as elk dung is to mango pudding, and about as appetizing.

Consider for a moment my arena: DNS performance testing.  We have tools that will play back, like a recording, DNS queries in the order and at the speed they happened.  Interesting?  Perhaps.  But nobody in the real world has query rates that approach the levels of performance I test at.  When I'm testing a product that can handle several hundred thousand queries a second, throwing 5,000 queries per second at it just isn't an interesting test for me, nor does it fall within my purview.  That's a functional QA test, and I'm not a functional  QA tester.

Okay, so let's run tests that push the limits of the products performance.  Now we're talking!  Wait, what's that?  You say the synthetic datasets I use aren't "real-world" enough?  True, the distribution might be different, the label count, label length, QTYPE distribution, and so on might not be representative of the "real world", but that's easy to fix: just tell me how you'd like those parameters tweaked, and I'll gin up a synthetic dataset tailored to those values.

"But it's not real world!  We have to use real data!"  Okay, now you're beginning to miss the point.  Either there are parameters that describe your precious "real world" data that you neglected to describe to me, or you've just latched onto a fallacy for dear life and refuse to let go.  Queries are ultimately just strings and an associated code representing the QTYPE.  There's nothing magical about queries obtained from a customer vs. synthetic data, unless those "real world" queries contain properties that might tickle functional bugs in the product.  Again, not my bailiwick; that's a functional QA task.  I'm 100% behind anyone who supports using real-world data for acceptance testing.  But the only thing you're doing by insisting I use what amounts to a dirty dataset as the one and only source of queries for performance testing is hamstringing my ability to do my job well.  "Why did you get that result?"  "Beats me.  Here's a page full of guesses; I can't tell you anything definitive because you insisted that, rather than doing controlled experiments, I should just throw 'real world' data at the product to make you feel better.  If you'd let me do it my way, I could have answered your questions."

Then there's the issue of test tools:  I realize there are myriad tools on the market that simulate thousands (and sometimes millions) of unique query sources for things like web server performance testing.  Please understand that this paradigm does not extend to performance testing of nameservers.  DNS performance testing tools come in two varieties:  those that create a sustained, protracted, constant load, and those that throw as many queries as possible at the server all at once, and wait to see how many replies they get back.  Neither of these is a "real world" test.  Not even remotely.  No, I can't finagle and finesse these tools into representing hundreds of thousands or millions of independent computers, each sending a low-rate but bursty series of queries to a server.  Why?  Because I would need hundreds of thousands to millions of computers to do so.  "But you can just fake the source address!"  Sure I can.  And I can just sniff the wire for the results.  But these tools have a limited pipeline for queries in flight.  And they behave in a particular way while waiting for responses to queries when the pipeline's full.  To be specific, they wait.  So I'd need a unique pipeline for each source address.  Suddenly, we've moved away from the single-interface, or even single-computer solution to this problem, and I'm back to needing a whole slew of machines.  Sure, I no longer need millions, but I sure as hell need a few thousand, at the barest minimum.  "Oh, just use VMs!"  Okay, sure.  I could do that.  And in so doing, introduce a gargantuan layer of abstraction and simulation that's not even remotely real-world, that may well interfere with both the test tool's behavior and the results I obtain, in part because I have to squeeze all that traffic out of the VM host and over to the server.  "So just host the nameserver in a VM too!  Why are you making this so difficult?"  Well, for one, the abstraction layer and simulation is still there, so I can't trust the results, nor can I easily generalize them to anything, e.g., real-world equipment.  More importantly, VMs may work -- modulo all the problems I've raised here -- to simulate many thousands of individual clients each generating traffic at a low rate.  But VMs fail miserably when trying to host a high-performance nameserver receiving many hundreds of thousands of queries per second.  The results will be much lower than you like, and cannot be generalized beyond the VM environment.

In short, I'm not trying to be difficult.  I'm trying to give you useful information that will help you understand the performance characteristics of your product and some numbers that, when framed the right way by marketing and sales, will convey a general idea of how the product might perform in a particular customer's environment (assuming the customer is aware of the various parameters in their environment that would impact performance: things like cache hit ratio, average and peak query rates, and so forth).  There is no magic set of parameters that will accurately represent the "real world" in general, or a specific customer's environment in particular.  There is no mystical set of tests I can run that accurately represents our products' behavior in the field, regardless of configuration or deployment.  Believe me, if there were I'd be the first in line to run them for you.  If I could concoct such a test, I'd do it gleefully, just to stop hearing about how current testing isn't "real world" enough.

My job is to characterize the performance of the product as a whole, and of various product features.  My job is to explain how that performance varies, and with the help of the developers, why.  On those bases, decisions may be made regarding product and/or feature development.  On those bases, performance issues may be uncovered that need to be addressed.

My job is not to take a spot sampling of queries from a customer, run it against a product, and then let sales and marketing go sell the customer a bill of goods, only to hear complaints weeks or months later about not seeing the same performance in the field as we saw in the lab.  If you want to "test" the real world, then examine the real world.  It's the best model of it we're ever going to have.  If, however, you want to EXPLAIN the real world, please let me rely instead on a few thousand years' worth of accumulated wisdom on the matter of how to successfully approach such explanations. 
Comments

Made the switch from the iPhone to Android

Yesterday, after several months of research and deliberation, I finally decided to take the plunge and switch from my iPhone 3GS to the Samsung Infuse Android phone running Froyo (2.2). 

So far, I really like it.  There are only three things I miss from the iPhone and its OS: The hardware switch to set the phone to "silent" mode, the ability to swipe to delete something from a list, and the Visual Voice Mail feature.  And all but the voice mail are just an added step to accomplish.  I'm fairly certain that if I look around, I'll also find a few apps that do something similar to VVM also.

The skin over Froyo on the Infuse is minimal, which alleviates one of my complaints about Android phones in general.  And, if I want, I can always root it and install a custom ROM with either Froyo or Gingerbread.

Oh, another nice thing?  Flash 10.1.  Yeah, that's right:  I can finally use Flash on my phone!  And with a 4.5" AmoLED screen, it's a really nice thing to do.

Android has matured to the point where the app selection is on par with that for the iPhone, and most everything I used on the iPhone is available on the Android as well.  With a beefier processor, expandable storage, user-replaceable batteries, and so on, I find I'm not regretting the change one bit.
Comments

Relaunching the blog

It’s been years since I tried to maintain this website as a blog. Back then, I was trying to manage it as some form of technology blog. From now on, it’s just going to be used as a repository of things interesting to me. That means it’ll get updated on my own timeframe, rather than putting pressure on me to maintain a regular update schedule.
Comments

Stuff