« April 2007 | Main | June 2007 »

May 2007 Archives

May 2, 2007

First 10 digit prime in consecutive digits of e

A few years back, google did a recruitment drive whereby they put up a bill-board which said:

{First 10 digit prime in consecutive digits of e}.com

The idea being that you work out the first 10 prime digits in consecutive digits of e (as in euler's number), tack on '.com' and surf on over in your web-browser. What followed was another challenge, which appealed to the number theorists in the audience, and once that was solved, you finally got to an exam, whose successful completion would qualify you for an interview with google.

I was pretty happy with my job at the time, so I wasn't imediately looking for an 'in' to google. I thought it was a fun challenge though, so I did spend some time working on some of the questions. I just stumbled on some of the code I wrote to do this, so I thought it might be fun to run a series of articles here that work through the problems, providing solutions where I have them.

So this one is about the first part of the challenge: finding those 10 consecutive digits that are also prime. Like everyone else who worked on this challenge, I searched and downloaded n digits of e from somewhere. I then found a list of primes, and tested each one as a factor for every 10-digit substring of my n digits of e.

However, working out which ones were actually prime, rather than not trivially composite was sounding way-ay-ay too much like hard work. So I employed a slightly different approach: I used the DNS server as an IsPrime function. I reasoned that for this website to work, it would need to have a URL, so having reduced my set of 10-digit substrings of e to a set of likely candidates, I fired of a small zillion DNS queries to find which ones actually turned into resolvable addresses. The script unfortunately no longer works, since google have taken the site down (which is a shame), but the salient parts follow. If nothing else, it might serve as a good example of how to use Python's DNS module.


#!/usr/bin/python

# Solve google's "{first 10-digit prime in digits of e}.com" problem.

import DNS

DNS.DiscoverNameServers()

e = '''2718281828459''' # ...

distincts = {}

small_primes = [
2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73
# ...
]

def obviousComposite(n) :
for f in small_primes :
if n % f == 0 :
return True
return False

if __name__ == '__main__' :
for i in range(len(e)) :
current = int(e[i:i + 10])
if obviousComposite(current) :
continue

# check if we've hit this before (which we won't) and check that the
# string representation is actually 10 digits long (numbers which start
# with 0s won't end up being 10 digits long)
if not distincts.has_key(current) and len(str(current)) == 10:
distincts[current] = i

sfn = lambda x, y: cmp(distincts[x], distincts[y])
nums = distincts.keys()
nums.sort(sfn)

# now, we're looking for the first of these numbers that is prime.
# Fortunately, google's DNS servers provide us with a convenient primality
# checker -- assuming the website they're advertising /exists/, there would
# need to be DNS records for it. Thus, we search through the list for the
# first number where there are zone records.com.
for n in nums :
host = '%d.com' % n
try :
host_info = DNS.Base.DnsRequest(host, qtype='a').req().answers[0]
print host
break
except :
pass

Of course, solving this in SQL should be a snap after my last post :)

The answer you end up getting is 7427466391.com, by the way. As I said before, the site is down. Here's what it said, when it was still online:

Congratulations. You've made it to level 2. Go to www.Linux.org and enter Bobsyouruncle as the login and the answer to this equation as the password.


f(1)= 7182818284
f(2)= 8182845904
f(3)= 8747135266
f(4)= 7427466391
f(5)= __________

If you're impatient, or less behind the times than I, you can go and find one of the zillions of others who have posted the solution. It's fairly simple though, so try it yourself if you don't know the answer (it's fun!). I'll post something about it soon.

May 9, 2007

On Bug-tracking

I wanted to share some thoughts of mine on using a bug-tracker effectively. I started using bug-trackers about a decade ago under duress, as a way of settling an argument with a company my employer was working with. At the time, it meant that there was a more reliable TODO list than before, but we were still left with the fundamental problem which was that the TODO list wasn't being effectively managed. Over the years I've changed the way I've used bug-trackers, as my needs from them have evolved. No doubt my requirements will continue to evolve, and my usage paterns will evolve with them, so this is just where I stand now.

Ok, so, the basics. In short, every bug must be:


  • reproducible

  • actionable

  • testable

It's just absolutely gotta be reproducible. If there's not enough information in the bug to reproduce the problem, you should treat it as a non-bug, so flick it back to the reporter. I concede that some issues are hard to reproduce and you might not get it right the first time. My advice: still close the bug as invalid, but if it gets re-opened, start a dialog with your long-suffering bug-reporter to try to extract the missing information. If they don't re-open it, well, if it's not important enough for them, it's not important enough for you.

Every bug has to be actionable. A bug that doesn't have a clear path to resolution is useless. Some examples: a crasher bug is actionable -- the action is to find out why your program is crashing under the described conditions and make it cease doing so. It sounds obvious, but there are plenty of bugs opened which don't really say to do anything at all. The most common ones are the ones that just lie outside the scope of the project; it might well be true that the ZingMaster 2000 really needs another 5000 units of zing, but at the end of the day that's another product with all the normal business considerations that go with that territory. It doesn't belong in the bug-tracker.

Finally, it's got to be testable. If there's no way to test that it's fixed, it's probably not really reproducible, but worse yet: there's no way of concluding when you've actually closed the bug.

Each bug has a discrete lifetime: it is opened, analysed, fixed, tested and finally resolved. Any bug that doesn't follow this process is really just wasting your programmers' time, and programmers are expensive. Which leads me onto the next part, which is how you can extract the maximum value from your bug-tracker, inside the framework for "legitimate" bug-tracking outlined above.

Logging

You need to make sure that you and your developers make extensive use of the "comment" feature in your bug-tracker. Put in theories, conjecture, the tests you do, anything that's relevant. Keep it up to date, and not just at the conclusion of the bug, do it as you go along. This serves a few purposes: first, the process of writing stuff down has value in and of itself, sometimes just writing things down is enough to find new ways of thinking about a problem. Similarly, anyone watching the messages from the bug-tracker might read them and offer a new insight. But the most important reason to do this is that people and bugs get re-assigned all the time. Without an accurate log of work that has been done thus far, they'll need to start from scratch, and that's wasted time.

Triage

It boils down to ruthlessness. Close bugs. It'll feel a bit strange to begin with, but after a while it'll start to feel kinda good. The bug-tracker is a work-queue, and if there's something in there that's not super-important, you should ditch it. Rely on your QA processes here, there's a reason they get paid. If there's really an issue there, it's their job to find it, so don't feel like it's all up to you.

I started this year with the goal of having no more than 10 unclassified bugs at any point in time. That was motivated by inheriting a bug database with about 200 itinerant bugs. These days I'll tolerate up to about 50 unassigned bugs, but try to stay on the low end of that. Honestly, there's just other stuff you need to do. Close bugs if you're in doubt, push them to a future version if they're going to interrupt your release schedule. The stuff that needs to be in your bug-tracker for the current version is the stuff that's actually going to stop you getting stuff out the door. Nothing more and nothing less.

Scheduling

The next thing to note is that your bug-tracker is a management tool as well as a development tool. Get everyone in your team on board with the idea that every bug that gets opened needs to have a time estimate logged against it. Then (and this is a little harder), make sure that everyone logs the time they spent fixing each bug. You shouldn't use this data to point fingers; that's not really productive. You should use it to improve your estimates on subsequent projects and that's really important for heading in the direction of the fabled "on-time software delivery". Programmers are generally pretty conscientious, so if they understand that it's not just writing quality code, but writing it to a timeline that is a deliverable, they'll generally want to use the data themselves to improve their own estimates.

There's a bit of a trap in doing this though: when it's clear that time is of the essence, the corner that tends to get cut first is testing, and that's a killer. Untested bugfixes are anathema to on-time delivery of software, so make sure your team understands that the estimates they put on the bug need to include time for testing as well as time for development.

SCM Integration

So this is something I started doing fairly recently. We started needing to manage multiple versions of the codebase simoultaneously. That meant integrating bugfixes between branches, and that meant we started losing track of what issues were fixed on what branches. The solution I came up with was to ask everyone to put a specially formatted string in the commit log that noted the bug number that was resolved in that commit. The version control system will keep track of the code that gets integrated between branches, and carries the commit messages with it. That means getting a reliable list of bugs fixed in any given version is as simple as enumerating the changesets integrated into that branch, searching for the magic string that identifies the bug numbers and looking up the details in your bug-tracker. We use Jira for bug-tracking and Perforce for version control. With those tools at my disposal, I was able to construct a Python script to generate accurate changelogs for any release branch of the code in about an hour.

This has a multitude of benefits, but mostly it means that everyone is on the same page with regard to what is fixed in what version. That helps QA, it helps your beta-testers, and it gives you some confidence about what's fixed where.

Conclusion

Your bug-tracker is the primary conduit for communication between developers and the next layer of management. If you use it properly, it can be a communication tool for other functions in your company: it can give a reliable reading on the status of a list of requested features to your products guys, and it can feed your QA guys with reliable data about the delta between revisions that they test. But most importantly, keep it lean and mean, otherwise you'll never keep your project on track.

May 30, 2007

Defect-free Code

The other day I found a webpage that I thought was worth sharing, which presents a set of practices you can follow to deliver defect-free software. I'm going to leave aside in-depth discussions of exactly what "defect-free" means right now, but as you may have guessed, I started researching the topic in the interests of improving the code-quality of my own team's product. I've also been thinking about the topic independently (indeed, that line of research is what led me to the page), so I'm writing this down to hopefully impart some new ideas, gather some new ones from you, and ultimately improve what everyone's doing. Basically what we're interested in is writing code that meets the standards of:

  • Our penn-testers
  • Ourselves
  • Our QA department

More or less in that order. As an aside, our penn-testers have far and away the hardest standards to meet -- they do the things we don't think about, and they do the things we hope no-one thinks about. We've learned a couple of valuable lessons from them, but I'll get back to that later.

I found the page particularly curious because all the literature I've read thus far (and my own experience) has said, more or less, "software is really, really, really hard, so you'll get it wrong. Here's some strategies for mitigating the damage you'll cause"; the idea that you could skip that whole "make bugs, find bugs, fix bugs" step in the software development process is pretty appealing, nay, seductive. The "find bugs, fix bugs" part is the absolute bane of my existence as a program manager, because it makes every schedule I write meaningless and that's not good for anyone. I'm persuaded enough by it that I've put some of those ideas in place as policies inside my team at work to see how they go.

There are a few salient points that resonated somewhat with me, and I'd like to chat about them in some more depth.

Point 4: Design your code for Simplicity and Reliability.

That almost needs to be two bullet points. It will either sound obvious, or it will sound needlessly beurocratic, but designing your code at all makes a huge difference to the end product. For most problems you should use the format that is most convenient for you: I like to scribble boxes in my notebook, other people I've worked with go in for massive great specs on webpages that are very... long. Whatever suits you, just make sure you take the necessary time. I've worked with people who do this, and people who have enough natural genius to get away with working it out as they go along, but both types spend *less time* getting it right when they think about it a bit first.

The second virtual bullet point -- designing for simplicity -- well, that one's a trap. As software-engineers we delight in simplicity, but the problem is that we delight in complexity even more. I was discussing a piece of code today, which is tiny in comparison to the grand master-works my words to this point may have suggested, but I think it demonstrates the point nicely. We had:

  • a fixed pool of buffers
  • a widget (w), in need of a buffer
  • a widget (v), whose buffer we want to steal to sate the need of (w)

So, to make it explicit, the reason we're stealing buffers from (v) (for 'victim') is that the pool was exhausted. Now, the really tricky part of this was when we made the realisation that you could end up in the situation where v == w. Those of you who write multi-threaded code for a living might be starting to see the problem.

The problem is this: to add the buffer to (w), you need to lock it to protect its internal consistency. To get the buffer from (v), you need to lock it to protect its internal consistency. If you build this naively, such that:

  • you lock (w)
  • you lock the pool
  • if the pool has a buffer, claim it and add it to (w)
  • if not, lock (v), evict a buffer and add that to (w)

then when you get to that last step, and when v == w, you're going to deadlock. Our first solution to this, well, it sucked. It went something like this:


int alreadyLocked = false;
lock(w)
if(!trylock(v))
{
        if(v == w)
        {
                alreadyLocked = true;
        }
        else
        {
                lock(v)
        }
}
// ...
if(!alreadyLocked)
{
        unlock(v);
}

Now, points are due for realising that a deadlock situation potentially existed (and we picked that one before it bit us. Phew), but man, that's horrible. The hard call to make is that the alternative was to break it up into a bunch of non-overlapped critical sections, using our buffer pool as the mediator between our various bits of code, and that sounded like more CPU work.

The real killer here is that we'd engineered it the way we had because we were worried that we'd be adding CPU work where it wasn't strictly necessary and ultimately slowing our product down (and that's a big deal where I work). I didn't let it go in in the form above because I thought it was just too prone to bugs. The really cool thing is that our re-engineered version which does that extra work actually goes faster. Why is a whole new article.

I'm not against performance optimizations, by the way. Just don't do them prematurely; aim for simplicity first and foremost. If nothing else, it will make the optimization step simpler.

Point 5: Trace Every Line of Code When Written

The next point, number 5, tracing every line of code in a debugger as it is written, is an interesting one. I've never had the discipline to do this (and honestly, the debugger in the kernel absolutely sucks), but reading his justification, it makes a lot of sense. I've been looking into this some more and the solution I'm going to try is using UML inside TotalView (which is a kick-ass debugger, by the way). The problem I have right now is that all of our code needs specialised hardware, which means you can't use UML. However, I realised this morning that it's an opportunity to provide another tool for debugging, which is that every component you can switch out for an alternative implementation gives you a way to confirm the functionality of that component. Having a software implementation of our hardware will almost certainly help us find new problems, so that's what I plan to do.

Point 6: Review Code by Programmer Peers

Code review: definitely. In my group we had been doing this by emailing diffs, but I realised recently that actually there's no good reason to do it on email: we're all 5 metres apart, so I've instituted the policy that code reviews happen in person. I'll be interested to see how this one works out, because whilst I agree that the process of articulating the algorithm behind an implementation engages more of the brain, and thus presents another opportunity for finding bugs, I fear that bugs may be missed because the reviewer has been guided by the coder's walk-through.

This fear is not entirely irrational either: I'm strongly reminded of some code I wrote about a year ago which was highly venn-diagram like in nature. I knew it was hairy, so I dragged over no less than 3 engineers to scrutinise it independently. I wrote test-cases which enumerated what I thought (and these reviewers agreed) was every possible input and output. And we all got it wrong. I repeated the process and got it wrong again. I honestly don't know if that code is right now; I just know that I've stopped getting inexplicable results from it. What I've come to learn since then is that that process's failure probably did not stem from its existence, just its implementation, and the evidence has been that we've already avoided some bugs from this policy, so in my mind it's paid for itself already.

I think that really, a code review shouldn't be much different from tracing the code in question. Going forward I want to try doing our reviews in the debugger. I'll report back when I have a good system for that.

Automated testing is a little hard for us right now. Today I enlisted our penn-tester in helping us get an automated version of himself. It's a shot-gun approach, sure, but it'll run every day, and if it passes, that's a whole day's worth of traffic we've tested on that we wouldn't have had otherwise. The thing to realise here is that any automated testing is so, so much better than no automated testing. If you're stuck in a rut now, because you don't have such a system, my advice is to not put off putting something in place because you won't have time to make it perfect, get cracking on putting *anything* in place, TODAY. You can build on that, or replace it, it really doesn't matter. You have bugs TODAY, that's the problem you need to think about.

Additionally I realised again today that end-to-end testing is different from unit-testing. I'm not sure what direction I want to take on this, but LTP looks like an ok starting point. My other option is to provide a way to get our kernel code running in userspace, which has other desirable properties, so I'll probably try to do that if I can.

Annnyway, it's worth checking out; my problem is subtly different, but there's good ideas in there for everyone. To wrap up, a couple of pieces of advice from what we've found from the issues our penn-testers have found since we started this drive for "high-quality, correct code". This is pretty heavily geared toward kernel-mode networking code, but there's analogues in other disciplines as well, just look out for them.

Use statically allocated pools for everything that you possibly can: one of the killer issues with network programming is that your attacker can easily exhaust your supply of memory if you allow him or her to invoke your memory allocator unwatched. Adding a watcher is a solution to be avoided, because as I learned this week, there's a bunch of different ways running out of memory can affect you, and frankly, watchers are just another bit of code you can get wrong. Your best strategy is to use a pool of memory that you allocate at startup. That way you can be absolutely certain that you have enough memory for any situation that your attacker can throw at you. The big win on this one though, is that it makes catastrophic "I freed the pointer too early!" type bugs far less disastrous -- if you're looking at memory from your pool and do this, you'll get a wrong result. If you do it when the MMU is involved, your kernel will panic (arguably, also a wrong result).

Which brings me nicely to point number two: since you're now circumventing the MMU as a debugging tool, you'll miss a bunch of conditions in your testing that you would otherwise see in the form of kernel panics. The trouble with kernel panics is that you then have to do work to find out what went wrong. So my next suggestion is to use "poison values". The idea is that when you return items to your pool, you set the pointer to some value which you know to be invalid, but which is unique to that particular path through your code. Then, you hammer your box with whatever tool is most appropriate and wait for it to break -- the address where the bad dereference occurred should point you in the right direction to fix your bug. Wise to build macros so that this only happens in debug mode though -- no point inflicting 0xdeadbeef on your customers.

It almost goes without saying (but I'll say it anyway) that you should make your code re-entrant and threadsafe as a matter of practice. Even if you're the only caller, static variables will bite you at some point.

And *please*, use simple locking. Not so brain-dead that you cripple your software's performance, but when you've got more than one lock on a structure, think really hard about whether you're actually going to save on contention there. AVOID OVERLAPPING LOCKS. They're always harder than you think.

Make some ASSERT_* macros to test pre-conditions on your functions. These save time, and let you communicate your assumptions to future readers of your code. I'm generally against the use of the 'assert' macro provided by C, because it's just not that good, and it's too easy to leave in release builds. Roll something that makes sense for your own environment.

I also can't recommend enough running with PREEMPT, on an SMP machine, with every debug option your kernel provides enabled and using whatever tools you have at your disposal to actively seek out flaws. kmemleak is on my list to investigate. We used SGI's lockmeter last week, which allowed us to make about a 20% performance improvement to our codebase and showed up some bits of code that just weren't working the way we thought they were. Oprofile is similarly good for this.

Finally, I highly recommend tcpsic et al as tools to ensure your network code is resilient against all possible inputs. They won't generate all inputs of course (well, they will, but you don't have enough time), but they will generate the kind of inputs that will break your computer, so they'll help you identify a bunch of test-cases you should be interested in. This is one of the tools our automated-penn-tester system will use.

It's a pretty paltry list, huh? Maybe I should add "don't write kernel code" to the list. I'll keep you guys posted. This is not the end, this is only the beginning...

Ideas? Mail me at codelore@james.id.au, if you dare...

About May 2007

This page contains all entries posted to Code Lore in May 2007. They are listed from oldest to newest.

April 2007 is the previous archive.

June 2007 is the next archive.

Many more can be found on the main index page or by looking through the archives.

Powered by
Movable Type 3.35