July 12, 2008

GasBag

One of the reasons I haven't had much time for writing here lately is that I've been working on starting a company. We're all perfectionists in this company, so there's still a bit more work to do before we actually make a release, but we've put out a video to let people know what's coming.

There's a high-quality version here on blip.tv, and there's a lower-quality (but more compatible, it seems) version here on youtube.

You can sign up to be informed of updates over at our website.

May 8, 2008

OCRopus (tm) Packages for Ubuntu Hardy

This is a poor substitute for an entry, I realise, but I've just made packages of Google's OCRopus(tm) project for Ubuntu Hardy Heron. You can grab it here. No support for apt-get/aptitude, as I'm sure it will be in Hardy+1 and don't want to dilute your /etc/apt/sources.list file with entries that will rapidly become stale. Also, it's unsigned. Sorry. The md5sum of the file is 9aee9459a6dc120a5a5537b49a67db0e if you want to verify it. It has a handful of dependencies that are all in the main distribution, so you should be able to sort those out in short order.

So, why? Well, the broad problem I'm currently trying to solve is to eliminate the massive pile of paper that has been growing over the course of years in my house. See, there was a time when I just threw out any piece of paper that came my way, with no regard for whether I'd need it again in the future. It was a care-free and happy time, and one that I shall fondly remember for the rest of my days. Then, something terrible happened: I started earning enough money that I had to file tax returns. The first few of those were painful, mostly because I had none of the bits of paper that I needed in order to handle the onslaught of uninteresting questions posed by the ATO's various forms.

So these days, I don't throw any piece of paper out, because I'm not suitably familiar with tax that I can confidently decide whether I'll need it or not in the future. That's not entirely true; it's not just the tax office that ask me to retrieve pieces of paper that I take no personal interest in; I've had cause to retrieve all manner of documents containing information that initially looked pretty ephemeral to me.

So this has come to a point where there's this whole shelf of my book-case devoted to pieces of paper that I have no personal interest in, but at one time in my life I thought might be valuable. Once any problem reaches the size of a bookshelf, it's a problem large enough that I think a computer would be a helpful tool to solve it.

My solution is one that I thought would be pretty standard: buy a sheet-feeding scanner, shove the documents into said scanner, scan them and throw away the originals. But of course, that's not actually enough; for this to actually solve the problem, I need to be able to locate any of these documents relatively trivially; I want to be able to search through them similarly to how I search my email: by full text search, tagging, date ranges etc.

Anyway, this last part requires me to find a way to extract the text from my scanned documents for indexing and searching. That means OCR. Anyway, I'll spare you the details, but everything I tried was, well, appalling. Embarassingly bad. Seriously. My test image was a pretty clean scan of a white page with plain black sans-serif text on it, and I had one app generate a page of punctuation as its output!

The bottom line is that OCRopus was far and away the best of the tools available, and since I'm anal about software installation I packaged it. I hope it's useful to you.

The other part of this story is that I also bought a scanner as part of this project. It's the first one I've ever owned. It turns out that these days you don't just buy scanners, you get a printer and fax machine and photo-copier too! The device I bought was the HP OfficeJet 6310 All-in-One, and it's awesome. I won't go on too much about what's so great about it, except to say that if you run Linux, install hplip and you'll have it working in seconds. It has an ethernet port on it, so you just plug it straight into your network and everyone can use it straight away.

But it gets better, you don't actually have to install any software at all! You can just punch in its IP address in a web-browser, and it has a web-GUI thing that will let you scan documents without using any client-side software at all! Neato! And for ~$150, you can't go wrong.

Hopefully I'll be back with something more substantial to say soon. Thanks for your patience throughout my hiatus.

November 23, 2007

Buying a Laptop

I've been trying to buy a new laptop recently, a project which I have not yet succeeded in. However, I've learnt a few things about buying a laptop that I thought might be worth sharing.

Buy it from America

Laptops are seriously half-price if you can get them anywhere that's not Australia. But the US is an especially good place to buy because of the Aussie dollar's relative strength. I don't know why, but laptops cost approximately half as much. Don't believe me? Compare US Lenovo's Thinkpad price and Australian Lenovo's Thinkpad X61 price. At the time of writing, the base-model X61 is $AU2199 or $US1025 (about $AU1,175.73. There'll be some tax too, but it's not going to be $1000). Insanity!

Get an American Express card

After having decided on a TP x61, I went through Lenovo's ordering page and got all the way to the end and discovered that it wouldn't let me use my Australian credit card -- the billing address doesn't let you specify a country, and all the states it lists are American states. However, if you read their unlinkable FAQ (you get to it by going through the ordering process, getting to entering your billing address and clicking the "credit card profile" link), it says:

If the billing information provided does not match what is on file with the credit card provider then your order will be held until we can verify that you are the authorized card user.

...

International credit cards
An international credit card is a credit card issued outside of the United States with a primary billing address in another country. To place an order using an international American Express card please call 1-866- 96-THINK. American Express is the only international credit card currently accepted by Lenovo.

I don't actually know if American Express have a special "international" card or not, but it seems like Amex is the card to have if you want to buy stuff from the US.

The closest thing to a redbook.com.au equivalent

redbook is a site listing all the cars, ever, along with detailed specifications and expected pricing and so on. It's immensely useful for buying a used car, or even a new one. Unfortunately I haven't found an equivalent for laptops, but at some point I did stumble upon CompUSA's laptop page which has a fairly representative set of current computers and a highly usable interface for narrowing down available options based on your requirements. I'd recommend buying direct from the laptop's manufacturer over buying from CompUSA, but they do have a very useful webpage.

Use discount coupons

It turns out, that like every other strange, perverted practice imaginable, there is an internet community committed to collecting and publicising the details of discount coupons for various laptop manufacturers. The one I stumbled onto was Fat Wallet. As an example, they have this page detailing discount codes for Lenovo laptops, including a coupon for a further 20% off!

Buy it at Thanksgiving

I haven't quite broken the code of when thanksgiving is, or what it is, but it seems to be a period of the year where no-one in America makes any money. It seems to be around November (ie: now). Whatever it is, it's a time of year when there's all kinds of discount offers to be had. Take advantage of it. See also bfads.

Get a "business" model

The "business" models from most vendors are typically much better machines, but that's not the reason I give this piece of advice. The reason is that since you're buying the machine from outside your home country, you'll want an international warranty; an extravagant extra normally only afforded to the "business" models (though exceptions exist).

Get it on Salary Sacrifice

I don't have a reference for it handy, but if utter the words "Salary Sacrifice" to whoever does payroll at your employer, you'll get the machine on pre-income-tax money. Which is great.

Can you Depreciate a Salary Sacrificed Laptop?

I don't know. I struggled with this on my last tax return. The argument for is that no matter what it cost you to acquire, you start out with an asset worth X, and it will lose a percentage of that value each year. So you should depreciate it. The argument against is that you didn't pay income tax when you bought it, so there's no income tax to claim. I don't know what the answer is, but I will point out that the tax pack lists a laptop computer as one of their examples in this section, and they're generally pretty careful about noting exceptions such as this. Make your own decision, but it's worth noting that there's a fairly defensible case either way.

Don't have time to get an Amex?

This is the unfortunate situation I'm in, and it renders most of my previous advice moot. However, you're not completely lost; it turns out that there are some vendors who sell laptops on ebay, and will accept paypal payment. They'll deliver it to anywhere in the US, and some of them will deliver internationally as well. If you can pick it up in the US, that's preferable, as you won't have to pay customs on it.

That's it for now. I'll be sure to let you know if I learn anything else before the conclusion of my laptop odyssey.

November 8, 2007

PCC Package for Ubuntu Gutsy

There was a bit of a stir not so long back about PCC (or more informative wikipedia page) being imported into OpenBSD. For reasons I hope to elucidate upon in a follow-up to this article, I was interested in giving PCC a try, but there were no packages for my OS of choice: Ubuntu.

So tonight, I built such a package. You can get it here, possibly for a limited time only. It's pretty basic, I've just used the extremely handy dh-make and forward-ported the patches from the FC7 package.

So far I've confirmed that it can compile "Hello, World". Hopefully I'll get it to do something a little more useful soon. In the interim, enjoy!

October 15, 2007

E: Dynamic MMap ran out of room

The solution to this one always seems to be harder to find than I remember, so I'm writing it down here. The problem is that you add some sources to apt, and then you do an update, and apt helpfully reports

E: Dynamic MMap ran out of room

As far as I can tell, it's complaining because it has tried to use my favourite syscall (no, really, it's awesome), mmap to load its dependency database into memory lazily, and a sanity limit has been reached.

The solution, on Ubuntu, is to put the following into a file called /etc/apt/apt.conf.d/00local (previously I've used /etc/apt/preferences; Ubuntu seems to adopt a more structured configuration system):

APT::Cache-Limit 50000000;

And generally you want that number to be something smaller than the amount of memory you have available on the system, though I suspect in practice it rarely matters (the kernel will over-commit on memory, but it's only a problem if your request touches more memory than you have; single package installation should thus be fine, dist-upgrades probably won't be. Note: haven't tested that theory).

This, incidentally, is an example of a bad error message. It's not appalling; the message does uniquely identify a particular fault, and as such you can find what you need with a minimum of googling. But it could be a lot better: in this case the application could have known what limit to use, and could have informed you that it's doing this for your own safety (the limit will be in place to prevent swapping), and could have even told you how to disable this behaviour (as above). More puzzling to me is why it doesn't do this dynamically based on the amount of physical memory available; it could even adopt a different algorithm when it detects low-memory conditions.

But anyway, there's the solution. I hope that you never have to go looking for it, but there it is nonetheless.

October 6, 2007

Reading and Writing Image Files with imlib

I thought it had been a bit too long since we'd had an entry on codelore about code. So, here it is. A while ago, I was trying to write some code to do some image processing. The first step, to read in an image file, proved to be immensely, horribly, painfully difficult. There's a bunch of libraries around, but the problem of "please give me a 2D array of pixels" is somehow not a use-case they anticipated.

I don't quite know how this dire situation came about, and I'm not that interested (though I expect one could learn a lot from API design by doing that research). Instead I just wanted to share my solution to this problem in case someone else wants to be able to read and write image files.

The solution that I found is a library called imlib (sources here). Now, it's not perfect; notably it's thread-unsafe by really quite extensive design (why?!), but it does have a simple way to load images of pretty much any interesting type, and it has methods to get pixels and draw lines (which I will use as a poor but workable substitute for having a pixel setting method). I won't talk too much more about it, I really just wanted this to get out there so that the next person to search for "reading and writing images 2d array" will not be as disappointed as I was. Note also that the industry-standard ImageMagick comes pretty close to this level of simplicity, and if you're working in C++, the Magick++ interface actually looks pretty nifty. What follows is the simplest thing I could get to work in C.

/**
 * Image loading and saving example code. This code is free for any purpose;
 * knock yourself out. No warranties expressed or implied. Build this example
 * program with a Makefile like this:
 * 
 *  img: main.c
 *      gcc `imlib2-config --cflags` `imlib2-config --libs` -o img main.c
 * 
 * Author: James Gregory <codelore@james.id.au>
 */

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <libgen.h>

#include <Imlib2.h>

#define pixelVal double

struct matrix
{
int width, height;
pixelVal **vals;
};

struct image
{
int width, height;

struct matrix *red;
struct matrix *green;
struct matrix *blue;
};

struct matrix *newMatrix(int width, int height)
{
int i;
struct matrix *mat = (struct matrix *)calloc(1, sizeof(struct matrix));

mat->vals = (pixelVal **)calloc(width, sizeof(pixelVal *));
mat->width = width;
mat->height = height;

for(i = 0; i < width; i++)
{
mat->vals[i] = (pixelVal*)calloc(height, sizeof(pixelVal));
}
return mat;
}

struct image *loadImage(const char *filename)
{
Imlib_Image img;
int i, j, width, height;

struct image *result = (struct image*)calloc(1, sizeof(struct image));

img = imlib_load_image(filename);
imlib_context_set_image(img);

width = imlib_image_get_width();
height = imlib_image_get_height();

result->width = width;
result->height = height;

result->red = newMatrix(width, height);
result->green = newMatrix(width, height);
result->blue = newMatrix(width, height);

for(i = 0; i < width; i++)
{
for(j = 0; j < height; j++)
{
Imlib_Color col;
imlib_image_query_pixel(i, j, &col);

result->red->vals[i][j] = (pixelVal)col.red;
result->green->vals[i][j] = (pixelVal)col.green;
result->blue->vals[i][j] = (pixelVal)col.blue;
}
}

return result;
}

void saveImage(struct image *img, const char *filename)
{
Imlib_Image out;
int i, j;

out = imlib_create_image(img->width, img->height);
imlib_context_set_image(out);

for(i = 0; i < img->width; i++)
{
for(j = 0; j < img->height; j++)
{
int red = (int)img->red->vals[i][j];
int green = (int)img->green->vals[i][j];
int blue = (int)img->blue->vals[i][j];

red = red <= 255 ? red : 255;
green = green <= 255 ? green : 255;
blue = blue <= 255 ? blue : 255;

imlib_context_set_color(red, green, blue, 255);
imlib_image_draw_line(i, j, i, j, 0);
}
}

imlib_save_image(filename);
}

int main(int argc, char *argv[])
{
struct image *img;

if(argc != 3)
{
fprintf(stderr, "Usage: %s <input file> <output file>\n", basename(argv[0]));
exit(-1);
}

img = loadImage(argv[1]);
saveImage(img, argv[2]);
}


(apologies for the awful code listing. Anyone with killer ideas for posting code on Movable Type blogs, let me know)

That should build a little program to convert an image file between image formats extremely slowly; something every programmer should have in their arsenal.

Now, some of you might be asking why I chose to split the red, green and blue channels into separate matrices. An excellent question. The reason is that the particular application I was working on had the luxury of being able to treat each channel separately, so from a code clarity perspective it made sense: I wrote my function to process one channel and then a wrapper to call it three times, but perhaps more interestingly it also gave me a speed advantage: it meant that I was able to fit a much larger portion of the working set into L2 cache. Unfortunately in the example above where you read and then immediately write the file you get almost the worst possible memory access pattern. If you are in the business of writing programs to do that, let me give you some advice: Stop. It's already been done.

Anyway, I hope that proves useful to someone. Feel free to use it however you wish.

August 25, 2007

More CV Advice

Mary just read my previous piece CV Advice and mailed me with this question:

why not just advise people with a non-Anglo surname to list their visa status? Is there some reason why "Mohammed al-Rashid, 42 Waratah Way, Sydney" is more reassuring that "Mohammed al-Rashid, present location: Sydney, visa status: Australian citizen"?

I must confess that that particular piece of advice didn't come from my own reading of CVs; I make a point of leaving aside such considerations of "are they legally allowed to work for me?" until my second pass, because I realise that it can be a somewhat difficult question. No, that piece of advice came from watching more than one of my superiors over the years discard CVs after reading only the name of the applicant. I've been shocked and appalled at this practice, but as an author of a CV, you can't lecture the reader on their unreasonable prejudices unless you get the interview, so you deal with that problem first.

To address the first part of the question -- why not just list your visa status -- don't get me wrong; if it's relevant, you must list your visa status. But that's trying to achieve a separate goal -- in stage two when the reader is considering interviewing you, they'll need to work out the implications of your visa status and make sure you'll actually be able to work under the necessary conditions.

But listing your address is about beating the possible prejudices of your reader, who will no doubt have reviewed a significant minority of CVs from people outside the country, who claim to be able to work in Australia immediately, only to discover after some closer inspection that actually they're eligible for a visa, haven't got it yet, and will need 3 months to relocate before being able to start in your organisation. I know I've read CVs like that. I understand what might motivate someone to do that in a CV, but the bottom line is that I need someone now, and as the author of your CV you have 30 seconds to make sure that your honesty in the matter of your visa status and availability is understood and appreciated. You need to give your reader all the help that you can.

Just to clarify, I think everyone should put their whole address in, because it adds to your credibility no matter what your name is. But, if your name "sounds foreign" (I'll leave it to you to decide if your name does), the benefit is multiplied because there's a possible fear, which might not be justified, but is probably with precedent, that they'll be wasting their time on you. No-one wants to waste time, it's as simple as that.

As for the second part -- why not just list a present location and visa status -- that's probably an ok way of going about it. My preference for listing an actual address is that the more detail you give, the more you sound like a real person. You need to consider that if you omit a piece of information from your CV, there's a good chance your reader will pick up on that and wonder why it was omitted. Personally, that's not a risk I like to run, so for anything that I think is important enough for my CV, I ensure that it's documented as completely as could be considered relevant, in the concise fashion that I outlined in the previous article.

But, as Mary has put it, I think I'd read that and feel relatively safe about giving Mohammed an interview, so that's another way to approach the problem, should you not wish to list your address (useful I guess in the case of a "web CV").

Thanks Mary for picking up on that.

CV Advice

It just so happens that over the last few weeks, I've been spending crazy amounts of time editing CVs. I didn't really think I was much of an expert on the subject until sitting down and finding just how easily it comes to me to tear one of these no doubt carefully constructed documents to shreds. I guess I took a bit for granted the idea that everyone possessed some basic level of knowledge about what to write in a CV, but having thought about it some more I've realised that I only have this knowledge because of the sheer number of them that I've read and the tiny number of them that I've read twice.

I think that probably, some people under-estimate the effect a CV has on your chances of getting a job. There's also a bit of a misunderstanding of what these potential employers will actually use your CV for, even amongst those who understand that it's really, really important to have a good CV.

So, before I get into applying you with generous serves of simple advice to make your CV awesome, let me try to explain what happens when an employer needs staff.

Basically this is what happens: you write an ad, you post it on some large job site like seek, then you come in the next morning with 50-100 extra emails than you'd normally get, all from applicants who want the job you're offering.

Now, the thing is, I've got 100 applicants, I need one. Reading a CV properly takes about half an hour, but I don't have 50 hours to read CVs. So what follows is a process where each one of those applications gets at most 30 seconds of consideration. This is just a short-listing process, to throw away the people who are just wrong for the job. To give you some idea of the brutality of this step, I generally go from 100 to 5 CVs in this step. I've read about other people who do this multi-stage thing, but it's not for me, and I don't think I'm uncommon in this regard (to be honest, some people make such a bad impression in their application that I don't even get as far as the CV).

There's a very important message in that: you need to make sure that all the reasons that I should hire you are completely obvious in a 30 second glance. I've read a lot of other pundits who have tried to prescribe what that information is, but I actually think that you've probably got a better idea of that than I do. I will say this: the section that will get most of my attention in those 30 seconds is your work history. I want to very quickly get a feel for what you've done. There's a few reasons for that, but mostly it's just the quickest way to get a feel for what you're capable of, and I do pay a little bit of attention to the general trajectory of your career to date. The thing that will most quickly endear you to me as an applicant is seeing some tangible experience listed, which very closely mirrors the work I want you to be doing. Having a short list of skills early on is a good here -- it will somewhat guide my reading of your work history.

So having got my list down to 5, that's when I'll start looking through your CV a bit more closely, trying to imagine you in my team. This is likely to be about 5 minutes per CV, and I'll have made a decision to interview or not at the end of that 5 minutes.

In the hour before your interview, I'll generally spend half an hour combing over every line, highlighting the bits that motivated me to bring you in for an interview, scribbling notes about things I want to ask about, and forming a map in my brain of how I want the interview to go. That's the third thing to remember.

Before moving on, I want to review those three purposes your CV needs to serve:

  1. The "elevator pitch" -- it needs to express enough in 30 seconds that I'll think about interviewing you. In this pass I'm trying to extract from your CV the stuff that you're good at, whether you're credible, and how can I tie you to the requirements of my job.
  2. To interview, or not to interview -- you passed the 30-second test, if I slow down and take a closer look, do I think you're a close enough match that I should spend some time talking to you? This is likely the part where I will google you, and flip through anything you've written, if it's easy enough to get to.
  3. What should I explore in the interview? -- to some extent this is the least important of the three because it's largely outside your control. If you keep it in the back of your mind though, you'll have some chance of influencing what questions you'll get asked. If you're not talented enough at CV writing to do that, no problem, but do remember that anything on your CV is fair game, so be prepared to talk about it.

Alright, I'll dive straight into sharing with everyone the advice I've been handing out. The point of the above snapshot into my world of recruiting is that keeping in mind the process, and the distinct types of scrutiny that your CV will be subject to, will help guide a lot of the decisions you make. The stuff that follows is just the basics, really, it's bland and generally applicable, and forms a sort of baseline standard.

Here goes:

  • The filename of your CV should be 'FirstnameLastnameCV.doc' -- employers get hundreds of these things, so you should make it as convenient as possible to find yours amongst the myriad that will be sitting on their filesystem. Also, while I sympathise with an affection for pdf, most recruiters have an affection for Microsoft Word and there's just no point fighting it.
  • Put complete contact details in. Anyone who's actually interested in hiring you will want to talk to you, so put a mobile phone number there at least. And, while it's absolutely racial discrimination, if you've got a slightly unusual name, putting a street address will put employers minds at rest that you are in fact an Australian resident and they don't have to worry about visas or whatever. It's just better to put it in there and not worry about missing jobs because of it. They won't letter bomb you.
  • In your employment history, in each of your jobs, each sentence needs to start with a verb that expresses what you did. Not just any verb! "worked" is a useless verb. I know you worked on stuff, that's why it's in your work history. Verbs like developed, managed, lead etc are the kinds of words you want to use. As an example, here's a line from a CV I read recently:
    Worked as a member of a small team of Engineers developing the XXX server in Felix.
    "Worked" is bad. The thing you did was "develop", but that's at the end of the sentence; it's too likely I'll miss it in the 30-second pass. The teamwork aspect is in there, but reading this, I don't get any idea of the scale of the team (small means many things to many people; saying "small" comes across as "so small that I'm embarassed to say"), and communicating that kind of detail will lend credibility to what you're saying, as well as make it more meangingful. Finally, the skill that this line should express to me, that you can write Felix code for server apps, is buried right at the end of the sentence. I would write that line like so:
    Developed server application in Felix for XXX project in a team of n people.
    Putting what you did at the start (developed), what it was you did that to next (a server application), and then other details, wrapping up with an objective statement about the team size. And remember: talk yourself up.
  • On layout: your CV will likely get mangled by recruiters, so you need to make it resistant to hack-and-slash reformatting. Use 'Times New Roman' as the font. The only possible exception to this is on headings, but make sure whatever you use is a font available on a default install of Windows. I suggest you eliminate any reliance on horizontal formatting. Make the body and the title of each section align to the left margin of your page. Put the job-titles in bold in your employment history so it's easy to navigate the document.
  • Eliminate redundancy, it really bothers people in a hurry. Saying 'Wrote unit tests...' is not sufficiently different to 'Wrote a regression test framework...'. I'm sure both are true, but you can get that into a single bullet point. Your aim is to get a chance to clarify that in the interview that you're hoping to get.
  • Make your sentences shorter. When you've got to read hundreds of these things, you really only pay attention to the first couple of words. Worse, you tend to stop reading bullet points toward the end, so put the most important stuff at the top of the list. Aiming for brevity will force you to prioritise important stuff.
  • It's good to list a team sport in your interests if you can, it suggests that you'll work well in a team.
  • Only put a 'career objective' section in if it's going to help you get this job. If you're just applying for the same kind of job you had before, it's probably not so important. The reader will know what kind of job you want, because that's what you're applying for. Stuff like '... further my experience in ... to pursue my primary professional interest of ass-kicking' or something is what you want to say here. I generally don't advise having such a section unless there's a good reason to explain why you're applying for what you're applying for. If you do have such a section, think very hard about what you want it to say. This is the first thing you're putting on your CV -- make it count.
  • Spell-check!
  • Punctuate correctly. Make correct use of apostrophes at least.

Finally, be prepared to have a few versions of your CV that highlight different aspects of yourself. Odds are that there'll be jobs that you're thoroughly qualified for, but this CV won't say that. If there's a job you really want, take the time to re-work your CV for that particular job. Aside from that, identify a few different focuses you're interested in and prepare CVs for each one. It'll mostly be re-ordering stuff and eliminating irrelevant sections, but it's worth doing.

Good luck! mail me if you've got any questions or suggestions.

July 10, 2007

Metrics

Carl, over at cysquatch, who is more types of ninja than most people are aware exist, has written about something I think is pretty important in his recent article: metrics. This is awesome because I've been trying to find a point to start talking about metrics for a while.

Metrics in a project are super-important, because anything you can't measure, you don't have. Every important aspect of your project needs to have a system for tracking its performance, preferably an automated system. There's all manner of ways to think about this, but at a very basic level, if you can't measure it, you can't put it on a brochure, and brochures sell stuff and make you money.

This was expressed to me somewhat more succinctly by one of my superiors some time ago as "you get what you measure". I've seen this phenomenon first hand, because my employer has historically been seriously concerned with performance. This came right from the top, so what it meant in practical terms is that our CEO would stalk the cubicles randomly asking engineers "how fast does it go?", and to answer that we needed to measure the speed. The net result of this is that everyone understood that there was a priority on speed, and we had this extremely simple test against which we could measure every change we made. Over time, people ran hundreds or thousands of experiments with this single variable, speed, the result. Over time the software got faster, and over time, people got better at writing fast software.

Now that's pretty awesome when you think about it: the simple act of asking how fast software went produced, only slightly indirectly, fast software. It communicated the most important goals for the project more efficiently and more effectively than any spec ever will. And it's a motivator as well -- having a number that says "my code this week made us 10% better" is fantastic -- you can see tangible results from your work.

The catch is that when you ask only for fast software, you get fast software, but you might also get unstable software or inaccurate software, or resource-hungry software, and odds are, unstable, inaccurate, or resource-hungry software isn't software that's good enough.

The other problem, is that you might think that all that stuff will somehow "just be ok", you might even think that some of those variables aren't so important for your target market. You might even be right, but those are all things you should keep an eye on; thinking isn't good enough, you need to know.

Back to Carl's article -- he's listed some bitchin' tools to monitor code complexity. The goal then, is to reduce code complexity through that act of performing measurements. This is a killer idea, and it's one that really does work, and better yet it's easy to implement: you can set it up to be an artifact of your nightly build system and get up-to-date data on this stuff every day! Indeed, the nightly build system itself is a massively useful metric -- the question of "can you build release packages?" is pretty important, because one day you're gonna want to do that.

So what else? Memory leaks are a good one to keep an eye on, and similarly straightforward -- run your unit-tests under valgrind, and electric fence for good measure. And while we're talking about unit-tests, there's metrics for those as well. Code-coverage tests let you measure how much of your code your unit-tests are actually checking. gcov is a good tool for this stuff. Making sure you have unit-tests for every line of code will give you much greater confidence in the quality of your code, and so in turn, you will get a metric of code-functionality. Setting up a metric that the unit-tests meet the spec is one of those domain specific problems, but it can be done, and you should think about how you can do it for your project.

There's other metrics you should watch as well: the deficit between a programmer's estimate on time to fix a bug and actual time taken is honestly the only way to start getting good time-estimates. Which is another important point: metrics don't just let you improve your product now, they help your staff get better at their jobs.

Similarly, it's imperfect, but trends from time-lines of bugs open, and rates of bugs opened per unit time show interesting trends. I've found these to be good indicators of project completeness in the past, but these metrics are dangerous: it's easy to be misled by this stuff, so treat such metrics with suitably large salt-grains.

And most important of all is that you need to have metrics for all the stuff you're going to sell your software on. If you're going to push out huge ads that claim your software is user-friendly, you wanna be damn sure you've conducted the usability studies to ensure that it actually is. Secure? Get some kick-ass penn-testers, and measure holes found per unit time. Fast? Buy an Avalanche. Sexy? Well, I'll leave that one as an exercise for the reader.

I've been pretty lucky in my thus far short-lived management career and worked with universally awesome people who were all able to grasp these benefits straight off the bat, and embraced any new metrics I tried to put in place. I imagine not everyone will be so lucky, so introducing these things slowly, and getting people used to the idea that they're there to help everyone are likely the strategies you want to use.

I don't have a reference to the article at hand, but Joel has previously submitted that there are some metrics you should not measure. The example he gave was in reference to his company's bug-tracking software, and he submitted that once you can measure the bugs-closed per unit time value, programmers will start artificially inflating their count by lying to the bug-tracker and to you. I frankly find this position a bit offensive, but if such situations exist it reflects poorly on management rather than the engineers. You need to be accepting of metrics that show problems, because if people hide the problems, you can't solve them, but similarly: if you can't measure, or, don't look for the problems, you can't solve them either. Now that takes some discipline, but building that kind of trust is the only way you'll be able to make your metrics valuable, and as I alluded to before: if your metrics can't demonstrate value, your program is without value.

Anyway, I've spent enough time on this one already, though there's probably still much more to say. So I'll suggest this as something to try: make a list of all the things that are important to you about your product. Speed? Stability? Memory-footprint? If there are any items on your list that you can't put a number next to, have a ponder about why it is so and if you could. If nothing else you'll get some killer graphs out of it.

June 21, 2007

ctags and cscope

If you're a programmer, and you use the excellent vim editor, you really need to get into ctags (the 'exuberant-ctags' package on debian, not the normal 'ctags' one) and cscope. These are both awesome tools, but there's plenty of info on making effective use of these tools for your own builds (cscope info here, and ":help ctags" for ctags, or google around), I thought I'd share a couple of other tid-bits on how I use them which have improved my productivity.

The Alias

Pretty simple really, I throw this:

alias t='ctags -R; find . -name "*.c" -o -name "*.cc" -o -name "*.hpp" -o -name "*.hh" -o -name "*.h" -o -name "*.cpp" -o -name "*.py" -o -name "*.pl" -o -name "*.pm" | cscope -Rb -i-'

into my .bashrc, and run 't' whenever I need to update my tags and cscope files. The -i argument to cscope works around issues you'd otherwise have with the file list being too long.

System-header tags

This one is a bit more useful. Add this to your .vimrc:

set tags=./tags,./TAGS,tags,TAGS,/usr/include/tags

and then go off and run the body of that 't' alias in /usr/include (or do it from cron if you want it to stay up to date). That way when you're trying to remember which header file it is that defines, for example IPPROTO_TCP (I can never remember that one), you're not reduced to grep -r, and vim will tab-complete when you do the search, and even remind you which IPPROTO's there are. For example, I fire up my vim and type ":ts IPPR" and then press ctrl-d, I'm greeted with this:

:ts IPPR
IPPROTO_AH        IPPROTO_HOPOPTS   IPPROTO_MAX       IPPROTO_ROUTING
IPPROTO_COMP      IPPROTO_ICMP      IPPROTO_MTP       IPPROTO_RSVP
IPPROTO_DSTOPTS   IPPROTO_ICMPV6    IPPROTO_NONE      IPPROTO_SCTP
IPPROTO_EGP       IPPROTO_IDP       IPPROTO_OSPF      IPPROTO_TCP
IPPROTO_ENCAP     IPPROTO_IGMP      IPPROTO_OSPF_LSA  IPPROTO_TP
IPPROTO_ESP       IPPROTO_IP        IPPROTO_PIM       IPPROTO_UDP
IPPROTO_FRAGMENT  IPPROTO_IPIP      IPPROTO_PUP       IPPROTO_VRRP
IPPROTO_GRE       IPPROTO_IPV6      IPPROTO_RAW

I can keep typing, or press tab to complete the first one and hit enter, which then tells me this:

  # pri kind tag               file
  1 F   e    IPPROTO_AH        /usr/include/linux/in.h
               IPPROTO_AH = 51,             /* Authentication Header protocol                   */
  2 F   e    IPPROTO_AH        /usr/include/netinet/in.h
               IPPROTO_AH = 51,       /* authentication header.  */
  3 F   d    IPPROTO_AH        /usr/include/netinet/in.h

That is, I have definitions for that constant in netinet/in.h and linux/in.h -- for maximum portability you should use the netinet version, but that's a lot easier than other ways I can think of to get that info. It also works for structure definitions and function prototypes, and if you want to dive into the definition itself, hit the number on the left (I've bolded it) and press enter. Zap, you're there. ctrl-t to bounce back. I also keep a copy of the Linux kernel source code indexed with ctags and cscope for when I do kernel development, I only pull that in when I need it though, because it generates a lot of noise for userspace stuff.

Make the best use of the tools you have; sometimes they're a distraction but used well, they can be a huge productivity boost.

Got a favourite tool? mail me about it.

Categories

Powered by
Movable Type 3.35