Saturday, November 7, 2009

Charity

Well one of my two favorite charities has started their 2009 drive. Child's Play gives toys to children's hospitals. As someone that recently spent a good deal of time in a hospital I can tell you, there is ABSOLUTELY nothing to do there except watch tv. Personally I played Civilization 4 to pass the time but not all of the kids in these hospitals have the luxury of a laptop.

So if you've a mind to make a charitable donation this holiday season please consider Child's Play for all your altruistic needs. The banner at the bottom of the page also links to their website.

On a related note if your in the Atlanta area, my other charity should start up soon too. Which is Clark's Christmas Kids, which a local radio personality Clark Howard runs. Like Child's Play Clark's Christmas Kids buys toys for children but in this case its foster children in Georgia. Christmas Kids for details of this charity.

And now stays faith, hope, charity, these three; but the greatest of these is charity. - 1 cor 13

Monday, November 2, 2009

Volatile the unpredictable keyword

Some time back I talked about static being a rarely used keyword of C. Today we are learning about Volatile which is even less used. However when you need volatile you generally cannot do without it. Volatile comes into play in two situations that I have encountered; Multi-threaded programming and dealing with hardware directly.

So what does the keyword do? It actually tells the compiler to not optimize access to a variable in a specific way. One of the biggest things an optimizing compiler tries do is keep often referenced variables in registers. However this means that if the value is changed in memory by something outside the current thread of control you can never tell when the change will be picked up.

By declaring the variable volatile you basically turn that behavior off. The compiler will load the value from memory every time it is referenced; There by picking up on changes made outside the current thread. Now the compiler makes this optimization for good reason. So using volatile incurs a speed penalty and you must use it judiciously.

In particular a guy at work had this problem recently. He was using a global variable to control when a thread exited. Something like:

int keep_going = true;

void thread(void) {
   while(keep_going) {
      ....do stuff
   } //end while
} //end thread
He had a problem that his main thread would set keep_going to false in an attempt to stop the thread. However the compiler had determined that keep_going was a lively variable and kept it entirely in the register. So the thread would never exit because the program was not reading the updated value from memory. When he described the problem to me I told him, basically what I've said about volatile and told him to use it. So theres another of C's more esoteric keywords explained.

Saturday, October 31, 2009

The language I would like to see

So if you read this blog very much you know that I talk about C a good bit. And you might ask "Well why C why not Python, Java or a more modern language?" The answer is that C sits in a niche neglected by most other programming languages. My job is in that niche of embedded systems.

C places very minimal requirements on the environment to the point that its actually fairly easy to write a C program that works directly with the computer's hardware with no OS to help. In fact by using GNU GRUB all you must do is conform to the multi-boot specification, avoid some sections of the standard library and compile with -ffreestanding -fno-builtin and maybe another flag or two. Which is surprisingly easy because few other languages can do that. Which is exactly why C/C++ is the dominate language of embedded systems (C++ is a little harder to get going but it can be done).

However i'm a little unsatisfied with that. I really wish that there was a language with somewhat higher level constructs that could be used. C++ is close to what I want but it often feels ...kludgy that things just could have been done better. Another candidate is D which is a language that's goal is to be C++ "done right". It is a reboot of adding these higher level features to C but with less weird syntax and avoiding known problems in the design of C++.

But anyway some of the features I want would be:

  • Manual Memory Management with compiler supported hooks for library based garbage collectors
  • Arbitrary memory manipulation through pointers or something similar
  • Inline assembly
  • Object Oriented
  • The ability to create objects with sub-byte packing of data elements
  • Very minimal support environment
Most of those features are targeted at working directly with the hardware of a computer. The language would be mainly for operating system/driver developers, embedded system developers and anyone else that ends up working very closely with the hardware.

Wednesday, October 21, 2009

I'm a rel bad spelr

There is a tool that I would like to see. I don't know about you but I am a really bad speller. Its happens quite often I am so far off that spell checking software does not even know what I'm trying to type. So I know code that I write is littered with spelling mistakes. What I would like to see is a Programmer's spell checker.

Now I don't mean one that spell checks just comments, because VIM already does that for me. But one that understands common programmer naming idioms like Hungarian notation, UNDER_BARS and CammelCase. It would split symbol names up and spell check the names against a dictionary like any other spell checker. The spell checker would probably need to be able to do some limited syntax analysis of the program it is spell checking, to separate out keywords and such. To really be useful it would also need very strong support for per-project user dictionary files and the ability to automatically add words from the interface of modules you are using.

Tuesday, October 20, 2009

And The 121 Dollar Word Is!

Idempotent, which is an adjective that describes something that has the same effect no matter how many times you do it. Which is something very impotent to achieve in clean up functions. Because when using manual resource management it can often be nearly impossible to tell whether something has been released or not at any one particular point in the code. So programmers tend to "over release" things to be safe. That is they release things again because they are not sure if it has been done yet or not.

This is much less of a problem in object oriented languages because you can use RAII. But in classic C where I spend most of my day we don't have destructors so that technique is not helpful. For some reason people sometimes write clean up functions that cannot be run twice without problems. Which inevitably causes problems. Because of what I described before. When writing any such function you should very carefully make sure that if it is run on an already released resource can detect that case and do nothing.

Design From Your User's Standpoint

Something that often seems to be lacking in software architecture is designing for usability of the API. Personally when I design something for other programmers (or even myself) to use I am always envisioning how the API will be used. It has brought me into disagreement with some of the other programmers occasionally in design pow-wows. They would be pushing for some interface that they think is easy to use, and I would be trying to steer them away because of issues envisioned in it's use.

One particularity annoying culprit is one of the hardware vendors used by my company. Recently I was working on using their SSL API but was having problems with the connections failing regularly. As it turns out the problem was that in some of my error handling routines I was releasing some of the SSL's resources before it was apparently done with them. Which caused their library to error and fail to connect from then on.

Now you might say "Well just use the library correctly and there wont be a problem". while thats one way of looking at it I believe it to be a poor opinion to have. What the implementers of the SSL library should have done is to more carefully designed their API and library to be easier to use and harder to break.

As one of my previous posts says "Fail safe but fail loudly!" which is one of my overriding design principles. Shift your point of view to being a client of the interface you are implementing and anticipate as best you can the errors another programmer would make. Those errors should be handled in such a way as it does not put your module/object whatever in an inconstant state. Then make sure the programmer knows that they are doing something wrong. Even better redesign the interface so that the number of errors that can be caused by misuse are minimized.

The weak link in software development is not the program but the programmer. We are all fallible. So code to your weakest link and try to make your APIs Idiot proof. Because God knows, when it comes down to it, we're all idiots in our own little ways.

Tuesday, October 13, 2009

Please Be Nice and Include Everyone

Well I saw something today that made my hair stand on end. One of my coworkers had done something he should have never, never, never, never....never, never (did I mention never) done. If you are not familiar with C you should know this. In C a functions' signature must be declared before it is used. That is you cannot call a function without first at least telling the compiler what it's interface looks like. This was a decision made in the early days of the language to make C easer to parse.

To this end C has two concepts Deceleration or saying something exists but not what it is and Definition which is saying what it is. You can't use anything, be it variable, function or whatever without at least saying it exists first. C also has two different file types the *.h file is reserved for the interface of a module and the *.c for the implementation. Which fits into the Deceleration/Definition scheme perfectly. Basically the *.h file is to be used for Decelerations only. It occasionally can contain some Definitions. To see a Definition in a *.h file usually signals a programming error however.

So ideally a C program should be structured so that there is one "module" per *.c file. Each *.c file has a corresponding *.h file with the module's public interface declared in it. Notice that I said "public interface" something that should be done is to take utility functions that have no bearing on the module's purpose out of the header file. You can define/declare them entirely in the *.c file as static. Or if they are more generally useful they can be moved to a sub-module that other modules can use and be declared/defined normally.

This is where the break down comes in. The developer had taken my advice but not really understood it. He had half way done both things. A function had been forward declared in the *.c file that used it but was defined in a separate *.c file. Which is a big no-no. The C compiler uses the deceleration in the *.h file to check and make sure that you are calling the function with the right number and types of parameters.

By forward declaring the function like he did the compiler had no chance what-so-ever of catching this programming error. While in the file where the function is used the compiler would take the forward deceleration as gospel and compile without even a warning. Then while compiling the file the function was in it would take the definition as gospel and compiled fine. However the generated code now is mismatched. If the two function signatures are ever different the compiler will happily call the functions with the wrong parameters. Another of C's many shotguns to blow your foot off.

This is the second time I've seen this in the company. The first time, was in a long existing code base, I was a little flabbergasted and spent a good bit of time cleaning it up. This time all I could do was shake my head and tell them to fix it. So the moral of the story is ...DON'T DO IT!!! Its a huge headache to find when it breaks and it will break. If you are putting a function in a *.c file and calling it from another *.c then make a header for the module. If the function is never used outside the current module then have it entirely inside the module. There is no other right way to do this!

Monday, October 5, 2009

Betting Against the Tortoise and Hare

So apparently one question that comes up pretty often in programmer interviews is "how do you find a loop in a singly linked list?" I've never personally been asked this question but thats the hearsay. The common answer is Floyd's cycle detection algorithm (also called the tortoise and the hare algorithm). It works by having two iterators the tortoise and the hare, where the hare moves through the list twice as fast as the tortoise. The algorithm can find not only a cycle in the list but where it starts (check Wikipedia's article for details). The algorithm takes a few passes through the list to find the information though.

That got me thinking about a slightly different problem. Can we find the first element of the loop with only a single pass in a list that wont fit in memory. The answer that I came up with is: Sort of. At least we can find it with some small probability that we are wrong. Is this optimal ...I doubt it but it was fun to devise. The algorithm relies on a data structure called a bloom filter.

Bloom filters have the ability to keep a tab on things in a set with very little space overhead. However bloom filters have a non-zero probability of saying something is in a set when it is not. So here comes the algorithm (I'm glossing over the exact details of the filter here).

BloomFilter set;
Iterator loop_beginning = null;
float prob = 1.0;

//go through the entire list
for(Iterator iter = list.beginning();
    iter.hasNext(); ++iter) {

    //is the node in the set
    if(set.contains(*iter)) {
        //it is! we may be in a loop.

        //is this the first node in this possible loop?
        if(loop_beginning == null) {
            prob = set.falsePositiveRate();
            loop_beginning = iter;
        } else {
            prob *= set.falsePositiveRate();
        } //end if

        //is the probability low enough?
        if(prob < errorThreshold) {
            //good enough for us return the
            //first node in the loop.
            return loop_beginning;
        } //end if
    } else {
        //nope not in the set. We have not seen
        //this node before so add it to the set.
        set.add(*iter);

        //no loop
        loop_beginning = null;
    } //end if else
} //end for

return null; 

The algorithm is pretty easy it goes around the list until the node being examined is in the bloom filter. We could stop there but it could be that we just found a false positive. Bloom filters never say something is not in the set when it is so we check to see that the next few nodes are in the set also. Every contiguous positive result reduces the probability that we are examining a false positive. So we just go until the probability that this is a false positive drops to an acceptable level.

So when would you use this? You probably wouldn't; not unless you have a known upper bound on the number of nodes and you can't have two iterators walking the sequence for some contrived reason. The size of the bloom filter will grow linearly with the upper bound of links but it will be considerably smaller then storing the keys themselves. Even with that space advantage the bloom filter might not fit in memory. But like I said it was fun coming up with this.

edit: I must have been really fried when I wrote that algorithm the first time. So its better now.

edit-edit: Argh! one day ill get that pseudo-code right.

Thursday, October 1, 2009

A Bit on Bits

Long time no post, try to do better...so on and so forth. With that out of the way.

A problem came up recently that I found interesting. The problem is an old one that of counting the number of bits set in a given integer. So I went looking for the correct algorithm to use. The one that I found was elegant and short but took a minute of inspection to understand how it worked.

The algorithm used a strategy called Dynamic Programming. First off Dynamic Programming is about the worst name for this algorithmic strategy because it conveys nothing about how it works. For that algorithmic strategy to be applicable the problem at hand must meet a few conditions. First you must be able to divide the problem into sub problems. Second you must be able to find answer to your problem by solving and putting together the sub problems.

Now you might say "But that sounds like Divide and Conquer" and you would be right. The difference is that in dynamic programming the identical sub-problem comes up multiple times. If you blindly divide and conquer such a problem you end up recomputing the sub-problems multiple times which often leads to exponential time algorithms. So in dynamic programming we keep a table of the answers computed and the problem becomes putting together the sub-answers to find the solution.

So here is the algorithm in pseudo-code ('&' is the bitwise operation a'la C).

onesCount[0] = 0

for i = 1 to 255
    onesCount[i] = onesCount[i / 2] + (i & 0x1)

See, short. This follows the basic structure of any Dynamic Programming algorithm. Basically this builds a look-up table for a single byte (i.e. onesCount[x] is the number of bits set in the integer x). The first line sets up the trivial answer to the problem's base case. Being that the integer zero contains no set bits. Then the for-loop precedes to fill in the rest of the table with the correct values. How does that work?

Now since we start at 0 and work our way upward we know for sure that the value for i / 2 has already been computed. So we know the number of bits set in that number. Well i / 2 basically chops the right most bit off the number (its a one position right bit shift). So that tells us how many bits have been set in all but the one's position and all that remains is to check out if that last bit is set or not. If it is the number of bits set in i is one more then in i / 2. So i & 0x1 is one in exactly that situation.

So thats the dynamic programming algorithm for constructing that look-up table. Do you see how it fits into the dynamic programming strategy. Be on the lookout for other algorithms from the dynamic programming family. They are always interesting.

Sunday, August 9, 2009

Trifles

Just last night I was reading a book I borrowed from a friend of mine. That book was The Sandman: Season of Mists by Neil Gaiman. In the introduction of the book by Mr. Harlan Ellison quoted a phrase attributed to Michelangelo: "Trifles make perfection and perfection is no trifle." Mr. Ellison wrote about this: "Hardly a sentiment for our times, for a world of assembly lines and buck passing and litterbugs." I immediately thought these quotes described my sentiments perfectly.

There seems to be little attention payed to craftsmanship at least from what I've seen. No pride placed in the quality of work. Perhaps this is true of all professions, and that all professionals are too pressed for time to deal with the trifles of perfection. While I agree with Dijkstra about the impossibility of bug free software. Every trifle handled every bug fixed is one step closer toward that unattainable goal.

Friday, July 31, 2009

Vote Openly

It is my opinion that the voting kiosks used in elections should be using open source software. I know that I am not the first to come up with this idea but there seems to be little mention of this on the internet (not that I have looked particularly hard). Some might ask, what are the benefits of open source voting machines. The answer is simply transparency.

It is far too easy to say one thing in software and do another. Most of the time you will never be able to tell that this has happened either. In fact its probably harder, to do exactly what you intend most of the time. Software is difficult to get correct even in the best of times. And even though I doubt there is any attempt at the voting machine companies to corrupt the vote it probably happens just as simple computer bugs. A computer is (usually) perfect in its execution of code but its only as good as the instructions its given.

Making the source code of the voting machines open goes a long way to alleviating these problems. Having the eyes of hundreds of programmers in effect doing a peer review of the code produces some of the highest quality code in existence. And we will know that the software is at least trying to do what it is intended to do. It would also give security experts the ability to find and fix vulnerabilities that would probably otherwise go unnoticed.

This project would probably need to be taken up by a decent sized government body to be taken seriously. I think it would need the backing of at least one state to gain the widespread use required to achieve the benefits mentioned. The project would also be relativity cheap for the sponsors. They would need to provide space for web presence, source control and hire a small technical administration team to oversee the project. All things most state level governments have already. So the outlay is small compared so say, building a bridge.

Tuesday, July 28, 2009

Peering Into Static

There are several rarely used keywords in C. There is asm, register, volatile, union, extern, static.... Each is useful in certain situations but those situations arise with varying frequencies. Extern and static are the most used of the least common keywords but many people do not know what they really mean.

Part of the problem is that these keywords effectively mean different things depending on what they are used on. Static, when applied to a storage definition in a function, means "shared". To declare a variable static causes it to become part of the global data. So if you declare

static int foo;
foo will be shared by every invocation of the enclosing function. Whatever value you set foo to will still be there when the function is called the next time. I don't particularly recommend this use of static. The strtok function from the C standard library uses this trick to remember where the last token ended. However this means that you can only strtok one thing at a time and the function cannot be used by two threads at the same time. Because of that shared state the invocations would trample each other and neither thread get the correct information.

Static when applied to a function deceleration means something different. In that case static means that the function is not visible outside the current file (translation unit is the exact terminology). So even if you put a forward deceleration of the function in the header file the linker will be unable to find it if you try to use it. Global variables declared static follow the same rules as when a function is declared static. By making functions and globals that are not part of the interface of that module static, you reduce name-space pollution (of the compilation symbols) and can speed up the linking step of your compilations.

Extern is the sort of the opposite of static. When applied to global data it makes the variable visible outside the file. Think of it like this; an extern deceleration is exactly like a forward deceleration of a function in a header file. And should be used in the same way. If you have global data that is to be used in a public interface you should use extern to forward declare it in the header file.

It is apparent from what I've seen at work that many really don't know how to use extern and static. So here are my simple rules.

  1. If you write a function that is not and should not be used outside the current file declare it static. Also put it's forward deceleration (if you need one) in the C file not the h file.
  2. If you have global data that you want to make visible. Declare the data as extern in the header file and as normal global data in the file C file.

Thursday, July 23, 2009

Turtle Power!

Do you remember the programming language LOGO? I sure as heck don't that was way before my time. LOGO was designed to be used as a teaching language. It's most enduring feature was not the language though. LOGO has built in, a little graphics section of the language called turtle. The drawing commands were very easy things such as "move forward 5 units", "turn left 30 degrees", "pen up" and so forth. Python has a built in library that allows you to do turtle graphics and its quite fun to play with ( python's turtle package reference). So give it a shot. But I didn't write this post to say just that.

Thinking back to my first CS class it was pretty boring. We did very simple math for our first project. We calculated the area of some rectangles of all things. Which seemed a little trivial. A better waste of our time (At least I think) would have been drawing some simple geometric shapes with turtle graphics. The visual feedback of the little turtles moving along would be much more satisfying then just printing out some simple number to the console.

Not only that the turtle graphics could be used through out the entire class. So you begin with drawing simple polygons taking in the size to draw as a parameter. Next as the class moves into control structures they draw the Fibbonacci Spiral. Then while working on lists or arrays begin drawing fractals such as the Dragon Curve and the Levy C Curve both of which are easily made using list operations. Toward the end of the class they could even touch recursion by drawing the Sierpinski Triangle and the Koch Snowflake which both have natural recursive definitions.

The use of directed, hands on exprimentation is as applicable to computer science as it is to physics or chemistry. Programming is usually such a cranial abstract thing that many have trouble with their first class. The concrete, visual feedback of computer processing, would be a boon to true beginners in picking up on how their programming affects the state of the computer. And watching that little turtle draw lines on screen under your command will be much more satisfying to the beginner then std.out.println(length * width).

Monday, July 20, 2009

Go ahead punk, Make my day

As a follow up to When Just Works, Just Fails I thought that I might post a Makefile and explain what is going on. Now I am by no means a make master so there are probably better ways to write this script. So take this with a grain of salt and go research your own.

Now something to be aware of if this is your first rodeo with make. The characters you use to indent are important. A tab begins a command passed to the build shell and this script mixes tabs and spaces for indention. This is a somewhat unfortunate "feature" of make because it can really bite you in the end. Also, well, thats a pretty advanced script, so take your time. Even better go read some beginner tutorials on make then come back we will wait...........Back? ok Good.

CC := g++

The first somewhat non-obvious thing (if you are not familiar with make) is the assignment of "g++" to the variable "CC". Make has an entire database of built in rules. They are sometimes almost enough by them selves to build a simple project. CC is a variable in the rule to compile a plain old C source file, it is the name of the compiler used to build (gcc on my system). Normally gcc uses the file extension to tell what whether it is compiling c or c++ but the extension *.h is associated with c and will choke on c++. So by setting CC I have told gcc to compile all the code as c++.

INCLUDE_DIRS := ../inc
SRC_DIRS := ../src

SRC_FILES := $(notdir $(foreach d, $(addsuffix /*.cpp, $(SRC_DIRS)), $(wildcard $d)))
OBJ_FILES := $(addsuffix .o, $(basename $(SRC_FILES)))

Now this is some of my favorite magic in this script. This little script searches all the directories listed in SRC_DIRS for files with the extension .cpp. Then it uses the function notdir to get the names of the files without their paths. So we just found all of the source files for this project. (If the SRC_DIRS list is complete). OBJ_FILES is similar its the file name of every .o file this project will output in the building process by replacing .cpp with .o for every item in the SRC_FILES variable.

vpath %.h $(INCLUDE_DIRS)
vpath %.cpp $(SRC_DIRS)

The vpath lines tell make where to search for files it needs. The list of directories to search is associated with a pattern for what kind of files to find there. As an example vpath %.cpp SRC/ tells make that when it is looking for a .cpp file to include the SRC/ directory in it's search.

CXXFLAGS := $(addprefix -I , $(INCLUDE_DIRS))
CPPFLAGS = -MMD -MP -Wall

These two variables are flags passed to the compiler when it builds. They are just passed to g++ so look them up (man gcc is your friend). The most important for this script are -MMD and -MP these cause gcc to create files containing make rules that specify the dependency information for each file. This is makes main purpose. It uses the information from these files to decide what files it must rebuild.

.PHONY: all clean
all : Oddysey

Oddysey : $(OBJ_FILES)

This is pretty standard boiler plate make stuff. It says that the Oddysey project depends on all of the files in OBJ_FILES.

ifneq "$(MAKECMDGOALS)" "clean"
    include $(wildcard *.d)
endif

$(foreach src, $(SRC_FILES), \
    $(eval $(src:.cpp=.o) : $(src)))

Here is more magic for automatic dependency generation. The first part includes all those dependency files that have been generated by the build process. The second requires a little more explanation. The eval function runs any string you give it as part of the script. The foreach takes each source file in turn from the SRC_FILES variable, and creates a dependency rule linking a .o file with every .cpp file that the make script found in its search. This information is actually redundant most of the time. The .d dependency files included encode the same information. So why do we need this? Its a boot strap so that when you add a new .cpp file to the project it is automatically picked up and it's dependency information generated.

clean :
 rm $(OBJ_FILES)

This is more boiler-plate make. The clean rule traditionally deletes all the intermediate files created when building the project. So thats it my little make file. It actually outputs all the files built in the current directory. So to run it you must move into the build directory and invoke make with make -f ../Makefile

Thursday, July 9, 2009

When Just Works, Just Fails

I have a rough history with IDEs. It seems like every time that I try to use one it seems to get in the way more then it helps me. Generally I eschew the monstrosities for that exact reason. I have tried Netbeans, Eclipse, various Visual Studios, IDLE and probably many others. Generally my development environment consists of gVIM, bash, cscope and GNU make. Some might think me a little crazy to do things the "hard way" but I like the control it gives me.

Case in point, just for fun, last night I started working on an Android application. I installed the proper plug-in for Eclipse and started hacking away at the android version of "hello world". Well after writing the code I began to try to run it on the G1 simulator and "It Just Works" didn't. For the next four or so hours I was banging my head up against the opaque build process. Finally I fixed the problem. It was my own fault for skipping a step in the installation of the tools. Now I am not claiming that doing things with make, ant or whatever would have been any faster. However I feel like those hours are wasted because I don't really know anything more about the build environment. Had I been using my usual method of development I would probably have learned the entire build process in that time.

I am not recommending that you ditch your IDE "to each his own". But every developer should at sometime do at least a toy project the hard way. Learn how building works it will make you a better programmer in the long run. Once you do that learn your favorite IDE's build process, what tools it uses and how it invokes them. Then when something breaks you will better know how to fix it and will waste less time banging your head against "It Just Works".

Tuesday, July 7, 2009

Hold the Macro

It seems a little odd to me but I am probably the most knowledgeable C programmer where I work. Odd in that I've only had my CS degree for about 2 years now. But with that title comes a great deal of questions from the other programmers. Often having to do with my help debugging a problem they are having. One such problem was of such subtlety that it took quite a while to find and fix. The problem is a well known one but perhaps by discussing it here I can help some programmer avoid a bug in the future. Specifically it had to do with #defined macros.

Something many inexperienced C-programmers miss is the difference between a macro invocation and a call to a function. Traditional C is primarily what is called a Call-By-Value language. What this means is that when a function is called like foo(1 + 2). 1 + 2 is evaluated and the value 3 is given to the function.

That is not true for macros. Macros actually replace the text of your code with their bodies. This makes macros act like they use a parameter passing method called Call-By-Name which is very different to Call-By-Value. But I digress let me just show you the problem code and explain what happened. The offending code was something like this:

    #define abs(val) ((val > 0)? val : -val)

    ....

    if(abs(lhs - rhs) < 45) {
        do something;
    } //end if
Did you spot the problem there?
Hint 1: If you didn't think about replacing the abs(lhs - rhs) with the macro text and see if you see it.
Hint 2: It has to do with when lhs - rhs <= 0
Answer: After the macro expansion the if construct looks like this
    if(((lhs - rhs > 0)? lhs - rhs: -lhs - rhs) < 45) {
        do stuff;
    } //end if
see the false clause of the "( )? : " (Which is called the ternary operator if you want to look it up) it's wrong. So when (lhs - rhs) is zero or less you get something other then the negation of the value calculated like was intended. This one problem could have been fixed by:
    #define abs(val) ((val < 0)? (val): -(val))
Which makes that particular define act more like any normal C-function.

There is another problem with this macro however; if the expression passed in val had side effects (changed the state of memory and not just computed a value) things would still be wrong. Say you had a function:

    int only_run_me_once(void);
And you used even the corrected macro above to take the absolute value of it's return. Guess what you just ran that function twice, because the code path through that macro has val in it twice. And if only_run_me_once is called that because it breaks something the second time its run you've a different very difficult to find bug. My advice is that if you don't know exactly what you are doing don't mess with macros (except for maybe to learn exactly what you are doing). Create a small function and trust your compiler to inline it for you.

Friday, July 3, 2009

Fail Safe, but Fail Loudly

I am starting this blog on the advice of one of my coworkers. He tells me that I might be a great help to other programmers by talking about my thoughts and experiences as a software engineer. Reader beware however I am a poor writer as far as spelling and grammar go. Hopefully my computer will help me with the former but readers will just have to bare with the latter. With that preamble done on to the meat of my first entry.

Recently I have been creating an application that communicates with a credit card processor. The processor uses a proprietary ASCII based data format. The structure of the protocol is not bad consisting of variable-length positional fields separated by file separator characters. The difficulty is that the software that accepts the messages at the processor only very weakly validates the input. This simple fact makes it very difficult to debug the communication. The processor will gleefully accept even wildly, obviously wrong input and approve the transaction. Even though the input message might be missing some information necessary to the transaction. The only way to be sure that things are right is to contact the employee that does certification of your application and ask them to validate the input. She generally responds quickly but is very busy herself and is not available all of the time. I cannot count how many hours that I have wasted on this project for this reason. Several milestones have been missed because I cannot tell if my stuff is correct and the certification employee is not available for a few days. So now comes the lesson that I want you to take away from this anecdote.

When designing and implementing any software interface for use by other developers; one of your goals should be to fail safe but fail spectacularly. Never just squelch errors, silent failures can cause extremely difficult to find bugs. Your software should wail like a banshee that something has gone wrong and it could not handle it. To me this seems a obvious but I have met several developers who never learned this lesson. The main reason for this is to show developers where the bugs in their software are; "the squeaky wheel gets the grease" so to speak. By making the failures loud you create an interface that is easy to debug and promote good programming practices.