Spectemur Agendo: July 2009

Friday, July 31, 2009

Vote Openly

It is my opinion that the voting kiosks used in elections should be using open source software. I know that I am not the first to come up with this idea but there seems to be little mention of this on the internet (not that I have looked particularly hard). Some might ask, what are the benefits of open source voting machines. The answer is simply transparency.

It is far too easy to say one thing in software and do another. Most of the time you will never be able to tell that this has happened either. In fact its probably harder, to do exactly what you intend most of the time. Software is difficult to get correct even in the best of times. And even though I doubt there is any attempt at the voting machine companies to corrupt the vote it probably happens just as simple computer bugs. A computer is (usually) perfect in its execution of code but its only as good as the instructions its given.

Making the source code of the voting machines open goes a long way to alleviating these problems. Having the eyes of hundreds of programmers in effect doing a peer review of the code produces some of the highest quality code in existence. And we will know that the software is at least trying to do what it is intended to do. It would also give security experts the ability to find and fix vulnerabilities that would probably otherwise go unnoticed.

This project would probably need to be taken up by a decent sized government body to be taken seriously. I think it would need the backing of at least one state to gain the widespread use required to achieve the benefits mentioned. The project would also be relativity cheap for the sponsors. They would need to provide space for web presence, source control and hire a small technical administration team to oversee the project. All things most state level governments have already. So the outlay is small compared so say, building a bridge.

Tuesday, July 28, 2009

Peering Into Static

There are several rarely used keywords in C. There is asm, register, volatile, union, extern, static.... Each is useful in certain situations but those situations arise with varying frequencies. Extern and static are the most used of the least common keywords but many people do not know what they really mean.

Part of the problem is that these keywords effectively mean different things depending on what they are used on. Static, when applied to a storage definition in a function, means "shared". To declare a variable static causes it to become part of the global data. So if you declare

static int foo;

foo will be shared by every invocation of the enclosing function. Whatever value you set foo to will still be there when the function is called the next time. I don't particularly recommend this use of static. The strtok function from the C standard library uses this trick to remember where the last token ended. However this means that you can only strtok one thing at a time and the function cannot be used by two threads at the same time. Because of that shared state the invocations would trample each other and neither thread get the correct information.

Static when applied to a function deceleration means something different. In that case static means that the function is not visible outside the current file (translation unit is the exact terminology). So even if you put a forward deceleration of the function in the header file the linker will be unable to find it if you try to use it. Global variables declared static follow the same rules as when a function is declared static. By making functions and globals that are not part of the interface of that module static, you reduce name-space pollution (of the compilation symbols) and can speed up the linking step of your compilations.

Extern is the sort of the opposite of static. When applied to global data it makes the variable visible outside the file. Think of it like this; an extern deceleration is exactly like a forward deceleration of a function in a header file. And should be used in the same way. If you have global data that is to be used in a public interface you should use extern to forward declare it in the header file.

It is apparent from what I've seen at work that many really don't know how to use extern and static. So here are my simple rules.

If you write a function that is not and should not be used outside the current file declare it static. Also put it's forward deceleration (if you need one) in the C file not the h file.
If you have global data that you want to make visible. Declare the data as extern in the header file and as normal global data in the file C file.

Thursday, July 23, 2009

Turtle Power!

Do you remember the programming language LOGO? I sure as heck don't that was way before my time. LOGO was designed to be used as a teaching language. It's most enduring feature was not the language though. LOGO has built in, a little graphics section of the language called turtle. The drawing commands were very easy things such as "move forward 5 units", "turn left 30 degrees", "pen up" and so forth. Python has a built in library that allows you to do turtle graphics and its quite fun to play with ( python's turtle package reference). So give it a shot. But I didn't write this post to say just that.

Thinking back to my first CS class it was pretty boring. We did very simple math for our first project. We calculated the area of some rectangles of all things. Which seemed a little trivial. A better waste of our time (At least I think) would have been drawing some simple geometric shapes with turtle graphics. The visual feedback of the little turtles moving along would be much more satisfying then just printing out some simple number to the console.

Not only that the turtle graphics could be used through out the entire class. So you begin with drawing simple polygons taking in the size to draw as a parameter. Next as the class moves into control structures they draw the Fibbonacci Spiral. Then while working on lists or arrays begin drawing fractals such as the Dragon Curve and the Levy C Curve both of which are easily made using list operations. Toward the end of the class they could even touch recursion by drawing the Sierpinski Triangle and the Koch Snowflake which both have natural recursive definitions.

The use of directed, hands on exprimentation is as applicable to computer science as it is to physics or chemistry. Programming is usually such a cranial abstract thing that many have trouble with their first class. The concrete, visual feedback of computer processing, would be a boon to true beginners in picking up on how their programming affects the state of the computer. And watching that little turtle draw lines on screen under your command will be much more satisfying to the beginner then std.out.println(length * width).

Monday, July 20, 2009

Go ahead punk, Make my day

As a follow up to When Just Works, Just Fails I thought that I might post a Makefile and explain what is going on. Now I am by no means a make master so there are probably better ways to write this script. So take this with a grain of salt and go research your own.

Now something to be aware of if this is your first rodeo with make. The characters you use to indent are important. A tab begins a command passed to the build shell and this script mixes tabs and spaces for indention. This is a somewhat unfortunate "feature" of make because it can really bite you in the end. Also, well, thats a pretty advanced script, so take your time. Even better go read some beginner tutorials on make then come back we will wait...........Back? ok Good.

CC := g++

The first somewhat non-obvious thing (if you are not familiar with make) is the assignment of "g++" to the variable "CC". Make has an entire database of built in rules. They are sometimes almost enough by them selves to build a simple project. CC is a variable in the rule to compile a plain old C source file, it is the name of the compiler used to build (gcc on my system). Normally gcc uses the file extension to tell what whether it is compiling c or c++ but the extension *.h is associated with c and will choke on c++. So by setting CC I have told gcc to compile all the code as c++.

INCLUDE_DIRS := ../inc
SRC_DIRS := ../src

SRC_FILES := $(notdir $(foreach d, $(addsuffix /*.cpp, $(SRC_DIRS)), $(wildcard $d)))
OBJ_FILES := $(addsuffix .o, $(basename $(SRC_FILES)))

Now this is some of my favorite magic in this script. This little script searches all the directories listed in SRC_DIRS for files with the extension .cpp. Then it uses the function notdir to get the names of the files without their paths. So we just found all of the source files for this project. (If the SRC_DIRS list is complete). OBJ_FILES is similar its the file name of every .o file this project will output in the building process by replacing .cpp with .o for every item in the SRC_FILES variable.

vpath %.h $(INCLUDE_DIRS)
vpath %.cpp $(SRC_DIRS)

The vpath lines tell make where to search for files it needs. The list of directories to search is associated with a pattern for what kind of files to find there. As an example vpath %.cpp SRC/ tells make that when it is looking for a .cpp file to include the SRC/ directory in it's search.

CXXFLAGS := $(addprefix -I , $(INCLUDE_DIRS))
CPPFLAGS = -MMD -MP -Wall

These two variables are flags passed to the compiler when it builds. They are just passed to g++ so look them up (man gcc is your friend). The most important for this script are -MMD and -MP these cause gcc to create files containing make rules that specify the dependency information for each file. This is makes main purpose. It uses the information from these files to decide what files it must rebuild.

.PHONY: all clean
all : Oddysey

Oddysey : $(OBJ_FILES)

This is pretty standard boiler plate make stuff. It says that the Oddysey project depends on all of the files in OBJ_FILES.

ifneq "$(MAKECMDGOALS)" "clean"
    include $(wildcard *.d)
endif

$(foreach src, $(SRC_FILES), \
    $(eval $(src:.cpp=.o) : $(src)))

Here is more magic for automatic dependency generation. The first part includes all those dependency files that have been generated by the build process. The second requires a little more explanation. The eval function runs any string you give it as part of the script. The foreach takes each source file in turn from the SRC_FILES variable, and creates a dependency rule linking a .o file with every .cpp file that the make script found in its search. This information is actually redundant most of the time. The .d dependency files included encode the same information. So why do we need this? Its a boot strap so that when you add a new .cpp file to the project it is automatically picked up and it's dependency information generated.

clean :
 rm $(OBJ_FILES)

This is more boiler-plate make. The clean rule traditionally deletes all the intermediate files created when building the project. So thats it my little make file. It actually outputs all the files built in the current directory. So to run it you must move into the build directory and invoke make with make -f ../Makefile

Thursday, July 9, 2009

When Just Works, Just Fails

I have a rough history with IDEs. It seems like every time that I try to use one it seems to get in the way more then it helps me. Generally I eschew the monstrosities for that exact reason. I have tried Netbeans, Eclipse, various Visual Studios, IDLE and probably many others. Generally my development environment consists of gVIM, bash, cscope and GNU make. Some might think me a little crazy to do things the "hard way" but I like the control it gives me.

Case in point, just for fun, last night I started working on an Android application. I installed the proper plug-in for Eclipse and started hacking away at the android version of "hello world". Well after writing the code I began to try to run it on the G1 simulator and "It Just Works" didn't. For the next four or so hours I was banging my head up against the opaque build process. Finally I fixed the problem. It was my own fault for skipping a step in the installation of the tools. Now I am not claiming that doing things with make, ant or whatever would have been any faster. However I feel like those hours are wasted because I don't really know anything more about the build environment. Had I been using my usual method of development I would probably have learned the entire build process in that time.

I am not recommending that you ditch your IDE "to each his own". But every developer should at sometime do at least a toy project the hard way. Learn how building works it will make you a better programmer in the long run. Once you do that learn your favorite IDE's build process, what tools it uses and how it invokes them. Then when something breaks you will better know how to fix it and will waste less time banging your head against "It Just Works".

Tuesday, July 7, 2009

Hold the Macro

It seems a little odd to me but I am probably the most knowledgeable C programmer where I work. Odd in that I've only had my CS degree for about 2 years now. But with that title comes a great deal of questions from the other programmers. Often having to do with my help debugging a problem they are having. One such problem was of such subtlety that it took quite a while to find and fix. The problem is a well known one but perhaps by discussing it here I can help some programmer avoid a bug in the future. Specifically it had to do with #defined macros.

Something many inexperienced C-programmers miss is the difference between a macro invocation and a call to a function. Traditional C is primarily what is called a Call-By-Value language. What this means is that when a function is called like foo(1 + 2). 1 + 2 is evaluated and the value 3 is given to the function.

That is not true for macros. Macros actually replace the text of your code with their bodies. This makes macros act like they use a parameter passing method called Call-By-Name which is very different to Call-By-Value. But I digress let me just show you the problem code and explain what happened. The offending code was something like this:

    #define abs(val) ((val > 0)? val : -val)

    ....

    if(abs(lhs - rhs) < 45) {
        do something;
    } //end if

Did you spot the problem there?
Hint 1: If you didn't think about replacing the abs(lhs - rhs) with the macro text and see if you see it.
Hint 2: It has to do with when lhs - rhs <= 0
Answer: After the macro expansion the if construct looks like this

    if(((lhs - rhs > 0)? lhs - rhs: -lhs - rhs) < 45) {
        do stuff;
    } //end if

see the false clause of the "( )? : " (Which is called the ternary operator if you want to look it up) it's wrong. So when (lhs - rhs) is zero or less you get something other then the negation of the value calculated like was intended. This one problem could have been fixed by:

    #define abs(val) ((val < 0)? (val): -(val))

Which makes that particular define act more like any normal C-function.

There is another problem with this macro however; if the expression passed in val had side effects (changed the state of memory and not just computed a value) things would still be wrong. Say you had a function:

    int only_run_me_once(void);

And you used even the corrected macro above to take the absolute value of it's return. Guess what you just ran that function twice, because the code path through that macro has val in it twice. And if only_run_me_once is called that because it breaks something the second time its run you've a different very difficult to find bug. My advice is that if you don't know exactly what you are doing don't mess with macros (except for maybe to learn exactly what you are doing). Create a small function and trust your compiler to inline it for you.

Friday, July 3, 2009

Fail Safe, but Fail Loudly

I am starting this blog on the advice of one of my coworkers. He tells me that I might be a great help to other programmers by talking about my thoughts and experiences as a software engineer. Reader beware however I am a poor writer as far as spelling and grammar go. Hopefully my computer will help me with the former but readers will just have to bare with the latter. With that preamble done on to the meat of my first entry.

Recently I have been creating an application that communicates with a credit card processor. The processor uses a proprietary ASCII based data format. The structure of the protocol is not bad consisting of variable-length positional fields separated by file separator characters. The difficulty is that the software that accepts the messages at the processor only very weakly validates the input. This simple fact makes it very difficult to debug the communication. The processor will gleefully accept even wildly, obviously wrong input and approve the transaction. Even though the input message might be missing some information necessary to the transaction. The only way to be sure that things are right is to contact the employee that does certification of your application and ask them to validate the input. She generally responds quickly but is very busy herself and is not available all of the time. I cannot count how many hours that I have wasted on this project for this reason. Several milestones have been missed because I cannot tell if my stuff is correct and the certification employee is not available for a few days. So now comes the lesson that I want you to take away from this anecdote.

When designing and implementing any software interface for use by other developers; one of your goals should be to fail safe but fail spectacularly. Never just squelch errors, silent failures can cause extremely difficult to find bugs. Your software should wail like a banshee that something has gone wrong and it could not handle it. To me this seems a obvious but I have met several developers who never learned this lesson. The main reason for this is to show developers where the bugs in their software are; "the squeaky wheel gets the grease" so to speak. By making the failures loud you create an interface that is easy to debug and promote good programming practices.

Spectemur Agendo