And now stays faith, hope, charity, these three; but the greatest of these is charity. - 1 cor 13
Saturday, November 7, 2009
Charity
Monday, November 2, 2009
Volatile the unpredictable keyword
int keep_going = true;
void thread(void) {
while(keep_going) {
....do stuff
} //end while
} //end thread
He had a problem that his main thread would set keep_going to false in an attempt to stop the thread. However the compiler had determined that keep_going was a lively variable and kept it entirely in the register. So the thread would never exit because the program was not reading the updated value from memory. When he described the problem to me I told him, basically what I've said about volatile and told him to use it. So theres another of C's more esoteric keywords explained.
Saturday, October 31, 2009
The language I would like to see
- Manual Memory Management with compiler supported hooks for library based garbage collectors
- Arbitrary memory manipulation through pointers or something similar
- Inline assembly
- Object Oriented
- The ability to create objects with sub-byte packing of data elements
- Very minimal support environment
Wednesday, October 21, 2009
I'm a rel bad spelr
Tuesday, October 20, 2009
And The 121 Dollar Word Is!
Design From Your User's Standpoint
Tuesday, October 13, 2009
Please Be Nice and Include Everyone
Monday, October 5, 2009
Betting Against the Tortoise and Hare
BloomFilter set;
Iterator loop_beginning = null;
float prob = 1.0;
//go through the entire list
for(Iterator iter = list.beginning();
iter.hasNext(); ++iter) {
//is the node in the set
if(set.contains(*iter)) {
//it is! we may be in a loop.
//is this the first node in this possible loop?
if(loop_beginning == null) {
prob = set.falsePositiveRate();
loop_beginning = iter;
} else {
prob *= set.falsePositiveRate();
} //end if
//is the probability low enough?
if(prob < errorThreshold) {
//good enough for us return the
//first node in the loop.
return loop_beginning;
} //end if
} else {
//nope not in the set. We have not seen
//this node before so add it to the set.
set.add(*iter);
//no loop
loop_beginning = null;
} //end if else
} //end for
return null;
The algorithm is pretty easy it goes around the list until the node being examined is in the bloom filter. We could stop there but it could be that we just found a false positive. Bloom filters never say something is not in the set when it is so we check to see that the next few nodes are in the set also. Every contiguous positive result reduces the probability that we are examining a false positive. So we just go until the probability that this is a false positive drops to an acceptable level.
So when would you use this? You probably wouldn't; not unless you have a known upper bound on the number of nodes and you can't have two iterators walking the sequence for some contrived reason. The size of the bloom filter will grow linearly with the upper bound of links but it will be considerably smaller then storing the keys themselves. Even with that space advantage the bloom filter might not fit in memory. But like I said it was fun coming up with this.
edit: I must have been really fried when I wrote that algorithm the first time. So its better now.
edit-edit: Argh! one day ill get that pseudo-code right.
Thursday, October 1, 2009
A Bit on Bits
onesCount[0] = 0
for i = 1 to 255
onesCount[i] = onesCount[i / 2] + (i & 0x1)
See, short. This follows the basic structure of any Dynamic Programming algorithm. Basically this builds a look-up table for a single byte (i.e. onesCount[x] is the number of bits set in the integer x). The first line sets up the trivial answer to the problem's base case. Being that the integer zero contains no set bits. Then the for-loop precedes to fill in the rest of the table with the correct values. How does that work?
Now since we start at 0 and work our way upward we know for sure that the value for i / 2 has already been computed. So we know the number of bits set in that number. Well i / 2 basically chops the right most bit off the number (its a one position right bit shift). So that tells us how many bits have been set in all but the one's position and all that remains is to check out if that last bit is set or not. If it is the number of bits set in i is one more then in i / 2. So i & 0x1 is one in exactly that situation.
So thats the dynamic programming algorithm for constructing that look-up table. Do you see how it fits into the dynamic programming strategy. Be on the lookout for other algorithms from the dynamic programming family. They are always interesting.
Sunday, August 9, 2009
Trifles
Friday, July 31, 2009
Vote Openly
Tuesday, July 28, 2009
Peering Into Static
static int foo;foo will be shared by every invocation of the enclosing function. Whatever value you set foo to will still be there when the function is called the next time. I don't particularly recommend this use of static. The strtok function from the C standard library uses this trick to remember where the last token ended. However this means that you can only strtok one thing at a time and the function cannot be used by two threads at the same time. Because of that shared state the invocations would trample each other and neither thread get the correct information. Static when applied to a function deceleration means something different. In that case static means that the function is not visible outside the current file (translation unit is the exact terminology). So even if you put a forward deceleration of the function in the header file the linker will be unable to find it if you try to use it. Global variables declared static follow the same rules as when a function is declared static. By making functions and globals that are not part of the interface of that module static, you reduce name-space pollution (of the compilation symbols) and can speed up the linking step of your compilations. Extern is the sort of the opposite of static. When applied to global data it makes the variable visible outside the file. Think of it like this; an extern deceleration is exactly like a forward deceleration of a function in a header file. And should be used in the same way. If you have global data that is to be used in a public interface you should use extern to forward declare it in the header file. It is apparent from what I've seen at work that many really don't know how to use extern and static. So here are my simple rules.
- If you write a function that is not and should not be used outside the current file declare it static. Also put it's forward deceleration (if you need one) in the C file not the h file.
- If you have global data that you want to make visible. Declare the data as extern in the header file and as normal global data in the file C file.
Thursday, July 23, 2009
Turtle Power!
Monday, July 20, 2009
Go ahead punk, Make my day
CC := g++The first somewhat non-obvious thing (if you are not familiar with make) is the assignment of "g++" to the variable "CC". Make has an entire database of built in rules. They are sometimes almost enough by them selves to build a simple project. CC is a variable in the rule to compile a plain old C source file, it is the name of the compiler used to build (gcc on my system). Normally gcc uses the file extension to tell what whether it is compiling c or c++ but the extension *.h is associated with c and will choke on c++. So by setting CC I have told gcc to compile all the code as c++.
INCLUDE_DIRS := ../inc SRC_DIRS := ../src SRC_FILES := $(notdir $(foreach d, $(addsuffix /*.cpp, $(SRC_DIRS)), $(wildcard $d))) OBJ_FILES := $(addsuffix .o, $(basename $(SRC_FILES)))Now this is some of my favorite magic in this script. This little script searches all the directories listed in SRC_DIRS for files with the extension .cpp. Then it uses the function notdir to get the names of the files without their paths. So we just found all of the source files for this project. (If the SRC_DIRS list is complete). OBJ_FILES is similar its the file name of every .o file this project will output in the building process by replacing .cpp with .o for every item in the SRC_FILES variable.
vpath %.h $(INCLUDE_DIRS) vpath %.cpp $(SRC_DIRS)The vpath lines tell make where to search for files it needs. The list of directories to search is associated with a pattern for what kind of files to find there. As an example vpath %.cpp SRC/ tells make that when it is looking for a .cpp file to include the SRC/ directory in it's search.
CXXFLAGS := $(addprefix -I , $(INCLUDE_DIRS)) CPPFLAGS = -MMD -MP -WallThese two variables are flags passed to the compiler when it builds. They are just passed to g++ so look them up (man gcc is your friend). The most important for this script are -MMD and -MP these cause gcc to create files containing make rules that specify the dependency information for each file. This is makes main purpose. It uses the information from these files to decide what files it must rebuild.
.PHONY: all clean all : Oddysey Oddysey : $(OBJ_FILES)
This is pretty standard boiler plate make stuff. It says that the Oddysey project depends on all of the files in OBJ_FILES.
ifneq "$(MAKECMDGOALS)" "clean"
include $(wildcard *.d)
endif
$(foreach src, $(SRC_FILES), \
$(eval $(src:.cpp=.o) : $(src)))
Here is more magic for automatic dependency generation. The first part includes all those dependency files that have been generated by the build process. The second requires a little more explanation. The eval function runs any string you give it as part of the script. The foreach takes each source file in turn from the SRC_FILES variable, and creates a dependency rule linking a .o file with every .cpp file that the make script found in its search. This information is actually redundant most of the time. The .d dependency files included encode the same information. So why do we need this? Its a boot strap so that when you add a new .cpp file to the project it is automatically picked up and it's dependency information generated.
clean : rm $(OBJ_FILES)
This is more boiler-plate make. The clean rule traditionally deletes all the intermediate files created when building the project. So thats it my little make file. It actually outputs all the files built in the current directory. So to run it you must move into the build directory and invoke make with make -f ../Makefile
Thursday, July 9, 2009
When Just Works, Just Fails
I am not recommending that you ditch your IDE "to each his own". But every developer should at sometime do at least a toy project the hard way. Learn how building works it will make you a better programmer in the long run. Once you do that learn your favorite IDE's build process, what tools it uses and how it invokes them. Then when something breaks you will better know how to fix it and will waste less time banging your head against "It Just Works".
Tuesday, July 7, 2009
Hold the Macro
It seems a little odd to me but I am probably the most knowledgeable C programmer where I work. Odd in that I've only had my CS degree for about 2 years now. But with that title comes a great deal of questions from the other programmers. Often having to do with my help debugging a problem they are having. One such problem was of such subtlety that it took quite a while to find and fix. The problem is a well known one but perhaps by discussing it here I can help some programmer avoid a bug in the future. Specifically it had to do with #defined macros.
Something many inexperienced C-programmers miss is the difference between a macro invocation and a call to a function. Traditional C is primarily what is called a Call-By-Value language. What this means is that when a function is called like foo(1 + 2). 1 + 2 is evaluated and the value 3 is given to the function.
That is not true for macros. Macros actually replace the text of your code with their bodies. This makes macros act like they use a parameter passing method called Call-By-Name which is very different to Call-By-Value. But I digress let me just show you the problem code and explain what happened. The offending code was something like this:
#define abs(val) ((val > 0)? val : -val)
....
if(abs(lhs - rhs) < 45) {
do something;
} //end if
Did you spot the problem there?
Hint 1: If you didn't think about replacing the abs(lhs - rhs) with the macro text and see if you see it.
Hint 2: It has to do with when lhs - rhs <= 0
Answer: After the macro expansion the if construct looks like this
if(((lhs - rhs > 0)? lhs - rhs: -lhs - rhs) < 45) {
do stuff;
} //end if
see the false clause of the "( )? : " (Which is called the ternary operator if you want to look it up) it's wrong. So when (lhs - rhs) is zero or less you get something other then the negation of the value calculated like was intended. This one problem could have been fixed by:
#define abs(val) ((val < 0)? (val): -(val))
Which makes that particular define act more like any normal C-function.
There is another problem with this macro however; if the expression passed in val had side effects (changed the state of memory and not just computed a value) things would still be wrong. Say you had a function:
int only_run_me_once(void);
And you used even the corrected macro above to take the absolute value of it's return. Guess what you just ran that function twice, because the code path through that macro has val in it twice. And if only_run_me_once is called that because it breaks something the second time its run you've a different very difficult to find bug. My advice is that if you don't know exactly what you are doing don't mess with macros (except for maybe to learn exactly what you are doing). Create a small function and trust your compiler to inline it for you.
Friday, July 3, 2009
Fail Safe, but Fail Loudly
I am starting this blog on the advice of one of my coworkers. He tells me that I might be a great help to other programmers by talking about my thoughts and experiences as a software engineer. Reader beware however I am a poor writer as far as spelling and grammar go. Hopefully my computer will help me with the former but readers will just have to bare with the latter. With that preamble done on to the meat of my first entry.
Recently I have been creating an application that communicates with a credit card processor. The processor uses a proprietary ASCII based data format. The structure of the protocol is not bad consisting of variable-length positional fields separated by file separator characters. The difficulty is that the software that accepts the messages at the processor only very weakly validates the input. This simple fact makes it very difficult to debug the communication. The processor will gleefully accept even wildly, obviously wrong input and approve the transaction. Even though the input message might be missing some information necessary to the transaction. The only way to be sure that things are right is to contact the employee that does certification of your application and ask them to validate the input. She generally responds quickly but is very busy herself and is not available all of the time. I cannot count how many hours that I have wasted on this project for this reason. Several milestones have been missed because I cannot tell if my stuff is correct and the certification employee is not available for a few days. So now comes the lesson that I want you to take away from this anecdote.
When designing and implementing any software interface for use by other developers; one of your goals should be to fail safe but fail spectacularly. Never just squelch errors, silent failures can cause extremely difficult to find bugs. Your software should wail like a banshee that something has gone wrong and it could not handle it. To me this seems a obvious but I have met several developers who never learned this lesson. The main reason for this is to show developers where the bugs in their software are; "the squeaky wheel gets the grease" so to speak. By making the failures loud you create an interface that is easy to debug and promote good programming practices.