Friday, January 29, 2010

MICR madness

Have you ever looked at the bottom of a check? Ever noticed that string of characters in a weird blocky font? Well that's a MICR line. MICR stands for "Magnetic Ink Character Recognition" and is a technology hailing from the 50's. Its sort of interesting actually. At the time digital cameras were unheard of, CCD was probably at best glimmer in some researcher's eye. But banks wanted a way for computers to get information off check without human intervention. So MICR was developed.

MICR reads the signature of the magnetic ink and uses it to recognise characters of certain specially designed fonts. That part of the technology works beautifully. There numerals and a few special characters in the MICR fonts being deemed sufficient to encode the information on a check. However something smells in the world of MICR.

The standard format for a MICR line in the United States is too lax for my taste. There are two common formats for MICR lines personal checks (anything other then a number is a special MICR character translated to ASCII):

:111111111: 2222222< 3333 ;44444444444;
Field 1 would be the routing number which is the most tightly controlled of the fields. It is an ABA number, has a check-sum and is pretty reliably just like I typed. Field 2 is the account number, and three the check number. These two have all kinds of problems. Field four is an optional amount of the check. That is what most personal checks look like. Business checks generally take the form:
<3333< :11111111: 2222222< ;44444444;
With the field numbers being the same thing as in the personal check.

This is all well and good. If that was the whole story things would be easy. But things are rarely so easy. Many banks, especially community banks, printed their checks with just about any format of the above fields they could come up with. Now the routing number and amount are consistent (they were specified by the original spec) but the order of fields, what separators are used where, where there are spaces and everything else seemingly nearly random. So producing a parser that can understand even a small subset of the checks in circulation can be a pretty good task.

If you can't tell I've been fighting the things lately and am a little frustrated. There is still a good bit of time spent by tellers hand-keying check information because of this wonky bit of banking history. Too bad the ABA did not come up with a much more exact specification allowing easy interoperability between banks. Of course some of the checks I've seen do not even follow the layout that the MICR standard did create, so the problem would still exist. However that's life.

Wednesday, January 27, 2010

The power of plain text

I am going to echo something from one of my favorite books about programming. In the Pragmatic Programmer a recommendation is made: Pick an advanced text editor and use it! Really learning an advanced text editor pays dividends, not only coding speed but in reduction of hand strain. Personally I use VIM as my text editor of choice but what editor really does not matter.

Try:

But the impotent thing is to pick one and learn it. Really spend some time in the learning it. I cannot count how many key strokes that VIM has saved me with things like:

  • :s/[ \t]+$//g
  • A,j.j.
The first eats all white space at the end of all of the lines in a file, the second inserted a comma at the end of three lines. In fact the '.' command is my absolute favorite in VIM it means repeat the last edit command again.

Learn how to customize your environment also VIM has it's scripting language (there are versions that can use Lua, Python and other scripting languages too). Emacs has a full blown LISP interpreter that can do anything you want it to. But the most important thing is to stick with it. If your company requires you use this IDE or that editor then learn that one like you would your main editor. your editor is your tool as a hammer to a carpenter or a brush to an artist, treat it with the respect any good professional treats his tools of the trade.

Monday, January 25, 2010

Cowabunga!

A while back in my Turtle Power! post I mentioned some fractals that I thought suitable for use teaching beginning programmers. I thought that I would present the code written to generate one of them, specifically the Levy C Curve. The code specifies the fractal as a L-system. Basically you start with a string of symbols then use certain rules recursively expand the string. Each symbol in the string becomes an instruction and the string itself a program for drawing the fractal. The Dragon Curve is generated in the same way. It simply has a slightly different formula.

The structure of an an L-system lends itself to implementation with very basic structures. The code below uses only two non-nested loops and a mulitway branch to do all its work. So I feel that it (and any L-system fractal) make good teaching material for control structures.

The C Curve's formula is this:

Start: "F"
Rules: "F" becomes "+F--F+"
Symbol meanings:
F - draw forward
+ - turn clockwise 45 degrees
- - turn counter-clockwise 45 degrees

And the required python code is:

import turtle 

SEGMENT_LENGTH = 2 
WINDOW_WIDTH = 772 
WINDOW_HEIGHT = 501 

if __name__ == "__main__": 

    move_list = "F" 

    for i in range(15): 
        move_list = move_list.replace("F", "+F--F+") 

    turtle.setup(width=WINDOW_WIDTH, height=WINDOW_HEIGHT) 
    turtle.tracer(False) 
    turtle.up() 
    turtle.goto(-180, -135) 

    turtle.down() 

    for i in move_list: 
        if i == "F": 
            turtle.forward(SEGMENT_LENGTH) 

        elif i == "+": 
            turtle.left(45) 

        else: 
            turtle.right(45) 

    turtle.Tkinter.mainloop()

Which is amazingly short for the complex beauty of the drawing it produces. That code was produced before the last major version number update of python and has not been upgraded. I think it will run with a new python installation but it might not.

Any numbers not mentioned in the curve formula are "Magic" (i.e. I fiddled with them until they produced a good looking picture). The structure should be obvious the first thing I do is produce the "program" to draw the curve using nothing but iteration and sub-string replacement. The rest of the program is the mini interpreter that takes the program and draws the picture. Remember the more times you iterate the string replace the more complex the fractal becomes. So go ahead and play with that a little and see what you think.

Friday, January 22, 2010

Multi-threaded nightmare

Over the past many months I have been working on a project for a major company (I'll not say who but their pretty big). The project was for a DLL that would communicate with outside hardware to run a financial transaction. But recently a problem with the DLL came up. It took quite some time to figure out what was going on. So I thought I would post it here so that someone finding themselves in a similar situation will have a less painful time finding the problem.

The bug in question is a classic Threading dead-lock. But one achieved through somewhat unusual means. There are two classes of functions in the DLL synchronous and asynchronous functions (there is really only one async function). Well the async functions signal certain events by using C# delegates as call backs.

The original API actually only had the synchronous functions exposed, however at the customer's request the async function was added. There in lies my downfall. I did not Analise the consequences of having the mishmash of functions adequately and now my DLL's interface has a dead-lock staring me right in the face.

What happens is this (I need a diagram to best explain but don't have the time to make one). There are two threads involved. The main thread of control in the customer's software we will call T1, the thread spawned to handle the async code T2. So T1 calls the async function and the afore mentioned T2 is spawned and begins doing it's thing. Properly T1 should wait for T2 to send the final "OK I'm done" event before doing anything with the interface, however "The best laid schemes o' mice an' men / Gang aft agley". When T1 decides it's tired of waiting they call a synchronous function, however the mutex that keeps the functions from trampling over T2's work is locked. So T1 begins waiting for T2 to unlock the mutex. At some point following this T2 reaches an event point and uses the callback to signal the fact. Well because of the way the customer's software must work this callback actually cross invokes a function on T1, but T1 is blocked, so the callback never returns and we have a deadlock.

That took forever to find and was a classic eureka moment when I did find it. But now I am stuck. Originally the code would just say "Busy leave me alone" when the mutex was locked, which alleviated the problem. The customer was unsatisfied with that though and required that each function not return until it's work was finished. Which given the semantics of the interface brought the deadlock to the fore. I also offered that I could delay the event call-backs until T1 was unblocked and able to handle them, which was also shot down. So now I need to break the interface to fix the problem but can't.... rock and a hard place indeed.

Thursday, January 21, 2010

ARGH again!?!

Man I thought people knew better.... That thing I mentioned back in please be nice and include everyone happened again! It's a different programmer and a different company this time but, come on! This time the programmer was linking against an external binary C library without a header file but still. If that function ever changes will the .c file forward declaration keep up? I doubt it very much. So its a big time bug waiting to happen. Please please please don't make the mistake this programmer has.

Wednesday, January 20, 2010

Make Pythagoras Proud

I am interested in a somewhat lesser known branch of algorithms. Geometric algorithms are algorithms that (duh) deal with geometry. In all of my schooling the only algorithm that I was ever exposed to from this class was Quick Hull. Which calculates the convex hull of a set of points (Go read that if you don't know what a convex hull is otherwise this probably won't make sense).

Quick Hull is optimal (in the big-O sense) for calculating the convex hull. It turns out that that the optimal class for such an algorithm is O(n log n) just like comparison based sorting! Why? Because we can use a convex hull to sort a set of points!

How you ask? Well first you should know that the hull algorithms produces edges of the polygon in sorted order. If you have a list of unique numbers to be sorted L[N]. First make a list of points P[N] = new point(L[i], L[i]^2) and make the convex hull of the point cloud. If you start from the lowest coordinate the sorted order of points will just be a traversal of the generated polygon (which direction depends on how you constructed it but its traditionally counter-clockwise).

So that means that if we could convex hull faster then O(n log n) we could sort faster too! It often amazes me how results from such disparate parts of computer science can be related so closely. Further it turns out that if the points are pre-sorted in certain ways you can build the hull in O(n). Which means most of the work done by the general hull algorithms is in sorting the edges.

Now there are hull algorithms that are "output sensitive". The best of those produce a hull in O(nh) with h being the number of points that form the hull. h is generally much smaller then n, because for random data most of the points in the set get thrown away when it is determined they are inside the hull. But we contrived the situation such that every point is on the hull (h == n) so those algorithms would be slower.

Convex Hulls are by no means the only algorithms in Computational Geometry (though they are one of the most important). Some of the algorithms have application in areas that you would never think of. So if you feel inclined I recommend you check Computational Geometry out.

Monday, January 18, 2010

Ordering Custom Software for non-programmers

This post is a little different from most of my others. I am directing it at marketing and management types that order software developed but are not themselves actively involved in software. That might be a little unusual because by Google analytics I know that right now I only have one regular reader (Hi Joseph), but if a management type happens to come across this, more the better.

It's a sad fact that something like 80% of software projects fail (are never completed, miss their target, come in late, over budget or a combination of these). The truth is that software is hard and most (even the development team themselves) underestimate the effort required to complete a project.

The other big problem is that software is almost always a moving target, requirements change constantly. The biggest help a software customer can be is to try and limit these changes to only those necessary. Know what you really want and discuss these wants with the architect for the team doing the work. Talking with marketing is not enough you really must get to the architect or risk your vision being mangled in translation from marketing to development.

First sit down and brain storm, what is the need the software is filling. Think hard on this, figure out exactly the need is. Come up with all the features that would be nice to have, maybe even draw up some quick mock-ups of the UI (if your program needs one). Now that you have that, go back and look at what you REALLY need, tear the software down to it's core features. Those features are your 1.0 software. Go over the whole list, including the wish-list you came up with first, with the architect but set your sight on that basic functionality.

By limiting the scope of the first version you make it significantly more likely that the project will succeed and your software need be fulfilled. Now go back and start meeting with the architect about your wish-list. In subsequent versions add more and more of those features. Until you have the software you dream of. This in my opinion is the best way to get satisfaction out of the software development process.

Wednesday, January 13, 2010

Interface vs. implementation dependencies

Every c/c++ programmer knows the difference between a *.h file and a *.c file. The h file is for the interface and the c file for the implementation. That is basic c. However many people miss one important aspect of this. That is classifying the module's dependencies based on whether they are dependencies of the interface or the implementation.

Where the rubber meets the road is in how you include your header files. Many developers do not take the time to separate their include directives into interface vs. implementation dependencies. Interface dependencies should be included in the header file, implementation dependencies in the source file. While violating this rule will not cause compile stopping problems (usually) it slows your compilation waaaay down.

I will admit this issue is about knee deep into the realm of nit-picking but every little bit helps. So next time you feel compelled to say #include in a header file think: is this really an interface dependency?

P.S. This applies to c++ too but the situation is somewhat more complicated.