Tag Archive: C


A coding dilemma.

This happens quite a lot in programming I find. There’s the ideal way of doing something and the way you actually end up doing it, be it due to buggy libraries, quirks of code, time or inexperience. I found one earlier today that had caught me out before. Modern cameras and images can have the orientation of the picture embedded. For example older cameras held normally may produce an image that is (widthxheight) 1024×768 and vertically an image of 768×1024. However a lot of the time now both images will internally be 1024×768 and have an orientation field (probably held in exif data I expect although in UIImage it’s a @!£%£@$^ read only property called imageOrientation) that informs you which way up the camera was when the shot was taken. Excellent. Not.

The problem with this occurs if you grab the raw image data to manipulate it and then convert it back. Say from a UIImage in cocoa/ukit whatever to an IplImage in opencv. What happens, if you had the camera held vertically, when you display your processed image is that it has rotated 90º ccw because you have not ported back the orientation data. That’s fine, copy the orientation data, then port it back. Except you can’t as apple in their infinite wisdom protect you from manipulating such things! I worked around this by avoiding having to manipulate my image that way within opencv. Many many others have worked around these problems by copying the orientation data and then using that to rotate their image before copying it back thus nullifying the need for orientation.

So to my specific problem. Opencv provides many many useful tools for discovering image data. It tends to present a lot of this data back to us as linked lists of points and the like. It also provides you with tools to draw said points in a meaningful way. One of which is cvDrawContours which takes a sequence of points and draws them using ‘pen’ information you provide (size, colour, etc.) onto your image. If this is done with a vertically taken iPhone camera image, copied to an IplImage, drawn on with cvDrawContours and then copied back to a UIImage to display your picture rotates 90º ccw. There are two potential solutions here:

  • Write aforementioned code that takes the old orientation into account and rotates the image post manipulation.
  • Rewrite the functions you need to to do any image drawing direct on your UIImage or Image object.

It might not be obvious but the former method is probably the worst of the two. It’s the quickest in terms of implementation. I think there are around 8 types of orientation of which 4 are mirrors so you only have to write code to rotate and/or mirror your image. However the other option is to simply doodle on your original using the data from opencv. Far more complicated for sure but it should be more efficient than rotating all those pixels about. I’ll find out when I’m done.

To do this I took the cvDrawImage code directly out of cxdrawing.cpp along with it’s helper functions and inline data structures in opencv and pasted it into my ImageView.m file. I renamed the function to uiDrawImage and altered the first parameter away from an IplImage to a UIImage. I then changed all function parameters away from a c++ style to an objective-c++ style. Of course it’s c++ so I had to rename the source file to ImageView.mm to get it compiled as objective-c++. No other changes needed except judicious deleting of functions that I shouldn’t have copied. The linker warnings gave them away so no problem there.

Now, a sequence of points in opencv is thankfully a doubly linked list with no tail and end. It’s effectively a circle of co-ordinates. This makes it easy to traverse and as it’s a list of structures that contain x&y floats or ints or whatever we should be able to rip out a lot of the code from cvDrawImage and get it so when it’s got the points it just adds them to a CGMutablePathRef. That’s effectively the same thing in cocoa/ukit whatever it is. After that there is a function provided that does the same thing as cvDrawImage but with a UIImage or Image. I’ve not looked that far yet.

If you hadn’t guessed this post is mostly laying out my current thinking on how to tackle this in the best way, not the way I’ll probably end up doing it. We’ll see.

OpenCV is pretty darn excellent.

It’s a heck of a lot faster than my code, the canny/sobel operators especially. Also it works. See photos for more.

The first is the original with 5×5 blur to reduce the noise, the second is adaptive threshold and the third is a canny edge detection. Takes about a 1/100th of the time to run on my netbook compared to image magick and my own stuff. Awesome.

Nothing to see here….

I’ve hit somewhat of a brick wall. I managed to get some morphological hit&miss code that thinned the images down to 1px wide and it worked fine on the previously posted test. However after that I started getting horrible loops and things attached to the digits ruining all chance of OCR from them. Coupled with the fact that additional photos taken with an actual iPhone camera as opposed to my Lumix don’t come out too well it’s a bit of a dead loss right now. On the plus side I have implemented my own code to write out bitmap files so once I’ve got my thresholded quantized image I’m free of MagickWand. Potentially I can implement the magic wand stuff myself now too and do away with it completely. However these problems are in dire need of solving:

1) Identifying some criteria upon which to threshhold and image (currently what radius to use in the process). Minor changes make the world of difference although 40-45px is still somewhat optimal.
2) Some way of RIPing the image. (Remove Isolated Pixels).
3) Some way of perhaps filling in the obvious holes, morphological thickening sucks.
4) A better approach to thinning, perhaps, if 2&3 don’t help with the current method.
5) A corner detection algorithm that works and isn’t either: (c)Dera/QinetiQ, Some Student or Patented and aww screw let’s face it, I need to do my own.

On the plus side a trip to the Central Resources Library for Hertfordshire has yielded useful material.

A sample of a ‘scanned’ puzzle.

So this is a section of a sudoku puzzle as the code sees it:

A small section of a Sudoku converting 1's and 0's.

It’s a screen cap of some debug output put into GEdit with a small font. It’s just 1′s and 0′s. A few interesting things here apart from the noise around the lower left of 8 is that it’s clearly obvious which numbers were the ones printed in the paper, the 2 and 3, along with the fact that the perfect horizontal lines are visibly distorted compared to the verticals. This is entirely down to the angle that I suspect most people will photograph things from unless they try for an exactly overhead shot. Now to test my detection routines.

EDIT@22:14 I take it back. The data was getting borked on exit from the function. Turns out that the code in use had RETURN() as a macro which freed the pointer I was using when I called RETURN(TRUE). Haha…. still doesn’t answer my original question below though.
/EDIT

I thought they were essentially the same. A pointer to a pointer is the address of an address and surely the address of a pointer is an address of an address? I’ve clearly forgotten something fundamental here so on the off chance somebody can help. I’ve done this a few times now and fixed it without really thinking what I’m doing. This time however it caught me out. For the whole of today I’ve been fiddling with one routine only to find a final problem when I’d hashed out the algorithm to use. I have this defined as a function:


bool read_bitmap_file_data(unsigned char *cstream, unsigned int *width, unsigned int *height, unsigned char **data);

It used to take a file handle but it’s easier to bung it a stream of characters as variable cstream. data is what I receive back. A pointer to a pointer. Within the function it goes through and parses the stream adding bits to data as follows:


unsigned char *ptr;
int bytes;

for (bytes = 0, ptr = bits; bytes < size; bytes++, ptr++) {
if ((value = next_int (&p_cstream)) < 0)
RETURN (FALSE);
*ptr=value;
}

The above snippet is actually stolen from an old file called xbm.c (google it). After that loop it does this assignment if bits has data:


*data = bits;

If I define the variable passed to it as a pointer to a pointer like so:


unsigned char **hex_sudoku;
read_bitmap_file_data(xbm_sudoku,&width,&height,hex_sudoku);

It segfaults on the assignment mentioned above. If I declare the parameter and call the function like so:


unsigned char *hex_sudoku;
read_bitmap_file_data(xbm_sudoku,&width,&height,&hex_sudoku);

it is fine. In both instances the contents of the variable bits is identical so it is something about declaring a pointer to a pointer and just passing it as opposed to declaring a pointer and passing the address. What am I actually passing in the first instance if not an address?

Ah well... it all works now and I have not exactly wasted my time here.

Well that was sodding hard work….

OK so I’ve praised MagicWand now I feel I can curse it. I want a 1bit BMP out of the thing. A bitmap is easy, it’s smart. I write something ending in jpg it writes a jpg, I want a bmp it writes one. However it was writing 24bit images. If I want to faff with it on a binary level myself I need as little data as humanly possible. So I call all sorts of functions to set the bit depth and such like:

MagickSetImageDepth(contrast_wand,8);
or
MagickSetType(contrast_wand,GrayscaleType);
or
MagickMANYOTHERS

None work. I get the same 24bit image. So I finally thought fine… I’ll write my own routing to strip everything out and call it as a function:


static MagickBooleanType FadeToBlack(PixelView *black_view, void *context)
{
MagickPixelPacket pixel;
PixelWand **pixels;
register long x;

pixels=GetPixelViewPixels(black_view);
for (x=0; x < (long) GetPixelViewWidth(black_view); x++)
{
/* Do magic */
PixelGetMagickColor(pixels[x],&pixel);
pixel.red=0;
pixel.green=0;
pixel.blue=0;
PixelSetMagickColor(pixels[x],&pixel);
}
}
--SNIP--
black_view=NewPixelView(contrast_wand);
if(black_view == (PixelView *)NULL)
ThrowWandException(contrast_wand);
status=UpdatePixelViewIterator(black_view,FadeToBlack,(void*) NULL);
if(status == MagickFalse)
ThrowWandException(contrast_wand);
black_view=DestroyPixelView(black_view);

Well you can guess how that ended up… with a black image (I thought wrongly being CMYK there might be a pixel.black. The above routine would have worked with a bit of analysis in it. Still with a 24bit image though. So I thought… well WWPD? (Photoshop) Quantize! So it turns out it’s one line:

/* Quantize image to 2? Please! */
MagickQuantizeImage(contrast_wand,2,GRAYColorspace,0,MagickFalse,MagickFalse);

And I get my precious:

diziet@ono-sendai:~/Programming/C/OCRSudoku$ file sudoku2bt50.bmp
sudoku2bt50.bmp: PC bitmap, Windows 3.x format, 800 x 739 x 1

I am hoping that on a binary level this is really simple to parse. I mean… It doesn’t have much data surely? 800×739 bits plus header? Nope sadly not. Ah well it’s only 73kB.

EDIT

Oh fucksocks, wish it’d dawned on me sooner. Isn’t the X11 xbm format err… ascii? Old school icons etc… So…..:


diziet@ono-sendai:~/Programming/C/OCRSudoku$ ./thresh-sigmoidal sudoku.jpg sudoku2bt50.xbm 50

diziet@ono-sendai:~/Programming/C/OCRSudoku$ file sudoku2bt50.xbm
sudoku2bt50.xbm: ASCII C program text

Did that say C? C? Hell I can do C!


#define sudoku2bt50_width 800
#define sudoku2bt50_height 739
static char sudoku2bt50_bits[] = {
0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
0x00, 0x00, 0x00, 0x00, 0x30, 0xC3, 0xFC, 0xFF, 0x0F, 0x1C, 0x04, 0x00,
0x00, 0x00, 0xC0, 0xFB, 0x7F, 0x1F, 0x00, 0xC0, 0xE1, 0xFF, 0xFE, 0x00,
etc.

Oh baby! Right… time for more Modnations Racer or Final Fantasy ∞.

EDIT

OK I can load and display OK:


diziet@ono-sendai:~/Programming/C/OCRSudoku$ ./testxbm test_bits.xbm
1111111111111111
1000000000000001
1011111111111101
1010000000000101
1010111111110101
1010100000010101
1010101111010101
1010101001010101
1010101001010101
1010101111010101
1010100000010101
1010111111110101
1010000000000101
1011111111111101
1000000000000001
1111111111111111

Obv. it’s a bit messy to display the 800pixel sudoku:


diziet@ono-sendai:~/Programming/C/OCRSudoku$ ./testxbm sudoku2bt50.xbm |more
0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
0000000110011000011001111111111111111110000001110000010000000000000000000000000000000000011110111111111111011111000000000
00000000111000011111111111011111110000000011111111111100001111000011110000

etc. etc. etc. etc. etc. etc. etc. etc. etc. etc. etc. etc. etc. etc. for another 738 lines… It works! Now for the hard part.

MagickWand and OCR

The problem with photographs and OCR is they contain lots of noise and colour that is really not helpful.  Levels and Posterise alongside other tools in the GIMP or Photoshop help but they’re hardly automatic and can’t be compiled into code so I looked at ImageMagick.  It turns out it has a new simpler API called MagickWand.  One of the examples on the site is a Sigmoidal contrast enhancer, whatever that really does.  It helps,  I decided to take that code and call an adaptive threshold function in MagicWand that takes a number of pixels as a square to apply it’s algorithm to.  Thinking there would be a sweet spot where it occluded most of the noise but picked up on the grid and the pen strokes of the letters.  I was right.  It’s anywhere between 40 and 100 pixels as the sequence below shows, mouse over for a bit of info.

Now I just need to implement the idea I have, which as ever is cribbed.   We detect the grid and the boundaries of each cell.  Then we detect the edges of the contents of that cell and quarter it.  We should therefore have in 1/4′s our number.  By comparing the shape of each quarter to a database (nearest match?) we can determine the number.  I’ll post some images as examples below later.  You’ll note that these posts are very light on the math. :P

A strange bug moving from 64bit to 32bit.

So I installed an SSH client on my iPhone today, TouchTerm (which so far I think is excellent, especially how it handles ctrl keys etc.), and the first thing I do? SSH to my server and compile the previously mentioned code. I load it and after changing the font to a fixed width one it all looks good. The puzzle loads, I hit ‘g’ to solve it and …. oh… it got part of the way through the recursive solution and stopped saying ‘Solution Found’. Except it hadn’t. I coded a refresh routine with vi on the phone to make sure it wasn’t a terminal or iphone issue, nope. It really had got part way in and decided the problem was fixed. How odd. The code was written in 64bit Intrepid running in VirtualBox on 64bit Win 7. The server was (it’s upgrading now) running 32bit Jaunty and is only a 32bit system anyhow.

Anyone? Strange strange bug. If the upgrade fixes it I will be surprised as the only external library of note is ncurses and it really isn’t an ncurses problem. My data structures are fairly boring:


struct _cell {
bool readonly;
int value;
};

struct _sudoku {
bool complete;
int iteration;
struct _cell cells[SUDOKU_VALUES][SUDOKU_VALUES];
};

And of that I only actually use the two dimensional array and the value. The readonly, complete and iteration fields are there in case I do implement weirder algorithms as they will be necessary then for distinguishing user input from generated across generations and keeping track of generations, so ancient incomplete ones can be culled as evolutionary dead ends. :) Of course the second one is marked complete the program would end and display it. Anyhow I find the above problem odd, when the upgrades are complete I’ll check and if it persists drop it into gdb for a bit of an autopsy.

It lives!

Finally. Stripped out loads of code having realised that keeping a list of possible answers to each square is all well and good but the computational time spent maintaining it outweighs the benefits of just err.. well figuring it out each time you need to know. So

For some reason the above thumbs are truncated on the left. The clickable images are not.


diziet@ono-sendai:~/Programming/C/Sudoku$ time ./sudoku

real 0m4.071s
user 0m0.390s
sys 0m0.110s

in an ubuntu virtualbox instance on a windows7 box that’s watching a movie. :P

Yay. What next? The backtrack algorithm I used is essentially a brute force. I may now implement my non brute force version to compare. Although I have a feeling I am painting a new room at the village hall tomorrow.

Now with added video?

Sudoku Solver

So ncurses operates on a y,x co-ordinate system.  All of it’s functions expect y first, for example:

getyx(y,x);

which gets the location of the cursor putting the values into y and x.  This sounds a bit counter intuitive to us, we are used to x then y. We think of going across the screen and then down.  So throughout my code I’ve thought of ncurses as counting down the screen then across and my code as counting across the screen and then down.  Which is right and the way I want it. Except when you declare an array as:

int fish[12][9];

You have 12 rows and 9 columns. It is also Y and then X. So a loop such as:


for(int x=0;x {
for(int y=0;y {
array[x][y]=0;
}
}

Does go down and across. So although conceptually you are going across then down if you print the array out with the same logic via say printf("%d ",array[x][y]) you'll find you print down and across. I think.... :P

So I have a sudoku puzzle where the first row at the start is effectively:

7 1

Yet my save routine saved out:


diziet@ono-sendai:~/Programming/C/Sudoku$ cat dm3.sud
0 0 0 0 7 1 0 8 3
0 0 0 0 0 0 2 5 0
0 1 0 0 0 3 4 0 0
7 0 3 5 0 0 0 0 0
0 0 0 0 0 8 9 1 0
0 0 0 9 0 0 0 0 0
0 0 0 0 6 0 1 0 8
1 6 0 0 0 4 0 0 0
0 4 0 0 2 0 3 0 0

So all of my loops have held the data going down and then across the array.   This is not a problem, it changes nothing really except it makes the save files look funny.   So I changed it all thus producing a patch from git with:

git format-patch origin/master

all because ncurses functions put y before x thus making me think it was odd. A snippet from the patch:


@@ -547,12 +547,12 @@ int display_sudoku(WINDOW *wi_sudoku, struct _sudoku *st_sudoku)

getyx(wi_sudoku,old_dy,old_dx);

- for(int x=0; x
+ for(int y=0; y
{
- for(int y=0; y
+ for(int x=0; xcells[x][y].value;
+ cell_value = st_sudoku->cells[y][x].value;

Ho hum.  At least now I can type in a potential sudoku once and keep loading it to test various ideas.  Of course we all know the backtracking algorithm will be the best but it's not the point of this entire exercise anyway.

Powered by WordPress | Theme: KLG based on Motion by 85ideas.