Tag Archive: Optical Character Recognition


A sample of a ‘scanned’ puzzle.

So this is a section of a sudoku puzzle as the code sees it:

A small section of a Sudoku converting 1's and 0's.

It’s a screen cap of some debug output put into GEdit with a small font. It’s just 1′s and 0′s. A few interesting things here apart from the noise around the lower left of 8 is that it’s clearly obvious which numbers were the ones printed in the paper, the 2 and 3, along with the fact that the perfect horizontal lines are visibly distorted compared to the verticals. This is entirely down to the angle that I suspect most people will photograph things from unless they try for an exactly overhead shot. Now to test my detection routines.

Well that was sodding hard work….

OK so I’ve praised MagicWand now I feel I can curse it. I want a 1bit BMP out of the thing. A bitmap is easy, it’s smart. I write something ending in jpg it writes a jpg, I want a bmp it writes one. However it was writing 24bit images. If I want to faff with it on a binary level myself I need as little data as humanly possible. So I call all sorts of functions to set the bit depth and such like:

MagickSetImageDepth(contrast_wand,8);
or
MagickSetType(contrast_wand,GrayscaleType);
or
MagickMANYOTHERS

None work. I get the same 24bit image. So I finally thought fine… I’ll write my own routing to strip everything out and call it as a function:


static MagickBooleanType FadeToBlack(PixelView *black_view, void *context)
{
MagickPixelPacket pixel;
PixelWand **pixels;
register long x;

pixels=GetPixelViewPixels(black_view);
for (x=0; x < (long) GetPixelViewWidth(black_view); x++)
{
/* Do magic */
PixelGetMagickColor(pixels[x],&pixel);
pixel.red=0;
pixel.green=0;
pixel.blue=0;
PixelSetMagickColor(pixels[x],&pixel);
}
}
--SNIP--
black_view=NewPixelView(contrast_wand);
if(black_view == (PixelView *)NULL)
ThrowWandException(contrast_wand);
status=UpdatePixelViewIterator(black_view,FadeToBlack,(void*) NULL);
if(status == MagickFalse)
ThrowWandException(contrast_wand);
black_view=DestroyPixelView(black_view);

Well you can guess how that ended up… with a black image (I thought wrongly being CMYK there might be a pixel.black. The above routine would have worked with a bit of analysis in it. Still with a 24bit image though. So I thought… well WWPD? (Photoshop) Quantize! So it turns out it’s one line:

/* Quantize image to 2? Please! */
MagickQuantizeImage(contrast_wand,2,GRAYColorspace,0,MagickFalse,MagickFalse);

And I get my precious:

diziet@ono-sendai:~/Programming/C/OCRSudoku$ file sudoku2bt50.bmp
sudoku2bt50.bmp: PC bitmap, Windows 3.x format, 800 x 739 x 1

I am hoping that on a binary level this is really simple to parse. I mean… It doesn’t have much data surely? 800×739 bits plus header? Nope sadly not. Ah well it’s only 73kB.

EDIT

Oh fucksocks, wish it’d dawned on me sooner. Isn’t the X11 xbm format err… ascii? Old school icons etc… So…..:


diziet@ono-sendai:~/Programming/C/OCRSudoku$ ./thresh-sigmoidal sudoku.jpg sudoku2bt50.xbm 50

diziet@ono-sendai:~/Programming/C/OCRSudoku$ file sudoku2bt50.xbm
sudoku2bt50.xbm: ASCII C program text

Did that say C? C? Hell I can do C!


#define sudoku2bt50_width 800
#define sudoku2bt50_height 739
static char sudoku2bt50_bits[] = {
0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
0x00, 0x00, 0x00, 0x00, 0x30, 0xC3, 0xFC, 0xFF, 0x0F, 0x1C, 0x04, 0x00,
0x00, 0x00, 0xC0, 0xFB, 0x7F, 0x1F, 0x00, 0xC0, 0xE1, 0xFF, 0xFE, 0x00,
etc.

Oh baby! Right… time for more Modnations Racer or Final Fantasy ∞.

EDIT

OK I can load and display OK:


diziet@ono-sendai:~/Programming/C/OCRSudoku$ ./testxbm test_bits.xbm
1111111111111111
1000000000000001
1011111111111101
1010000000000101
1010111111110101
1010100000010101
1010101111010101
1010101001010101
1010101001010101
1010101111010101
1010100000010101
1010111111110101
1010000000000101
1011111111111101
1000000000000001
1111111111111111

Obv. it’s a bit messy to display the 800pixel sudoku:


diziet@ono-sendai:~/Programming/C/OCRSudoku$ ./testxbm sudoku2bt50.xbm |more
0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
0000000110011000011001111111111111111110000001110000010000000000000000000000000000000000011110111111111111011111000000000
00000000111000011111111111011111110000000011111111111100001111000011110000

etc. etc. etc. etc. etc. etc. etc. etc. etc. etc. etc. etc. etc. etc. for another 738 lines… It works! Now for the hard part.

MagickWand and OCR

The problem with photographs and OCR is they contain lots of noise and colour that is really not helpful.  Levels and Posterise alongside other tools in the GIMP or Photoshop help but they’re hardly automatic and can’t be compiled into code so I looked at ImageMagick.  It turns out it has a new simpler API called MagickWand.  One of the examples on the site is a Sigmoidal contrast enhancer, whatever that really does.  It helps,  I decided to take that code and call an adaptive threshold function in MagicWand that takes a number of pixels as a square to apply it’s algorithm to.  Thinking there would be a sweet spot where it occluded most of the noise but picked up on the grid and the pen strokes of the letters.  I was right.  It’s anywhere between 40 and 100 pixels as the sequence below shows, mouse over for a bit of info.

Now I just need to implement the idea I have, which as ever is cribbed.   We detect the grid and the boundaries of each cell.  Then we detect the edges of the contents of that cell and quarter it.  We should therefore have in 1/4′s our number.  By comparing the shape of each quarter to a database (nearest match?) we can determine the number.  I’ll post some images as examples below later.  You’ll note that these posts are very light on the math. :P

Powered by WordPress | Theme: KLG based on Motion by 85ideas.