[lnkForumImage]
TotalShareware - Download Free Software

Confronta i prezzi di migliaia di prodotti.
Asp Forum
 Home | Login | Register | Search 


 

Forums >

comp.lang.ruby

finding blocks in black-and-white images (efficiently

Axel Etzold

9/2/2008 11:52:00 AM

Dear all,

I have a number of black-and-white scanned pages. To prepare them for OCR,
I have to split them in columns and rows. Additionally, somewhere in between, there
are pictures, which also need to be separated.

So, in a page that might look like this:

Text1 Text4 Text6

Text2 Pict1 Text7

Text3 Text5 Pict2

I'd like to find the largest blocks of white which separate the texts and pictures, both horizontally
and vertically.

Right now, I would use RMagick with export_pixels_to_str and then regular expressions to find the
zeros, but I am not sure whether there's a more effective way for this purpose....

Do you have any suggestions ?

Thank you very much,

Best regards,

Axel


--
GMX Kostenlose Spiele: Einfach online spielen und Spaß haben mit Pastry Passion!
http://games.entertainment.gmx.net/de/entertainment/games/free/puzz...

6 Answers

Tim Hunter

9/2/2008 3:00:00 PM

0

Axel Etzold wrote:
> Dear all,
>
> I have a number of black-and-white scanned pages. To prepare them for OCR,
> I have to split them in columns and rows. Additionally, somewhere in between, there
> are pictures, which also need to be separated.
>
> So, in a page that might look like this:
>
> Text1 Text4 Text6
>
> Text2 Pict1 Text7
>
> Text3 Text5 Pict2
>
> I'd like to find the largest blocks of white which separate the texts and pictures, both horizontally
> and vertically.
>
> Right now, I would use RMagick with export_pixels_to_str and then regular expressions to find the
> zeros, but I am not sure whether there's a more effective way for this purpose....
>
> Do you have any suggestions ?
>
> Thank you very much,
>
> Best regards,
>
> Axel
>
>

I took the liberty of posting your question to the ImageMagick forum
[http://www.imagemagick.org/discourse-server/viewtopic.php?f=1&a...].
There's some pretty good IM users on that forum and usually it's not
hard to convert IM commands and options to RMagick code. If they have
any suggestions I'll let you know.

--
RMagick: http://rmagick.ruby...

ara.t.howard

9/2/2008 3:21:00 PM

0


On Sep 2, 2008, at 5:52 AM, Axel Etzold wrote:

> Dear all,
>
> I have a number of black-and-white scanned pages. To prepare them =20
> for OCR,
> I have to split them in columns and rows. Additionally, somewhere in =20=

> between, there
> are pictures, which also need to be separated.
>
> So, in a page that might look like this:
>
> Text1 Text4 Text6
>
> Text2 Pict1 Text7
>
> Text3 Text5 Pict2
>
> I'd like to find the largest blocks of white which separate the =20
> texts and pictures, both horizontally
> and vertically.
>
> Right now, I would use RMagick with export_pixels_to_str and then =20
> regular expressions to find the
> zeros, but I am not sure whether there's a more effective way for =20
> this purpose....
>
> Do you have any suggestions ?
>
> Thank you very much,
>
> Best regards,
>
> Axel
>
>
> --=20
> GMX Kostenlose Spiele: Einfach online spielen und Spa=DF haben mit =20
> Pastry Passion!
> =
http://games.entertainment.gmx.net/de/entertainment/games/free/pu...
196

you are attempting to roll your own image segmentation. google for =20
'computer vision'. some helpful links

http://kogs-www.informatik.uni-hamburg.de/~koe...

http://ww...

http://camellia.source...

it can be quite a different domain than normal image processing


a @ http://codeforp...
--
we can deny everything, except that we have the possibility of being =20
better. simply reflect on that.
h.h. the 14th dalai lama




Tim Hunter

9/4/2008 6:25:00 PM

0

Axel Etzold wrote:
> Dear all,
>
> I have a number of black-and-white scanned pages. To prepare them for OCR,
> I have to split them in columns and rows. Additionally, somewhere in between, there
> are pictures, which also need to be separated.
>
> So, in a page that might look like this:
>
> Text1 Text4 Text6
>
> Text2 Pict1 Text7
>
> Text3 Text5 Pict2
>
> I'd like to find the largest blocks of white which separate the texts and pictures, both horizontally
> and vertically.

Anthony, one of the IM team, has a suggestion you can read here:
http://www.imagemagick.org/discourse-server/viewtopic.php?f=1&t=11980&p=39....
If this is something you want to pursue let me know and we can work on
converting his shell script to Ruby.

--
RMagick: http://rmagick.ruby...

Axel Etzold

9/4/2008 8:18:00 PM

0


-------- Original-Nachricht --------
> Datum: Fri, 5 Sep 2008 03:25:04 +0900
> Von: Tim Hunter <TimHunter@nc.rr.com>
> An: ruby-talk@ruby-lang.org
> Betreff: Re: finding blocks in black-and-white images (efficiently)

> Axel Etzold wrote:
> > Dear all,
> >
> > I have a number of black-and-white scanned pages. To prepare them for
> OCR,
> > I have to split them in columns and rows. Additionally, somewhere in
> between, there
> > are pictures, which also need to be separated.
> >
> > So, in a page that might look like this:
> >
> > Text1 Text4 Text6
> >
> > Text2 Pict1 Text7
> >
> > Text3 Text5 Pict2
> >
> > I'd like to find the largest blocks of white which separate the texts
> and pictures, both horizontally
> > and vertically.
>
> Anthony, one of the IM team, has a suggestion you can read here:
> http://www.imagemagick.org/discourse-server/viewtopic.php?f=1&t=11980&p=39....
> If this is something you want to pursue let me know and we can work on
> converting his shell script to Ruby.
>
> --
> RMagick: http://rmagick.ruby...


Dear Tim,

thank you very much for your help. This script does indeed look very interesting -- and very heroic !
It would be very nice to have it in RMagick,as far as I am concerned. I fear that my shell scripting
capabilities/knowledge of RMagick will not suffice to get it done in a very short time, so I'd some help to convert it into
Ruby. Also, more generally, how do you wrap ImageMagick functions in RMagick ? Do you call C functions ?
At the install, I was lazy and took the gem option ;)

Thanks again and looking forward to your answer!

Best regards,

Axel




--
Psssst! Schon das coole Video vom GMX MultiMessenger gesehen?
Der Eine für Alle: http://www.gmx.net/de/go/m...

Tim Hunter

9/4/2008 9:24:00 PM

0

Axel Etzold wrote:
> thank you very much for your help. This script does indeed look very interesting -- and very heroic !
> It would be very nice to have it in RMagick,as far as I am concerned. I fear that my shell scripting
> capabilities/knowledge of RMagick will not suffice to get it done in a very short time, so I'd some help to convert it into
> Ruby. Also, more generally, how do you wrap ImageMagick functions in RMagick ? Do you call C functions ?
> At the install, I was lazy and took the gem option ;)

Okay, I'll see what I can do. I'll follow up with you directly. I'm
going out of town tomorrow so it may be a couple of days.

ImageMagick is essentially a library with a C-level API. (Actually there
are two APIs, MagickCore and MagickWand, but that's neither here nor
there.) The ImageMagick utilities (convert, mogrify, etc.) are
stand-alone programs that call into the library via the API. RMagick
uses the library, too.

Of course since RMagick is Ruby you get much more use out of the
ImageMagick library - access to individual pixels, for example - than
you can via the utilities, and Ruby makes it easier to use the API than
a shell scripting language does.

This page http://studio.imagemagick.org/RMagick/doc/opt...
describes some of the RMagick API that corresponds to the ImageMagick
commands and options.

--
RMagick: http://rmagick.ruby...

Axel Etzold

9/4/2008 10:07:00 PM

0


-------- Original-Nachricht --------
> Datum: Fri, 5 Sep 2008 06:23:54 +0900
> Von: Tim Hunter <TimHunter@nc.rr.com>
> An: ruby-talk@ruby-lang.org
> Betreff: Re: finding blocks in black-and-white images (efficiently)

> Axel Etzold wrote:
> > thank you very much for your help. This script does indeed look very
> interesting -- and very heroic !
> > It would be very nice to have it in RMagick,as far as I am concerned. I
> fear that my shell scripting
> > capabilities/knowledge of RMagick will not suffice to get it done in a
> very short time, so I'd some help to convert it into
> > Ruby. Also, more generally, how do you wrap ImageMagick functions in
> RMagick ? Do you call C functions ?
> > At the install, I was lazy and took the gem option ;)
>
> Okay, I'll see what I can do. I'll follow up with you directly. I'm
> going out of town tomorrow so it may be a couple of days.
>
> ImageMagick is essentially a library with a C-level API. (Actually there
> are two APIs, MagickCore and MagickWand, but that's neither here nor
> there.) The ImageMagick utilities (convert, mogrify, etc.) are
> stand-alone programs that call into the library via the API. RMagick
> uses the library, too.
>
> Of course since RMagick is Ruby you get much more use out of the
> ImageMagick library - access to individual pixels, for example - than
> you can via the utilities, and Ruby makes it easier to use the API than
> a shell scripting language does.
>
> This page http://studio.imagemagick.org/RMagick/doc/opt...
> describes some of the RMagick API that corresponds to the ImageMagick
> commands and options.
>
> --
> RMagick: http://rmagick.ruby...


Tim,

Thank you very much for the pointers !

Best regards,

Axel

--
Psssst! Schon das coole Video vom GMX MultiMessenger gesehen?
Der Eine für Alle: http://www.gmx.net/de/go/m...