[lnkForumImage]
TotalShareware - Download Free Software

Confronta i prezzi di migliaia di prodotti.
Asp Forum
 Home | Login | Register | Search 


 

Forums >

comp.lang.ruby

String 'close-to' comparison.

Kyle Hunter

4/3/2008 4:32:00 AM

Hello, I have an array. It contains approximately twenty elements which
are strings. I also have one string - this string was obtained using an
OCR system. One of the strings in the array should 'match' the string
gotten using the OCR system - unfortunately OCRs aren't perfect!

I want to take this string, and compare it to every string in the array,
and attempt to return the closest match.

I.E.,
array = ['Hello there, how are you?', 'What did you do over your
break?', 'I like my coffee brown.", "I just bought a new car."]
string = "What did you d0 over your brcak?"


And then have my comparison function return array[1]. As you can see,
string has some 'OCR errors' - it's usually 80-95% accurate, if not
dead-on.

--
Thanks, Kyle 'Phenax' Hunter
http://keletech...
--
Posted via http://www.ruby-....

5 Answers

Brendan Stennett

4/3/2008 5:26:00 AM

0

If you know all the possibilities that your OCR system *could* pick up
then you could always do something like this...

knownStrings = ['Hello','Goodbye']
out = []
OCR_strings = #new array of strings

OCR_strings.each do |ocr|
matches,len = 0,0
knownStrings.each do |known|
len = known.length
(len-1).times do |i|
if (i+1) >= ocr.length
break
else
if ocr[i] == known[i]
matches += 1
end
end
if matches / known.length > 0.85
out << known
else
out << "!#{known}"
end
end
end
end


...completely untested but i think you know what im getting at
--
Posted via http://www.ruby-....

Brendan Stennett

4/3/2008 5:28:00 AM

0

> if matches / known.length > 0.85
> out << known
> else
> out << "!#{known}"
> end

should be more like

if matches / known.length > 0.85
out << known
end


--
Posted via http://www.ruby-....

Heesob Park

4/3/2008 5:48:00 AM

0

Hi,

Kyle Hunter wrote:
> Hello, I have an array. It contains approximately twenty elements which
> are strings. I also have one string - this string was obtained using an
> OCR system. One of the strings in the array should 'match' the string
> gotten using the OCR system - unfortunately OCRs aren't perfect!
>
> I want to take this string, and compare it to every string in the array,
> and attempt to return the closest match.
>
> I.E.,
> array = ['Hello there, how are you?', 'What did you do over your
> break?', 'I like my coffee brown.", "I just bought a new car."]
> string = "What did you d0 over your brcak?"
>
>
> And then have my comparison function return array[1]. As you can see,
> string has some 'OCR errors' - it's usually 80-95% accurate, if not
> dead-on.
>
> --
> Thanks, Kyle 'Phenax' Hunter
> http://keletech...
Here is a simple score matching code:

array = ['Hello there, how are you?', 'What did you do over your
break?',
'I like my coffee brown.', 'I just bought a new car.']
string = "What did you d0 over your brcak?"

def comp(str1,str2)
a=str1.split('').uniq
b=str2.split('').uniq
(a+b).uniq.length*1.0/(a.length+b.length)
end

puts array.sort_by{|x|comp(string,x)}.first

Regards,
Park Heesob
--
Posted via http://www.ruby-....

Chris Shea

4/3/2008 6:13:00 AM

0

On Apr 2, 10:32 pm, Kyle Hunter <keletmas...@gmail.com> wrote:
> Hello, I have an array. It contains approximately twenty elements which
> are strings. I also have one string - this string was obtained using an
> OCR system. One of the strings in the array should 'match' the string
> gotten using the OCR system - unfortunately OCRs aren't perfect!
>
> I want to take this string, and compare it to every string in the array,
> and attempt to return the closest match.
>
> I.E.,
> array = ['Hello there, how are you?', 'What did you do over your
> break?', 'I like my coffee brown.", "I just bought a new car."]
> string = "What did you d0 over your brcak?"
>
> And then have my comparison function return array[1]. As you can see,
> string has some 'OCR errors' - it's usually 80-95% accurate, if not
> dead-on.
>
> --
> Thanks, Kyle 'Phenax' Hunterhttp://keletech...
> --
> Posted viahttp://www.ruby-....

It sounds like what you want is something like the Levenshtein
distance (http://en.wikipedia.org/wiki/Levenshtei...).

HTH,
Chris

ara.t.howard

4/3/2008 7:13:00 AM

0


On Apr 2, 2008, at 10:32 PM, Kyle Hunter wrote:
> Hello, I have an array. It contains approximately twenty elements
> which
> are strings. I also have one string - this string was obtained using
> an
> OCR system. One of the strings in the array should 'match' the string
> gotten using the OCR system - unfortunately OCRs aren't perfect!
>
> I want to take this string, and compare it to every string in the
> array,
> and attempt to return the closest match.
>
> I.E.,
> array = ['Hello there, how are you?', 'What did you do over your
> break?', 'I like my coffee brown.", "I just bought a new car."]
> string = "What did you d0 over your brcak?"
>
>
> And then have my comparison function return array[1]. As you can see,
> string has some 'OCR errors' - it's usually 80-95% accurate, if not
> dead-on.
>
> --
> Thanks, Kyle 'Phenax' Hunter
> http://keletech...
> --
> Posted via http://www.ruby-....

http://amatch.ruby...


a @ http://codeforp...
--
we can deny everything, except that we have the possibility of being
better. simply reflect on that.
h.h. the 14th dalai lama