[lnkForumImage]
TotalShareware - Download Free Software

Confronta i prezzi di migliaia di prodotti.
Asp Forum
 Home | Login | Register | Search 


 

Forums >

comp.lang.ruby

want to compare data in two files.

viupljindal

8/21/2008 8:02:00 PM

I have two CSV files and i want to find the duplicates records.

For ex.

Sheet1 Sheet2
Vipul Anthony
John Wayne
Mac Bill
smith randy
Nick thalia
Trishi
ricky
sachin
Nick


So i want to check the appearance of each record of Sheet1 in the
sheet2, like i want to check if vipul exist in sheet2 or not.

Is there any automated tool for it or do i have to do it manually.


Thanks in advance.

Vipul jindal
4 Answers

Robert Klemme

8/21/2008 8:30:00 PM

0

On 21.08.2008 22:02, viupljindal@gmail.com wrote:
> I have two CSV files and i want to find the duplicates records.
>
> For ex.
>
> Sheet1 Sheet2
> Vipul Anthony
> John Wayne
> Mac Bill
> smith randy
> Nick thalia
> Trishi
> ricky
> sachin
> Nick
>
>
> So i want to check the appearance of each record of Sheet1 in the
> sheet2, like i want to check if vipul exist in sheet2 or not.
>
> Is there any automated tool for it or do i have to do it manually.

Not sure what you want (or rather, why you are asking here if you just
want a quick solution) but there is of course

>> diff -u <(sort Sheet1) <(sort Sheet2)

Note, you need a decent shell and OS for this (bash on Linux will do).
Otherwise you need temporary files.

Cheers

robert

Martin DeMello

8/22/2008 12:48:00 AM

0

On Thu, Aug 21, 2008 at 1:01 PM, <viupljindal@gmail.com> wrote:
> I have two CSV files and i want to find the duplicates records.
>
> For ex.
>
> Sheet1 Sheet2
> Vipul Anthony
> John Wayne
> Mac Bill
> smith randy
> Nick thalia
> Trishi
> ricky
> sachin
> Nick

if it's just one word per line and the files aren't huge

a = IO.readlines('sheet1')
b = IO.readlines('sheet2')
puts a & b

martin

Mayuresh Kathe

8/22/2008 4:23:00 AM

0

I'm not a master of the language yet, but based on what I've learned,
can't you create two arrays out of those files, let's say sheet1,
sheet2.
Then simply do a "sheet2 - sheet1"

~Mayuresh

On Fri, Aug 22, 2008 at 1:31 AM, <viupljindal@gmail.com> wrote:
> I have two CSV files and i want to find the duplicates records.
>
> For ex.
>
> Sheet1 Sheet2
> Vipul Anthony
> John Wayne
> Mac Bill
> smith randy
> Nick thalia
> Trishi
> ricky
> sachin
> Nick
>
>
> So i want to check the appearance of each record of Sheet1 in the
> sheet2, like i want to check if vipul exist in sheet2 or not.
>
> Is there any automated tool for it or do i have to do it manually.
>
>
> Thanks in advance.
>
> Vipul jindal
>
>

timr

8/22/2008 7:17:00 AM

0

To identify duplicates the 'array1 & array2' solution given above is
perfect. See examples below:

# setting up two arrays with your names

irb(main):004:0> sheet1 = %w[vipul john mac smith nick]
=> ["vipul", "john", "mac", "smith", "nick"]
irb(main):005:0> sheet2 = %w[anthony wayne bill randy thalia trishi
ricky sachin nick]
=> ["anthony", "wayne", "bill", "randy", "thalia", "trishi", "ricky",
"sachin", "nick"]

# finding common elements, note the order is inconsequential
irb(main):006:0> sheet2 & sheet1
=> ["nick"]
irb(main):007:0> sheet1 & sheet2
=> ["nick"]

# Determining what items are unique to the first array. (What items
are in the first list that are not in the second?) Note order matters
here.
irb(main):008:0> sheet2 - sheet1
=> ["anthony", "wayne", "bill", "randy", "thalia", "trishi", "ricky",
"sachin"]
irb(main):009:0> sheet1-sheet2
=> ["vipul", "john", "mac", "smith"]
irb(main):010:0>

-Tim