Asp Forum - Returning rows with duplicate fields - microsoft.public.sqlserver.programming

Rick

3/24/2007 6:13:00 PM

I have a database table like this:

Unique Pdf Destination
1 one.pdf a
2 one.pdf b
3 one.pdf b
4 two.pdf x
5 two.pdf y
6 two.pdf z

I want to return all of the rows where a Pdf field has duplicate
Destination values. If I use this:

SELECT tblNewlinks.Pdf, tblNewlinks.Destination, Count(*) AS NumRows
FROM tblNewlinks
GROUP BY tblNewlinks.Pdf, tblNewlinks.Destination
HAVING (((Count(*))>1))
ORDER BY tblNewlinks.Pdf, tblNewlinks.Destination;

I get

one.pdf b 2

but this is not exactly what I want. What I want is:

2 one.pdf b
3 one.pdf b

so I can examine the duplicate values. Any help would be appreciated.

Rick Quatro

8 Answers

Tom Cooper

3/24/2007 6:46:00 PM

One way would be

Select n.Unique, n.Pdf, n.Destination
From tblNewlinks As n
Inner Join (Select n1.Pdf, n1.Destination
From tblNewlinks As n1
Group By n1.Pdf, n1.Destination
Having Count(*) > 1) As x On n.Pdf = x.Pdf And n.Destination =
x.Destination
Order By n.Unique;

Tom

"Rick" <rickquatro@gmail.com> wrote in message
news:1174759966.329510.216630@l75g2000hse.googlegroups.com...
>I have a database table like this:
>
> Unique Pdf Destination
> 1 one.pdf a
> 2 one.pdf b
> 3 one.pdf b
> 4 two.pdf x
> 5 two.pdf y
> 6 two.pdf z
>
> I want to return all of the rows where a Pdf field has duplicate
> Destination values. If I use this:
>
> SELECT tblNewlinks.Pdf, tblNewlinks.Destination, Count(*) AS NumRows
> FROM tblNewlinks
> GROUP BY tblNewlinks.Pdf, tblNewlinks.Destination
> HAVING (((Count(*))>1))
> ORDER BY tblNewlinks.Pdf, tblNewlinks.Destination;
>
> I get
>
> one.pdf b 2
>
> but this is not exactly what I want. What I want is:
>
> 2 one.pdf b
> 3 one.pdf b
>
> so I can examine the duplicate values. Any help would be appreciated.
>
> Rick Quatro
>

--CELKO--

3/24/2007 7:31:00 PM

Have you considered a UNIQUE (pdf, destination) constraint to prevent
the problem instead?

Uri Dimant

3/25/2007 9:26:00 AM

Rick
http://dimantdatabasesolutions.blogspot.com/2007/02/dealing-with-dupli...

"Rick" <rickquatro@gmail.com> wrote in message
news:1174759966.329510.216630@l75g2000hse.googlegroups.com...
>I have a database table like this:
>
> Unique Pdf Destination
> 1 one.pdf a
> 2 one.pdf b
> 3 one.pdf b
> 4 two.pdf x
> 5 two.pdf y
> 6 two.pdf z
>
> I want to return all of the rows where a Pdf field has duplicate
> Destination values. If I use this:
>
> SELECT tblNewlinks.Pdf, tblNewlinks.Destination, Count(*) AS NumRows
> FROM tblNewlinks
> GROUP BY tblNewlinks.Pdf, tblNewlinks.Destination
> HAVING (((Count(*))>1))
> ORDER BY tblNewlinks.Pdf, tblNewlinks.Destination;
>
> I get
>
> one.pdf b 2
>
> but this is not exactly what I want. What I want is:
>
> 2 one.pdf b
> 3 one.pdf b
>
> so I can examine the duplicate values. Any help would be appreciated.
>
> Rick Quatro
>

Steve Dassin

3/26/2007 2:18:00 AM

Hi Uri,

I thought I'd bounce this off of you :)

When someone presents a table with numeric columns C(N) like:

ID C1 C2 C3 C4 C5 C6....C10

and asks how to get the maximun value for each ID, most times the person is
told the problem is most easily solved if the data was normalized. Why?
Simply because then the data fits more cogently into the framework of the
sql language. But also because the solution is more conceptually appealing
logically. The data and the solution make the most sense when normalized.
Using a case statement, the solution looks like jibberish.
For a lot of other problems the solution must look like the maximum case
statement to a lot of developers. It isn't common practice, in general, to
tell developers that if you follow general relational ideas, even if just in
a query, the solution will appear more logical and understandable. Just as
the GROUP BY nicely solves the maximum value problem when the data is
normalized, the language should support other constructs to do the same kind
of thing in other kinds of problems.

Take Ricks try at a query or Toms solution, which seems to be the most
common solution for duplicates. I call this Celko sql. While his idea of
everything happening 'all at once' in sql is nonsense, it is a very good
description of sql queries. The logic appears to be happening all at once,
it's very hard for developers to see a beginning and an end. As they would
see in a 'cursor' solution.

But I think S2005s APPLY construct is something that will appeal to
application developers, those who are not expert sql programmers. With a
little thought and some basic understanding of relational ideas, APPLY can
conceptually simplify a lot of sql 'all at once' queries. It allows a clear
progression of steps, like a cursor, to a lot of queries.

Here is a solution to the problem with APPLY starting with the basic idea of
a unique table of Destinations (just in the query).

create table ##tblNewlinks ([Unique] int primary key,
Pdf varchar(20),Destination char(1))
insert ##tblNewlinks values(1,'one.pdf','a')
insert ##tblNewlinks values(2,'one.pdf','b')
insert ##tblNewlinks values(3,'one.pdf','b')
insert ##tblNewlinks values(4,'two.pdf','x')
insert ##tblNewlinks values(5,'two.pdf','y')
insert ##tblNewlinks values(6,'two.pdf','z')
-- Some additional data
insert ##tblNewlinks values(7,'two.pdf','y')
insert ##tblNewlinks values(8,'two.pdf','y')

-- Step 1
--Unique Destinations from a (e.[Unique] and e.Pdf are added from step 3).
select distinct a.Destination,e.[Unique],e.Pdf
from ##tblNewlinks as a
-- Step 2
-- Use cross apply to limit rows based on Destination>1
-- Cross apply here is only a restriction, we don't add any data.
cross apply
(select b.[Unique]
from ##tblNewlinks as b
where a.Destination=b.Destination and b.[Unique]<a.[Unique]) as c ([Dummy])
-- Step 3
-- Add columns Unique and Pdf (from e) for each distinct Destination.
cross apply
(select d.[Unique],d.Pdf
from ##tblNewlinks as d
where d.Destination=a.Destination) as e ([Unique],Pdf)

Destination Unique Pdf
----------- ----------- --------------------
b 2 one.pdf
b 3 one.pdf
y 5 two.pdf
y 7 two.pdf
y 8 two.pdf

APPLY is a excellent idea by MS. Too bad it's write up in bol is so poor.
And unfortuneately, developers will probably become confused with the idea
of APPLY and a subquery. A subquery is just a special case of APPLY. The
situation is not unlike GROUP BY with 'HAVING' and the 'WHERE' statement.
The redundancy is silly, but that's sql -:). Still kudos to MS for APPLY.
(And yes there will be performance issues with it but what doesn't have
them? :)

best,
steve

I'm an advocate of the relational system Dataphor @
www.alphora.com

www.beyondsql.blogspot.com

Steve Dassin

3/26/2007 4:31:00 AM

"Uri Dimant" <urid@iscar.co.il> wrote in message
news:OjNxk02bHHA.588@TK2MSFTNGP06.phx.gbl...
> Hi Steve
> I like your posts :-))

Thanks and ditto!

> I absolutely agree with you that S2005s APPLY finaly resolves so many
> problems that we could not till SQL Server 2005. One of them is a scalar
> UDF that by using CROSS APPLY we can supply columns as a parameters like
>
> SELECT *
> FROM T1 CROSS APPLY fn_scalar_min_max(col1, col2) AS M

Great minds think alike -:)

> > (And yes there will be performance issues with it but what doesn't have
> > them? :)
>
> :-))

-:)

best,
steve

Uri Dimant

3/26/2007 6:07:00 AM

Hi Steve
I like your posts :-))
I absolutely agree with you that S2005s APPLY finaly resolves so many
problems that we could not till SQL Server 2005. One of them is a scalar
UDF that by using CROSS APPLY we can supply columns as a parameters like

SELECT *
FROM T1 CROSS APPLY fn_scalar_min_max(col1, col2) AS M

> (And yes there will be performance issues with it but what doesn't have
> them? :)

:-))

"Steve Dassin" <rac4sqlnospam@net> wrote in message
news:%23DYqrz1bHHA.2188@TK2MSFTNGP04.phx.gbl...
> Hi Uri,
>
> I thought I'd bounce this off of you :)
>
> When someone presents a table with numeric columns C(N) like:
>
> ID C1 C2 C3 C4 C5 C6....C10
>
> and asks how to get the maximun value for each ID, most times the person
> is
> told the problem is most easily solved if the data was normalized. Why?
> Simply because then the data fits more cogently into the framework of the
> sql language. But also because the solution is more conceptually appealing
> logically. The data and the solution make the most sense when normalized.
> Using a case statement, the solution looks like jibberish.
> For a lot of other problems the solution must look like the maximum case
> statement to a lot of developers. It isn't common practice, in general, to
> tell developers that if you follow general relational ideas, even if just
> in
> a query, the solution will appear more logical and understandable. Just as
> the GROUP BY nicely solves the maximum value problem when the data is
> normalized, the language should support other constructs to do the same
> kind
> of thing in other kinds of problems.
>
> Take Ricks try at a query or Toms solution, which seems to be the most
> common solution for duplicates. I call this Celko sql. While his idea of
> everything happening 'all at once' in sql is nonsense, it is a very good
> description of sql queries. The logic appears to be happening all at once,
> it's very hard for developers to see a beginning and an end. As they would
> see in a 'cursor' solution.
>
> But I think S2005s APPLY construct is something that will appeal to
> application developers, those who are not expert sql programmers. With a
> little thought and some basic understanding of relational ideas, APPLY can
> conceptually simplify a lot of sql 'all at once' queries. It allows a
> clear
> progression of steps, like a cursor, to a lot of queries.
>
> Here is a solution to the problem with APPLY starting with the basic idea
> of
> a unique table of Destinations (just in the query).
>
> create table ##tblNewlinks ([Unique] int primary key,
> Pdf varchar(20),Destination char(1))
> insert ##tblNewlinks values(1,'one.pdf','a')
> insert ##tblNewlinks values(2,'one.pdf','b')
> insert ##tblNewlinks values(3,'one.pdf','b')
> insert ##tblNewlinks values(4,'two.pdf','x')
> insert ##tblNewlinks values(5,'two.pdf','y')
> insert ##tblNewlinks values(6,'two.pdf','z')
> -- Some additional data
> insert ##tblNewlinks values(7,'two.pdf','y')
> insert ##tblNewlinks values(8,'two.pdf','y')
>
> -- Step 1
> --Unique Destinations from a (e.[Unique] and e.Pdf are added from step 3).
> select distinct a.Destination,e.[Unique],e.Pdf
> from ##tblNewlinks as a
> -- Step 2
> -- Use cross apply to limit rows based on Destination>1
> -- Cross apply here is only a restriction, we don't add any data.
> cross apply
> (select b.[Unique]
> from ##tblNewlinks as b
> where a.Destination=b.Destination and b.[Unique]<a.[Unique]) as c
> ([Dummy])
> -- Step 3
> -- Add columns Unique and Pdf (from e) for each distinct Destination.
> cross apply
> (select d.[Unique],d.Pdf
> from ##tblNewlinks as d
> where d.Destination=a.Destination) as e ([Unique],Pdf)
>
> Destination Unique Pdf
> ----------- ----------- --------------------
> b 2 one.pdf
> b 3 one.pdf
> y 5 two.pdf
> y 7 two.pdf
> y 8 two.pdf
>
> APPLY is a excellent idea by MS. Too bad it's write up in bol is so poor.
> And unfortuneately, developers will probably become confused with the idea
> of APPLY and a subquery. A subquery is just a special case of APPLY. The
> situation is not unlike GROUP BY with 'HAVING' and the 'WHERE' statement.
> The redundancy is silly, but that's sql -:). Still kudos to MS for APPLY.
> (And yes there will be performance issues with it but what doesn't have
> them? :)
>
> best,
> steve
>
> I'm an advocate of the relational system Dataphor @
> www.alphora.com
>
> www.beyondsql.blogspot.com
>
>
>
>

Anith Sen

3/26/2007 2:17:00 PM

>> When someone presents a table with numeric columns C(N) like:
>>
>> ID C1 C2 C3 C4 C5 C6....C10
>>
>> and asks how to get the maximun value for each ID, most times the person
>> is told the problem is most easily solved if the data was normalized.
>> Why? Simply because then the data fits more cogently into the framework
>> of the sql language. But also because the solution is more conceptually
>> appealing logically. The data and the solution make the most sense when
>> normalized.

There are many among us who consider any design problem as a byproduct of
the data not being normalized. What we often fail to consider is the normal
form that is being violated in each case. Case in point, the quoted example
above.

--
Anith

Steve Dassin

3/26/2007 8:23:00 PM

As someone who is not part of the 'us' here, you sir are a mystery.
It is not possible for you to write this post on one hand and not
recognize the hypocrisy I write about on the other. For what it's
worth, when I am critical it is for a reason, a reason that is never
personal.

"Anith Sen" <anith@bizdatasolutions.com> wrote in message
news:OO8k6F7bHHA.2408@TK2MSFTNGP02.phx.gbl...
>>> When someone presents a table with numeric columns C(N) like:
>>>
>>> ID C1 C2 C3 C4 C5 C6....C10
>>>
>>> and asks how to get the maximun value for each ID, most times the person
>>> is told the problem is most easily solved if the data was normalized.
>>> Why? Simply because then the data fits more cogently into the framework
>>> of the sql language. But also because the solution is more conceptually
>>> appealing logically. The data and the solution make the most sense when
>>> normalized.
>
> There are many among us who consider any design problem as a byproduct of
> the data not being normalized. What we often fail to consider is the
> normal form that is being violated in each case. Case in point, the quoted
> example above.
>
> --
> Anith
>

microsoft.public.sqlserver.programming

Returning rows with duplicate fields

Rick

Tom Cooper

--CELKO--

Uri Dimant

Steve Dassin

Steve Dassin

Uri Dimant

Anith Sen

Steve Dassin

x Login to ForumsZone