Sunday, February 26, 2012

For the SQL Gurus out there, a question

I am not good at SQL - hence here goes

Consider this scenario of 2 tables X and Y with a many to many
relationship

Table X (name,weightage)
X1 2
X2 1
X3 5
X4 1

Table Y (name ,attrib)
Y1 attrib1
Y2 attrib2
Y3 attrib3
Y4 attrib4

Relationship table Z (id,id)
X1 Y1
X1 Y2
X2 Y1
X3 Y1
X3 Y2
X3 Y4
X4 Y4

So based on Z the distribution (count of Y/ Total Y ) is like
Y1 = 100*3/7 Y2=100*2/7 Y3=100*0/7 Y4=100*2/7
Now I want to factor in the weighatge column in X also

So
X1 Y1 2*1
X1 Y2 2*1
X2 Y1 1*1
X3 Y1 5*1
X3 Y2 5*1
X3 Y4 5*1
X4 Y4 1*1

Sum: 21

Given these tables ,How would I factor in the weights and get the
distribution in a single SQL query.

Desired Output of SQL Query

Id AttribDistrib
----------
Y1 attrib1 100*8/21
Y2 attrib2 100*7/21
Y3 attrib3 0
Y4 attrib4 100*6/21

Thx
JPOracle or SQLServer? This was tested on SQLServer 2000 but is ANSI-92 so I
think it should be OK on Oracle 9.

CREATE TABLE TableX (xcol CHAR(2) PRIMARY KEY, weightage INTEGER NOT NULL)

INSERT INTO TableX (xcol, weightage) VALUES ('X1',2)
INSERT INTO TableX (xcol, weightage) VALUES ('X2',1)
INSERT INTO TableX (xcol, weightage) VALUES ('X3',5)
INSERT INTO TableX (xcol, weightage) VALUES ('X4',1)

CREATE TABLE TableY (ycol CHAR(2) PRIMARY KEY, attrib VARCHAR(7) NOT NULL)

INSERT INTO TableY (ycol, attrib) VALUES ('Y1','attrib1')
INSERT INTO TableY (ycol, attrib) VALUES ('Y2','attrib2')
INSERT INTO TableY (ycol, attrib) VALUES ('Y3','attrib3')
INSERT INTO TableY (ycol, attrib) VALUES ('Y4','attrib4')

CREATE TABLE TableZ (xcol CHAR(2) REFERENCES TableX (xcol), ycol CHAR(2)
REFERENCES TableY (ycol), PRIMARY KEY (xcol,ycol))

INSERT INTO TableZ (xcol, ycol) VALUES ('X1','Y1')
INSERT INTO TableZ (xcol, ycol) VALUES ('X1','Y2')
INSERT INTO TableZ (xcol, ycol) VALUES ('X2','Y1')
INSERT INTO TableZ (xcol, ycol) VALUES ('X3','Y1')
INSERT INTO TableZ (xcol, ycol) VALUES ('X3','Y2')
INSERT INTO TableZ (xcol, ycol) VALUES ('X3','Y4')
INSERT INTO TableZ (xcol, ycol) VALUES ('X4','Y4')

SELECT Y.ycol, Y.attrib,
COALESCE(100*SUM(X.weightage)/
(SELECT CAST(SUM(X.weightage) AS REAL)
FROM TableX AS X
JOIN TableZ AS Z
ON X.xcol = Z.xcol),0)
FROM TableY AS Y
LEFT JOIN TableZ AS Z
ON Y.ycol = Z.ycol
LEFT JOIN TableX AS X
ON Z.xcol = X.xcol
GROUP BY Y.ycol, Y.attrib

--
David Portas
SQL Server MVP
--|||John Pifer wrote:

> I am not good at SQL - hence here goes
> Consider this scenario of 2 tables X and Y with a many to many
> relationship

Lets just stop right here!

The example is by definition an example of bad design. All many-to-many
relationships must be resolved when moving from a logical design to a
physical design.

Fix the design problem!

That is the only acceptable solution.

--
Daniel Morgan
http://www.outreach.washington.edu/...oad/oad_crs.asp
http://www.outreach.washington.edu/...aoa/aoa_crs.asp
damorgan@.x.washington.edu
(replace 'x' with a 'u' to reply)|||Daniel,

Re-read John's post or check out the DDL I posted. John has a joining table
for the many-to-many relationship.

--
David Portas
SQL Server MVP
--|||>> All many-to-many relationships must be resolved when moving from a
logical design to a physical design. <<

Can you explain what you mean by this? What does a many-to-many relationship
between two entities, which is a logical issue, have to do with physical
design?

--
- Anith
( Please reply to newsgroups only )|||Anith Sen wrote:
>>>All many-to-many relationships must be resolved when moving from a
> logical design to a physical design. <<
> Can you explain what you mean by this? What does a many-to-many relationship
> between two entities, which is a logical issue, have to do with physical
> design?

A typical many to many relationship would be ... a student can be in
many classes and a class has many students. This is true in the
logical model but should never be implemented in a relational database
using two tables: Rather using three.

STUDENT ... PK = person_id
CLASS .... PK = class_id

STUDENT_CLASS (intersecting entity) ... PK person_id and class_id.

--
Daniel Morgan
http://www.outreach.washington.edu/...oad/oad_crs.asp
http://www.outreach.washington.edu/...aoa/aoa_crs.asp
damorgan@.x.washington.edu
(replace 'x' with a 'u' to reply)|||Representation of data in a relational database is by definition, logical. I
am more interested in knowing why this schema, irrespective of the number of
tables, has anything to do with physical design.

--
- Anith
( Please reply to newsgroups only )|||Anith Sen wrote:
> Representation of data in a relational database is by definition, logical. I
> am more interested in knowing why this schema, irrespective of the number of
> tables, has anything to do with physical design.

We are talking here about Database 101. If you need a course in
relational theory and relational database architecture take it at a
local college or university. If you are not familiar with the work of
E.F. Cobb and Chris Date you have no business getting behind the keyboard.

--
Daniel Morgan
http://www.outreach.washington.edu/...oad/oad_crs.asp
http://www.outreach.washington.edu/...aoa/aoa_crs.asp
damorgan@.x.washington.edu
(replace 'x' with a 'u' to reply)|||>>If you are not familiar with the work of
>>E.F. Cobb and Chris Date you have no business getting behind the keyboard.

If you lack social skills...you have no business getting behind the keyboard.

BZ

Daniel Morgan <damorgan@.x.washington.edu> wrote in message news:<1076311321.975718@.yasure>...
> Anith Sen wrote:
> > Representation of data in a relational database is by definition, logical. I
> > am more interested in knowing why this schema, irrespective of the number of
> > tables, has anything to do with physical design.
> We are talking here about Database 101. If you need a course in
> relational theory and relational database architecture take it at a
> local college or university. If you are not familiar with the work of
> E.F. Cobb and Chris Date you have no business getting behind the keyboard.|||xAvailx wrote:

>>>If you are not familiar with the work of
>>>E.F. Cobb and Chris Date you have no business getting behind the keyboard.
>
> If you lack social skills...you have no business getting behind the keyboard.
> BZ

So because you are unqualified to do the job you take money for you
think it appropriate to disparage others that are qualified and point
out the minimal qualifiecation set: Fascinating.

I presume you are not a hypocrite and when you need heart surgery some
day you will not go to a board certified cardiologist but rather will go
to some guy that has a great personality.

It is your heart ... it is not your database. Do your employer a favor
and either learn what you are doing or do the ethical thing and quit.

--
Daniel Morgan
http://www.outreach.washington.edu/...oad/oad_crs.asp
http://www.outreach.washington.edu/...aoa/aoa_crs.asp
damorgan@.x.washington.edu
(replace 'x' with a 'u' to reply)|||>> We are talking here about Database 101. <<

Good. Since you are clear about the topic we are talking about, all I need
to provide some directives for you to either learn or recollect.

ISO/IEC document 2382 International Standard Database Vocabulary details the
three levels of discourses for fundamental data representation.
i) A conceptual level which captures an information model that can be
assimilated by humans.(business model)
ii) A logical level that represents, in an abstract form, some or all of a
conceptual model suitable for machine manipulation without regard to actual
media or physical devices (logical model)
iii) A physical level that maps the logical level to actual storage
structures and locations. This involves the actual
encodings of data on physical media or devices (the physical model)

Here is Pascal, clearing it up, with examples:
http://www.dbdebunk.com/page/page/622537.htm
http://www.inconcept.com/JCM/May2003/Pascal.html

Codd's Book RM V2 book, Chapter 29 (Fundamental Laws in Database Management)
has descriptions about the three levels of concept -- the user level,
logical/semantic level & storage oriented or physical levels. Similar
classification of data representation has given by C.J Date in the 2nd
Chapter of his Intro to DB book.

During the relational database design endeavor, the external predicate
(meaning understood by humans) in the conceptual model is translated to the
internal predicate (meaning understood by the DBMS) in the logical model as
values in relations. The physical model, which is transparent to the user,
encompasses the implementation specific details like the internal data
structure, file system, bytes on disk etc & DBMS-specific details like
access paths, indexes, clusters etc.

So, instead of making judgements, I suggest you rethink your statement, "All
many-to-many relationships must be resolved when moving from a logical
design to a physical design." since data representation in a relational
database is, by definition, logical & database design using relational
approach is by definition, logical design.

>> If you need a course in relational theory and relational database
architecture take it at a local college or university. <<

If this is your general attitude, I suggest you read the following link and
get a clue.
http://www.apa.org/journals/psp/psp7761121.html

>> If you are not familiar with the work of E.F. Cobb and Chris Date you
have no business getting behind the keyboard. <<

May I get on with my business?

--
- Anith
( Please reply to newsgroups only )|||
Daniel Morgan wrote:

> Anith Sen wrote:
>>>> All many-to-many relationships must be resolved when moving from a
>>
>>
>> logical design to a physical design. <<
>>
>> Can you explain what you mean by this? What does a many-to-many
>> relationship
>> between two entities, which is a logical issue, have to do with physical
>> design?
>
> A typical many to many relationship would be ... a student can be in
> many classes and a class has many students. This is true in the
> logical model but should never be implemented in a relational database
> using two tables: Rather using three.
> STUDENT ... PK = person_id
> CLASS .... PK = class_id
> STUDENT_CLASS (intersecting entity) ... PK person_id and class_id.

Yes. In the original post, I missed how his setup was different. It seems
he has exactly that, the two tables and an association table whose two
columns were the primary keys of the other two.
Joe|||> So because you are unqualified to do the job you take money for you
> think it appropriate to disparage others that are qualified and point
> out the minimal qualifiecation set: Fascinating.

I would say that Anith Sen is one of the most qualified people in this
newsgroup. Certainly helps more people out than you do in a polite
manner.

> It is your heart ... it is not your database. Do your employer a favor
> and either learn what you are doing or do the ethical thing and quit.

Disagreeing with your attitude has little to do with my
qualifications. Maybe you should to the ethical thing and show some
respect to others? Just because you are a professor(?) doesn't mean
everyone (outside of your courses) has to bend over for you.

BZ

Daniel Morgan <damorgan@.x.washington.edu> wrote in message news:<1076371954.930049@.yasure>...
> xAvailx wrote:
> >>>If you are not familiar with the work of
> >>>E.F. Cobb and Chris Date you have no business getting behind the keyboard.
> > If you lack social skills...you have no business getting behind the keyboard.
> > BZ
> So because you are unqualified to do the job you take money for you
> think it appropriate to disparage others that are qualified and point
> out the minimal qualifiecation set: Fascinating.
> I presume you are not a hypocrite and when you need heart surgery some
> day you will not go to a board certified cardiologist but rather will go
> to some guy that has a great personality.
> It is your heart ... it is not your database. Do your employer a favor
> and either learn what you are doing or do the ethical thing and quit.|||I really hate you table names which are almost impossible to
understand (blah, blah, sample data -- but let's make life easier for
people anyway). How about weighted scores in various contests as the
human-friendly version?

The lack of DDL is also a pain, since we have to make all kinds of
assumptions about NULLs and keys. But is this what you wanted to say?

CREATE TABLE Scores
(score_type CHAR(2) NOT NULL PRIMARY KEY,
wgt INTEGER NOT NULL CHECK (wgt > 0));

INSERT INTO Scores VALUES ('X1', 2);
INSERT INTO Scores VALUES ('X2', 1);
INSERT INTO Scores VALUES ('X3', 5);
INSERT INTO Scores VALUES ('X4', 1);

CREATE TABLE Contests
(contest_id CHAR(2) NOT NULL PRIMARY KEY,
contest_name CHAR(10) NOT NULL);

INSERT INTO Contests VALUES ('Y1', 'Contest-1');
INSERT INTO Contests VALUES ('Y2', 'Contest-2');
INSERT INTO Contests VALUES ('Y3', 'Contest-3');
INSERT INTO Contests VALUES ('Y4', 'Contest-4');

CREATE TABLE Standings
(score_type CHAR(2) NOT NULL REFERENCES Scores (score_type),
contest_id CHAR(2) NOT NULL REFERENCES Contests (contest_id),
PRIMARY KEY (score_type, contest_id));

INSERT INTO Standings VALUES ('X1', 'Y1');
INSERT INTO Standings VALUES ('X1', 'Y2');
INSERT INTO Standings VALUES ('X2', 'Y1');
INSERT INTO Standings VALUES ('X3', 'Y1');
INSERT INTO Standings VALUES ('X3', 'Y2');
INSERT INTO Standings VALUES ('X3', 'Y4');
INSERT INTO Standings VALUES ('X4', 'Y4');

>> So based on Z the distribution (count of Y/ Total Y) is like
Y1 = 100*3/7 Y2=100*2/7 Y3=100*0/7 Y4=100*2/7 <<

I am feeling pedantic, so let's do this in steps to show the reasoning
behind each part and then assemble the final query from them.

SELECT C1.contest_id, (1.0 * COUNT(s1.contest_id)/(SELECT 1.0 *
COUNT(*) FROM Standings)) AS distribution
FROM Contests AS C1
LEFT OUTER JOIN
Standings AS ST1
ON C1.contest_id = ST1.contest_id
GROUP BY C1.contest_id;

The 1.0 is to avoid integer math; it could have been a CAST()

>> Now I want to factor in the wgt column in Scores also <<

Here is where I get confused; you show a constant "wgt * 1" for each
row of the standings table, but if this table is all key, isn't the
one redundant? On that assumption:

SELECT ST1.score_type, ST1.contest_id, S1.wgt
FROM Scores AS S1,
Standings AS ST1
WHERE ST1.score_type = S1.score_type;

>> Sum: 21 <<

SELECT SUM(S1.wgt)
FROM Scores AS S1,
Standings AS ST1
WHERE ST1.score_type = S1.score_type;

>> Given these tables, How would I factor in the weights and get the
distribution in a single SQL query. <<

SELECT C1.contest_id, C1.contest_name,
SUM(S1.wgt)/
(1.0 * (SELECT SUM(S1.wgt)
FROM Scores AS S1, Standings AS ST1
WHERE ST1.score_type = S1.score_type)) AS wgt-distrib
FROM (Contests AS C1
LEFT OUTER JOIN
Standings AS ST1
ON ST1.contest_id = C1.contest_id)
LEFT OUTER JOIN
Scores AS S1
ON ST1.score_type = S1.score_type
GROUP BY C1.contest_id, C1.contest_name;

The optimizer ought to pick out the scalar subquery and do it once.
The big worry is preserving the Contests table via nested LEFT UTER
JOINs.

Results
contest_id contest_name wgt-distrib
=================================
Y1 Contest-1 .38095238095238
Y2 Contest-2 .33333333333333
Y3 Contest-3 NULL
Y4 Contest-4 .28571428571428

Notice I left a NULL in the answer. This means that **nobody** got a
score in contest 'Y3', while a zero implies that **somebody** got a
score of zero in contest 'Y3' -- big difference.

Does it help to explain thigns to this detail or do you need more?|||pwned by Anith Sen tbh.
What a storm over a small slip of symantics.|||WangKhar wrote:

> pwned by Anith Sen tbh.
> What a storm over a small slip of symantics.

I would suggest you not minimize it. The difference is significant.

--
Daniel Morgan
http://www.outreach.washington.edu/...oad/oad_crs.asp
http://www.outreach.washington.edu/...aoa/aoa_crs.asp
damorgan@.x.washington.edu
(replace 'x' with a 'u' to reply)|||Daniel Morgan <damorgan@.x.washington.edu> wrote in message news:<1076532770.994229@.yasure>...
> WangKhar wrote:
> > pwned by Anith Sen tbh.
> > What a storm over a small slip of symantics.
> I would suggest you not minimize it. The difference is significant.

Well in that case I suggest you remember the difference in future, and
regard this as a learning event.|||bottom line: we should create designs with M-M relationships
implemented with less than three tables?
Good. You stay with that one. I'll stay with the
delivered, working systems.

I'd rather not learn from this one...
--
Cheers
Nuno Souto
wizofoz2k@.yahoo.com.au.nospam
"WangKhar" <Wangkhar@.yahoo.com> wrote in message
news:bb269444.0402120748.4ae59054@.posting.google.c om...
> Well in that case I suggest you remember the difference in future, and
> regard this as a learning event.|||"Greg D. Moore (Strider)" <mooregr_deleteth1s@.greenms.com> wrote in message
news:KaYWb.64294$%72.34500@.twister.nyroc.rr.com...

> Au contraire, I think you're the one that needs to review here.
> Cobb would say that the whole point is that you don't NEED to worry about
> the physical design.
> The database could be stored via pieces of paper in a pigeon-hole desk for
> all the dba cares about.
> So again, what does the logical design have to do with the physical design?

It all depends on what one calls logical and physical, isn't it?
Using some nomenclatures, you only talk about tables in the physical design.

This , because it may well be that at this stage you may decide the best way
to implement a given data storage is for example to use a flat file.
As opposed to a r-table. It could even be an object in Oracle, for example.
Which is not exactly a table, although it approaches it. Physical design, all
of it, with that approach.

Logical is where you talk about entities, relationships, attributes. Or object
classes and methods. It doesn't matter if the relationship is 1-1, 1-M or M-M
at all, other than by design requirements. And how it is implemented is far
from being of any concern.

So yes in some methodologies tables *are* part of the physical design, while in
others they are exclusively part of the logical design.
As for how all that applies to the OP, I lost track of the whole thing
as well as losing interest when the slag started.
Forgive me.

--
Cheers
Nuno Souto
wizofoz2k@.yahoo.com.au.nospam|||Thanks much that helped - Sorry for the bad names :)

joe.celko@.northface.edu (--CELKO--) wrote in message news:<a264e7ea.0402101535.4f7b8af5@.posting.google.com>...
> I really hate you table names which are almost impossible to
> understand (blah, blah, sample data -- but let's make life easier for
> people anyway). How about weighted scores in various contests as the
> human-friendly version?
> The lack of DDL is also a pain, since we have to make all kinds of
> assumptions about NULLs and keys. But is this what you wanted to say?
> CREATE TABLE Scores
> (score_type CHAR(2) NOT NULL PRIMARY KEY,
> wgt INTEGER NOT NULL CHECK (wgt > 0));
> INSERT INTO Scores VALUES ('X1', 2);
> INSERT INTO Scores VALUES ('X2', 1);
> INSERT INTO Scores VALUES ('X3', 5);
> INSERT INTO Scores VALUES ('X4', 1);
> CREATE TABLE Contests
> (contest_id CHAR(2) NOT NULL PRIMARY KEY,
> contest_name CHAR(10) NOT NULL);
> INSERT INTO Contests VALUES ('Y1', 'Contest-1');
> INSERT INTO Contests VALUES ('Y2', 'Contest-2');
> INSERT INTO Contests VALUES ('Y3', 'Contest-3');
> INSERT INTO Contests VALUES ('Y4', 'Contest-4');
> CREATE TABLE Standings
> (score_type CHAR(2) NOT NULL REFERENCES Scores (score_type),
> contest_id CHAR(2) NOT NULL REFERENCES Contests (contest_id),
> PRIMARY KEY (score_type, contest_id));
> INSERT INTO Standings VALUES ('X1', 'Y1');
> INSERT INTO Standings VALUES ('X1', 'Y2');
> INSERT INTO Standings VALUES ('X2', 'Y1');
> INSERT INTO Standings VALUES ('X3', 'Y1');
> INSERT INTO Standings VALUES ('X3', 'Y2');
> INSERT INTO Standings VALUES ('X3', 'Y4');
> INSERT INTO Standings VALUES ('X4', 'Y4');
> >> So based on Z the distribution (count of Y/ Total Y) is like
> Y1 = 100*3/7 Y2=100*2/7 Y3=100*0/7 Y4=100*2/7 <<
> I am feeling pedantic, so let's do this in steps to show the reasoning
> behind each part and then assemble the final query from them.
> SELECT C1.contest_id, (1.0 * COUNT(s1.contest_id)/(SELECT 1.0 *
> COUNT(*) FROM Standings)) AS distribution
> FROM Contests AS C1
> LEFT OUTER JOIN
> Standings AS ST1
> ON C1.contest_id = ST1.contest_id
> GROUP BY C1.contest_id;
> The 1.0 is to avoid integer math; it could have been a CAST()
> >> Now I want to factor in the wgt column in Scores also <<
> Here is where I get confused; you show a constant "wgt * 1" for each
> row of the standings table, but if this table is all key, isn't the
> one redundant? On that assumption:
> SELECT ST1.score_type, ST1.contest_id, S1.wgt
> FROM Scores AS S1,
> Standings AS ST1
> WHERE ST1.score_type = S1.score_type;
> >> Sum: 21 <<
> SELECT SUM(S1.wgt)
> FROM Scores AS S1,
> Standings AS ST1
> WHERE ST1.score_type = S1.score_type;
> >> Given these tables, How would I factor in the weights and get the
> distribution in a single SQL query. <<
> SELECT C1.contest_id, C1.contest_name,
> SUM(S1.wgt)/
> (1.0 * (SELECT SUM(S1.wgt)
> FROM Scores AS S1, Standings AS ST1
> WHERE ST1.score_type = S1.score_type)) AS wgt-distrib
> FROM (Contests AS C1
> LEFT OUTER JOIN
> Standings AS ST1
> ON ST1.contest_id = C1.contest_id)
> LEFT OUTER JOIN
> Scores AS S1
> ON ST1.score_type = S1.score_type
> GROUP BY C1.contest_id, C1.contest_name;
> The optimizer ought to pick out the scalar subquery and do it once.
> The big worry is preserving the Contests table via nested LEFT UTER
> JOINs.
> Results
> contest_id contest_name wgt-distrib
> =================================
> Y1 Contest-1 .38095238095238
> Y2 Contest-2 .33333333333333
> Y3 Contest-3 NULL
> Y4 Contest-4 .28571428571428
> Notice I left a NULL in the answer. This means that **nobody** got a
> score in contest 'Y3', while a zero implies that **somebody** got a
> score of zero in contest 'Y3' -- big difference.
> Does it help to explain thigns to this detail or do you need more?

No comments:

Post a Comment