Sunday, February 26, 2012

FOR XML and performance

Does anyone have any suggestions or pointers for performance optimizations
for SELECT queries that take relational data and
return it as XML columns?
For example, I have a Person table. In our system, a person may have
multiple names, addresses, phone numbers, and email
addresses. So, I have a PersonNames table, PersonAddresses table,
PersonPhones table and a PersonEmails table. I'm
returning all of the person information at once, instead of having to make
multiple database calls.
One way to do this is to return multiple resultsets: the first is the person
base information, and each of the associated
collections of informataion would be in the next result sets.
There are some problems with this, though: if I ever want to bring back
multiple people with all of their supporting
information, I have to return them with multiple result sets in the same
order:
Person1
Person1Names
Person1Addresses
..
..
..
Person2
Person2Names
Person2Addresses
..
..
..
etc.
I've dealt with this by defining a view that selects the various information
as XML columns, like this:
SELECT personId,
dateOfBirth,
gender,
(SELECT DISTINCT nameId,
salutation,
firstName,
middleName,
surName,
suffix,
nameTypeId
FROM Names AS [Name]
WHERE personId = People.personId
FOR XML AUTO, TYPE, Root('Names')) AS Names,
(SELECT Address.addressId,
PeopleAddresses.addressTypeId AS addressTypeId,
cityId,
countyId,
stateId,
streetAddr1,
streetAddr2,
cityName,
stateAbbrev,
zipCode,
stateName,
countyName,
areaCode,
timeZone,
useDST
FROM PeopleAddresses INNER JOIN
VAddresses AS [Address] ON PeopleAddresses.addressId =
Address.addressId
WHERE PeopleAddresses.personId = People.personId AND
PeopleAddresses.addressId = Address.AddressId
FOR XML RAW('Address'), TYPE, Root('Addresses')) AS Addresses,
(SELECT emailAddressId,
emailAddress,
emailAddressTypeId
FROM EmailAddresses AS [EmailAddress]
WHERE EmailAddress.personId = People.personId FOR XML AUTO, TYPE,
Root('EmailAddresses')) AS EmailAddresses,
(SELECT PhoneNumbers.phoneTypeId,
PhoneNumber.phoneNumberId,
PhoneNumber.areaCode,
PhoneNumber.phoneNumber,
PhoneNumber.extension,
PhoneNumber.prefix
FROM PeoplePhones PhoneNumbers INNER JOIN
PhoneNumbers PhoneNumber ON PhoneNumbers.phoneNumberId =
PhoneNumber.phoneNumberId
WHERE PhoneNumbers.personId = People.personId
FOR XML RAW('PhoneNumber'), TYPE, Root('PhoneNumbers')) AS PhoneNumbers
FROM dbo.People
This works really well - I can reach into the XML columns of the result set,
and pull back the data. I can also have a
SINGLE piece of code that knows how to read addresses, phone numbers, etc.
for other entities that have these. I just return
them in the queries for these entities as XML columns.
However, the performance is not optimal. I know that XML introduces a
performance penalty, and I'm willing to pay one for
this flexibility - but I was wondering if anyone has any performance
optimization tips for this scenario?
I could obviously change how I'm doing things: store the names, addresses,
etc. as XML, so that I only construct it once,
have special cases for returning this information as multiple result sets,
etc. I don't expect this to be the
best-performing method, but I'd like to eliminate as many performance
bottlenecks as possible.
Any comments are appreciated. Thanks!
Hello PMarino,
If you need to read all that data then thats your option. However you are
readd all people in this example normally an app would only select 1 person.
Also you need to make sure the query is optimal so the subqueries are using
good plans
Simon Sabin
SQL Server MVP
http://sqlblogcasts.com/blogs/simons

> Does anyone have any suggestions or pointers for performance
> optimizations for SELECT queries that take relational data and
> return it as XML columns?
> For example, I have a Person table. In our system, a person may have
> multiple names, addresses, phone numbers, and email
> addresses. So, I have a PersonNames table, PersonAddresses table,
> PersonPhones table and a PersonEmails table. I'm
> returning all of the person information at once, instead of having to
> make multiple database calls.
> One way to do this is to return multiple resultsets: the first is the
> person base information, and each of the associated
> collections of informataion would be in the next result sets.
> There are some problems with this, though: if I ever want to bring
> back multiple people with all of their supporting
> information, I have to return them with multiple result sets in the
> same order:
> Person1
> Person1Names
> Person1Addresses
> .
> .
> .
> Person2
> Person2Names
> Person2Addresses
> .
> .
> .
> etc.
> I've dealt with this by defining a view that selects the various
> information as XML columns, like this:
> SELECT personId,
> dateOfBirth,
> gender,
> (SELECT DISTINCT nameId,
> salutation,
> firstName,
> middleName,
> surName,
> suffix,
> nameTypeId
> FROM Names AS [Name]
> WHERE personId = People.personId
> FOR XML AUTO, TYPE, Root('Names')) AS Names,
> (SELECT Address.addressId,
> PeopleAddresses.addressTypeId AS addressTypeId,
> cityId,
> countyId,
> stateId,
> streetAddr1,
> streetAddr2,
> cityName,
> stateAbbrev,
> zipCode,
> stateName,
> countyName,
> areaCode,
> timeZone,
> useDST
> FROM PeopleAddresses INNER JOIN
> VAddresses AS [Address] ON PeopleAddresses.addressId =
> Address.addressId
> WHERE PeopleAddresses.personId = People.personId AND
> PeopleAddresses.addressId = Address.AddressId
> FOR XML RAW('Address'), TYPE, Root('Addresses')) AS Addresses,
> (SELECT emailAddressId,
> emailAddress,
> emailAddressTypeId
> FROM EmailAddresses AS [EmailAddress]
> WHERE EmailAddress.personId = People.personId FOR XML AUTO,
> TYPE,
> Root('EmailAddresses')) AS EmailAddresses,
> (SELECT PhoneNumbers.phoneTypeId,
> PhoneNumber.phoneNumberId,
> PhoneNumber.areaCode,
> PhoneNumber.phoneNumber,
> PhoneNumber.extension,
> PhoneNumber.prefix
> FROM PeoplePhones PhoneNumbers INNER JOIN
> PhoneNumbers PhoneNumber ON PhoneNumbers.phoneNumberId =
> PhoneNumber.phoneNumberId
> WHERE PhoneNumbers.personId = People.personId
> FOR XML RAW('PhoneNumber'), TYPE, Root('PhoneNumbers')) AS
> PhoneNumbers
> FROM dbo.People
> This works really well - I can reach into the XML columns of the
> result set, and pull back the data. I can also have a
> SINGLE piece of code that knows how to read addresses, phone numbers,
> etc. for other entities that have these. I just return
> them in the queries for these entities as XML columns.
> However, the performance is not optimal. I know that XML introduces a
> performance penalty, and I'm willing to pay one for
> this flexibility - but I was wondering if anyone has any performance
> optimization tips for this scenario?
> I could obviously change how I'm doing things: store the names,
> addresses, etc. as XML, so that I only construct it once,
> have special cases for returning this information as multiple result
> sets, etc. I don't expect this to be the
> best-performing method, but I'd like to eliminate as many performance
> bottlenecks as possible.
> Any comments are appreciated. Thanks!
>
|||Did you try to analyze why your query is expensive? Can it be because of the
SELECT DISTINCT?
You can find some details of FOR XML implementation that could potentially
help optimize FOR XML query performance at
http://blogs.msdn.com/sqlprogrammability/pages/576095.aspx .
Best regards,
Eugene
"PMarino" <PMarino@.discussions.microsoft.com> wrote in message
news:B9127CC7-2413-4047-9E50-C779032E1586@.microsoft.com...
> Does anyone have any suggestions or pointers for performance optimizations
> for SELECT queries that take relational data and
> return it as XML columns?
> For example, I have a Person table. In our system, a person may have
> multiple names, addresses, phone numbers, and email
> addresses. So, I have a PersonNames table, PersonAddresses table,
> PersonPhones table and a PersonEmails table. I'm
> returning all of the person information at once, instead of having to make
> multiple database calls.
> One way to do this is to return multiple resultsets: the first is the
> person
> base information, and each of the associated
> collections of informataion would be in the next result sets.
> There are some problems with this, though: if I ever want to bring back
> multiple people with all of their supporting
> information, I have to return them with multiple result sets in the same
> order:
> Person1
> Person1Names
> Person1Addresses
> .
> .
> .
> Person2
> Person2Names
> Person2Addresses
> .
> .
> .
> etc.
>
> I've dealt with this by defining a view that selects the various
> information
> as XML columns, like this:
> SELECT personId,
> dateOfBirth,
> gender,
> (SELECT DISTINCT nameId,
> salutation,
> firstName,
> middleName,
> surName,
> suffix,
> nameTypeId
> FROM Names AS [Name]
> WHERE personId = People.personId
> FOR XML AUTO, TYPE, Root('Names')) AS Names,
> (SELECT Address.addressId,
> PeopleAddresses.addressTypeId AS addressTypeId,
> cityId,
> countyId,
> stateId,
> streetAddr1,
> streetAddr2,
> cityName,
> stateAbbrev,
> zipCode,
> stateName,
> countyName,
> areaCode,
> timeZone,
> useDST
> FROM PeopleAddresses INNER JOIN
> VAddresses AS [Address] ON PeopleAddresses.addressId =
> Address.addressId
> WHERE PeopleAddresses.personId = People.personId AND
> PeopleAddresses.addressId = Address.AddressId
> FOR XML RAW('Address'), TYPE, Root('Addresses')) AS Addresses,
> (SELECT emailAddressId,
> emailAddress,
> emailAddressTypeId
> FROM EmailAddresses AS [EmailAddress]
> WHERE EmailAddress.personId = People.personId FOR XML AUTO, TYPE,
> Root('EmailAddresses')) AS EmailAddresses,
> (SELECT PhoneNumbers.phoneTypeId,
> PhoneNumber.phoneNumberId,
> PhoneNumber.areaCode,
> PhoneNumber.phoneNumber,
> PhoneNumber.extension,
> PhoneNumber.prefix
> FROM PeoplePhones PhoneNumbers INNER JOIN
> PhoneNumbers PhoneNumber ON PhoneNumbers.phoneNumberId =
> PhoneNumber.phoneNumberId
> WHERE PhoneNumbers.personId = People.personId
> FOR XML RAW('PhoneNumber'), TYPE, Root('PhoneNumbers')) AS PhoneNumbers
> FROM dbo.People
>
> This works really well - I can reach into the XML columns of the result
> set,
> and pull back the data. I can also have a
> SINGLE piece of code that knows how to read addresses, phone numbers, etc.
> for other entities that have these. I just return
> them in the queries for these entities as XML columns.
> However, the performance is not optimal. I know that XML introduces a
> performance penalty, and I'm willing to pay one for
> this flexibility - but I was wondering if anyone has any performance
> optimization tips for this scenario?
> I could obviously change how I'm doing things: store the names, addresses,
> etc. as XML, so that I only construct it once,
> have special cases for returning this information as multiple result sets,
> etc. I don't expect this to be the
> best-performing method, but I'd like to eliminate as many performance
> bottlenecks as possible.
> Any comments are appreciated. Thanks!
>

No comments:

Post a Comment