Monday, March 19, 2012

Functions

I have a subquery that is running really slow when i use a date from the parent query

* the function udfMinContact returns a table of the earliest contact for every child that occured after the date passed to the function

* the function udfMaxReferral returns a table of the latest Referral for every child that occured before the date passed to the function

* the referral happens first, then the child is contacted. i'm looking for contacts that happed over 45 days after the referral
*************************************************************************************
DECLARE
@.EndDate DateTime,
@.StartDate DateTime

SET @.StartDate = '4/1/2007'
SET @.EndDate = '6/30/2007'

SELECT c.ChildId, c.FN, c.LN, c.DOB

FROM Child c INNER JOIN udfMinContact(@.StartDate) ct ON c.ChildID = ct.ChildId

WHERE ct.ContactDate BETWEEN @.StartDate AND @.EndDate

AND EXISTS
(
SELECT ChildId
FROM udfMaxReferral(ct.ContactDate) r
WHERE r.ChildId = c.ChildId
AND r.ReferralDate < DATEADD(dd, -45, ct.ContactDate)

)
*********************************************************************************************

If i run as is, it takes over 40 min. If i replace 'ct.ContactDate' which i highlighted with a static date like '1/1/2007' it runs in just a few seconds.

any idea why the drastic time difference and any suggestions on how to speed it up?

thanks

Check the Execution Plan.

Your FUNCTION has to fully execute for each and every row in the table, perhaps two times per row if it needs to re-calculate for the sub-query..

You may be able to substanially improve execution speed if you JOIN with the data that has the earliest contact INSTEAD of using the function. (I'm assuming that the function is a query.)

Please post the entire FUNCTION code and we can better determine the optimal way to deal with your issue.

|||

Is udfMaxReferral a multi-statement or inline TVF? Look at the query plan to see how the join is being done. If it is a nested loop join then it is possible that the TVF is invoked for every row that is being joined in the outer SELECT statement. Also, if the TVF is multi-statement then there are no statistics on the rows being returned so the plan will be sub-optimal. You should consider using inline TVF so that the query can be optimized as a whole.

Now, as for the question why if you use a variable or column in the TVF it is slower than a value or constant is due to plan caching and query optimization. When you specify a constant in a predicate or parameter to SP or function etc then the query optimizer can use that value and determine the best plan based on the available statistics. On the other hand, if you specify a variable or column then the value is not known and it can be any value within the domain of a data type so the query optimizer will pick a plan that works optimally for any search value. See the white paper on compilation, recompilation for more details on how this works.

|||Function: udfMinContact
Description: Returns info on the earliest 'IF' Contact (that occured on or after the StartDate) for every child

ALTER FUNCTION [dbo].[udfMinContact]
(
@.StartDate datetime
)
RETURNS @.retChildList TABLE
(
ChildId uniqueidentifier,
ContactId uniqueidentifier,
ContactDate datetime
)

AS
BEGIN

INSERT @.retChildList
SELECT ChildId, ContactId, ContactDate
FROM Contact ct
WHERE ContactId =
(
SELECT TOP(1) ContactId
FROM Contact
WHERE ChildId = ct.ChildId
AND ContactDate >= @.StartDate
ORDER BY ContactDate ASC, CREATE_TIME ASC
)

RETURN
END;

*********************************************************************************

Function: udfMaxReferral
Description: Returns info on the most current Referral (that occured on or before the EndDate) for every child

ALTER FUNCTION [dbo].[udfMaxReferral]
(
@.EndDate datetime
)
RETURNS @.retChildList TABLE
(
ChildId uniqueidentifier,
ReferralId uniqueidentifier,
ReferralDate datetime
)

AS
BEGIN

INSERT @.retChildList
SELECT ChildId, ReferralId, ReferralDate
FROM Referral r
WHERE ReferralId =
(
SELECT TOP(1) ReferralId
FROM Referral
WHERE ChildId = r.ChildId
AND ReferralDate <= @.EndDate
ORDER BY ReferralDate DESC, CREATE_TIME DESC
)

RETURN
END;

*********************************************************************************************************

I know the functions are not ideal, but the way the database is set up, it's the best i could do. there are many cases where a child will have several contacts or referrals on the same day so this is how i forced only 1 to be returned
|||

you can try this, and check if it can change your query speed,

notice, the get_datetime is a function needed you defined it.

i think ORDER BY clause always waste resource!!!

ALTER FUNCTION [dbo].[udfMaxReferral]
(
@.EndDate datetime
)
RETURNS @.retChildList TABLE
(
ChildId uniqueidentifier,
ReferralId uniqueidentifier,
ReferralDate datetime
)

AS
BEGIN

INSERT @.retChildList
SELECT ChildId, ReferralId, ReferralDate
FROM Referral a inner join

(

SELECT ChildId,ReferralId,MAX(get_datetime(ReferralDate,ReferralTime) datetime
FROM Referral

WHERE ReferralDate <= @.EndDate

GROUP BY ChildId,ReferralId

) b on a.ChildId=b.ChildId and a.ReferralId=ReferralId.ReferralId and get_datetime(a.ReferralDate,a.ReferralTime)=b.datetime

WHERE ReferralDate <= @.EndDate

RETURN
END;

|||

As per Uma's suggestion you have to convert your Multilined TVF to Inline TVF,

Use the following functions,

Code Snippet

CREATE FUNCTION [dbo].[udfMinContact]

(

@.StartDate datetime

)

RETURNS TABLE

AS

RETURN (

SELECT ChildId, ContactId, ContactDate

FROM Contact ct

WHERE ContactId =

(

SELECT TOP(1) ContactId

FROM Contact

WHERE ChildId = ct.ChildId

AND ContactDate >= @.StartDate

ORDER BY ContactDate ASC, CREATE_TIME ASC

)

)

GO

CREATE FUNCTION [dbo].[udfMaxReferral]

(

@.EndDate datetime

)

RETURNS TABLE

AS

RETURN

(

SELECT ChildId, ReferralId, ReferralDate

FROM Referral r

WHERE ReferralId =

(

SELECT TOP(1) ReferralId

FROM Referral

WHERE ChildId = r.ChildId

AND ReferralDate <= @.EndDate

ORDER BY ReferralDate DESC, CREATE_TIME DESC

)

)

|||

Manivannan.D.Sekaran wrote:

As per Uma's suggestion you have to convert your Multilined TVF to Inline TVF,

What is the difference? Sorry I'm still pretty new to anything more the simple SQL

No comments:

Post a Comment