FullText Ranking: forceseek performance and cost...

Hi,

I have just run the 2 queries from the BOL to test the forceseek option:

USE AdventureWorks;

SELECT*

FROM Sales.SalesOrderHeader AS h

INNERJOIN Sales.SalesOrderDetail AS d

ON h.SalesOrderID = d.SalesOrderID

WHERE h.TotalDue > 100

AND(d.OrderQty > 5 OR d.LineTotal < 1000.00);

SELECT*

FROM Sales.SalesOrderHeader AS h

INNERJOIN Sales.SalesOrderDetail AS d WITH(FORCESEEK)

ON h.SalesOrderID = d.SalesOrderID

WHERE h.TotalDue > 100

AND(d.OrderQty > 5 OR d.LineTotal < 1000.00);

When I take a look at the execution plan, the first query has a cost of 20% and the forceseek take 80% of the total cost.

Also the number of logical read jump from 1238 to 66194!!!

So the forceseek option has a bad impact on the query.

The BOL says if there is a lot of IO, using the forceseek can provide better performance... but in this case its not so clear...

can you explain more in detail when its better to use the forceseek?

Thanks.

The cost increases,but if you look, the index change of clustered index scan to clustered index seek and if you compare the estimated I/O cust and estimated CPU Cost you can see the differences.The number reduce....

|||

Actually this is expected behavior. We have approx 30k rows of Order Header with approx 120k with order lines. What you tell SQL Server in the latter query, is to ignore the fact that we are retrieving all the rows, and still use a lookup instead of a scan. So what SQL Server does (on your demand) is to take each and every order one by one and lookup the corresponding order lines. If you looked carefully at the execution plan, you'ld see that the merge join was replaced with a inner loop join. The latter is more efficient if you lookup only a few values, but when you lookup all the values a merge join is way much faster.

As I understand the FORCESEEK its a way of forcing seeking in the index the few times that SQL Server believes it has to read more or less the whole table, while it actually is going to retrieve just a few rows.

Anybody is free to correct me if I'm wrong on this one.

Edit: That said, when I run a trace it actually seems that the last query is the faster, I would guess that is since all the data already are in memory.

|||

Hints are there to be used when the optimizer produces a less than optimal plan and, despite over two decades of effort, query optimizers still can't get the 100% best plan 100% of the time.

So what is FORCESEEK intended to deal with? The optimizer is always working with imperfect data about the values in the table. And when you start doing a join using multiple predicates that include operators such as "<" and ">" its ability to accurately estimate the number of rows that will be touched declines. Particularly true if you don't have up to date statistics on some of the columns, or if you encounter one of the rare instances where the histograms don't capture enough information to do accurate estimates. Historically (dating back to the System R research project) if you don't have statistics that indicate otherwise then an operator such as ">" is considered to be true for 1/3 of the data. Thus ORing two such conditions would lead to an assumption that 2/3 of the table was going to be retrieved which clearly would lead to a scan over a seek. This is a grand oversimplification of how things work in this day and age, but the basics help you understand why the optimizer might choose the wrong plan. It might think that 2/3 of the rows must be touched while you know that based on the actual data it is more like .01%. The bottom line being, there are cases where the optimizer will think that a scan offers the best performance but where human knowledge of the data makes it clear that a seek is the better alternative.

Using the optimizer's own statistics to see if FORCESEEK is better or not is a mistake. The reason the optimizer didn't choose the seek strategy on its own is that it estimated it would take a lot more I/O and rejected that strategy. So when you use FORCESEEK and look at the estimates what you see are the (incorrect) estimates that the optimizer saw when it considered this strategy on its own. You just can't use that data, you need to look at the actual performance of the query.

Hal

Friday, March 23, 2012

forceseek performance and cost...

No comments:

Post a Comment

FullText Ranking

Blog Archive

About Me