Filter in the Join vs Filter in the Where

Discussion in 'microsoft.public.sqlserver.programming' started by Mike H, Oct 28, 2009.

  1. Mike H

    Mike H Guest

    To start off, this is with SQL Server 2005 data.

    The next question comes from running into someone that likes to write
    their
    code a different way than I do. I've long ago established how I do
    it, and I'm sure I had a good reason at the time, but I can't recall.
    And this other person, I'll call Bob, also likes his way, and feels
    it's better.

    The actual data this question is based on is tens of millions of rows
    in 3nf normalized database that stores data. Not quite OLTP, but not
    quite datawarehouse. Typically people are working with 4 or 5 sets of
    data from this database at any one time, with each set of data having
    hundreds of thousands of rows. We can assume (for this discussion)
    that appropriate indexes are in place, with sufficient memory, cpu,
    and i/o available. Of course we simplify here for an example:
    CREATE TABLE #One (CarBrand NVARCHAR(40), Employees INT)
    INSERT INTO #one VALUES ('Ford',625)
    INSERT INTO #one VALUES ('Chevrolet',865)
    INSERT INTO #one VALUES ('Porche',700)
    INSERT INTO #one VALUES ('Citroën',250)

    CREATE TABLE #two (CarBrand NVARCHAR(40), PopularityRank SMALLINT)
    INSERT INTO #two VALUES ('Ford',1)
    INSERT INTO #two VALUES ('Chevrolet',3)
    INSERT INTO #two VALUES ('Porche',2)
    INSERT INTO #two VALUES ('Citroën',4)

    CREATE INDEX #myindex ON #one(Carbrand)
    CREATE INDEX #myindex3 on #one(Employees)
    CREATE INDEX #myindex2 ON #two(Carbrand)

    The conflict is that creating a query that joins multiple tables, and
    requires filtering to be applied to various columns from these various
    tables, I will more often than not use the Join On to join the common
    fields, and place any filters in the Where clause. For example I
    might write a query like this:

    SELECT a.carbrand, a.Employees, b.Rank
    FROM #One a
    JOIN #Two b
    ON a.CarBrand = b.carbrand
    WHERE a.Employees > 500

    Bob will write a query that looks like this:

    SELECT a.carbrand, a.Employees, b.Rank
    FROM (select 'x' as x) as x
    JOIN #One a
    on a.Employees > 500
    JOIN #Two b
    ON a.CarBrand = b.carbrand

    Bob's theory being that since the order of operations for sql
    databases is:
    1. FROM clause
    2. WHERE clause
    3. GROUP BY clause
    4. HAVING clause
    5. SELECT clause
    6. ORDER BY clause

    sticking any filtering conditions into the FROM clause is going to
    lead to higher performance through less data needing to be processed
    as it is filtered before the where needs to deal with it. I've found
    others preaching this as well:http://www.bennadel.com/blog/70-SQL-
    Query-Order-of-Operations.htm

    In general, that's never really something I've worried about. Should
    I be? Won't SQL Server, Oracle, DB2 optimize your query and run it
    however it feels is most optimal, regardless of where you put
    filtering conditions? Or is there some incremental benefit that might
    appear, so you should just try and avoid the where?

    It was news to me and I hadn't found much discussion online so I
    thought I'd see what folks around here say.
     
    Mike H, Oct 28, 2009
    #1
    1. Advertisements

  2. Mike H

    Tom Cooper Guest

    I like your method, I don't like Bob's method. But that's a matter of
    style, not efficiency. Your method is by far the most commonly used.

    But they are both equally efficient. If you examine the execution plans,
    you will see that they are identical. And, yes, the optimizer is allowed to
    and does change the order of the operations to the most efficient method it
    can find that returns the correct result. And since your query and Bob's
    query both return the same result, it is not suprising that the optimizer
    generates the same plan. And since the query plan is the same, they are
    equally efficient.

    Tom

    To start off, this is with SQL Server 2005 data.

    The next question comes from running into someone that likes to write
    their
    code a different way than I do. I've long ago established how I do
    it, and I'm sure I had a good reason at the time, but I can't recall.
    And this other person, I'll call Bob, also likes his way, and feels
    it's better.

    The actual data this question is based on is tens of millions of rows
    in 3nf normalized database that stores data. Not quite OLTP, but not
    quite datawarehouse. Typically people are working with 4 or 5 sets of
    data from this database at any one time, with each set of data having
    hundreds of thousands of rows. We can assume (for this discussion)
    that appropriate indexes are in place, with sufficient memory, cpu,
    and i/o available. Of course we simplify here for an example:
    CREATE TABLE #One (CarBrand NVARCHAR(40), Employees INT)
    INSERT INTO #one VALUES ('Ford',625)
    INSERT INTO #one VALUES ('Chevrolet',865)
    INSERT INTO #one VALUES ('Porche',700)
    INSERT INTO #one VALUES ('Citroën',250)

    CREATE TABLE #two (CarBrand NVARCHAR(40), PopularityRank SMALLINT)
    INSERT INTO #two VALUES ('Ford',1)
    INSERT INTO #two VALUES ('Chevrolet',3)
    INSERT INTO #two VALUES ('Porche',2)
    INSERT INTO #two VALUES ('Citroën',4)

    CREATE INDEX #myindex ON #one(Carbrand)
    CREATE INDEX #myindex3 on #one(Employees)
    CREATE INDEX #myindex2 ON #two(Carbrand)

    The conflict is that creating a query that joins multiple tables, and
    requires filtering to be applied to various columns from these various
    tables, I will more often than not use the Join On to join the common
    fields, and place any filters in the Where clause. For example I
    might write a query like this:

    SELECT a.carbrand, a.Employees, b.Rank
    FROM #One a
    JOIN #Two b
    ON a.CarBrand = b.carbrand
    WHERE a.Employees > 500

    Bob will write a query that looks like this:

    SELECT a.carbrand, a.Employees, b.Rank
    FROM (select 'x' as x) as x
    JOIN #One a
    on a.Employees > 500
    JOIN #Two b
    ON a.CarBrand = b.carbrand

    Bob's theory being that since the order of operations for sql
    databases is:
    1. FROM clause
    2. WHERE clause
    3. GROUP BY clause
    4. HAVING clause
    5. SELECT clause
    6. ORDER BY clause

    sticking any filtering conditions into the FROM clause is going to
    lead to higher performance through less data needing to be processed
    as it is filtered before the where needs to deal with it. I've found
    others preaching this as well:http://www.bennadel.com/blog/70-SQL-
    Query-Order-of-Operations.htm

    In general, that's never really something I've worried about. Should
    I be? Won't SQL Server, Oracle, DB2 optimize your query and run it
    however it feels is most optimal, regardless of where you put
    filtering conditions? Or is there some incremental benefit that might
    appear, so you should just try and avoid the where?

    It was news to me and I hadn't found much discussion online so I
    thought I'd see what folks around here say.
     
    Tom Cooper, Oct 28, 2009
    #2
    1. Advertisements

  3. Mike H

    Dan Guzman Guest

    Bob's theory being that since the order of operations for sql
    This logical order of query processing is not necessarily the same as the
    physical order. Like Tom said, it is the execution plan that will tell the
    real story. Semantically identical queries often yield the same execution
    plan since the optimizer's job is to plan the declared query as efficiently
    as possible. Even if you do get different plans do to happenstance (this
    can tends to happen for more complex queries and the optimizer isn't
    perfect), the plan could very well change between service packs or SQL
    Server versions as the optimizer improves.

    --
    Hope this helps.

    Dan Guzman
    SQL Server MVP
    http://weblogs.sqlteam.com/dang/
     
    Dan Guzman, Oct 28, 2009
    #3
  4. Mike H

    Mike H Guest


    Thank you for the feedback. I've done some testing with Query analyzer
    and have found that for the most part, execution plans end up being
    the same regardless, even on the more complex queries. It is Bob's
    insistence that the order of operations dictate that you are
    essentially a fool for using the where clause that prompted my
    question. He seems so fixed in his position that I thought that maybe
    I'm missing something so basic, folks don't even talk about it.
     
    Mike H, Oct 28, 2009
    #4
  5. One thing to note is that there can be a difference in results when choosing
    between a filter in the WHERE and a filter in an OUTER JOIN. (Basically,
    you are either (WHERE) filtering the rows to be returned, or (OUTER JOIN)
    filtering the rows to be joined.) You want to be sure that Bob is aware of
    this.

    For one discussion:
    http://www.sqlservercentral.com/Forums/Topic449462-338-1.aspx

    RLF


    Thank you for the feedback. I've done some testing with Query analyzer
    and have found that for the most part, execution plans end up being
    the same regardless, even on the more complex queries. It is Bob's
    insistence that the order of operations dictate that you are
    essentially a fool for using the where clause that prompted my
    question. He seems so fixed in his position that I thought that maybe
    I'm missing something so basic, folks don't even talk about it.
     
    Russell Fields, Oct 28, 2009
    #5
    1. Advertisements

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.