4th December, 2007

Indexing isn’t hard

Filed under: database — admin @ 7:45 am

I found this blog post via Reddit and was happy to see the positive benefits the author is getting from profiling his SQL and indexing his database. He could have avoided the slow running query in the first place by following Andy’s simple rules of indexing;

  1. If your database doesn’t do it automatically, add indexes to all of your primary key columns
  2. Add indexes to all of your natural key columns
  3. Add indexes to all of your foreign key columns

When primary or foreign keys are composites (i.e. contain more than one column) create an index containing all of the columns in the same order that they are defined in the corresponding constraint.

If none of this makes any sense and you are responsible for your application’s database give me a call, my consulting rates are very reasonable.

8 Comments

  1. “If your database doesn’t do it automatically, add indexes to all of your primary key columns”
    Shouldn’t this be
    “If your database doesn’t automatically index primary keys, change your database software”

    Seriously, if you have multi-column primary keys, you want one index on all columns. So the classic “order line” table would have one index of “order_no, line_no”. Having a separate index on line_no is probably unhelpful.

    Comment by gary — 04/12/2007 @ 9:08 am

  2. “If your database doesn’t do it automatically, add indexes to all of your primary key columns”
    Shouldn’t this be
    “If your database doesn’t automatically index primary keys, change your database software”

    Seriously, if you have multi-column primary keys, you want one index on all columns. So the classic “order line” table would have one index of “order_no, line_no”. Having a separate index on line_no is probably unhelpful.

    Comment by gary — 04/12/2007 @ 9:08 am

  3. Hi, Andrew.

    In MS SQL Server, (Since SQL Server 2000) the query optimizer will usually aggregate separately indexed columns for you and take advantage of them as if they were composite indexes. So, at least in that RDBMS, the idea of _having_ to have composite keys is not so relevant.

    I am not sure about MYSQL, PostgreSQL and Oracle as I primarily work in MSSQL.

    Just thought you might want to know.

    Cheers.

    G. D. Milner
    Portland, OR

    Comment by G D Milner — 04/12/2007 @ 11:06 am

  4. Hi, Andrew.

    In MS SQL Server, (Since SQL Server 2000) the query optimizer will usually aggregate separately indexed columns for you and take advantage of them as if they were composite indexes. So, at least in that RDBMS, the idea of _having_ to have composite keys is not so relevant.

    I am not sure about MYSQL, PostgreSQL and Oracle as I primarily work in MSSQL.

    Just thought you might want to know.

    Cheers.

    G. D. Milner
    Portland, OR

    Comment by G D Milner — 04/12/2007 @ 11:06 am

  5. I think you have to be careful with the compound indexes. The two largest objects in my accounting package’s db are indexes on the main AR tables.

    Comment by Dave — 04/12/2007 @ 11:45 am

  6. I think you have to be careful with the compound indexes. The two largest objects in my accounting package’s db are indexes on the main AR tables.

    Comment by Dave — 04/12/2007 @ 11:45 am

  7. Gary:

    You don’t necessarily want to ditch your database if it doesn’t automatically create a primary index. For example. if you have a bulk load of millions of rows to perform then adding the data first and creating the index after the load will typically be quicker than creating the index as the data is added.

    Much depends on the indexing and the data access patterns. The classic order line actually represents a may-to-many relationship between order and product, with a primary key of (orderID, productID). If you build an index on the primary key that’s great for looking up the products for a particular order, but doesn’t help if you want to know about all the orders for a particular product.

    In that case, a separate index on the productID foreign key would be exactly what you’d need, but no additional index would be needed on orderID because the primary key index would serve (since orderID is the dominant column). Andrew’s advice is sound because it implies specific lookup patterns.

    So, as is so often the case in the database world, you can’t be too dogmatic about what’s right for a specific application without measuring performance under real application load. That’s why it’s a good thing to have a DBA who can do the performance measurement and tune the physical model long after the logical model designers have moved on to the next project. The line number isn’t that relevant in most cases, though you are correct that a separate index that allowed you to locate all the line ones wouldn’t be useful, and in fact not all order processing systems bother to number the lines.

    Comment by Steve Holden — 04/12/2007 @ 1:50 pm

  8. Gary:

    You don’t necessarily want to ditch your database if it doesn’t automatically create a primary index. For example. if you have a bulk load of millions of rows to perform then adding the data first and creating the index after the load will typically be quicker than creating the index as the data is added.

    Much depends on the indexing and the data access patterns. The classic order line actually represents a may-to-many relationship between order and product, with a primary key of (orderID, productID). If you build an index on the primary key that’s great for looking up the products for a particular order, but doesn’t help if you want to know about all the orders for a particular product.

    In that case, a separate index on the productID foreign key would be exactly what you’d need, but no additional index would be needed on orderID because the primary key index would serve (since orderID is the dominant column). Andrew’s advice is sound because it implies specific lookup patterns.

    So, as is so often the case in the database world, you can’t be too dogmatic about what’s right for a specific application without measuring performance under real application load. That’s why it’s a good thing to have a DBA who can do the performance measurement and tune the physical model long after the logical model designers have moved on to the next project. The line number isn’t that relevant in most cases, though you are correct that a separate index that allowed you to locate all the line ones wouldn’t be useful, and in fact not all order processing systems bother to number the lines.

    Comment by Steve Holden — 04/12/2007 @ 1:50 pm

RSS feed for comments on this post.

Sorry, the comment form is closed at this time.

Powered by WordPress