Today we tried to add an attribute to our Sphinx index definition that uses an association with GROUP BY and HAVING. This was our first attempt:


  # association definition
  has_many :regularly_ordered_by, :class_name => "Order",
    :finder_sql => 'SELECT DISTINCT(orders.user_id)' +
      'FROM orders LEFT OUTER JOIN line_items ON (orders.id = line_items.order_id)' +
      'WHERE line_items.product_id = #{id} AND orders.user_id IS NOT NULL' +
      'GROUP BY orders.user_id HAVING COUNT(*) > 1'

  # index definition
  define_index do
    ...

    has regularly_ordered_by(:user_id), :as => :regular_for_users

    ...
  end

Unfortunately, TS generates a huge SQL query to generate the index and uses a GROUP BY clause by default, so it ignores any of the slightly advanced SQL features we needed. After a couple hours experimenting we came up with a TS index query, that worked. But this solution was rather ugly, because we couldn't rely on TS to generate the config for us anymore.

So because we're developers, we decided to fork and extend TS - you can get a copy at Tricycle's fork of TS. With the extension you can specify any SQL that will be appended after the generated joins. In the end, our solution looked something like this:


  # association definition
  # no association needed

  # index definition
  define_index do
    ...

    has "`temp_regularly_ordered_by`.user_ids", :as => :regular_for_users, :type => :multi

    from <<-SQL
LEFT OUTER JOIN (
  SELECT product_id, GROUP_CONCAT(DISTINCT user_id) user_ids FROM
    (
      SELECT `products`.`id` product_id, `orders`.user_id FROM `products`
        LEFT OUTER JOIN `line_items`
          ON (`products`.`id` = `line_items`.`product_id`)
        LEFT OUTER JOIN `orders`
          ON (`orders`.`id` = `line_items`.`order_id` AND `orders`.user_id IS NOT NULL)
      GROUP BY `products`.id, `orders`.user_id HAVING COUNT(`orders`.id) > 1
      ORDER BY NULL
    ) AS regular_products
  WHERE USER_ID IS NOT NULL
  GROUP BY product_id
) AS `temp_regularly_ordered_by` ON (`temp_regularly_ordered_by`.product_id = `products`.id)
      SQL

    ...
  end