I could see it as taking a typical day's worth of transactions, creating new test db from this log, and then letting a tool try out different indexes based on some common heuristics and re-run the queries and benchmark. Because often it requires experimentation to get the right indexes.