ACM SIGMOD
SIGMOD 2010 Programming Contest
Distributed Query Engine

Platinum Sponsors

NSF Microsoft

Gold Sponsors

Amazon Web services INRIA Saclay – Île-de-France

Silver Sponsors

Exalead Yahoo!

This Web site deals with the 2010 edition of the SIGMOD Programming contest. See the Web site for the SIGMOD 2011 contest.

A programming contest was organized in parallel with the ACM SIGMOD 2010 conference, following the success of the first annual SIGMOD programming contest organized the previous year. Student teams from degree-granting institutions were invited to compete to develop a distributed query engine over relational data. Submissions were judged on the overall performance of the system on a variety of workloads. A shortlist of finalists was invited to present their implementation at the SIGMOD conference in June 2010 in Indianapolis, USA. The winning team, selected during the conference, was awarded a prize of 5,000 USD and was invited to a research visit in Paris. The winning system, released in open source, will form a building block of a complete distributed database system which will be built over the years, throughout the programming contests.

Results

News

2010-12-01: Report on the Competition
A report on the competition, written by the organizers and the two leading teams, has been published in SIGMOD Record.
2010-06-21: Winning implementation source code
The source code of the winning implementation is available for download.
2010-06-09: Announcement of the winning team
Hyunjung Park (cardinality, Stanford University) wins the competition.
2010-06-03: Posters of finalists
The finalist teams have designed posters to present their implementations.
2010-05-30: End of the competition
The competition is now over. Results will be announced at the SIGMOD 2010 business meeting, on June 9th.
2010-04-15: Selection of the finalists
The list of finalist teams is now available!
2010-04-11: End of the first phase of the competition
The deadline for submitting implementations is now over. Some statistics about the teams who entered the competition is available.
2010-02-14: Evaluation cluster made available
The evaluation and submission cluster is now available.
2010-02-04: Small modification in the dates
The date where the evaluation cluster is made available, as well as the submission dates are delayed one week.
2009-12-13: Task details made available
The full task details are made available. The contest is now open!
2009-12-13: Mailing list
A mailing list has been created to publish technical information about the competition. Contestants are encouraged to subscribe to it. This complements the Atom feed that will provide more general news.
2009-11-19: Poster
A letter-sized poster is made available for advertising the contest.
2009-11-17: Prize announced
We can now announce the amount of the prize awarded to the winner of the contest: 5,000 USD. This comes in addition to an invited research visit in Paris.
2009-10-09: Initial description of the contest
The initial description of the contest is available on the SIGMOD 2010 programming contest website.

Contestants and other interested parties are invited to subscribe to the Atom feed of the SIGMOD 2010 programming contest, that serves general news about the competition.

Task Overview

The system to implement is a simple distributed query executor built on top of last year's main-memory index (an implementation of which is provided). Centralized query plans are supplied and have to be translated into distributed query plans, to be executed on each peer of a cluster of machines. An initial computation of statistics can be run over each peer in order to optimize the distributed query plan. The system must be able to efficiently execute the query over each peer, with the help of the in-memory index, gather the results from each peer, and perform any other necessary operation on a monitoring peer.

The system is tested on a collection of synthetic and real-world datasets, with appropriate query loads. The interface is planned so that the distributed query engine can be tested either on a single machine (local processes acting as peers), on an ad-hoc cluster of peers, or on an evaluation cluster made available to test the performance of the system in the conditions of the final evaluation.

To help contestants test their implementation, any team whose system passes a collection of unit tests can be provided with an Amazon Web Services account of a 100 USD value (up to 30 accounts are available, on a first-come, first-served basis; up to two accounts per team can be provided, depending on demand). This is made possible thanks to the support of Amazon.

Important Dates

Sunday, December 13, 2009
Interfacing code and example workload made available.
Saturday, February 13, 2010
Evaluation infrastructure made available.
Sunday, April 11, 2010
Submissions due.
Friday, April 16, 2010
Notification of a shortlist of finalists.
Sunday, May 30, 2010
Final submission by finalists due.
6–11 June, 2010
SIGMOD conference in Indianapolis, USA. Presentation of the works of finalists, and announcement of the winner.

All deadlines are 5:00pm UTC (10:00am PDT, 1:00pm EDT).

Task Details

Given a parsed SQL query, you have to return the right results as fast as possible. The data is stored on disk, the indexes are all in memory. The SQL queries always has the following form:

SELECT alias_name.field_name, ...
FROM table_name AS alias_name, ...
WHERE condition1 AND ... AND conditionN

A condition may be either:

The data is distributed on multiple nodes, and can be replicated. The distribution of data is horizontal: a given row of a table is never fragmented. The implementation of the indexes is provided and cannot be changed. Up to 50 queries are sent at the same time by 50 different threads, but only the total amount of time is measured. You do not have to take care of the partitioning, replication or creation of the indexes: these are done before the beginning of the benchmark of your client.

Before the actual start of the benchmarks, you are given a predefined number of seconds to run some preprocessing on the data. You are also given a set of queries which is representative of the benchmark to help you run the preprocessing.

There are 7 methods to implement. There are fully described in the client.h file. The following diagrams show the way they are called.

In the initialization phase, the startPretreatmentMaster function is called on the master node, while the StartSlave method is called on slave nodes. In the connection phase, the createConnection function is called on the master node. In the query phase, the performQuery and fetchRow functions are successively called on the master node. In the closing phase, the closeConnection then closeProcess functions are called on the master node.

For more details, see the README file inside the ZIP archive.

ZIP archive with all instructions, example workload, and example implementation

Regulations

The contest is open to undergraduate and graduate students registered at degree-granting institutions over the world (contestants must be registered students at the beginning of the competition, Sunday, December 13, 2009). A team can be formed of one or several students from one or several institutions (in case of a team consisting of several students, only a fixed number of them receive travel grants to attend the conference). Several teams from the same institution can compete independently. Students can work under the guidance of a professor or researcher, but the implementation must be written exclusively by the students. Contestants must supply the source code for their entries, and agree to license their code under the BSD or MIT open-source license should their system win the contest. Using open-source third-party libraries is allowed, as long as the core of the system is novel and the licenses of the libraries are compatible with the BSD or MIT licenses. Students supervised by the organizers cannot enter the contest.

Organization

Organizers
Advisory board

For any information, please contact Pierre Senellart at pierre@senellart.com. The actual dataset and queries used during the competition are available upon request.