Potentially raising the bar on SQL scalability, Facebook has released open source SQL query engine called Presto that was built to work with petabyte-sized data warehouses.
Currently, more than 1,000 Facebook employees use Presto daily to run 30,000 interactive queries, involving over a petabyte of processing. The company has scaled the software to run on a 1000-node cluster
Unlike Hive, Presto does not use MapReduce, which involves writing results back to disk. Instead, Presto compiles parts of the query on the fly and does all of its processing in memory. As a result, Facebook claims Presto is 10 times better in terms of CPU efficiency and latency than the Hive and MapReduce combo.
Now, Facebook wants other data-driven organizations to use, and it hopes, refine Presto. The company has posted the software’s source code at https://github.com/Facebook/presto. The software is already being tested by a number of other large Internet services, namely AirBnB and Dropbox.