Data | Benchmark of database-like systems
[1] is a benchmark of database-like systems (e.g. dplyr
in R, pandas
in Python) across a number of different platforms. It is rerun regularly and is useful as an up-to-date resource for comparing the speed and memory usage of these systems with each other.
However, caveats to note before looking at the results there:
the benchmarks were created as an extension of an original suite created by the main author of
data.table
, Matt Dowle, and are maintained by a major contributor to the same package, Jan Gorecki; putting aside any ulterior motives, this still means that the benchmarks are more likely to have optimized code fordata.table
than for other systemsit's also important to look at what exactly the benchmark code is doing and whether it is relevant to the tasks that your own code needs to perform
Resource:
[1] benchmark results: https://h2oai.github.io/db-benchmark/
[2] benchmark code repo: https://github.com/h2oai/db-benchmark