by Kent Marten, Michael Johns, Menelaos Karavelas and Desmond Cheong
For working with geospatial data, see this post from Databricks announcing support for Spatial SQL: Introducing Spatial SQL in Databricks: 80+ Functions for High-Performance Geospatial Analytics
On the heels of the initial release of H3 support in Databricks Runtime (DBR), we are happy to share ground-breaking performance improvements with H3, support for four additional expressions, and availability in Databricks SQL. In this blog, you will learn about the new expressions, performance benchmarks from our vectorized columnar implementation, and multiple approaches for point-in-polygon spatial joins using H3.
When we implemented built-in H3 capabilities in Databricks [AWS | ADB | GCP], we committed to making it best-in-class. Ultimately, this comes down to useful APIs and performance. Our original goals aimed at improving the performance of H3 expressions in Photon by at least 20%. The results are far more exciting and impressive. In the table below, we have categorized each H3 expression by functional category and measured each function's performance against the performance of using the Java H3 library implementation (essentially, what you would get when importing the H3 library).
We strongly recommend using the BIGINT representation of H3 cell IDs. Comparing H3 cell IDs in H3-based joins using the BIGINT representation is more performant compared to using the STRING representation. We also strongly recommend using the H3 expression overloads that take BIGINTs as inputs. Moreover, for the expressions that would be typically used in H3-based joins, namely the traversal and predicate expressions, the absolute runtime performance of the BIGINT-based expressions is several times faster than the STRING-based ones.
| Functional Category | Average Performance Gain | Expressions | Performance Gain DOUBLE inputs for longlat expressions WKB, WKT, GeoJSON for point or polyfill | |
|---|---|---|---|---|
| Import | 1.4x | h3_longlatash3 | 1.3x | |
| h3_longlatash3string | 1.8x | |||
| h3_pointash3 | 1.7x | |||
| h3_pointash3string | 1.7x | |||
| h3_polyfillash3 | 1.2x | |||
| h3_polyfillash3string | 1.2x | |||
| h3_try_polyfillash3 | 1.2x | |||
| h3_try_polyfillash3string | 1.2x | |||