[SEDONA-756] feat: raster Python serde and with_bands() support#2956
Conversation
ed31473 to
782a8e1
Compare
782a8e1 to
288bfe7
Compare
288bfe7 to
c52c18a
Compare
|
Hi @prantogg — heads up, I rebased this branch onto current master to resolve the conflict in Resolution choices for the UDF section:
If you have local commits on top of |
| """Serialize an InDbSedonaRaster to the Sedona binary format. | ||
|
|
||
| The output bytes are compatible with the JVM's Serde.deserialize(). | ||
| Only InDbSedonaRaster is supported. OutDb and LazyLoad rasters |
There was a problem hiding this comment.
Please remove references to outdb since they don't exist here
There was a problem hiding this comment.
Done — the rebase you did already removed these references. Confirmed there are no remaining out-db mentions in the current branch.
8536d98 to
e6c9cd0
Compare
Add Python-side serialize() for InDbSedonaRaster, enabling Python UDFs to return raster objects directly instead of the lossy .tolist() + RS_MakeRaster workaround. Rasters now round-trip as contiguous bytes preserving native dtypes and all metadata (CRS, nodata, affine, etc.). Add with_bands() to InDbSedonaRaster for replacing pixel data (NumPy array) while preserving spatial metadata. Band count and dtype may differ from the source raster. Add reconcileColorModel() to DeepCopiedRenderedImage (JVM) to fix colorModel/sampleModel mismatches at deserialization when Python UDFs change band count or dtype. Cherry-picked from wherobots/wherobots-compute@e08bde1da08 with vectorized UDF wiring excluded.
e6c9cd0 to
84b169d
Compare
Did you read the Contributor Guide?
Is this PR related to a ticket?
[SEDONA-XXX] my subject.What changes were proposed in this PR?
Returning raster data from Python UDFs currently requires
.tolist()+RS_MakeRaster, which forces Float64 promotion, creates 262K Python float objects per 512×512 tile, and loses all raster metadata (CRS, nodata, affine transform).This PR adds:
raster_serde.serialize()— WritesInDbSedonaRasterto Sedona's binary format, byte-compatible with JVMSerde.deserialize(). Uses cache-and-replay for opaque Kryo blobs (categories, properties, colorModel).InDbSedonaRaster.with_bands()— Creates a new raster with replaced pixel data (NumPy array) but preserved spatial metadata. Band count and dtype may differ from the source.RasterType.serialize()— Delegates toraster_serde.serialize()instead of raisingNotImplementedError.DeepCopiedRenderedImage.reconcileColorModel()(JVM) — Fixes colorModel/sampleModel mismatches at deserialization time when Python UDFs change band count or dtype.KryoUtil.skipUTF8String()andGridSampleDimensionSerializer.skip()— Utility methods for navigating Kryo streams without full deserialization.Benchmarked on Apple M2 Pro, 4-band rasters, median of 50 iterations:
.tolist()(ms)serialize()(ms)How was this patch tested?
with_bands()tests (band count changes, dtype changes, metadata survival)Did this PR include necessary documentation updates?
docs/tutorial/raster.mdto show the new raster-to-raster UDF pattern usingwith_bands().