feat: add g.to_file() / graphistry.from_file() for Plottable serialization#923
feat: add g.to_file() / graphistry.from_file() for Plottable serialization#923
Conversation
…ation Add bundle-based serialization for saving and loading Plottable graphs to disk with integrity verification, backwards-compatible golden fixtures, and tripwire tests to catch field drift. - Directory (default) or zip archive format - Parquet for DataFrames, JSON manifest for bindings/settings/metadata - SHA256 integrity checks on all artifacts - Remote server state opt-in restoration (dropped by default with warning) - Pydantic 2 as optional dep: pip install graphistry[serialization] - Reuses existing serialize/deserialize_plottable_metadata for bindings - Tier 1 (edges/nodes/bindings), Tier 2 (embeddings/features/algo config), Tier 3 (model objects, deferred to later PR) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
can we move the tests to like tests/io/test_bundle.py ?
| 'spanner': ['google-cloud-spanner'], | ||
| 'kusto': ['azure-kusto-data', 'azure-identity'] | ||
| 'kusto': ['azure-kusto-data', 'azure-identity'], | ||
| 'serialization': ['pydantic>=2.0'], |
There was a problem hiding this comment.
should test add serialization / ci.yml add it?
There was a problem hiding this comment.
also unsure about mypy implications
| g: 'Plottable', | ||
| path: str, | ||
| format: Optional[str] = None, | ||
| ) -> Tuple['Plottable', BundleWriteReport]: |
There was a problem hiding this comment.
- should we also do
to_folder/from_folder? - should
from_/to_accept bytes, not just a path, for diskless flows like http? - for to_, allow selection of parquet vs json/csv, and extra kwargs passthrough? This can be good for bigger files, like picking snappy compression, and avoiding the indirection of .zip .
Some of this is scope creep so I can imagine landing without
There was a problem hiding this comment.
Serialization is great, thank you! I believe this largely stems for already-reviewed louie pydantics + some changes.
See comments
Change request:
-
test/folder organization to usetest/io/*
Discussion (non-blocking?):
- pydantic dep: agreed on making optional, see comments on setup.py, ci. Maybe eventually we can do serialization without it, but not a blocker imo: pydantic is neutral as ~no format implications. I believe our notebooks/streamlit have pydantic2? (cc @mj3cheun @aucahuasi )
- Adding
io/vs reusingmodels/is sensible, eg, arrow_uploader bits should probably migrate to that too - I had some non-blocking requests like format control (ex: picking parquet / zip compression settings, in-memory, and csv/json instead of only binary formats)
- Bigger discussion: Format standardization. : upload / remote_gfql / remote_python do plottable -> remote nexus/fep, so I have to wonder what the relationship between the payload formats is / should be . If these bundles work with the python client, I'd imagine they should work with js / REST / etc . Eg, mandating parquet/arrow can be friction for browser users. But maybe it's ok, esp for v1.
|
Thinking more: we have a Using the same format, but converting to reading/writing via pydantic, would be great, and step forward in autodocs here vs current manual: https://hub.graphistry.com/docs/api/2/rest/upload/#createdataset2 . The various repos have validators for them too. A lot can be derived based on the REST docs, this repo, and FEP's or nexus's dataset loaders |
|
@exrhizo while dev team figures out long-lived version, maybe we mark this experimental somehow just to get landed and unblock things? We don't have a standard convention right now to mark things this way, maybe we do _experimental on the method name? Then we can support it going forward a bit longer, and after stable version is official, Mark this deprecated, etc, and eventually phase out for the official method name |
|
I want to add that the |
|
Oh that is a little trippy... Interesting. |
Summary
g.to_file(path)andgraphistry.from_file(path)for saving/loading Plottable graphs to diskrestore_remote=Truepip install graphistry[serialization]v1_bundle) committed for backwards-compatibility testingNew files
graphistry/io/bundle.pygraphistry/io/plottable_bundle.pyto_file,from_filegraphistry/tests/test_io_bundle.pygraphistry/tests/test_io_plottable_bundle.pygraphistry/tests/fixtures/v1_bundle/Modified files
Plottable.py— add_node_target_encoderfield +to_file()protocol signaturePlotterBase.py— addto_file()method (lazy import)pygraphistry.py— add standalonefrom_file()function__init__.py— exportfrom_fileio/__init__.py— lazy import commentsetup.py— addserializationextraTest plan
g.to_file()→graphistry.from_file()roundtripcd docker && WITH_BUILD=0 WITH_LINT=0 WITH_TYPECHECK=0 ./test-cpu-local.sh graphistry/tests/test_io_bundle.py graphistry/tests/test_io_plottable_bundle.pycd docker && WITH_BUILD=0 ./test-cpu-local.sh🤖 Generated with Claude Code