Skip to content

feat: add g.to_file() / graphistry.from_file() for Plottable serialization#923

Open
exrhizo wants to merge 1 commit intomasterfrom
feat/serialization-bundle
Open

feat: add g.to_file() / graphistry.from_file() for Plottable serialization#923
exrhizo wants to merge 1 commit intomasterfrom
feat/serialization-bundle

Conversation

@exrhizo
Copy link
Contributor

@exrhizo exrhizo commented Feb 21, 2026

Summary

  • Adds g.to_file(path) and graphistry.from_file(path) for saving/loading Plottable graphs to disk
  • Uses parquet for DataFrames, JSON manifest for bindings/settings/metadata, with SHA256 integrity verification
  • Supports directory (default, debuggable) and zip archive formats
  • Remote server state dropped by default with warning; opt-in via restore_remote=True
  • Pydantic 2 as optional dep: pip install graphistry[serialization]
  • Tripwire tests catch field drift when new attrs are added to Plottable/PlotterBase
  • Golden fixture (v1_bundle) committed for backwards-compatibility testing

New files

File Purpose
graphistry/io/bundle.py Generic bundle engine (SHA256, parquet I/O, manifest, zip)
graphistry/io/plottable_bundle.py Plottable adapter with field groups, to_file, from_file
graphistry/tests/test_io_bundle.py 17 tests for bundle engine
graphistry/tests/test_io_plottable_bundle.py 17 tests for plottable roundtrip, tripwire, golden fixture
graphistry/tests/fixtures/v1_bundle/ Golden fixture (~10KB)

Modified files

  • Plottable.py — add _node_target_encoder field + to_file() protocol signature
  • PlotterBase.py — add to_file() method (lazy import)
  • pygraphistry.py — add standalone from_file() function
  • __init__.py — export from_file
  • io/__init__.py — lazy import comment
  • setup.py — add serialization extra

Test plan

  • 34 unit tests pass locally (17 bundle + 17 plottable)
  • Existing metadata tests pass (no regressions)
  • Smoke test: g.to_file()graphistry.from_file() roundtrip
  • cd docker && WITH_BUILD=0 WITH_LINT=0 WITH_TYPECHECK=0 ./test-cpu-local.sh graphistry/tests/test_io_bundle.py graphistry/tests/test_io_plottable_bundle.py
  • Full CI: cd docker && WITH_BUILD=0 ./test-cpu-local.sh

🤖 Generated with Claude Code

…ation

Add bundle-based serialization for saving and loading Plottable graphs
to disk with integrity verification, backwards-compatible golden fixtures,
and tripwire tests to catch field drift.

- Directory (default) or zip archive format
- Parquet for DataFrames, JSON manifest for bindings/settings/metadata
- SHA256 integrity checks on all artifacts
- Remote server state opt-in restoration (dropped by default with warning)
- Pydantic 2 as optional dep: pip install graphistry[serialization]
- Reuses existing serialize/deserialize_plottable_metadata for bindings
- Tier 1 (edges/nodes/bindings), Tier 2 (embeddings/features/algo config),
  Tier 3 (model objects, deferred to later PR)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we move the tests to like tests/io/test_bundle.py ?

'spanner': ['google-cloud-spanner'],
'kusto': ['azure-kusto-data', 'azure-identity']
'kusto': ['azure-kusto-data', 'azure-identity'],
'serialization': ['pydantic>=2.0'],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should test add serialization / ci.yml add it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also unsure about mypy implications

g: 'Plottable',
path: str,
format: Optional[str] = None,
) -> Tuple['Plottable', BundleWriteReport]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. should we also do to_folder / from_folder ?
  2. should from_ / to_ accept bytes, not just a path, for diskless flows like http?
  3. for to_, allow selection of parquet vs json/csv, and extra kwargs passthrough? This can be good for bigger files, like picking snappy compression, and avoiding the indirection of .zip .

Some of this is scope creep so I can imagine landing without

Copy link
Contributor

@lmeyerov lmeyerov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Serialization is great, thank you! I believe this largely stems for already-reviewed louie pydantics + some changes.

See comments

Change request:

  • test/ folder organization to use test/io/*

Discussion (non-blocking?):

  • pydantic dep: agreed on making optional, see comments on setup.py, ci. Maybe eventually we can do serialization without it, but not a blocker imo: pydantic is neutral as ~no format implications. I believe our notebooks/streamlit have pydantic2? (cc @mj3cheun @aucahuasi )
  • Adding io/ vs reusing models/ is sensible, eg, arrow_uploader bits should probably migrate to that too
  • I had some non-blocking requests like format control (ex: picking parquet / zip compression settings, in-memory, and csv/json instead of only binary formats)
  • Bigger discussion: Format standardization. : upload / remote_gfql / remote_python do plottable -> remote nexus/fep, so I have to wonder what the relationship between the payload formats is / should be . If these bundles work with the python client, I'd imagine they should work with js / REST / etc . Eg, mandating parquet/arrow can be friction for browser users. But maybe it's ok, esp for v1.

cc @aucahuasi @mj3cheun

@lmeyerov
Copy link
Contributor

lmeyerov commented Feb 21, 2026

Thinking more: we have a datasets.json format that we already upload... can/should that be what we're using?

Using the same format, but converting to reading/writing via pydantic, would be great, and step forward in autodocs here vs current manual: https://hub.graphistry.com/docs/api/2/rest/upload/#createdataset2 . The various repos have validators for them too.

A lot can be derived based on the REST docs, this repo, and FEP's or nexus's dataset loaders

@lmeyerov
Copy link
Contributor

@exrhizo while dev team figures out long-lived version, maybe we mark this experimental somehow just to get landed and unblock things?

We don't have a standard convention right now to mark things this way, maybe we do _experimental on the method name? Then we can support it going forward a bit longer, and after stable version is official, Mark this deprecated, etc, and eventually phase out for the official method name

@exrhizo
Copy link
Contributor Author

exrhizo commented Feb 28, 2026

I want to add that the remote data that is serialized includes what graphistry server & dataset id - so that a client can find an upload

@lmeyerov
Copy link
Contributor

Oh that is a little trippy... Interesting.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants