HDF container (reference)
HDF container reference
An HdfContainer is the payload kind for HDF5 scientific data,
backed by the HSDS sidecar. Phase
1 of the rollout (this slice — backlog ID A5a) ships
create / read / delete of containers; the data-path mirroring of
HSDS’s dataset / value / attribute surface, the per-DataObject
HdfReference, the byte-identical download fallback, and the
shared-Keycloak token relay all arrive in later phases (A5b – A5e —
see aidocs/35
for the rollout plan).
Opt-in feature
The HDF / HSDS surface is off by default. With the toggle off,
every /v2/hdf-containers/... endpoint returns 404 Not Found and
no HSDS HTTP client is ever instantiated. Operators flip the toggle
on by:
-
Bringing up the
hdfcompose profile so theshepard-hsdssidecar is running:docker compose --profile hdf up -d shepard-hsds -
Setting the four config keys on the shepard backend:
shepard.hdf.enabled=true shepard.hdf.hsds.endpoint=http://shepard-hsds:5101 shepard.hdf.hsds.username=admin # match HSDS_USERNAME from compose shepard.hdf.hsds.password=admin # match HSDS_PASSWORD from compose -
Restarting the backend. Startup fails fast if the toggle is on but credentials are blank — Phase 1 deliberately refuses to run in “ambient auth” mode.
The HTTP Basic credential pair is the only auth Phase 1 supports. Per-user OIDC token relay arrives in A5e.
Read-on note for admins: see
docs/admin.md§”HDF5 (HSDS)” for the host-side details (volume mount, capacity-planning rule of thumb, where to put credentials in production).
Shape
HdfContainer ────► HSDS domain ( /shepard/<container-appId>/ )
│
│ (permissions)
▼
Owner / Readers / Writers / Managers (standard shepard ACL)
| Field | Type | Notes |
|---|---|---|
appId |
string (UUID v7) | Server-minted. The HSDS domain path is /shepard/{appId}/. |
name |
string | Required on create. |
description |
string? | Optional free-form description. |
hsdsDomain |
string | Server-minted. Read-only on the wire. Carved out so admins can grep the path out of HSDS-side audit logs. |
attributes |
Map<string,string> |
Free-form key/value metadata. Same delimiter idiom as the rest of the codebase. |
permissions |
Permissions | Standard shepard ACL (owner / readers / writers / managers). In Phase 1 these are not yet flowed to HSDS-side ACLs — A5b lights up the bridge. |
REST surface
All endpoints live on the /v2/ shelf (this fork’s development
surface — see API version policy).
Upstream shepard 5.2.0 has no HDF support; nothing lands on
/shepard/api/....
| Verb | Path | What it does | Status codes |
|---|---|---|---|
GET |
/v2/hdf-containers/{appId} |
Read one container. Permission-checked: caller needs READ on the container. | 200 / 401 / 403 / 404 |
POST |
/v2/hdf-containers |
Create a new container. Provisions the HSDS domain via the sidecar; rolls back the HSDS side if the Neo4j commit fails. | 201 / 400 / 401 / 503 |
DELETE |
/v2/hdf-containers/{appId} |
Soft-delete the container + drop the HSDS domain. Owner-only. | 204 / 401 / 403 / 404 / 503 |
POST /v2/hdf-containers request body:
{
"name": "primary",
"description": "Hot-fire run 2026-05-12",
"attributes": {
"project": "rocket-x",
"instrument": "thrust-bench"
}
}
Response (201):
{
"appId": "019e1cee-654f-7554-8543-0ba62ae14113",
"name": "primary",
"description": "Hot-fire run 2026-05-12",
"hsdsDomain": "/shepard/019e1cee-654f-7554-8543-0ba62ae14113/",
"attributes": { "project": "rocket-x", "instrument": "thrust-bench" }
}
Permission model
shepard is the source of truth for HDF container ACLs (aidocs/63
ADR-0020). When you change permissions on a :Collection that contains
:HdfContainer rows, the PermissionsChangedEvent CDI seam fires the
HdfPermissionBridge observer, which rewrites the matching HSDS
domain’s ACL via the sidecar’s REST API. The mapping is:
| shepard role | HSDS POSIX bits |
|---|---|
| Reader | read |
| Writer | read,update,create,delete |
| Manager | read,update,create,delete,update_acl |
| (no role) | (entry removed; container becomes invisible to that user) |
Sync is best-effort. Failures (HSDS unreachable, auth misconfig) are logged + queued in a size-capped in-memory retry list, then attempted again on the next permission write. A failed bridge call never blocks the shepard write — the casual UX always wins.
Direct HSDS-side mutation gets clobbered. Tools that edit HSDS
ACLs out of band (h5pyd admin paths, hsadmin CLI) will see their
changes overwritten on the next shepard permission change for the
affected container. If you’ve been editing HSDS ACLs directly and
want shepard’s graph to take over cleanly, run the drift-recovery
admin endpoint once:
curl -X POST https://shepard.example.dlr.de/v2/admin/hdf/rebuild-acls \
-H "X-API-KEY: <instance-admin-api-key>"
Response shape:
{
"containersProcessed": 42,
"containersSynced": 41,
"errors": [
{"appId": "01HF…", "reason": "hsds.connect-timeout"}
]
}
The endpoint is idempotent — re-running after a transient HSDS outage finishes the job.
What’s deferred
The full E7 vision is rolling out across A5a – A5e:
| Phase | Status | What it adds |
|---|---|---|
| A5a (this slice) | shipped | HSDS sidecar + HdfContainer create/read/delete + V25 migration. HTTP Basic auth (admin-managed). |
| A5b | queued | Permission bridge — shepard permission changes flow to HSDS ACLs via a PermissionsService post-commit hook. |
| A5c | queued | HdfReference per-DataObject anchor at a specific dataset path; annotation hookup via E6 (AnnotatableHdfDataset). |
| A5d | queued | Download-original-file fallback — GET /v2/hdf-containers/{appId}/file returns the byte-identical HDF5 via HSDS bulk-export. Unblocks the offline h5py.File(local) path. |
| A5e | queued | Auth bridge — shepard API keys mint short-lived JWTs signed by a shared Keycloak realm. Three-line clients/python helper that returns an h5pyd.File. |
See also
aidocs/35-hdf5-hsds-implementation-design.md— implementation design for the entire A5 series.docs/admin.md§”HDF5 (HSDS)” — operator-side install steps.- HSDS — upstream HDF Group project that powers the sidecar.