From 655f86a438d57f736f8cf9d3773851217aebb480 Mon Sep 17 00:00:00 2001 From: "copilot-swe-agent[bot]" <198982749+Copilot@users.noreply.github.com> Date: Mon, 26 Jan 2026 01:30:44 +0000 Subject: [PATCH] Add documentation for blob streaming sync implementation Co-authored-by: rchincha <45800463+rchincha@users.noreply.github.com> --- docs/BLOB_STREAMING_SYNC.md | 147 ++++++++++++++++++++++++++++++++++++ 1 file changed, 147 insertions(+) create mode 100644 docs/BLOB_STREAMING_SYNC.md diff --git a/docs/BLOB_STREAMING_SYNC.md b/docs/BLOB_STREAMING_SYNC.md new file mode 100644 index 00000000..c436f23f --- /dev/null +++ b/docs/BLOB_STREAMING_SYNC.md @@ -0,0 +1,147 @@ +# Blob Streaming Sync Implementation + +## Overview + +This implementation adds on-demand blob synchronization to zot based on the discussion in [PR #3733](https://github.com/project-zot/zot/pull/3733). + +## Background + +The original problem: When blobs are downloaded on-demand from zot, zot first pulls the entire blob from upstream, commits it to storage, and then replies to the client. For large blobs, this can cause connection timeouts for clients. + +## Solution Implemented + +### Phase 1: On-Demand Blob Sync (Completed) + +The current implementation provides automatic on-demand blob synchronization: + +1. **Check Local Storage**: When a client requests a blob via `GET /v2/{name}/blobs/{digest}`, zot first checks local storage. + +2. **Trigger Sync on Miss**: If the blob is not found locally and sync is enabled, zot automatically triggers `SyncBlob()` to download the blob from upstream. + +3. **Download from Upstream**: Uses regclient to fetch the blob from the configured upstream registry. + +4. **Store Locally**: Blob is stored in local storage using `FullBlobUpload()`. + +5. **Serve to Client**: After successful sync, zot retries reading from local storage and serves the blob to the client. + +### Key Components + +#### 1. SyncBlob Method + +Added to the `Service` interface and implemented in `BaseService`: + +```go +func (service *BaseService) SyncBlob(ctx context.Context, repo string, digest godigest.Digest) error +``` + +This method: +- Checks content filtering rules +- Creates a remote reference for the blob +- Downloads the blob using regclient +- Stores it in the local image store + +#### 2. On-Demand Integration + +Modified `GetBlob()` handler in `pkg/api/routes.go`: +- When `ErrBlobNotFound` is encountered +- If sync is enabled (`isSyncOnDemandEnabled`) +- Calls `SyncOnDemand.SyncBlob()` to fetch from upstream +- Retries local blob retrieval after sync +- Serves the blob if sync was successful + +#### 3. BaseOnDemand Enhancement + +Extended `BaseOnDemand` with `SyncBlob()` method that: +- Deduplicates concurrent requests for the same blob +- Uses channels to coordinate multiple clients +- Supports background retries for transient failures + +### Design Decisions Based on PR #3733 Discussion + +From rchincha's feedback: + +> 1. Check if the image/blob already exists? +> - 1.1 If yes, then just serve it +> - 1.2 If no, then send the request to a BlobStreamer + +**Implemented**: Steps 1.1 and 1.2 are implemented. Blob existence check happens in the `GetBlob` handler, and sync is triggered when not found. + +> 2. BlobStreamer is a writer to a temp location and many readers (clients) - pay attention to range requests +> - 2.1 - clients can disconnect +> - 2.2 - upstream may timeout + +**Partially Implemented**: Basic BlobStreamer infrastructure is created but not yet activated. Currently uses direct upload. The framework is in place for future enhancement. + +> 3. Once the blob is fully downloaded and verified, then copy it to the actual repo + +**Implemented**: The current implementation downloads and stores the blob directly in the repository using `FullBlobUpload()`. Future enhancement can add temp storage and verification before moving to final location. + +## Future Enhancements + +### Phase 2: Concurrent Client Streaming (Planned) + +The `BlobStreamer` and `BlobStreamManager` components have been created to support: + +1. **Chunk-based Downloads**: Split blobs into configurable chunks +2. **Multi-client Serving**: Allow multiple clients to read while download is in progress +3. **Temp Storage**: Download to temporary location first +4. **Verification**: Verify digest before moving to final storage +5. **Range Request Support**: Support HTTP range requests during streaming + +### Files Created for Future Use + +- `pkg/extensions/sync/blob_streamer.go`: Manages streaming of individual blobs +- `pkg/extensions/sync/blob_stream_manager.go`: Coordinates multiple concurrent blob streams + +These components are not yet activated but provide the foundation for future streaming enhancements. + +## Configuration + +No new configuration is required. The feature automatically works when: +- Sync extension is enabled in zot config +- On-demand sync is configured for a registry + +Example configuration: +```yaml +extensions: + sync: + enable: true + registries: + - urls: + - https://registry.example.com + onDemand: true +``` + +## Benefits + +1. **Automatic Blob Caching**: Blobs are automatically fetched on-demand +2. **Reduced Client Timeouts**: Initial implementation reduces wait time by syncing only requested blobs +3. **Deduplication**: Multiple concurrent requests for the same blob are handled efficiently +4. **Future-Ready**: Infrastructure in place for concurrent client streaming + +## Testing + +The implementation: +- Builds successfully with `make binary BUILD_LABELS="sync"` +- Integrates with existing on-demand sync infrastructure +- Follows established patterns from manifest and referrer sync + +## Related Files + +### Modified Files +- `pkg/api/routes.go`: Added blob sync trigger in `GetBlob()` handler +- `pkg/api/controller.go`: Added `SyncBlob` to `SyncOnDemand` interface +- `pkg/extensions/sync/sync.go`: Added `SyncBlob` to `Service` interface +- `pkg/extensions/sync/on_demand.go`: Implemented `SyncBlob` orchestration +- `pkg/extensions/sync/on_demand_disabled.go`: Added stub for non-sync builds +- `pkg/extensions/sync/service.go`: Implemented `SyncBlob` in `BaseService` + +### New Files +- `pkg/extensions/sync/blob_streamer.go`: Streaming infrastructure (future use) +- `pkg/extensions/sync/blob_stream_manager.go`: Stream coordination (future use) + +## Compatibility + +- Maintains backward compatibility +- Works with existing sync configurations +- No breaking changes to APIs or storage format