* feat(metrics): add Prometheus GC metrics
Track garbage collection activity with three new metrics:
- zot_gc_runs_total (counter, label: error) — GC run count
- zot_gc_duration_seconds (summary) — GC run duration
- zot_gc_deleted_total (counter, label: type) — items deleted
by type: blob, manifest, upload
MetricServer is added to GarbageCollect and wired through
all callers (controller, verify-feature retention, tests).
Signed-off-by: Benoit Tigeot <benoit.tigeot@lifen.fr>
* fix(test): add missing metrics var in GCS GC tests
TestGCSGarbageCollectImageIndex and
TestGCSGarbageCollectChainedImageIndexes were missing the
metrics variable required by NewGarbageCollect after the
MetricServer parameter was added.
Signed-off-by: Benoit Tigeot <benoit.tigeot@lifen.fr>
* fix(test): add defer metrics.Stop() in GC tests
Prevent goroutine/port leaks by stopping MetricsServer in
storage_test.go (3 functions) and gcs_test.go (also add
missing metrics declaration in TestGCSGarbageCollectImageManifest).
Signed-off-by: Benoit Tigeot <benoit.tigeot@lifen.fr>
* fix(test): cover `CleanRepo` error path
Add test that exercises the error branch in
`CleanRepo` where `cleanRepo` fails, covering
the metrics calls and log lines flagged by Codecov.
Signed-off-by: Benoit Tigeot <benoit.tigeot@lifen.fr>
* test: Cover GC error paths for codecov
Add three tests in gc_internal_test.go to cover previously
untested error branches in `removeBlobUploads` and
`removeUnreferencedBlobs`: `ListBlobUploads` failure,
`addIndexBlobsToReferences` failure, and `PathNotFoundError`
from `GetAllBlobs`.
Signed-off-by: Benoit Tigeot <benoit.tigeot@lifen.fr>
* test(gc): cover remaining error paths
Cover `StatBlobUpload`, `digest.Validate()`,
`isBlobOlderThan`, and `CleanupRepo` error branches
in `removeBlobUploads` and `removeUnreferencedBlobs`.
`removeUnreferencedBlobs` now at 100% coverage,
`removeBlobUploads` from 78.3% to 91.3%.
Signed-off-by: Benoit Tigeot <benoit.tigeot@lifen.fr>
* test: cover `sanityChecks` label name mismatch
Try to avoid -0.09% coverage regression on `minimal.go`
by exercising the uncovered branch in `sanityChecks`
where label names have correct count but wrong values.
Signed-off-by: Benoit Tigeot <benoit.tigeot@lifen.fr>
* test(gc): exercise real GC path in metrics test
TestGCMetrics was calling metric helpers directly instead of
running actual garbage collection, so it couldn't catch wiring
regressions where `CleanRepo` stops recording metrics.
Now uploads an orphaned blob and runs `gc.CleanRepo` end-to-end,
verifying metrics appear on the Prometheus endpoint.
Suggestion from Copilot: https://github.com/project-zot/zot/pull/3863#discussion_r3129324719
Signed-off-by: Benoit Tigeot <benoit.tigeot@lifen.fr>
* fix(gc): skip deletion metrics when DryRun is enabled
https://github.com/project-zot/zot/pull/3863#discussion_r3129324684
Signed-off-by: Benoit Tigeot <benoit.tigeot@lifen.fr>
* fix(test): stop leaked MetricsServer goroutines in GCS tests
https://github.com/project-zot/zot/pull/3863#discussion_r3129324657
Signed-off-by: Benoit Tigeot <benoit.tigeot@lifen.fr>
* refactor(test): drop unnecessary zlog import alias
Signed-off-by: Benoit Tigeot <benoit.tigeot@lifen.fr>
* fix(monitoring): expose metric types outside build tag
`MetricsCopy` and related types were only visible under `\!metrics`,
causing a typecheck failure when golangci-lint runs with `-tags metrics`.
Moving the type definitions to `common.go` makes them unconditionally available.
Signed-off-by: Benoit Tigeot <benoit.tigeot@lifen.fr>
* fix(monitoring): remove extra blank line for gci
Signed-off-by: Benoit Tigeot <benoit.tigeot@lifen.fr>
* test(gc): cover both dry-run and real deletion metrics
And fix issue with build tag with metrics
Signed-off-by: Benoit Tigeot <benoit.tigeot@lifen.fr>
* Satisfy testpackage linter for gc metrics test
The `testpackage` linter allows `package gc` only in files named
`*_internal_test.go`; rename to follow that convention.
Signed-off-by: Benoit Tigeot <benoit.tigeot@lifen.fr>
---------
Signed-off-by: Benoit Tigeot <benoit.tigeot@lifen.fr>
* fix(storage): resolve double-prefixing issue for GCS rootdirectory
Preserve double-prefixing for S3 to maintain backward compatibility with existing data. For GCS, always use "/" as rootDir to avoid double-prefixing, as GCS rootdirectory usage is a newer feature without legacy data.
Signed-off-by: Sebastian Thees <thees@users.noreply.github.com>
* fix(gcs): handle io.EOF correctly in Walk method
Ensure io.EOF is returned unwrapped to allow proper error handling with errors.Is() upstream.
Signed-off-by: Sebastian Thees <thees@users.noreply.github.com>
* fix(storage): set sensible default ("/zot") for GCS when storageDriver.rootdirectory is unset or empty or "/"
Signed-off-by: Sebastian Thees <thees@users.noreply.github.com>
* fix(imagestore): avoid warning logs for expected cache miss scenarios
Refine logging to use debug level for expected cache misses, preventing unnecessary warnings.
Signed-off-by: Sebastian Thees <thees@users.noreply.github.com>
---------
Signed-off-by: Sebastian Thees <thees@users.noreply.github.com>
feat(storage): add a GCS driver
test(storage): add unit tests for GCS driver
test(storage): add missing unit tests for GCS driver & resolve lint issues
fix: configuration validation for GCS Storage
test(storage): resolve panic by test due to setupGCS ignoring returned error
test(storage): add dummy gcs credentials
test: add darwin support for macos to run tests
ci: update workflows to pin gcs emulator version
lint: resolve long line lengths & formatting issues
test: move error for gcs mock earlier with an error
test: stop test using local google credentials and use mock instead
test: add missing dummy creds
test(storage): use storage-testbench for GCS, isolate GCS tests, fix driver Delete
- Switch GCS emulator from fake-gcs-server to storage-testbench in CI.
Run the GCS emulator only in the privileged-test job; remove it from
minimal and extended test jobs.
- Consolidate GCS tests under pkg/storage/gcs (needprivileges,linux).
Add TestMain with HTTPS proxy and /etc/hosts so tests talk to
storage-testbench; move GCS-specific cases from storage_test.go and
scrub_test.go into gcs_test.go. Run GCS tests via a second privileged-test
invocation and collect coverage in coverage-needprivileges-gcs.txt.
- Make GCS driver Delete idempotent and normalize errors. Treat
PathNotFoundError from Delete as success so that deleting an already-gone
path (e.g. after GC under eventual consistency) does not fail. Add
formatErr to map 404/not found to PathNotFoundError and use it for all
driver methods so callers get consistent storage driver errors.
- Drop GCS branches and helpers from storage_test.go and scrub_test.go so
non-privileged tests only use local/S3; GCS is tested only in
pkg/storage/gcs with storage-testbench.
- Set GCSMOCK_ENDPOINT without /storage/v1/, as the rest of the URL is set in tests.
- Show errors in case of failure to create bucket.
- Consolidate StorageDriverMock structs inside the pkg/test/mocks package.
Signed-off-by: Andrei Aaron <andreifdaaron@gmail.com>
Co-authored-by: Steven Marks <steve.marks@qomodo.io>
Replace MakeTempFile usage with MakeTempFilePath and MakeTempFileWithContent
helpers that automatically handle file lifecycle. This prevents resource
leaks by ensuring temporary files are properly closed.
Shoudld also make the tests easier to read.
Signed-off-by: Andrei Aaron <andreifdaaron@gmail.com>
Require blob files to follow standard OCI image layout:
rootDir/repo/blobs/algorithm/digest
- Validate grandparent directory is ImageBlobsDir
- Validate parent directory is valid digest algorithm
- Update tests to use standard OCI structure
- Add blobPath() helper to reduce duplication and fix linting
This should reduce the number of uneeded digest computations
if other non-oci specific files are present in the layout.
Fix also a race condition when picking ports in monitoring tests.
Signed-off-by: Andrei Aaron <andreifdaaron@gmail.com>
GC and scrub should not stop if a manifest or index is missing from storage.
Other similar changes are also included.
WRT metadb, the missing manifests cannot be added, and the results returned from metadb
do not include the descriptors for these manifests.
Signed-off-by: Andrei Aaron <andreifdaaron@gmail.com>
See: https://github.com/project-zot/zot/issues/2506
Note we are not loosing anything functionality-wise by making this change.
Initially we considered the tags are in the annotations present in the referrers
but the only annotations we set on referrers are the ones inside the manifests themselves,
not the ones in the manifest descriptors, so the tags were not presetn anyway.
Signed-off-by: Andrei Aaron <andreifdaaron@gmail.com>
feat: add verify-feature retention subcommand with comprehensive testing and validation
Add a `verify-feature retention` subcommand that allows users to preview and
validate retention policy changes without running the actual Zot server.
The command runs GC and retention tasks in dry-run mode for immediate feedback.
- Run verify-feature retention standalone without starting the server
- Preview retention policy decisions in dry-run mode
- Configurable GC interval override via command-line flag
- Optional timeout for task completion
- Configurable log output (stdout or file)
Basic usage:
```bash
zot verify-feature retention <config-file>
```
With log file output:
```bash
zot verify-feature retention -l /var/log/zot-retention-check.log <config-file>
```
With GC interval override (runs GC tasks every 30 seconds):
```bash
zot verify-feature retention -i 30s <config-file>
```
With timeout (wait up to 5 minutes for tasks to complete):
```bash
zot verify-feature retention -t 5m <config-file>
```
Combined flags:
```bash
zot verify-feature retention -l /var/log/zot-retention-check.log -i 1m -t 10m <config-file>
```
The command supports overriding GC settings from the config:
- `-i, --gc-interval`: Override the GC interval setting (applies to all storage paths including subpaths)
- Refactored `RunGCTasks` from `controller.go` to be reusable
- Added `checkServerRunning` validation to prevent conflicts
- Implemented signal handling for graceful shutdown
- Added configuration sanitization and logging
- Set GCMaxSchedulerDelay programmatically (not user-configurable)
Added tests for coverage on main function:
- Negative test cases (no args, bad config, GC disabled, server running)
- Both BoltDB and Redis
- Retention enabled scenarios with complex image setups
- Retention disabled scenarios
- Delete referrers functionality
- Subpaths configuration
- GC interval override validation
Run the verify-feature retention tests:
```bash
go test -v ./pkg/cli/server -run TestRetentionCheck
```
Signed-off-by: Andrei Aaron <andreifdaaron@gmail.com>
* fix: make config read/write thread safe and fix some other similar issues
1. The config config has a lock, and safe methods to update and read the attributes
2. The config has methods to retrieve copies of specific attributes, such as the extyensions config, the auth config, and the authz config.
These are needed, as the config object may mutate in the middle of an auth/authz requests, and we avoid partial configuration being applied for that request.
3. Fix an issue with the monitoring server not stopping when the controller is shut down.
4. Fix an issue with the HTPasswdWatcher not stopping when the background tasks are supposed to finish.
5. Fix some tests using hardcoded ports.
Moved some of the methods which were on the main config to the auth, access control and extension configs
Signed-off-by: Andrei Aaron <andreifdaaron@gmail.com>
* fix: migrate to Go module v2 for proper semantic versioning
This change updates the module path from 'zotregistry.dev/zot' to
'zotregistry.dev/zot/v2' to comply with Go's semantic versioning rules.
According to Go's module versioning requirements, major version v2+
must include the major version in the module path. The current
module path 'zotregistry.dev/zot' only supports v0.x.x and v1.x.x
versions, making existing v2.x.x tags (like v2.1.8) unusable.
Changes:
- Updated go.mod module path to zotregistry.dev/zot/v2
- Updated all internal import paths across 280+ Go source files
- Updated configuration files (golangcilint.yaml, gqlgen.yml)
- Updated README.md Go reference badge
This fix enables proper use of existing v2.x.x Git tags and allows
external packages to import zot v2+ versions without compatibility
errors.
Resolves: Go module import compatibility for v2+ versions
Fixes: #3071
Signed-off-by: Luca Muscariello <muscariello@ieee.org>
* fix: regenerate GraphQL files with updated v2 import paths
The gqlgen tool needs to regenerate the GraphQL schema files after
the module path change to use the new v2 imports.
Signed-off-by: Luca Muscariello <muscariello@ieee.org>
---------
Signed-off-by: Luca Muscariello <muscariello@ieee.org>
Most users don't make the difference between retention deleting untagged manifests vs GC deleting other blobs.
This causes confusion since the GC delay and the retention delay (used for untagged manifests and orphan referrers) have different defaults, and are set separately in the zot configuration.
Most users don't configrue retention policies, and they still expect untagged manifests to be deleted at GC time.
With this change, if retention delay is not specified in the config file, the value used is the GC delay.
If GC delay is also unspecified in the config file, the default GC delay is used for both.
Signed-off-by: Andrei Aaron <andreifdaaron@gmail.com>
* fix: migrate from github.com/rs/zerolog to golang-native log/slog
We have been using zerolog for a really long time.
golang now has structured logging using slog.
Best to move to this in interests of long-term support.
This is a tech debt item.
Signed-off-by: Ramkumar Chinchani <rchincha.dev@gmail.com>
* fix: a few changes on top
Signed-off-by: Ramkumar Chinchani <rchincha.dev@gmail.com>
* fix: address comments
Signed-off-by: Ramkumar Chinchani <rchincha.dev@gmail.com>
---------
Signed-off-by: Ramkumar Chinchani <rchincha.dev@gmail.com>
* chore: increase/stabilize coverage for the local storage driver
Signed-off-by: Andrei Aaron <andreifdaaron@gmail.com>
* chore: add/stabilize coverage for soring ImageSummary objects
Signed-off-by: Andrei Aaron <andreifdaaron@gmail.com>
* chore: stabilize coverage in sync tests
Signed-off-by: Andrei Aaron <andreifdaaron@gmail.com>
---------
Signed-off-by: Andrei Aaron <andreifdaaron@gmail.com>
- fixes#3347: removeUntaggedManifests() did not consider compatible manifest types
- add AsDockerImage() to Image and MultiarchImage for testing
- extend TestGarbageCollectAndRetentionMetaDB to test docker image and multiarch image
Signed-off-by: Stephan Merker <stephan.merker@sap.com>
Using just the last repository is not enough as in the case when it is deleted
(either by GC or some other way), GetNextRepository returns empty string
causing the generator to be marked completed without any errors.
An alternative would have been to start over from the first repository,
but this can take hours if multiple repositories need to be deleted,
not to mention the processing power and I/O and S3 load this could take.
Signed-off-by: Andrei Aaron <aaaron@luxoft.com>
* chore: bump github.com/olekukonko/tablewriter from 0.0.5 to 1.0.7
Signed-off-by: Andrei Aaron <aaaron@luxoft.com>
* fix: zli failed to connect to https server using test certificates
Signed-off-by: Andrei Aaron <aaaron@luxoft.com>
---------
Signed-off-by: Andrei Aaron <aaaron@luxoft.com>
It is to fix#3185.
This fixes the case where MetaDB is not instantiated (none of the conditions match),
and we want to retain tags only by pattern (which should not need to use MetaBD).
Without this fix you could only use retention to delete untagged manifests.
If you specified only the key "patterns" under "keepTags", zot would crash.
It was possible to not specify "keepTags" all, which would retain all tags,
but it was not possible to retains specific tags.
Basically the case quoted below, from the documentation, was broken::
https://zotregistry.dev/v2.1.4/articles/retention/#configuration-example
```
When you specify a regex pattern with no rules other than the default, all tags matching the pattern are retained.
```
This would only work if MetaDb was instantiated by an unrelated configured feature.
Signed-off-by: Andrei Aaron <aaaron@luxoft.com>
Fixes#3005
Previously, changing a image's media-type was disallowed.
However, "docker buildx" appears to first push an image manifest and
then an image index for the same image tag. So, allow this.
Signed-off-by: Ramkumar Chinchani <rchincha.dev@gmail.com>
* feat: add redis cache support
https://github.com/project-zot/zot/pull/2005
Fixes https://github.com/project-zot/zot/issues/2004
* feat: add redis cache support
Currently, we have dynamoDB as the remote shared cache but ideal only
for the cloud use case.
For on-prem use case, add support for redis.
Signed-off-by: Ramkumar Chinchani <rchincha@cisco.com>
* feat(redis): added blackbox tests for redis
Signed-off-by: Petu Eusebiu <peusebiu@cisco.com>
* feat(redis): dummy implementation of MetaDB interface for redis cache
Signed-off-by: Alexei Dodon <adodon@cisco.com>
* feat: check validity of driver configuration on metadb instantiation
Signed-off-by: Andrei Aaron <aaaron@luxoft.com>
* feat: multiple fixes for redis cache driver implementation
- add missing method GetAllBlobs
- add redis cache tests, with and without mocking
Signed-off-by: Andrei Aaron <aaaron@luxoft.com>
* feat(redis): redis implementation for MetaDB
Signed-off-by: Andrei Aaron <aaaron@luxoft.com>
* feat(redis): use redsync to block concurrent write access to the redis DB
Signed-off-by: Andrei Aaron <aaaron@luxoft.com>
* feat(redis): update .github/workflows/cluster.yaml to also test redis
Signed-off-by: Andrei Aaron <aaaron@luxoft.com>
* feat(metadb): add keyPrefix parameter for redis and remove unneeded method meta.Crate()
Signed-off-by: Andrei Aaron <aaaron@luxoft.com>
* feat(redis): support RedisCluster configuration and add unit tests
Signed-off-by: Andrei Aaron <aaaron@luxoft.com>
* feat(redis): more tests for redis metadb implementation
Signed-off-by: Andrei Aaron <aaaron@luxoft.com>
* feat(redis): add more examples and update examples/README.md
Signed-off-by: Andrei Aaron <aaaron@luxoft.com>
* feat(redis): move option parsing and redis client initialization under pkg/api/config/redis
Signed-off-by: Andrei Aaron <aaaron@luxoft.com>
* chore(cachedb): move Cache interface to pkg/storage/types
Signed-off-by: Andrei Aaron <aaaron@luxoft.com>
* feat(redis): reorganize code in pkg/storage/cache.go
Signed-off-by: Andrei Aaron <aaaron@luxoft.com>
* feat(redis): call redis.SetLogger() with the zot logger as parameter
Signed-off-by: Andrei Aaron <aaaron@luxoft.com>
* feat(redis): rename pkg/meta/redisdb to pkg/meta/redis
Signed-off-by: Andrei Aaron <aaaron@luxoft.com>
---------
Signed-off-by: Ramkumar Chinchani <rchincha@cisco.com>
Signed-off-by: Petu Eusebiu <peusebiu@cisco.com>
Signed-off-by: Alexei Dodon <adodon@cisco.com>
Signed-off-by: Andrei Aaron <aaaron@luxoft.com>
Co-authored-by: a <a@tuxpa.in>
Co-authored-by: Ramkumar Chinchani <rchincha@cisco.com>
Co-authored-by: Petu Eusebiu <peusebiu@cisco.com>
Co-authored-by: Alexei Dodon <adodon@cisco.com>
* feat: add support for docker images
Issue #724
A new config section under "HTTP" called "Compat" is added which
currently takes a list of possible compatible legacy media-types.
https://github.com/opencontainers/image-spec/blob/main/media-types.md#compatibility-matrix
Only "docker2s2" (Docker Manifest V2 Schema V2) is currently supported.
Garbage collection also needs to be made aware of non-OCI compatible
layer types.
feat: add cve support for non-OCI compatible layer types
Signed-off-by: Ramkumar Chinchani <rchincha@cisco.com>
*
Signed-off-by: Ramkumar Chinchani <rchincha@cisco.com>
* test: add more docker compat tests
Signed-off-by: Ramkumar Chinchani <rchincha@cisco.com>
* feat: add additional validation checks for non-OCI images
Signed-off-by: Ramkumar Chinchani <rchincha@cisco.com>
* ci: make "full" images docker-compatible
Signed-off-by: Ramkumar Chinchani <rchincha@cisco.com>
---------
Signed-off-by: Ramkumar Chinchani <rchincha@cisco.com>
Exit early in case of all folders and known non-blob file names.
This avoids the logic for validating digests, and log message generation.
Signed-off-by: Andrei Aaron <aaaron@luxoft.com>
Looks like we didn't have many GC tests for retaining multiarch images.
I added more data to the existing image retention tests, besides the new GC tests.
Signed-off-by: Andrei Aaron <aaaron@luxoft.com>
cache.HasBlob() looks in both buckets: duplicates and original blobs
Because we want to check if the blob is in original bucket let's use
cache.GetBlob() because it's looking only in original bucket.
Signed-off-by: Eusebiu Petu <petu.eusebiu@gmail.com>
In case of delete by tag only the tag is removed, the manifest itself would continue to be accessible by digest.
In case of delete by digest the manifest would be completely removed (provided it is not used by an index or another reference).
Signed-off-by: Andrei Aaron <aaaron@luxoft.com>