* feat(metrics): add Prometheus GC metrics
Track garbage collection activity with three new metrics:
- zot_gc_runs_total (counter, label: error) — GC run count
- zot_gc_duration_seconds (summary) — GC run duration
- zot_gc_deleted_total (counter, label: type) — items deleted
by type: blob, manifest, upload
MetricServer is added to GarbageCollect and wired through
all callers (controller, verify-feature retention, tests).
Signed-off-by: Benoit Tigeot <benoit.tigeot@lifen.fr>
* fix(test): add missing metrics var in GCS GC tests
TestGCSGarbageCollectImageIndex and
TestGCSGarbageCollectChainedImageIndexes were missing the
metrics variable required by NewGarbageCollect after the
MetricServer parameter was added.
Signed-off-by: Benoit Tigeot <benoit.tigeot@lifen.fr>
* fix(test): add defer metrics.Stop() in GC tests
Prevent goroutine/port leaks by stopping MetricsServer in
storage_test.go (3 functions) and gcs_test.go (also add
missing metrics declaration in TestGCSGarbageCollectImageManifest).
Signed-off-by: Benoit Tigeot <benoit.tigeot@lifen.fr>
* fix(test): cover `CleanRepo` error path
Add test that exercises the error branch in
`CleanRepo` where `cleanRepo` fails, covering
the metrics calls and log lines flagged by Codecov.
Signed-off-by: Benoit Tigeot <benoit.tigeot@lifen.fr>
* test: Cover GC error paths for codecov
Add three tests in gc_internal_test.go to cover previously
untested error branches in `removeBlobUploads` and
`removeUnreferencedBlobs`: `ListBlobUploads` failure,
`addIndexBlobsToReferences` failure, and `PathNotFoundError`
from `GetAllBlobs`.
Signed-off-by: Benoit Tigeot <benoit.tigeot@lifen.fr>
* test(gc): cover remaining error paths
Cover `StatBlobUpload`, `digest.Validate()`,
`isBlobOlderThan`, and `CleanupRepo` error branches
in `removeBlobUploads` and `removeUnreferencedBlobs`.
`removeUnreferencedBlobs` now at 100% coverage,
`removeBlobUploads` from 78.3% to 91.3%.
Signed-off-by: Benoit Tigeot <benoit.tigeot@lifen.fr>
* test: cover `sanityChecks` label name mismatch
Try to avoid -0.09% coverage regression on `minimal.go`
by exercising the uncovered branch in `sanityChecks`
where label names have correct count but wrong values.
Signed-off-by: Benoit Tigeot <benoit.tigeot@lifen.fr>
* test(gc): exercise real GC path in metrics test
TestGCMetrics was calling metric helpers directly instead of
running actual garbage collection, so it couldn't catch wiring
regressions where `CleanRepo` stops recording metrics.
Now uploads an orphaned blob and runs `gc.CleanRepo` end-to-end,
verifying metrics appear on the Prometheus endpoint.
Suggestion from Copilot: https://github.com/project-zot/zot/pull/3863#discussion_r3129324719
Signed-off-by: Benoit Tigeot <benoit.tigeot@lifen.fr>
* fix(gc): skip deletion metrics when DryRun is enabled
https://github.com/project-zot/zot/pull/3863#discussion_r3129324684
Signed-off-by: Benoit Tigeot <benoit.tigeot@lifen.fr>
* fix(test): stop leaked MetricsServer goroutines in GCS tests
https://github.com/project-zot/zot/pull/3863#discussion_r3129324657
Signed-off-by: Benoit Tigeot <benoit.tigeot@lifen.fr>
* refactor(test): drop unnecessary zlog import alias
Signed-off-by: Benoit Tigeot <benoit.tigeot@lifen.fr>
* fix(monitoring): expose metric types outside build tag
`MetricsCopy` and related types were only visible under `\!metrics`,
causing a typecheck failure when golangci-lint runs with `-tags metrics`.
Moving the type definitions to `common.go` makes them unconditionally available.
Signed-off-by: Benoit Tigeot <benoit.tigeot@lifen.fr>
* fix(monitoring): remove extra blank line for gci
Signed-off-by: Benoit Tigeot <benoit.tigeot@lifen.fr>
* test(gc): cover both dry-run and real deletion metrics
And fix issue with build tag with metrics
Signed-off-by: Benoit Tigeot <benoit.tigeot@lifen.fr>
* Satisfy testpackage linter for gc metrics test
The `testpackage` linter allows `package gc` only in files named
`*_internal_test.go`; rename to follow that convention.
Signed-off-by: Benoit Tigeot <benoit.tigeot@lifen.fr>
---------
Signed-off-by: Benoit Tigeot <benoit.tigeot@lifen.fr>
* fix: make config read/write thread safe and fix some other similar issues
1. The config config has a lock, and safe methods to update and read the attributes
2. The config has methods to retrieve copies of specific attributes, such as the extyensions config, the auth config, and the authz config.
These are needed, as the config object may mutate in the middle of an auth/authz requests, and we avoid partial configuration being applied for that request.
3. Fix an issue with the monitoring server not stopping when the controller is shut down.
4. Fix an issue with the HTPasswdWatcher not stopping when the background tasks are supposed to finish.
5. Fix some tests using hardcoded ports.
Moved some of the methods which were on the main config to the auth, access control and extension configs
Signed-off-by: Andrei Aaron <andreifdaaron@gmail.com>
* fix: migrate to Go module v2 for proper semantic versioning
This change updates the module path from 'zotregistry.dev/zot' to
'zotregistry.dev/zot/v2' to comply with Go's semantic versioning rules.
According to Go's module versioning requirements, major version v2+
must include the major version in the module path. The current
module path 'zotregistry.dev/zot' only supports v0.x.x and v1.x.x
versions, making existing v2.x.x tags (like v2.1.8) unusable.
Changes:
- Updated go.mod module path to zotregistry.dev/zot/v2
- Updated all internal import paths across 280+ Go source files
- Updated configuration files (golangcilint.yaml, gqlgen.yml)
- Updated README.md Go reference badge
This fix enables proper use of existing v2.x.x Git tags and allows
external packages to import zot v2+ versions without compatibility
errors.
Resolves: Go module import compatibility for v2+ versions
Fixes: #3071
Signed-off-by: Luca Muscariello <muscariello@ieee.org>
* fix: regenerate GraphQL files with updated v2 import paths
The gqlgen tool needs to regenerate the GraphQL schema files after
the module path change to use the new v2 imports.
Signed-off-by: Luca Muscariello <muscariello@ieee.org>
---------
Signed-off-by: Luca Muscariello <muscariello@ieee.org>
Most users don't make the difference between retention deleting untagged manifests vs GC deleting other blobs.
This causes confusion since the GC delay and the retention delay (used for untagged manifests and orphan referrers) have different defaults, and are set separately in the zot configuration.
Most users don't configrue retention policies, and they still expect untagged manifests to be deleted at GC time.
With this change, if retention delay is not specified in the config file, the value used is the GC delay.
If GC delay is also unspecified in the config file, the default GC delay is used for both.
Signed-off-by: Andrei Aaron <andreifdaaron@gmail.com>
* fix: migrate from github.com/rs/zerolog to golang-native log/slog
We have been using zerolog for a really long time.
golang now has structured logging using slog.
Best to move to this in interests of long-term support.
This is a tech debt item.
Signed-off-by: Ramkumar Chinchani <rchincha.dev@gmail.com>
* fix: a few changes on top
Signed-off-by: Ramkumar Chinchani <rchincha.dev@gmail.com>
* fix: address comments
Signed-off-by: Ramkumar Chinchani <rchincha.dev@gmail.com>
---------
Signed-off-by: Ramkumar Chinchani <rchincha.dev@gmail.com>
- fixes#3347: removeUntaggedManifests() did not consider compatible manifest types
- add AsDockerImage() to Image and MultiarchImage for testing
- extend TestGarbageCollectAndRetentionMetaDB to test docker image and multiarch image
Signed-off-by: Stephan Merker <stephan.merker@sap.com>
Using just the last repository is not enough as in the case when it is deleted
(either by GC or some other way), GetNextRepository returns empty string
causing the generator to be marked completed without any errors.
An alternative would have been to start over from the first repository,
but this can take hours if multiple repositories need to be deleted,
not to mention the processing power and I/O and S3 load this could take.
Signed-off-by: Andrei Aaron <aaaron@luxoft.com>
It is to fix#3185.
This fixes the case where MetaDB is not instantiated (none of the conditions match),
and we want to retain tags only by pattern (which should not need to use MetaBD).
Without this fix you could only use retention to delete untagged manifests.
If you specified only the key "patterns" under "keepTags", zot would crash.
It was possible to not specify "keepTags" all, which would retain all tags,
but it was not possible to retains specific tags.
Basically the case quoted below, from the documentation, was broken::
https://zotregistry.dev/v2.1.4/articles/retention/#configuration-example
```
When you specify a regex pattern with no rules other than the default, all tags matching the pattern are retained.
```
This would only work if MetaDb was instantiated by an unrelated configured feature.
Signed-off-by: Andrei Aaron <aaaron@luxoft.com>
* feat: add support for docker images
Issue #724
A new config section under "HTTP" called "Compat" is added which
currently takes a list of possible compatible legacy media-types.
https://github.com/opencontainers/image-spec/blob/main/media-types.md#compatibility-matrix
Only "docker2s2" (Docker Manifest V2 Schema V2) is currently supported.
Garbage collection also needs to be made aware of non-OCI compatible
layer types.
feat: add cve support for non-OCI compatible layer types
Signed-off-by: Ramkumar Chinchani <rchincha@cisco.com>
*
Signed-off-by: Ramkumar Chinchani <rchincha@cisco.com>
* test: add more docker compat tests
Signed-off-by: Ramkumar Chinchani <rchincha@cisco.com>
* feat: add additional validation checks for non-OCI images
Signed-off-by: Ramkumar Chinchani <rchincha@cisco.com>
* ci: make "full" images docker-compatible
Signed-off-by: Ramkumar Chinchani <rchincha@cisco.com>
---------
Signed-off-by: Ramkumar Chinchani <rchincha@cisco.com>
Looks like we didn't have many GC tests for retaining multiarch images.
I added more data to the existing image retention tests, besides the new GC tests.
Signed-off-by: Andrei Aaron <aaaron@luxoft.com>
wait for workers to finish before exiting
should fix tests reporting they couldn't remove rootDir because it's being
written by tasks
Signed-off-by: Petu Eusebiu <peusebiu@cisco.com>
- MetaDB stores the time of the last update of a repo
- During startup we check if the layout has been updated after the last recorded change in the db
- If this is the case, the repo is parsed and updated in the DB otherwise it's skipped
Signed-off-by: Laurentiu Niculae <niculae.laurentiu1@gmail.com>