Commit Graph

4 Commits

Author SHA1 Message Date
Benoit Tigeot 6c1f1bdd40 feat(metrics): add Prometheus GC metrics (#3863)
* feat(metrics): add Prometheus GC metrics

Track garbage collection activity with three new metrics:
- zot_gc_runs_total (counter, label: error) — GC run count
- zot_gc_duration_seconds (summary) — GC run duration
- zot_gc_deleted_total (counter, label: type) — items deleted
  by type: blob, manifest, upload

MetricServer is added to GarbageCollect and wired through
all callers (controller, verify-feature retention, tests).

Signed-off-by: Benoit Tigeot <benoit.tigeot@lifen.fr>

* fix(test): add missing metrics var in GCS GC tests

TestGCSGarbageCollectImageIndex and
TestGCSGarbageCollectChainedImageIndexes were missing the
metrics variable required by NewGarbageCollect after the
MetricServer parameter was added.

Signed-off-by: Benoit Tigeot <benoit.tigeot@lifen.fr>

* fix(test): add defer metrics.Stop() in GC tests

Prevent goroutine/port leaks by stopping MetricsServer in
storage_test.go (3 functions) and gcs_test.go (also add
missing metrics declaration in TestGCSGarbageCollectImageManifest).

Signed-off-by: Benoit Tigeot <benoit.tigeot@lifen.fr>

* fix(test): cover `CleanRepo` error path

Add test that exercises the error branch in
`CleanRepo` where `cleanRepo` fails, covering
the metrics calls and log lines flagged by Codecov.

Signed-off-by: Benoit Tigeot <benoit.tigeot@lifen.fr>

* test: Cover GC error paths for codecov

Add three tests in gc_internal_test.go to cover previously
untested error branches in `removeBlobUploads` and
`removeUnreferencedBlobs`: `ListBlobUploads` failure,
`addIndexBlobsToReferences` failure, and `PathNotFoundError`
from `GetAllBlobs`.

Signed-off-by: Benoit Tigeot <benoit.tigeot@lifen.fr>

* test(gc): cover remaining error paths

Cover `StatBlobUpload`, `digest.Validate()`,
`isBlobOlderThan`, and `CleanupRepo` error branches
in `removeBlobUploads` and `removeUnreferencedBlobs`.

`removeUnreferencedBlobs` now at 100% coverage,
`removeBlobUploads` from 78.3% to 91.3%.

Signed-off-by: Benoit Tigeot <benoit.tigeot@lifen.fr>

* test: cover `sanityChecks` label name mismatch

Try to avoid -0.09% coverage regression on `minimal.go`
by exercising the uncovered branch in `sanityChecks`
where label names have correct count but wrong values.

Signed-off-by: Benoit Tigeot <benoit.tigeot@lifen.fr>

* test(gc): exercise real GC path in metrics test

TestGCMetrics was calling metric helpers directly instead of
running actual garbage collection, so it couldn't catch wiring
regressions where `CleanRepo` stops recording metrics.

Now uploads an orphaned blob and runs `gc.CleanRepo` end-to-end,
verifying metrics appear on the Prometheus endpoint.

Suggestion from Copilot: https://github.com/project-zot/zot/pull/3863#discussion_r3129324719

Signed-off-by: Benoit Tigeot <benoit.tigeot@lifen.fr>

* fix(gc): skip deletion metrics when DryRun is enabled

https://github.com/project-zot/zot/pull/3863#discussion_r3129324684

Signed-off-by: Benoit Tigeot <benoit.tigeot@lifen.fr>

* fix(test): stop leaked MetricsServer goroutines in GCS tests

https://github.com/project-zot/zot/pull/3863#discussion_r3129324657

Signed-off-by: Benoit Tigeot <benoit.tigeot@lifen.fr>

* refactor(test): drop unnecessary zlog import alias

Signed-off-by: Benoit Tigeot <benoit.tigeot@lifen.fr>

* fix(monitoring): expose metric types outside build tag

`MetricsCopy` and related types were only visible under `\!metrics`,
causing a typecheck failure when golangci-lint runs with `-tags metrics`.
Moving the type definitions to `common.go` makes them unconditionally available.

Signed-off-by: Benoit Tigeot <benoit.tigeot@lifen.fr>

* fix(monitoring): remove extra blank line for gci

Signed-off-by: Benoit Tigeot <benoit.tigeot@lifen.fr>

* test(gc): cover both dry-run and real deletion metrics

And fix issue with build tag with metrics

Signed-off-by: Benoit Tigeot <benoit.tigeot@lifen.fr>

* Satisfy testpackage linter for gc metrics test

The `testpackage` linter allows `package gc` only in files named
`*_internal_test.go`; rename to follow that convention.

Signed-off-by: Benoit Tigeot <benoit.tigeot@lifen.fr>

---------

Signed-off-by: Benoit Tigeot <benoit.tigeot@lifen.fr>
2026-05-16 23:03:36 -07:00
Andrei Aaron d33c1e3b22 fix: now attempt to bind to the zot server socket to check if the server is running (#3703)
Signed-off-by: Andrei Aaron <andreifdaaron@gmail.com>
2026-01-15 20:02:15 +02:00
Andrei Aaron da426850e7 chore: update golangci-lint and fix all issues (#3575)
* chore: Update golangci-lint

Signed-off-by: Lars Francke <git@lars-francke.de>

* chore: fix all golangci-lint issues

- Remove deprecated `// +build` tags
- Fix godoclint, modernize, wsl_v5, govet, lll, gci, noctx issues
- Update linter configuration
- Modernize code to use Go 1.22+ features (for range N, slices.Contains, etc.)
- Update make check lint the privileged tests

Signed-off-by: Andrei Aaron <andreifdaaron@gmail.com>

---------

Signed-off-by: Lars Francke <git@lars-francke.de>
Signed-off-by: Andrei Aaron <andreifdaaron@gmail.com>
Co-authored-by: Lars Francke <git@lars-francke.de>
2025-11-22 23:36:48 +02:00
Andrei Aaron 41e10d4fe9 feat: add zot subcommand to enable testing retention policy settings (#3449)
feat: add verify-feature retention subcommand with comprehensive testing and validation

Add a `verify-feature retention` subcommand that allows users to preview and
validate retention policy changes without running the actual Zot server.
The command runs GC and retention tasks in dry-run mode for immediate feedback.

- Run verify-feature retention standalone without starting the server
- Preview retention policy decisions in dry-run mode
- Configurable GC interval override via command-line flag
- Optional timeout for task completion
- Configurable log output (stdout or file)

Basic usage:
```bash
zot verify-feature retention <config-file>
```

With log file output:
```bash
zot verify-feature retention -l /var/log/zot-retention-check.log <config-file>
```

With GC interval override (runs GC tasks every 30 seconds):
```bash
zot verify-feature retention -i 30s <config-file>
```

With timeout (wait up to 5 minutes for tasks to complete):
```bash
zot verify-feature retention -t 5m <config-file>
```

Combined flags:
```bash
zot verify-feature retention -l /var/log/zot-retention-check.log -i 1m -t 10m <config-file>
```

The command supports overriding GC settings from the config:
- `-i, --gc-interval`: Override the GC interval setting (applies to all storage paths including subpaths)

- Refactored `RunGCTasks` from `controller.go` to be reusable
- Added `checkServerRunning` validation to prevent conflicts
- Implemented signal handling for graceful shutdown
- Added configuration sanitization and logging
- Set GCMaxSchedulerDelay programmatically (not user-configurable)

Added tests for coverage on main function:
- Negative test cases (no args, bad config, GC disabled, server running)
- Both BoltDB and Redis
- Retention enabled scenarios with complex image setups
- Retention disabled scenarios
- Delete referrers functionality
- Subpaths configuration
- GC interval override validation

Run the verify-feature retention tests:
```bash
go test -v ./pkg/cli/server -run TestRetentionCheck
```

Signed-off-by: Andrei Aaron <andreifdaaron@gmail.com>
2025-10-28 13:36:59 -07:00