Fast-path progressbar: ~24 ns/iter (was ~254), behavior-preserving by wolph · Pull Request #316 · wolph/python-progressbar

wolph · 2026-06-22T01:23:48Z

Fast-path progressbar: ~24 ns/iter (was ~254)

Makes wrapping a loop with progressbar2 an order of magnitude cheaper, without changing observable behavior.

Result — per-iteration overhead (the headline metric)

library	ns/iter
rich	19
progressbar2	24 (was 254)
tqdm	56
alive-progress	247
click	1878

progressbar2 goes from mid-pack to 2nd-fastest — ~11× faster than before, ~2.3× faster than tqdm, and within a few ns of rich. (Beating rich outright is the goal of a planned optional native extension, deliberately scoped as a separate follow-up.)

How

An integer "next-update gate": the common iteration is just value += 1; self.value = value; if value >= next_update: update(); yield. The expensive redraw machinery (clock read + widget formatting) only runs when a redraw could actually happen (rate-limited to ~20×/sec). The gate calibrates from a real timing measurement and self-corrects via a tqdm-style closed loop.

Iterator path: __iter__ rewritten as a single peek-first generator; the shortcuts.progressbar wrapper generator is collapsed away.
Manual path: update() / += skip the per-call clock read below the gate threshold.
The gate only ever skips iterations — whenever it enters the slow path, the unchanged _needs_update() makes the real redraw decision. It can never force a wrong redraw, only defer a check.

Correctness & backward compatibility

Public API unchanged. bar.value stays live every iteration (== current item index). Same widgets, same redraw cadence.
previous_value now tracks the value at the last redraw (needed for the pixel check once intermediate update()s are skipped).
A PROGRESSBAR_DISABLE_FASTPATH env var (and min_poll_interval=0) reverts to the original per-iteration behavior.
Equivalence tests assert the gated bar draws the same frame cadence as an ungated bar (catch any dropped redraw); developed TDD-first against characterization tests of the pre-change behavior.
Full suite: 471 passed, 100% branch coverage. pyright clean; zero new mypy errors.

Benchmark + regression guard

benchmarks/ — reproducible suite (bench.py + report.py) pitting progressbar2 against tqdm/rich/alive-progress/click across iteration overhead, forced-render cost, and import time, all rendered to a real pseudo-terminal.
tests/test_perf_budget.py + a CI job — fail the build if per-iteration overhead regresses toward the old regime.
README documents the claim with the generated chart.

Secondary (honest scope notes)

Render cost: 28.5 → 25.7 µs/update via a safe no_color/len_color fast path (skip the ANSI-strip regex on plain text). Getting under tqdm's ~11 µs needs deeper widget-render rework (deferred; negligible in real use at ~20 redraws/sec).
Import time: investigated and left unchanged — measurement shows the ~45 ms is dominated by the always-needed python_utils dependency (asyncio + typing_extensions ~30 ms), not by anything lazy-loadable here. Reducing it needs trimming the python_utils dependency or an upstream fix (separate effort).

🤖 Generated with Claude Code

Benchmarks progressbar2 against tqdm, rich, alive-progress and click across three dimensions (default iteration overhead, forced per-update render cost, cold import time), all rendered to a real drained pty. Produces benchmarks/report.md + benchmarks/chart.png.

Adds fixed_clock() helper (patches bar_module.timeit.default_timer via monkeypatch) and test_redraw_count_is_rate_limited. With dt=0.001 (1 ms per timer read) and the default 50 ms min_poll_interval, 20 000 iterations produced n_repaints=238 — well inside the assertion band 1 < n < 2000.

Adds `_next_update`, `_gate_step`, `_gate_last_value`, `_gate_last_timer`, and `_gate_enabled` fields to `init()`, plus `_recompute_gate()` for tqdm-style closed-loop step calibration. Gate is dead code this task; wired in by later tasks.

…bles - Move `self.previous_value = self.value` into the `_update_parents` branch (inside a `finally` so it runs even if the draw raises), so previous_value reflects the value at the last *successful draw attempt* rather than the last update() call. This keeps `_needs_update()`'s pixel-threshold check anchored to "pixels since last draw" once the gate (Task 5) starts skipping intermediate update() calls. - Skip `_update_variables` entirely on the common no-kwargs path: `variables_changed = self._update_variables(kwargs) if kwargs else False` - Add two TDD tests to tests/test_fastpath.py: · test_previous_value_tracks_last_redraw: previous_value ∈ drawn values · test_previous_value_not_set_on_skipped_update: after two rate-limited skips, previous_value stays pinned at the last-drawn value (3), not the penultimate call value (4).

Add value-threshold early-return in update() so iterations below _next_update bypass timeit.default_timer() entirely. Gate activates only after _gate_calibrated=True, which requires either (a) a real timing measurement in _recompute_gate (redraw path), or (b) the backed-off step growing to >=10 (fast tight-loop path without redraws). This ensures slow iterators where time advances between calls (sleep, freezegun.tick) are never incorrectly skipped before the gate is trustworthy. Also adds `_recompute_gate(self.value)` after the draw block so the threshold stays calibrated on the non-gated path, and covers the `_gate_enabled=False` branch via test_gate_disabled_skips_recompute.

…me equivalence test Remove the uncalibrated back-off activation in _recompute_gate: the gate is now calibrated only by a real timing measurement (the redraw branch), never by the no-redraw back-off path. Add _next_pixel_value() and clamp _next_update to the next pixel-bucket crossing so the time-sized step can never leap past a value-driven redraw the ungated bar would render. Honor the no_freezegun pytest marker in conftest's sleep_faster fixture, mark the perf test no_freezegun (it needs a real clock to calibrate), and add test_gated_matches_ungated_drawn_frames asserting byte-identical frames for a poll_interval=None slow loop.

Rewrite ProgressBar.__iter__ as a peek-first generator that applies the integer gate (value >= next_update) in the hot loop, eliminating the per-iteration next(self) call that invoked __next__ on every item. The shortcut progressbar() is now a plain function that returns iter(ProgressBar(...)(iterator)) — a single generator — instead of a generator wrapping a generator. Two related fixes needed for correctness: - _recompute_gate: only apply the pixel-boundary clamp when _gate_calibrated is True; clamping during the back-off phase with a frozen clock created an infinite loop at the pixel boundary. - __iter__ inner loop: prime next_update = self.value (not self._next_update) so the first post-yield iteration always calls update(), ensuring slow clocks (time advancing in the loop body before gate calibration) still redraw at the right pixel buckets. Added tests: test_shortcut_has_single_generator_layer, test_iterator_overhead_is_low (skipped under coverage tracing), test_next_direct_exhaustion_calls_finish.

… add iterator-path equivalence test

…; restore fast-path perf

Add PROGRESSBAR_DISABLE_FASTPATH env-var gate and auto-disable when min_poll_interval is zero, reverting to per-update behavior for callers that need every update considered. Test uses instance-level min_poll_interval mutation (not class attribute) to avoid invalidating CPython 3.13 adaptive specialiser caches, which caused a 120x perf regression in test_iterator_overhead_is_low.

Add tests/test_perf_budget.py measuring iterator overhead (wrapped minus baseline) and asserting < 120 ns/iter to catch regression to the ~250 ns pre-fast-path regime. Marked no_freezegun (real clock needed for gate calibration) and guarded against coverage-tracer inflation (sys.gettrace()). Update conftest small_interval fixture to also skip patching _MINIMUM_UPDATE_INTERVAL for no_freezegun tests so the gate can calibrate correctly. Add benchmarks/requirements.txt with pinned competitor versions. Add perf-budget CI job running pytest tests/test_perf_budget.py --no-cov.

…mark progressbar2 now ~24 ns/iter wrapping a loop (was ~254): 2nd-fastest of the common libraries, ~2.3x faster than tqdm, within a few ns of rich, and 10-79x faster than alive-progress/click. README Performance section + chart.

len_color is called for every widget on every redraw; no_color ran a regex substitution even on plain text (no ESC byte). Fast-path returns the value unchanged when there is no ESC byte (identical result, no regex). Forced per-update render: 28.5us -> 25.7us. Output byte-identical (full suite green at 100% coverage); deeper sub-tqdm render work deferred as a follow-up.

chatgpt-codex-connector · 2026-06-22T01:23:53Z

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.
To continue using code reviews, you can upgrade your account or add credits to your account and enable them for code reviews in your settings.

gemini-code-assist

Code Review

This pull request introduces a performance optimization ("fast-path" gate) to progressbar2 to reduce per-iteration overhead, along with benchmarks, reports, and tests to verify performance and correctness. The review feedback identifies two critical issues: first, _gate_step can grow exponentially before calibration on fast loops, causing severe performance degradation; second, changing the semantics of the public previous_value attribute breaks backward compatibility, which should be resolved by introducing a private _last_drawn_value attribute instead. Additionally, a test update is suggested to align with the calibration fix.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

- Exclude benchmarks/ from pytest collection (--doctest-modules was importing benchmarks/report.py which needs matplotlib, absent in CI) and from ruff (tooling scripts, not package code). - Fix ruff E501/RUF002/RUF003/N818 in new test + gate-comment lines (wrap long comments, en-dash -> hyphen, Boom -> BoomError).

Copilot

Pull request overview

This PR introduces a “next-update gate” fast path in ProgressBar to dramatically reduce per-iteration overhead by skipping expensive redraw checks until a computed threshold is reached, while aiming to preserve redraw cadence and public API behavior. It also adds performance regression tests/CI coverage and a reproducible benchmark suite plus documentation of the performance claim.

Changes:

Implement a calibrated integer gate in progressbar/bar.py to avoid per-iteration clock reads/redraw predicate work on the common path (iterator + manual update() paths).
Add characterization/equivalence tests and a CI “performance budget” test to prevent regressions in iterator-wrap overhead.
Add benchmarks tooling/artifacts and document the performance results in README.rst.

Reviewed changes

Copilot reviewed 15 out of 16 changed files in this pull request and generated 1 comment.

Show a summary per file

File	Description
`progressbar/bar.py`	Adds gated fast-path logic in `__iter__` and `update()`, plus gate calibration/recompute machinery and env-based disabling.
`progressbar/shortcuts.py`	Collapses the shortcut wrapper generator layer by returning the bar’s iterator directly.
`progressbar/utils.py`	Adds a fast path in `no_color` to skip regex work when no ANSI escape is present.
`tests/test_fastpath.py`	Adds extensive behavioral parity and fast-path characterization tests (iterator + manual update paths).
`tests/test_perf_budget.py`	Adds a tight perf regression guard for iterator-wrap overhead (skips assertion under coverage tracing).
`tests/conftest.py`	Adds a `no_freezegun` marker escape hatch so timing-dependent perf tests run with real clocks/intervals.
`benchmarks/bench.py`	Adds a reproducible pty-based benchmark runner comparing progressbar2 vs alternatives.
`benchmarks/report.py`	Adds chart/report generation from benchmark results.
`benchmarks/requirements.txt`	Pins benchmark-only dependencies.
`benchmarks/results.json`	Adds a captured benchmark results snapshot.
`benchmarks/report.md`	Adds a generated benchmark report snapshot.
`README.rst`	Documents the performance claim and how to reproduce benchmarks.
`.github/workflows/main.yml`	Adds a dedicated CI job to enforce the iterator overhead performance budget.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

- Restore public previous_value semantics (value before the last update() call); the gate now uses a separate private _last_drawn_value for its pixel check (gemini: backward compatibility). - Only back off _gate_step once calibrated, preventing exponential growth of the step before calibration (gemini: performance). - Update gate-step tests for the calibrated back-off + add an uncalibrated no-growth test. - Avoid mixed import of progressbar in tests; comment bench.py teardown excepts (CodeQL).

- Compare gi_code to ProgressBar.__iter__.__code__ instead of __qualname__ for the single-generator-layer check (Copilot: more robust). - Use progressbar.bar instead of 'from progressbar import bar' so the test module no longer mixes import styles (CodeQL).

…k-compat) Set previous_value in the iterator's gated-out branch too, so reads of bar.previous_value mid-loop match the pre-gate every-iteration semantics exactly (not just at redraws). Closes the last residue of the backward- compatibility review concern. Costs ~7 ns/iter (now ~31 ns vs tqdm 56, still 2nd-fastest); README + benchmark updated to the honest figure. Adds a per-iteration previous_value assertion to the liveness test.

wolph · 2026-06-22T10:39:22Z

Review resolution summary

All inline review comments have been addressed and resolved. Recording the resolution here since the bot summary reviews above were written before the fixes and don't auto-update.

Finding	Reviewer	Resolution
`_gate_step` could grow exponentially before calibration	gemini (high)	The back-off doubling is now `elif self._gate_calibrated:` — it never runs before a real timing measurement, and the gate isn't even consulted pre-calibration. Regression test `test_recompute_gate_no_backoff_before_calibration` added.
Public `previous_value` semantics / backward compatibility	gemini (high)	`previous_value` keeps its original meaning (the value before the current `update()` call) and is now updated on every iteration of `for x in bar` too — verified byte-identical to the pre-gate implementation on the manual path, the iterator path, `increment()`/`+=`, and with the fast path disabled. The gate uses a separate private `_last_drawn_value` for its pixel check, so no public attribute changed meaning. A per-iteration `previous_value` assertion was added to the liveness test.
Back-off test should set `_gate_calibrated=True`	gemini (medium)	Done, plus a complementary no-growth-before-calibration test.
`gen.__qualname__` is brittle	Copilot	Now compares `gen.gi_code is ProgressBar.__iter__.__code__`.
Empty `except` without comment (×2)	CodeQL	Explanatory comments added to the pty-teardown `except` clauses.
Module imported with both `import` and `import-from`	CodeQL	Test module no longer mixes import styles (uses `progressbar.bar`).

Backward compatibility: byte-identical value/previous_value on every iteration; same widgets and same redraw cadence; an env var (PROGRESSBAR_DISABLE_FASTPATH) fully reverts to the original path.

Performance (updated after the back-compat hardening): ~31 ns/iter wrapping a loop (was ~254), ~1.8× faster than tqdm, second only to rich. Full suite green at 100% branch coverage; ruff and pyright clean; CI green.

🤖 Generated with Claude Code

The absolute 120 ns/iter ceiling failed on the py314 CI runner (measured 136 ns) because raw ns/iter varies wildly across runners and Python builds. Replace it with a ratio: iterator overhead must stay under 4x the cost of a single clock read measured in the same process. The pre-gate path read the clock every iteration (~9x); the gated path reads none (~1.4x locally), so the bound robustly catches a regression while tolerating slow/noisy CI.

Those two fields duplicated _last_drawn_value/_last_update_timer (value/time at the last redraw). Calibrate _gate_step inline from the prior redraw's value/time, snapshotted before the draw overwrites them. Gate state: 7 -> 5 fields. _recompute_gate is replaced by a small _gate_skips() predicate and a _draw_and_recalibrate() helper (keeps update() under the complexity limit). Behavior and ~31 ns/iter overhead unchanged; 470 tests pass at 100% coverage.

wolph added 16 commits June 21, 2026 16:35

test: characterize progressbar iteration behavior before fast path

8f69bf6

test: drop unused monkeypatch param, tighten gate-init assertions

6c8b055

fix: iterator gate skips only after calibration (no dropped buckets);…

8ab7fe1

… add iterator-path equivalence test

fix: remove pixel clamp that defeated the gate (857k/1M update calls)…

f7efd66

…; restore fast-path perf

docs: fix stale __iter__ comments referencing removed pixel clamp

f371737

Copilot AI review requested due to automatic review settings June 22, 2026 01:23

Copilot started reviewing on behalf of wolph June 22, 2026 01:24 View session

github-advanced-security AI found potential problems Jun 22, 2026

View reviewed changes

Comment thread benchmarks/bench.py Fixed

Comment thread benchmarks/bench.py Fixed

Comment thread tests/test_fastpath.py Fixed

gemini-code-assist Bot reviewed Jun 22, 2026

View reviewed changes

Comment thread progressbar/bar.py Outdated

Comment thread progressbar/bar.py Outdated

Comment thread tests/test_fastpath.py Outdated

Copilot AI reviewed Jun 22, 2026

View reviewed changes

Comment thread tests/test_fastpath.py Outdated

wolph force-pushed the fast-progressbar branch from 2f3aa7d to 7a0e2d2 Compare June 22, 2026 01:43

wolph added 2 commits June 22, 2026 03:48

wolph added 2 commits June 22, 2026 12:45

wolph force-pushed the fast-progressbar branch from 5f5704d to 9fd1a66 Compare June 23, 2026 10:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fast-path progressbar: ~24 ns/iter (was ~254), behavior-preserving#316

Fast-path progressbar: ~24 ns/iter (was ~254), behavior-preserving#316
wolph wants to merge 22 commits into
developfrom
fast-progressbar

wolph commented Jun 22, 2026

Uh oh!

chatgpt-codex-connector Bot commented Jun 22, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

wolph commented Jun 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

wolph commented Jun 22, 2026