Generated 2026-04-14T00:18:52+00:00
Click any column header to sort. Tokens / Cost / Duration are per-replicate means — sort ascending to find the most efficient stacks at a given quality level.
| Maturity | Phase | Stack | n | code_quality | tokens (mean) | cost (mean) | duration (mean) | turns (mean) | $/quality |
|---|---|---|---|---|---|---|---|---|---|
| 0.750 | trial | language=go, model=opus, tooling=none | 1/1 | 1.000 | 1,098,817 | $1.3872 | 273.5s | 24.0 | $1.3872 |
| 0.750 | trial | language=go, model=opus, tooling=beads | 1/1 | 1.000 | 683,860 | $1.2267 | 268.7s | 28.0 | $1.2267 |
| 0.750 | trial | language=go, model=sonnet, tooling=none | 1/1 | 1.000 | 1,540,294 | $1.1782 | 425.7s | 30.0 | $1.1782 |
| 0.750 | trial | language=java, model=opus, tooling=none | 1/1 | 1.000 | 965,345 | $1.2560 | 218.0s | 24.0 | $1.2560 |
| 0.750 | trial | language=java, model=opus, tooling=beads | 1/1 | 1.000 | 1,669,932 | $1.7526 | 340.6s | 39.0 | $1.7526 |
| 0.750 | trial | language=java, model=sonnet, tooling=beads | 1/1 | 1.000 | 2,779,597 | $1.8355 | 674.0s | 57.0 | $1.8355 |
| 0.708 | trial | language=rust, model=opus, tooling=none | 1/1 | 0.833 | 593,895 | $0.8660 | 174.7s | 16.0 | $1.0392 |
| 0.708 | trial | language=clojure, model=opus, tooling=none | 1/1 | 0.833 | 630,222 | $0.8090 | 178.2s | 17.0 | $0.9709 |
| 0.708 | trial | language=rust, model=opus, tooling=beads | 1/1 | 0.833 | 1,108,771 | $1.5749 | 350.3s | 32.0 | $1.8899 |
| 0.708 | trial | language=rust, model=sonnet, tooling=none | 1/1 | 0.833 | 209,825 | $1.1439 | 471.0s | 9.0 | $1.3727 |
| 0.708 | trial | language=rust, model=sonnet, tooling=beads | 1/1 | 0.833 | 491,969 | $1.1087 | 532.5s | 17.0 | $1.3304 |
| 0.708 | trial | language=clojure, model=opus, tooling=beads | 1/1 | 0.833 | 1,391,321 | $1.3912 | 342.5s | 34.0 | $1.6694 |
| 0.708 | trial | language=clojure, model=sonnet, tooling=beads | 1/1 | 0.833 | 1,811,932 | $1.0288 | 410.3s | 49.0 | $1.2346 |
| 0.708 | trial | language=clojure, model=sonnet, tooling=none | 1/1 | 0.833 | 1,920,625 | $1.1249 | 436.6s | 45.0 | $1.3498 |
| 0.683 | trial | language=typescript, model=sonnet, tooling=beads | 1/1 | 0.733 | 1,556,688 | $0.9246 | 361.9s | 43.0 | $1.2609 |
| 0.683 | trial | language=typescript, model=opus, tooling=none | 1/1 | 0.733 | 919,663 | $1.0130 | 187.5s | 23.0 | $1.3814 |
| 0.683 | trial | language=typescript, model=opus, tooling=beads | 1/1 | 0.733 | 1,022,265 | $1.0675 | 204.8s | 27.0 | $1.4556 |
| 0.683 | trial | language=typescript, model=sonnet, tooling=none | 1/1 | 0.733 | 797,512 | $0.7092 | 274.5s | 24.0 | $0.9671 |
| 0.667 | trial | language=python, model=sonnet, tooling=none | 1/1 | 0.667 | 879,497 | $0.7158 | 328.9s | — | $1.0737 |
| 0.667 | trial | language=python, model=opus, tooling=none | 1/1 | 0.667 | 580,884 | $0.7256 | 149.3s | 16.0 | $1.0884 |
| 0.667 | trial | language=python, model=opus, tooling=beads | 1/1 | 0.667 | 1,625,376 | $1.7259 | 348.8s | 44.0 | $2.5888 |
| 0.667 | trial | language=python, model=sonnet, tooling=beads | 1/1 | 0.667 | 2,113,900 | $1.2469 | 482.8s | 49.0 | $1.8704 |
| 0.150 | candidate | language=java, model=sonnet, tooling=none | 0/1 | n/a | 4,013,637 | $2.3117 | 779.7s | 61.0 | — |
| 0.150 | candidate | language=go, model=sonnet, tooling=beads | 0/1 | n/a | 3,180,387 | $1.7240 | 506.8s | 61.0 | — |
Click a column header to sort. retort · maturity = 0.30·agreement + 0.30·completion + 0.25·score + 0.15·coverage
From retort analyze on the exported CSV. Significant factors are flagged at the bottom of each response section.
============================================================
Response: code_quality transform: log10(y)
R² = 1.0000 Adj R² = 1.0000
============================================================
sum_sq df F PR(>F)
C(language) 8.535518e-02 5.0 9.957970e+29 2.474325e-206
C(model) 2.831117e-32 1.0 1.651463e+00 2.196141e-01
C(tooling) 1.966053e-34 1.0 1.146849e-02 9.162363e-01
Residual 2.400032e-31 14.0 NaN NaN
Significant (p < 0.1): language
============================================================
Response: _tokens transform: log10(y)
R² = 0.6020 Adj R² = 0.4029
============================================================
sum_sq df F PR(>F)
C(language) 0.536723 5.0 2.739070 0.062773
C(model) 0.046457 1.0 1.185415 0.294644
C(tooling) 0.230278 1.0 5.875925 0.029479
Residual 0.548662 14.0 NaN NaN
Significant (p < 0.1): language, tooling
============================================================
Response: _cost_usd transform: log10(y)
R² = 0.6531 Adj R² = 0.4796
============================================================
sum_sq df F PR(>F)
C(language) 0.097462 5.0 2.451931 0.085321
C(model) 0.004787 1.0 0.602110 0.450685
C(tooling) 0.089329 1.0 11.236655 0.004744
Residual 0.111298 14.0 NaN NaN
Significant (p < 0.1): language, tooling
============================================================
Response: _duration_seconds transform: log10(y)
R² = 0.8547 Adj R² = 0.7820
============================================================
sum_sq df F PR(>F)
C(language) 0.087041 5.0 2.611892 0.071824
C(model) 0.357773 1.0 53.679806 0.000004
C(tooling) 0.123530 1.0 18.534238 0.000726
Residual 0.093309 14.0 NaN NaN
Significant (p < 0.1): language, model, tooling
============================================================
Response: _turns transform: log10(y)
R² = 0.7036 Adj R² = 0.5439
============================================================
sum_sq df F PR(>F)
C(language) 0.267959 5.0 2.807060 0.062101
C(model) 0.043085 1.0 2.256755 0.156929
C(tooling) 0.242839 1.0 12.719571 0.003446
Residual 0.248193 13.0 NaN NaN
Significant (p < 0.1): language, tooling
============================================================
Response: test_coverage transform: log10(y+1)
R² = 0.6104 Adj R² = 0.4157
============================================================
sum_sq df F PR(>F)
C(language) 0.220742 5.0 4.120960 0.016429
C(model) 0.011791 1.0 1.100608 0.311907
C(tooling) 0.002332 1.0 0.217711 0.647968
Residual 0.149984 14.0 NaN NaN
Significant (p < 0.1): language