Generated 2026-04-13T21:50:10+00:00
Click any column header to sort. Tokens / Cost / Duration are per-replicate means — sort ascending to find the most efficient stacks at a given quality level.
| Maturity | Phase | Stack | n | code_quality | tokens (mean) | cost (mean) | duration (mean) | turns (mean) | $/quality |
|---|---|---|---|---|---|---|---|---|---|
| 1.000 | production | language=go, model=sonnet, tooling=beads | 3/3 | 1.000 | 476,955 | $0.3110 | 146.6s | — | $0.3110 |
| 1.000 | production | language=java, model=opus, tooling=none | 3/3 | 1.000 | 217,163 | $0.4364 | 131.1s | — | $0.4364 |
| 1.000 | production | language=java, model=opus, tooling=beads | 3/3 | 1.000 | 325,112 | $0.5525 | 149.6s | — | $0.5525 |
| 1.000 | production | language=java, model=sonnet, tooling=none | 3/3 | 1.000 | 494,116 | $0.3259 | 151.6s | — | $0.3259 |
| 1.000 | production | language=java, model=sonnet, tooling=beads | 3/3 | 1.000 | 611,395 | $0.3646 | 181.6s | — | $0.3646 |
| 0.989 | production | language=go, model=sonnet, tooling=none | 3/3 | 0.956 | 435,374 | $0.3029 | 123.4s | — | $0.3169 |
| 0.970 | production | language=go, model=opus, tooling=beads | 3/3 | 0.985 | 346,216 | $0.4908 | 117.0s | — | $0.4981 |
| 0.958 | production | language=rust, model=opus, tooling=none | 3/3 | 0.833 | 150,702 | $0.3314 | 106.1s | — | $0.3977 |
| 0.958 | production | language=rust, model=opus, tooling=beads | 3/3 | 0.833 | 355,100 | $0.4808 | 142.6s | — | $0.5770 |
| 0.958 | production | language=rust, model=sonnet, tooling=none | 3/3 | 0.833 | 395,257 | $0.3551 | 194.1s | — | $0.4261 |
| 0.958 | production | language=clojure, model=opus, tooling=none | 3/3 | 0.833 | 409,366 | $0.5790 | 178.7s | — | $0.6948 |
| 0.933 | production | language=typescript, model=opus, tooling=none | 3/3 | 0.733 | 168,703 | $0.3187 | 181.2s | — | $0.4346 |
| 0.933 | production | language=typescript, model=opus, tooling=beads | 3/3 | 0.733 | 454,220 | $0.5119 | 219.1s | — | $0.6980 |
| 0.924 | production | language=go, model=opus, tooling=none | 3/3 | 0.963 | 230,498 | $0.3611 | 93.5s | — | $0.3750 |
| 0.869 | production | language=python, model=sonnet, tooling=none | 3/3 | 0.637 | 332,390 | $0.2257 | 73.8s | — | $0.3543 |
| 0.858 | production | language=typescript, model=sonnet, tooling=beads | 3/4 | 0.733 | 637,683 | $0.3812 | 167.7s | — | $0.5198 |
| 0.808 | trial | language=rust, model=sonnet, tooling=beads | 2/3 | 0.833 | 643,793 | $0.4141 | 207.6s | — | $0.4969 |
| 0.808 | trial | language=clojure, model=opus, tooling=beads | 2/3 | 0.833 | 723,724 | $0.7618 | 201.4s | — | $0.9142 |
| 0.808 | trial | language=clojure, model=sonnet, tooling=beads | 2/3 | 0.833 | 722,940 | $0.5204 | 259.1s | — | $0.6245 |
| 0.808 | trial | language=clojure, model=sonnet, tooling=none | 2/3 | 0.833 | 665,636 | $0.5747 | 310.1s | — | $0.6896 |
| 0.791 | trial | language=python, model=sonnet, tooling=beads | 3/3 | 0.696 | 436,754 | $0.2617 | 110.2s | — | $0.3758 |
| 0.789 | trial | language=python, model=opus, tooling=beads | 3/3 | 0.672 | 280,360 | $0.3734 | 79.1s | — | $0.5554 |
| 0.783 | trial | language=typescript, model=sonnet, tooling=none | 2/3 | 0.733 | 835,319 | $0.5314 | 281.1s | — | $0.7246 |
| 0.736 | trial | language=python, model=opus, tooling=none | 3/3 | 0.789 | 91,698 | $0.2034 | 44.0s | — | $0.2579 |
Click a column header to sort. retort · maturity = 0.30·agreement + 0.30·completion + 0.25·score + 0.15·coverage
From retort analyze on the exported CSV. Significant factors are flagged at the bottom of each response section.
============================================================
Response: code_quality transform: log10(y)
R² = 0.8454 Adj R² = 0.8270
============================================================
sum_sq df F PR(>F)
C(language) 2.433865e-01 5.0 64.428644 1.260303e-22
C(model) 6.148193e-04 1.0 0.813767 3.706769e-01
C(tooling) 7.135823e-07 1.0 0.000944 9.755866e-01
Residual 4.457583e-02 59.0 NaN NaN
Significant (p < 0.1): language
============================================================
Response: _tokens transform: log10(y)
R² = 0.7425 Adj R² = 0.7050
============================================================
sum_sq df F PR(>F)
C(language) 0.585702 5.0 7.218186 4.155562e-05
C(model) 1.099493 1.0 67.750717 9.877011e-11
C(tooling) 0.541440 1.0 33.363505 5.500229e-07
Residual 0.778968 48.0 NaN NaN
Significant (p < 0.1): language, model, tooling
============================================================
Response: _cost_usd transform: log10(y)
R² = 0.7031 Adj R² = 0.6598
============================================================
sum_sq df F PR(>F)
C(language) 0.512915 5.0 16.710178 1.550257e-09
C(model) 0.089848 1.0 14.635766 3.765549e-04
C(tooling) 0.098682 1.0 16.074650 2.116749e-04
Residual 0.294670 48.0 NaN NaN
Significant (p < 0.1): language, model, tooling
============================================================
Response: _duration_seconds transform: log10(y)
R² = 0.7673 Adj R² = 0.7333
============================================================
sum_sq df F PR(>F)
C(language) 1.208637 5.0 26.687311 8.496877e-13
C(model) 0.204070 1.0 22.529828 1.900173e-05
C(tooling) 0.036614 1.0 4.042288 5.000990e-02
Residual 0.434773 48.0 NaN NaN
Significant (p < 0.1): language, model, tooling