Paper-Conference | Hao Yan

D-Convexity: A Unified Differentiable Convex Shape Prior via Quasi-Concavity for Data-driven Image Segmentation

Thu, 01 Jan 2026 00:00:00 +0000

Overview

D-Convexity is a unified, threshold-free, fully differentiable convex-shape prior for data-driven image segmentation. Instead of constraining the binary mask at a fixed threshold, we require the entire network output $u:\Omega\to[0,1]$ to be quasi-concave — equivalently, every super-level set $S_\gamma=\{\mathbf{x}\in\Omega \mid u(\mathbf{x})\geq\gamma\}$ is convex. From this single principle we derive zero-, first-, and second-order characterizations that turn a hard global geometric constraint into local, differentiable inequalities, yielding a compact convolutional loss and a drop-in Convex Gradient Projection Module (CGPM).

Accepted at CVPR 2026 as a Highlight paper (top 3%).

Figure 1: Overall framework. A Swin-Transformer encoder–decoder backbone produces feature $o$; a sigmoid yields the raw mask $u=\mathcal{S}(o)$. The Convex Gradient Projection Module (CGPM) is an unrolled gradient-descent block ($v^0 \rightarrow v^1 \rightarrow \cdots \rightarrow v^T$) that projects $u$ onto the quasi-concave manifold by descending the convex loss $\nabla\mathcal{L}_{\mathrm{convex}}$. The network is trained with cross-entropy $\mathcal{L}_{\mathrm{CE}}$ on the raw mask and the quasi-concavity loss $\mathcal{L}_{\mathrm{convex}}$ on the projected mask.

Animated Demo: Zero/First/Second-Order Convexification

The animation below visualizes the midpoint (zero-order), first-order gradient, and second-order Hessian convexification dynamics applied to a non-convex initial mask. All three orders progressively regularize the shape into a convex region, but with increasing levels of spatial smoothness.

Your browser does not support the video tag.

Convexification dynamics under the proposed zero-, first-, and second-order quasi-concavity priors. Starting from non-convex inputs, the mask function u is iteratively updated by (left) the local midpoint rule (Algorithm 1, zero-order), (middle) the first-order gradient-based supporting-hyperplane condition, and (right) the second-order quadratic-form penalty Q_2(x). Higher-order priors produce progressively smoother convex shapes.

Motivation

Convexity is a fundamental prior: many anatomical structures (optic disc/cup, blood vessels, organs) and man-made objects are convex or close-to-convex. Enforcing convexity suppresses holes, fragmented predictions, and irregular boundary artifacts, especially under noise, occlusion, and limited training data.

Existing approaches, however, have significant limitations:

Discrete formulations (e.g. 1–0–1 collinear-triplet penalties, graph-cuts with convexity constraints, ILP/multicut decompositions) rely on combinatorial solvers and are hard to differentiate through.
Level-set/curvature methods (non-negative curvature $\kappa\geq 0$, signed-distance Laplacian $\Delta\phi\geq 0$) certify convexity only at one chosen threshold (e.g. $\phi=0$) and are typically necessary but not sufficient.
Recent deep shape priors still lack explicit, principled control over convexity at every confidence level.

D-Convexity resolves all three issues with a single functional view: the mask function $u$ itself should be quasi-concave.

Theory: Quasi-Concavity as a Unified Convex Prior

We formalize convexity threshold-freely as quasi-concavity of $u$:

$$ u \text{ is quasi-concave} \;\Longleftrightarrow\; \forall \gamma,\; S_\gamma=\{\mathbf{x}\mid u(\mathbf{x})\geq\gamma\}\ \text{is convex}. $$

Figure 2: Concave vs. quasi-concave functions. A concave function (left) lies below every tangent plane — a strong property that most segmentation masks violate. A quasi-concave function (right) is the weaker, threshold-free notion D-Convexity uses: it only requires that every super-level set $S_\gamma$ be a convex region. At any boundary point $\mathbf{x}$, the supporting hyperplane is given by $\nabla u(\mathbf{x})^{\top}(\mathbf{y}-\mathbf{x})=0$ — this is the geometric content of our first-order condition.

By considering different smoothness assumptions on $u$, we derive three equivalent (or sufficient) characterizations:

Zero-order condition ($u\in C^0$)

$u$ is quasi-concave $\Longleftrightarrow$ for all $\mathbf{x},\mathbf{y}\in\Omega,\ \lambda\in[0,1]$:
$$u(\lambda\mathbf{x}+(1-\lambda)\mathbf{y}) \;\geq\; \min\{u(\mathbf{x}),u(\mathbf{y})\}.$$

A line segment joining two points above a level cannot dip below that level.

First-order condition ($u\in C^1$)

$u$ is quasi-concave $\Longleftrightarrow$ if $u(\mathbf{x})\geq u(\mathbf{y})$, then $\nabla u(\mathbf{y})^{\top}(\mathbf{x}-\mathbf{y})\geq 0.$

The gradient at every point defines a supporting hyperplane of the local super-level set.

Second-order condition ($u\in C^2$, sufficient)

If for all $\mathbf{x}\in\Omega$ with $\nabla u(\mathbf{x})\neq 0$ the Hessian $\nabla^2 u(\mathbf{x}) \prec 0$ (strict negative definite) on the tangent space $T_\mathbf{x}=\{\mathbf{d}\mid \nabla u(\mathbf{x})^{\top}\mathbf{d}=0\}$, then $u$ is quasi-concave.

For 2D images this has the compact convolutional form:

$$ Q_2(\mathbf{x}) \;=\; u_x^2\,u_{yy} \;-\; 2\,u_x u_y\,u_{xy} \;+\; u_y^2\,u_{xx} \;<\;0, $$

a quadratic form in the image gradient that can be evaluated densely as a tiny fixed-kernel convolution — no thresholding required.

A unifying lens

Following Section 3.6 of the paper, D-Convexity recovers many existing convex priors as special cases, with each prior mapped to one of our zero-, first-, or second-order quasi-concavity conditions. The mapping below uses the exact references from the CVPR 2026 paper (arXiv:2605.19210v1):

Zero-order line-segment prior. Han, Kwon, Kim & Cho, Noise-Robust Pupil Center Detection with Shape-Prior Loss, IEEE Access 2020 require that for every $\mathbf{x},\mathbf{y}$ in the segmentation object, the line segment between them also lies inside it — this is exactly our zero-order condition (Theorem 1) applied over the image domain. Our formulation is more general because it applies to the continuous mask $u$ rather than a single thresholded region.
Half-disk / binary convexity characterization. The indicator-mask condition $(u-1)(b_r\ast(2u-1))\geq 0$ proposed in Liu, Tai & Luo, Convex Shape Prior for Deep Neural Convolution Network based Eye Fundus Images Segmentation, 2020, Luo, Tai & Wang, A New Binary Representation Method for Shape Convexity, Analysis & Applications 2022, and Luo, Chen, Xiao & Tai, A Binary Characterization Method for Shape Convexity, Applied Mathematical Modelling 2023 follows directly from our first-order supporting-hyperplane condition (Theorem 2): at a background pixel $\mathbf{y}$, Lemma 1 forces the foreground into the half-space $\nabla u(\mathbf{y})^{\top}(\mathbf{x}-\mathbf{y})\geq 0$, which intersected with a radius-$r$ disk gives $|B_r(\mathbf{y})\cap S|\leq \tfrac{1}{2}|B_r(\mathbf{y})|$.
Curvature priors $\kappa\geq 0$. Ukwatta et al., Efficient Convex Optimization-Based Curvature Dependent Contour Evolution, SPIE 2013 and Yang et al., A Level Set Method for Convexity Preserving Segmentation of Cardiac Left Ventricle, ICIP 2017 constrain non-negative curvature of level-set boundaries — corresponding to $Q_2(\mathbf{x})\leq 0$, the necessary but not sufficient weakening of our second-order condition $Q_2(\mathbf{x})<0$.
Signed-distance Laplacian priors $\|\nabla\phi\|=1$ with $\Delta\phi\geq 0$. Luo, Tai, Huo, Wang & Glowinski, Convex Shape Prior for Multi-Object Segmentation, ICCV 2019 and Yan, Tai, Liu & Huang, Convexity Shape Prior for Level Set-Based Image Segmentation, IEEE TIP 2020 impose non-negativity of the signed-distance Laplacian. With $\phi=-u$, the curvature identity $\kappa=-Q_2/\|\nabla u\|^3$ shows $\kappa\geq 0 \Leftrightarrow Q_2\leq 0$; D-Convexity’s strict $Q_2<0$ upgrades this into a sufficient convexity condition while remaining fully differentiable.

Related discrete convexity priors (discussed in Section 2 of the paper, and subsumed at the pixel-graph scale by our zero-order view) include 1–0–1 collinear-triple penalties (Gorelick, Veksler, Boykov & Nieuwenhuis, ECCV 2014 / TPAMI 2017), multicut / ILP convexity constraints (Royer, Richmond, Rother, Andres & Kainmüller, CVPR 2016), and relaxed star-type families (Veksler, ECCV 2008; Gulshan et al., CVPR 2010; Isack, Veksler, Sonka & Boykov, CVPR 2016).

So a single quasi-concavity principle subsumes discrete, half-disk, level-set, and curvature-based shape priors in one continuous, differentiable framework, with each prior corresponding to the smoothness order ($C^0$ / $C^1$ / $C^2$) at which it operates.

Loss Functions and CGPM

The first- and second-order conditions become local convolutional losses, evaluated densely over the image without any thresholding:

First-order loss ($\mathcal{L}_{\text{1st}}$): penalize the positive part of the asymmetric pair inequality $\mathrm{ReLU}\big(\nabla u(\mathbf{y})^{\top}(\mathbf{y}-\mathbf{x})\big)$ over a small $r$-radius neighborhood $\mathbf{x}\in N_{\mathbf{y}}$.
Second-order loss ($\mathcal{L}_{\text{2nd}}$): penalize the positive part of $Q_2(\mathbf{x})+\delta$ weighted by $\|\nabla u(\mathbf{x})\|$:

$$ \mathcal{L}_{\text{2nd}}(u) \;=\; \frac{1}{|\Omega|}\sum_{\mathbf{x}\in\Omega} \|\nabla u(\mathbf{x})\|\cdot \mathrm{ReLU}\big(Q_2(\mathbf{x})+\delta\big). $$

Both losses cost $\mathcal{O}(r^2|\Omega|)$ for the first-order and $\mathcal{O}(|\Omega|)$ for the second-order condition, are GPU-parallel, and have explicit closed-form gradients (see Appendix E of the paper).

Convex Gradient Projection Module (CGPM)

At inference time, the loss alone may not strictly enforce convexity. The CGPM solves a small proximal optimization on the network logits:

$$ u_p \in \arg\min_{v\in[0,1]} \tfrac{1}{2}\|v-u\|^2 + \lambda\cdot \mathcal{L}_{\text{convex}}(v), $$

with $\mathcal{L}_{\text{convex}}\in\{\mathcal{L}_{\text{1st}},\mathcal{L}_{\text{2nd}}\}$. Implemented as an unrolled gradient-descent module on the logit space, CGPM is a drop-in projection layer compatible with any segmentation backbone (U-Net, nnU-Net, TransUNet, etc.):

from CGPM import SegModelWithCGPM

model = UNet2D().to(device)
model.load_state_dict(ckpt)
model.eval()

SegCGPM = SegModelWithCGPM(model, backprop_to_backbone=False)
cgpm_output = SegCGPM(images)

CGPM can be used in train mode (back-propagated into the backbone) or as a post-hoc projection (frozen backbone, projection only).

Experimental Results

We evaluate D-Convexity on four segmentation benchmarks spanning cardiac MRI (ACDC), iris segmentation (CASIA), and retinal optic-disc/cup segmentation (REFUGE, RIM-ONE-r3). To assess out-of-distribution generalization, models trained on REFUGE are evaluated directly on RIM-ONE-r3 without fine-tuning. Reported metrics are Dice ↑, IoU ↑, and Hausdorff Distance HD ↓.

Qualitative comparison

Figure 3: Qualitative segmentation comparison. Rows: cardiac MRI (ACDC), iris (CASIA), and retinal optic-disc/cup (REFUGE & RIM-ONE-r3). Columns: (a) input, (b) ground truth, (c)–(h) six baselines, (i) Proposed (D-Convexity). Color code: ▢ white = true positive, ■ black = true negative, ■ red = false positive, ■ green = false negative, ▢ blue = predicted boundary. Baselines tend to produce fragmented holes (green) and spurious lobes (red); D-Convexity yields clean, simply-connected, convex regions that tightly track the ground-truth boundary.

Quantitative results

**Table 1.** Performance of baseline and shape-aware methods on the ACDC, CASIA, REFUGE, and RIM-ONE-r3 datasets. Models trained on REFUGE are evaluated *directly* on RIM-ONE-r3 to assess cross-dataset generalization. Best values per column are in blue; our method (*Proposed*) is highlighted.
Method	ACDC			CASIA			REFUGE			RIM-ONE-r3
Method	Dice ↑	IoU ↑	HD ↓	Dice ↑	IoU ↑	HD ↓	Dice ↑	IoU ↑	HD ↓	Dice ↑	IoU ↑	HD ↓
U-Net [28]	89.52	81.02	28.04	94.65	89.84	2.549	84.66	73.71	11.07	76.48	61.92	20.57
Swin-Unet [3]	95.42	91.23	4.965	94.76	90.05	2.399	84.00	72.42	7.863	81.00	68.07	15.32
DCAN [4]	93.38	87.59	6.946	94.90	90.29	2.413	80.66	67.59	9.379	76.23	61.59	16.53
DMTN [31]	92.60	86.22	8.500	94.92	90.34	2.337	82.36	70.01	9.337	78.39	64.46	16.80
ConvMCD [25]	93.44	87.68	15.53	95.03	90.54	2.323	78.38	64.45	12.51	76.71	62.22	18.18
Active Boundary [35]	90.93	81.38	24.71	94.49	89.55	2.656	84.82	73.63	10.59	75.37	60.48	20.64
Proposed (D-Convexity)	95.46	91.31	4.702	94.71	89.94	2.288	88.61	79.54	5.859	83.09	71.08	12.59

Takeaways.

Best overall on 3 of 4 datasets. D-Convexity is the top performer on ACDC, REFUGE, and RIM-ONE-r3 across all three metrics, and is best on Hausdorff Distance on CASIA. Dice/IoU on CASIA are essentially saturated for all methods (within 0.3% of each other).
Largest gains on hard, shape-driven tasks. On REFUGE, D-Convexity improves Dice from 84.82 → 88.61 ( +3.79) and reduces HD from 7.863 → 5.859 ( −2.0) versus the strongest baseline, with similar gains on the ACDC cardiac task.
Strong out-of-distribution generalization. When the REFUGE-trained model is applied directly to RIM-ONE-r3 (different acquisition device and population), D-Convexity still wins by +2.1 Dice and −2.7 HD over Swin-Unet — evidence that the convex shape prior acts as a robust, task-agnostic regularizer rather than overfitting to a particular dataset.
Drop-in improvement. All gains are obtained with the same backbone segmentation network as the baselines, with CGPM as a plug-in module — no architectural changes are required.

Key Contributions

Quasi-concavity as a unified convex prior. We formalize convexity of all super-level sets as quasi-concavity of the network output $u$, yielding a threshold-free, differentiable, image-domain constraint.
Multi-order characterizations. Zero-, first-, and second-order conditions for $u\in C^0,C^1,C^2$, corresponding to different mask smoothness regimes.
Compact convolutional losses. The first- and second-order conditions reduce to tiny fixed-kernel convolutions, allowing dense evaluation across the image at $\mathcal{O}(|\Omega|)$ cost.
Convex Gradient Projection Module (CGPM). A plug-and-play unrolled-optimization module that strictly enforces convexity at inference time.
Theoretical unification. Discrete 1–0–1 priors, half-disk convolution priors, and curvature / signed-distance Laplacian priors are all recovered as special cases or necessary weakenings of our framework.
Empirical gains. Consistent convexity and shape-regularity improvements across multiple medical-imaging datasets (retinal fundus, cardiac MRI, iris, etc.), outperforming task-specific networks and prior shape-aware methods.

Quick Start

The reference implementation is available on GitHub: ShengzheC/D-Convexity.

For intuition on the convexification algorithm and the zero-order dynamics, start with the notebook:

Convexification_Algorithm.ipynb

The CGPM segmentation framework lives in CGPM.py, and the first- and second-order losses in loss.py.

Resources

Paper (arXiv): arXiv:2605.19210
Code: github.com/ShengzheC/D-Convexity
CVPR 2026 virtual poster: cvpr.thecvf.com/virtual/2026/poster/39174
Venue: CVPR 2026 (Highlight, top 3%)

BibTeX

@inproceedings{chen2026dconvexity,
 title = {D-Convexity: A Unified Differentiable Convex Shape Prior via Quasi-Concavity for Data-driven Image Segmentation},
 author = {Chen, Shengzhe and Yan, Hao},
 booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
 year = {2026},
 note = {Accepted as Highlight (top 3\%)},
 eprint = {2605.19210},
 archivePrefix = {arXiv},
 primaryClass = {cs.CV},
 url = {https://arxiv.org/abs/2605.19210v1}
}

Path-Coupled Bellman Flows for Distributional Reinforcement Learning

Thu, 01 Jan 2026 00:00:00 +0000

Overview

Path-Coupled Bellman Flows (PCBF) is a continuous-time distributional reinforcement learning method that learns return distributions with flow matching using source-consistent Bellman-coupled paths: the current path starts from the required base prior at $t{=}0$, reaches the Bellman target at $t{=}1$, and maintains a pathwise affine relation to the successor flow at intermediate times. PCBF couples current and successor return flows through shared base noise and uses a $\lambda$-parameterized control variate that trades controlled bias for variance reduction in critic training.

Accepted at ICML 2026 as a regular-track presentation.

Figure 1: Path-coupled Bellman geometry. Each panel shows a single current (blue) and successor (orange) return flow. (a) Uncoupled: independent source noise — flows are unrelated except in distribution. (b) Source-inconsistent: the successor starts from $R+gamma X_0$, violating the base prior at $t{=}0$. (c) PCBF: shared noise drives both flows, preserving the base prior at $t{=}0$ and the Bellman endpoint at $t{=}1$.

Animated Demo

The animation below visualizes learned return transport on the Discrete Monte Carlo toy environment: particles flow from a Gaussian source at $t{=}0$ to the learned return distribution at $t{=}1$ along PCBF Bellman-coupled trajectories.

Learned PCBF return transport on the Discrete Monte Carlo environment. Individual particles (colored trajectories) are transported from the base noise distribution at $t{=}0$ to state-dependent return outcomes at $t{=}1$.

Motivation

Distributional reinforcement learning (DRL) models the full distribution of returns rather than only their expectation, enabling richer uncertainty representations and often better empirical performance. Most practical DRL algorithms, however, rely on finite-dimensional approximations — categorical projections or quantile assignments — that introduce bias when the Bellman update does not align with fixed support points.

Reframing DRL as continuous probability transport makes flow matching a natural framework: the distributional Bellman equation defines an affine transport relationship, and a neural velocity field can transport samples from a simple Gaussian prior to the return law without heuristic projections.

Directly enforcing an uncorrected pointwise Bellman map inside flow composition fails in two critical ways:

Source boundary mismatch. Flow matching requires generation to start from a fixed simple prior (e.g., $\mathcal{N}(0,1)$), but an uncorrected Bellman update $Z_t = R + \gamma Z'_t$ starts from $R + \gamma X_0 \neq X_0$.
High-variance bootstrapping. When current and successor noises are sampled independently, intermediate trajectories are not pathwise aligned; Bellman consistency can only be enforced at the endpoint, yielding unstable per-sample targets.

PCBF resolves both issues through source-consistent Bellman path correction and shared-noise path coupling, cleanly separating geometric flow requirements from Bellman bootstrapping variance.

Method: Path-Coupled Bellman Flows

Shared-noise Bellman paths

Given shared base noise $X_0 \sim \mathcal{N}(0,1)$ and a successor return sample $X' = \psi_{\theta^-}^{1}(X_0 \mid s', a')$ from the target flow map, PCBF defines time-synchronized linear interpolation paths:

$$ Z^{s'}_t = (1-t)X_0 + t X' \qquad\text{(successor path)}, $$ $$ Z^{s}_t = (1-t)X_0 + t\bigl(R + \gamma X'\bigr) \qquad\text{(current path)}. $$

An equivalent form that reveals the Bellman geometry is:

$$ Z^s_t = t R + \gamma Z^{s'}_t + (1-t)(1-\gamma)X_0. $$

The residual anchor $(1-t)(1-\gamma)X_0$ guarantees exact alignment at $t{=}0$ regardless of $\gamma$, while $Z^s_1 = R + \gamma X'$ satisfies the distributional Bellman boundary at $t{=}1$. Differentiating yields the unbiased BCFM target $\dot Z^s_t = R + \gamma X' - X_0$.

Lambda-parameterized control variates

To reduce variance from the noisy successor sample $X'$, PCBF forms the training target $u_t^\lambda$ from two pieces:

Sample Bellman velocity (baseline): $Y = R + \gamma X' - X_0$. This is unbiased but can have high variance because it depends directly on the bootstrapped successor return $X'$.
Control-variate correction: $\lambda \cdot \bigl( v_{\theta^-}(t, Z^{s'}_t \mid s', a') - (X' - X_0) \bigr)$, where $v_{\theta^-}$ is the lagged target velocity field along the successor path $Z^{s'}_t$.

Putting them together,

$u_t^\lambda = Y + \lambda \bigl( v_{\theta^-}(t, Z^{s'}_t \mid s', a') - (X' - X_0) \bigr)$.

Setting $\lambda = 0$ recovers the unbiased sample Bellman target. Values $\lambda > 0$ introduce a variance-reducing correction using successor-flow velocity predictions. With shared-noise coupling, the induced bias stays small: in a linear–Gaussian model, shared noise ($\rho = 1$) gives bias on the order of $(1-\gamma)(1-t)$, which vanishes when $\gamma \approx 1$ and at the flow endpoints $t \in \{0, 1\}$.

Policy extraction for offline RL

At deployment, a behavior-cloned proposal policy samples $K{=}16$ candidate actions; each is scored by the mean terminal return under the learned flow $\hat Q_\theta(s,a) = \frac{1}{M}\sum_m \psi_\theta^{1}(X_{0,m}\mid s,a)$, and the highest-scoring action is executed.

Toy Environments: Distributional Fidelity

We validate PCBF on three analytically tractable environments with known return laws: Solitaire Dice (heavy-tailed discrete returns), Bernoulli MRP (uniform return on $[0,2]$), and Discrete Monte Carlo Chain (multimodal finite-horizon returns).

Figure 2: Learned PCBF maps on toy environments. Solitaire (top left), Bernoulli (top right), Discrete MC (bottom). PCBF recovers heavy-tailed, uniform, and multimodal return structures and closely matches ground-truth histograms.

Figure 3: Distributional accuracy on toy environments. Learned return CDFs for PCBF and Value Flows (dcfm $in {0, 0.5, 1}$) versus ground-truth references. PCBF consistently tracks the reference CDFs; Value Flows degrades as dcfm increases, systematically underestimating return variance.

Figure 4: Hyperparameter sensitivity (PCBF vs. Value Flows). On Solitaire and Discrete MC, increasing Value Flows’ dcfm coefficient degrades Wasserstein error, while PCBF’s $lambda$-target remains robust across a wide range of values.

Figure 5: Variance reduction via $lambda$-parameterized control variates. Larger $lambda$ yields smoother Bellman velocity regression loss trajectories (lower within-run standard deviation), validating the control-variate mechanism.

Pathwise Bellman Residual and Discretization

PCBF enforces the Bellman endpoint at $t{=}1$ by construction, but training uses a finite-step Euler solver (10 NFE). Shared-noise coupling yields smaller corrected Bellman residuals $r_{\mathrm{corr}}(t,N)$ than independent-noise ablations across solver budgets $N \in \{4,8,16,32\}$:

Figure 6: Corrected Bellman residual $r_{mathrm{corr}}(t,N)$ on Solitaire Dice. Shared-noise PCBF (blue) maintains lower residuals than independent-noise coupling (orange) across flow times and Euler budgets.

Offline RL Benchmarks

We evaluate PCBF on 38 offline RL tasks: 30 OGBench single-task variants (four state-based manipulation domains and two pixel-based domains) plus eight D4RL Adroit tasks. Baselines include distributional methods (IQN, CODAC, Value Flows), flow-based scalar critics (FloQ, FQL), and IQL.

Figure 7: OGBench tasks. State-based cube, scene, and puzzle manipulation environments and pixel-based visual variants used in our evaluation.

Aggregated results

**Table 1.** Offline RL results on OGBench and D4RL Adroit. Success rates (%) for OGBench domains (5 tasks each) and normalized scores for D4RL. Results averaged over 8 seeds (4 for pixel tasks). Bold values are within 95% of the best method on each domain; *PCBF (Ours)* is highlighted.
Domain	IQN	CODAC	FloQ	FQL	IQL	Value Flows	PCBF (Ours)
cube-double-play (5 tasks)	42 ± 8	61 ± 6	47 ± 14	29 ± 6	7 ± 1	69 ± 4	71 ± 5
scene-play (5 tasks)	40 ± 1	55 ± 1	58 ± 4	56 ± 2	28 ± 3	59 ± 4	54 ± 4
puzzle-4×4-play (5 tasks)	27 ± 4	20 ± 18	28 ± 6	17 ± 5	7 ± 2	27 ± 4	30 ± 4
cube-triple-play (5 tasks)	6 ± 0	2 ± 1	8 ± 3	4 ± 2	1 ± 1	14 ± 3	4 ± 1
D4RL adroit (8 tasks)	66 ± 5	69 ± 0	70 ± 5	71 ± 4	70	65 ± 2	69 ± 2
visual-antmaze-teleport (5 tasks)	4 ± 2	—	—	5 ± 2	6 ± 4	13 ± 4	14 ± 4
visual-cube-double-play (5 tasks)	1 ± 0	—	—	6 ± 1	11 ± 6	13 ± 2	3 ± 0

Takeaways.

Selective but strong gains. PCBF achieves best or near-best aggregate performance on cube-double-play, puzzle-4×4-play, D4RL Adroit, and visual-antmaze-teleport, where critic-side return-law fidelity and variance-controlled bootstrapping affect action ranking.
Best distributional fidelity on toys. On analytically tractable MRPs, PCBF closely tracks ground-truth CDFs and remains robust to $\lambda$, while Value Flows degrades as the DCFM consistency weight increases.
Honest limitations. On cube-triple-play and visual-cube-double-play, PCBF underperforms Value Flows — long-horizon sparse-reward and pixel-based settings remain challenging when policy extraction, visual encoders, or $\lambda$ selection become bottlenecks.
Similar cost to Value Flows. PCBF uses ~60 GB GPU memory and ~2.5× wall-clock versus scalar critics on OGBench (single A100, $10^6$ steps); training requires 10-step Euler integration of the velocity field.

Key Contributions

Source-consistent Bellman-interpolated paths that resolve the $t{=}0$ boundary mismatch of uncorrected pointwise Bellman paths while preserving the Bellman endpoint at $t{=}1$.
Shared-noise path coupling that aligns current and successor return flows pathwise, inducing a geometric Bellman relation between velocity fields.
$\lambda$-parameterized control-variate target with a distribution-free $L_2$ bias bound and a linear–Gaussian closed form explaining why shared-noise coupling shrinks intrinsic bias.
Population velocity identification, shared-noise Bellman contraction, and Euler integration sensitivity analysis supporting stable flow-based distributional critics.
Comprehensive evaluation on Solitaire Dice, Bernoulli, and Discrete MC toy MRPs plus 38 OGBench and D4RL offline RL tasks.

Quick Start

The reference implementation is available on GitHub: BoyangASU/path-coupled-bellman-flows.

PCBF is implemented in JAX, adapted from the FQL codebase. Key hyperparameters: 10 Euler integration steps, batch size 256, learning rate $3\times10^{-4}$, and domain-tuned $\lambda$ (see paper Tables for per-domain values). State-based tasks train for 1M gradient steps; pixel-based tasks for 500K steps.

Resources

Paper (arXiv): arXiv:2605.08253
Code: github.com/BoyangASU/path-coupled-bellman-flows
Venue: ICML 2026 (regular track)

BibTeX

@inproceedings{xu2026pathcoupled,
 title = {Path-Coupled Bellman Flows for Distributional Reinforcement Learning},
 author = {Xu, Boyang and Zou, Qing and Yang, Siqin and Yan, Hao},
 booktitle = {Proceedings of the International Conference on Machine Learning (ICML)},
 year = {2026},
 note = {Regular track},
 eprint = {2605.08253},
 archivePrefix = {arXiv},
 primaryClass = {cs.LG},
 url = {https://arxiv.org/abs/2605.08253}
}

Multi-modal Generative Modeling of Event Sequences and Time Series for Solar PV Systems

Wed, 01 Jan 2025 00:00:00 +0000

Probabilistic Kolmogorov-Arnold Networks via sparsified deep Gaussian processes with additive kernels

Wed, 01 Jan 2025 00:00:00 +0000

Graph-aware Tensor Topic Models for Individualized Passenger Travel Pattern Clustering

Sun, 01 Jan 2023 00:00:00 +0000

Tensor dirichlet process multinomial mixture model with graphs for passenger trajectory clustering

Sun, 01 Jan 2023 00:00:00 +0000

Attention-based Representation Learning for Time Series with Principal and Residual Space Monitoring

Sat, 01 Jan 2022 00:00:00 +0000

Event Extraction for aviation accident reports through attention-based multi-label classification

Sat, 01 Jan 2022 00:00:00 +0000

Combining Anatomical Constraints and Deep Learning for 3-D CBCT Dental Image Multi-Label Segmentation

Mon, 19 Apr 2021 00:00:00 +0000

Edge Computing Accelerated Defect Classification Based on Deep Convolutional Neural Network With Application in Rolling Image Inspection

Fri, 01 Jan 2021 00:00:00 +0000

Hierarchical Tree-Based Sequential Event Prediction with Application in the Aviation Accident Report

Fri, 01 Jan 2021 00:00:00 +0000

Tensor Completion for Weakly-Dependent Data on Graph for Metro Passenger Flow Prediction

Tue, 01 Dec 2020 00:00:00 +0000

Simultaneous material microstructure classification and discovery via hidden Markov modeling of acoustic emission signals

Wed, 01 Jan 2020 00:00:00 +0000

Image-Based Process Monitoring via Adversarial Autoencoder with Applications to Rolling Defect Detection

Thu, 01 Aug 2019 00:00:00 +0000

Physics-Based Deep Spatio-Temporal Metamodeling for Cardiac Electrical Conduction Simulation

Thu, 01 Aug 2019 00:00:00 +0000

Rapid Detection of Hot-Spot by Tensor Decomposition with Application to Weekly Gonorrhea Data

Tue, 01 Jan 2019 00:00:00 +0000

Semi-supervised constrained hidden Markov model using multiple sensors for remaining useful life prediction and optimal predictive maintenance— For remaining useful life prediction and optimal predictive maintenance

Tue, 01 Jan 2019 00:00:00 +0000

Real-time production performance analysis using machine degradation signals— A two-machine case

Mon, 01 Jan 2018 00:00:00 +0000

Point Cloud Data Analysis for Process Modeling and Optimization

Sun, 01 Jan 2017 00:00:00 +0000

Frequency Domain Instantaneous Wavenumber Estimation for Damage Quantification in Layered Plate Structures

Wed, 01 Jan 2014 00:00:00 +0000