publications
2026
- COLTAlmost linear convergence under minimal score assumptions: Quantized transition diffusionXunpeng Huang, Yingyu Lin, Nikki Lijing Kuang, Hanze Dong, Difan Zou, Yian Ma, and Tong ZhangAnnual Conference on Learning Theory (COLT) 2026
Continuous diffusion models have demonstrated remarkable generative performance across diverse domains but are often constrained by the computational cost of simulating reverse OrnsteināUhlenbeck processes via SDE/ODE solvers. Existing theoretical results typically establish query complexities that scale polynomially with both the dimension d and the error tolerance ε(e.g., \tilde\mathcalO(d/ε)). This mirrors the limitations of unadjusted Langevin algorithm, where standard first-order score solvers lack access to zeroth-order density information, precluding natural error-correction mechanisms and thus preventing the fast \ln(1/ε) convergence attainable by Metropolis-adjusted methods. In this paper, we develop an improved generative modeling method by introducing \ourmethod (Quantized Transition Diffusion), a framework that reformulates continuous diffusion into a discrete generation problem through spatial quantization and the parameterization of zeroth-order information (e.g., density ratios). To sample from this discrete target, we propose a truncated uniformization algorithm that simulates the underlying continuous-time Markov chain of the discrete diffusion process without discretization error, while eliminating the restrictive bounded-score assumption required by prior uniformization-based approaches. Consequently, \ourmethod attains ε-accuracy in total variation distance with a query complexity of \mathcalO(d \ln^2(d/ε)), yielding a notable improvement in ε-dependence compared to existing continuous diffusion samplers. Crucially, our analysis capitalizes on a novel proof technique based on the infinitesimal chain rule of KL divergence, providing a fresh perspective on unifying continuous and discrete diffusion paradigms.
@article{huang2026almost, abbr = {COLT}, bibtex_show = {true}, title = {Almost linear convergence under minimal score assumptions: Quantized transition diffusion}, author = {Huang, Xunpeng and Lin, Yingyu and Kuang, Nikki Lijing and Dong, Hanze and Zou, Difan and Ma, Yian and Zhang, Tong}, journal = {Annual Conference on Learning Theory (COLT)}, year = {2026}, html = {https://arxiv.org/pdf/2505.21892}, selected = {true} }
2025
- arXivOn the Complexity Theory of Masked Discrete Diffusion: From $\backslashmathrm {poly}(1/\backslashepsilon) to Nearly \backslashepsilon $-FreeXunpeng Huang, Yingyu Lin, Nishant Jain, Kaibo Wang, Difan Zou, Yian Ma, and Tong ZhangarXiv preprint arXiv:2509.21835 2025
We study \emphabsorbing discrete diffusion, a flexible paradigm for high-dimensional data generation where elements are progressively corrupted to an absorbing mask state before being denoised. Despite its empirical success, the theoretical complexity of this approach in high dimensions remains insufficiently understood. Existing analyses largely focus on \emphuniform discrete diffusion, and more recent attempts addressing absorbing diffusion either (1) overlook widely used Euler samplers, (2) impose restrictive bounded-score assumptions, or (3) fail to showcase the advantages of absorbing discrete diffusion over its uniform counterpart. To bridge this gap, we provide the first rigorous analysis of typical Euler samplers for absorbing discrete diffusion, establishing an ε-accuracy in total variation (TV) distance with \tildeO(d^2ε^-3/2) discrete score evaluations. Furthermore, we introduce \emphMask-Aware Truncated Uniformization (\ourmethod), a technique that eliminates bounded-score assumptions while preserving unbiased score approximations. By exploiting the structural property that tokens are unmasked at most once, \ourmethod achieves an ε-free complexity of O(d \ln d). This result strictly improves upon existing uniformization-based inference for uniform diffusion by removing the \ln(1/ε) factor, offering a substantial convergence speedup. Our findings not only provide a solid theoretical foundation for absorbing discrete diffusion, confirming its advantage over uniform diffusion for high-dimensional generation, but also pave the way for future analysis of diffusion-based language models under the masking paradigm.
@article{huang2025complexity, abbr = {arXiv}, bibtex_show = {true}, title = {On the Complexity Theory of Masked Discrete Diffusion: From $$\backslash$mathrm $\{$poly$\}$(1/$\backslash$epsilon) $ to Nearly $$\backslash$epsilon $-Free}, author = {Huang, Xunpeng and Lin, Yingyu and Jain, Nishant and Wang, Kaibo and Zou, Difan and Ma, Yian and Zhang, Tong}, journal = {arXiv preprint arXiv:2509.21835}, year = {2025}, html = {https://arxiv.org/abs/2509.21835} } - ICMLWhat makes in-context learning effective for mathematical reasoningJiayu Liu, Zhenya Huang, Chaokun Wang, Xunpeng Huang, ChengXiang Zhai, and Enhong ChenIn Forty-second International Conference on Machine Learning 2025
Owing to the capability of in-context learning, large language models (LLMs) have shown impressive performance across diverse mathematical reasoning benchmarks. However, we find that few-shot demonstrations can sometimes bring negative performance and their effectiveness on LLMsā reasoning abilities remains unreliable. To this end, in this paper, we aim to theoretically analyze the impact of in-context demonstrations on LLMsā reasoning performance. We prove that the reasoning efficacy (measured by empirical prediction loss) can be bounded by an \emphLLM-oriented semantic similarity and an \emphinference stability of demonstrations, which is general for both one-shot and few-shot scenarios. Based on this finding, we propose a straightforward, generalizable, and low-complexity demonstration selection method named LMS3. It facilitates to select the most pertinent samples for different LLMs and includes a novel demonstration rejection mechanism to automatically filter out samples that are unsuitable for few-shot learning. Through experiments on three representative benchmarks, two LLM backbones, and multiple few-shot settings, we verify that our LMS3 has superiority and achieves consistent improvements on all datasets, which existing methods have been unable to accomplish. Our code is available at \urlhttps://github.com/Ljyustc/LMS3.
@inproceedings{liu2025makes, abbr = {ICML}, bibtex_show = {true}, title = {What makes in-context learning effective for mathematical reasoning}, author = {Liu, Jiayu and Huang, Zhenya and Wang, Chaokun and Huang, Xunpeng and Zhai, ChengXiang and Chen, Enhong}, booktitle = {Forty-second International Conference on Machine Learning}, year = {2025}, html = {https://openreview.net/forum?id=upSPbQoViI} }
2024
- NeurIPSReverse Transition Kernel: A Flexible Framework to Accelerate Diffusion InferenceXunpeng Huang, Difan Zou, Hanze Dong, Yi Zhang, Yian Ma, and Tong ZhangIn Annual Conference on Neural Information Processing Systems (NeurIPS) 2024, Best paper of ICML Workshop on Structured Probabilistic Inference & Generative Modeling 2024
To generate data from trained diffusion models, most inference algorithms, such as DDPM, DDIM, and other variants, rely on discretizing the reverse SDEs or their equivalent ODEs. In this paper, we view such approaches as decomposing the entire denoising diffusion process into several segments, each corresponding to a reverse transition kernel (RTK) sampling subproblem. Specifically, DDPM uses a Gaussian approximation for the RTK, resulting in low per-subproblem complexity but requiring a large number of segments (i.e., subproblems), which is conjectured to be inefficient. To address this, we develop a general RTK framework that enables a more balanced subproblem decomposition, resulting in \tildeO(1) subproblems, each with strongly log-concave targets. We then propose leveraging two fast sampling algorithms, the Metropolis-Adjusted Langevin Algorithm (MALA) and Underdamped Langevin Dynamics (ULD), for solving these strongly log-concave subproblems. This gives rise to the RTK-MALA and RTK-ULD algorithms for diffusion inference. In theory, we further develop the convergence guarantees for RTK-MALA and RTK-ULD in total variation (TV) distance: RTK-ULD can achieve εtarget error within \tildeO(d^1/2ε^-1) under mild conditions, and RTK-MALA enjoys a O(d^2\log(d/ε)) convergence rate under slightly stricter conditions. These theoretical results surpass the state-of-the-art convergence rates for diffusion inference and are well supported by numerical experiments.
@inproceedings{huang2024reverse, abbr = {NeurIPS}, bibtex_show = {true}, title = {Reverse Transition Kernel: A Flexible Framework to Accelerate Diffusion Inference}, author = {Huang, Xunpeng and Zou, Difan and Dong, Hanze and Zhang, Yi and Ma, Yian and Zhang, Tong}, booktitle = {Annual Conference on Neural Information Processing Systems (NeurIPS) 2024, Best paper of ICML Workshop on Structured Probabilistic Inference & Generative Modeling}, year = {2024}, html = {https://arxiv.org/abs/2401.06325}, selected = {true} } - COLTFaster Sampling without Isoperimetry via Diffusion-based Monte CarloXunpeng Huang, Difan Zou, Hanze Dong, Yian Ma, and Tong ZhangIn Annual Conference on Learning Theory (COLT) 2024
To sample from a general target distribution pāāeāfā beyond the isoperimetric condition, Huang et al. (2023) proposed to perform sampling through reverse diffusion, giving rise to Diffusion-based Monte Carlo (DMC). Specifically, DMC follows the reverse SDE of a diffusion process that transforms the target distribution to the standard Gaussian, utilizing a non-parametric score estimation. However, the original DMC algorithm encountered high gradient complexity, resulting in an exponential dependency on the error tolerance ϵ of the obtained samples. In this paper, we demonstrate that the high complexity of DMC originates from its redundant design of score estimation, and proposed a more efficient algorithm, called RS-DMC, based on a novel recursive score estimation method. In particular, we first divide the entire diffusion process into multiple segments and then formulate the score estimation step (at any time step) as a series of interconnected mean estimation and sampling subproblems accordingly, which are correlated in a recursive manner. Importantly, we show that with a proper design of the segment decomposition, all sampling subproblems will only need to tackle a strongly log-concave distribution, which can be very efficient to solve using the Langevin-based samplers with a provably rapid convergence rate. As a result, we prove that the gradient complexity of RS-DMC only has a quasi-polynomial dependency on ϵ, which significantly improves exponential gradient complexity in Huang et al. (2023). Furthermore, under commonly used dissipative conditions, our algorithm is provably much faster than the popular Langevin-based algorithms. Our algorithm design and theoretical framework illuminate a novel direction for addressing sampling problems, which could be of broader applicability in the community.
@inproceedings{huang2024faster, abbr = {COLT}, bibtex_show = {true}, title = {Faster Sampling without Isoperimetry via Diffusion-based Monte Carlo}, author = {Huang, Xunpeng and Zou, Difan and Dong, Hanze and Ma, Yian and Zhang, Tong}, booktitle = {Annual Conference on Learning Theory (COLT)}, year = {2024}, html = {https://arxiv.org/abs/2401.06325}, selected = {true} } - ICMLFaster Sampling via Stochastic Gradient Proximal SamplerXunpeng Huang, Difan Zou, Yian Ma, Hanze Dong, and Tong ZhangIn International Conference on Machine Learning (ICML) 2024
Stochastic gradients have been widely integrated into Langevin-based methods to improve their scalability and efficiency in solving large-scale sampling problems. However, the proximal sampler, which exhibits much faster convergence than Langevin-based algorithms in the deterministic setting  \citelee2021structured, has yet to be explored in its stochastic variants. In this paper, we study the Stochastic Proximal Samplers (SPS) for sampling from non-log-concave distributions. We first establish a general framework for implementing stochastic proximal samplers and establish the convergence theory accordingly. We show that the convergence to the target distribution can be guaranteed as long as the second moment of the algorithm trajectory is bounded and restricted Gaussian oracles can be well approximated. We then provide two implementable variants based on Stochastic gradient Langevin dynamics (SGLD) and Metropolis-adjusted Langevin algorithm (MALA), giving rise to SPS-SGLD and SPS-MALA. We further show that SPS-SGLD and SPS-MALA can achieve ε-sampling error in total variation (TV) distance within \tilde\mathcalO(dε^-2) and \tilde\mathcalO(d^1/2ε^-2) gradient complexities, which outperform the best-known result by at least an \tilde\mathcalO(d^1/3) factor. This enhancement in performance is corroborated by our empirical studies on synthetic data with various dimensions, demonstrating the efficiency of our proposed algorithm.
@inproceedings{huang2024sps, abbr = {ICML}, bibtex_show = {true}, title = {Faster Sampling via Stochastic Gradient Proximal Sampler}, author = {Huang, Xunpeng and Zou, Difan and Ma, Yian and Dong, Hanze and Zhang, Tong}, booktitle = {International Conference on Machine Learning (ICML)}, year = {2024}, html = {https://proceedings.mlr.press/v235/huang24aj.html}, selected = {true} } - ICLRReverse Diffusion Monte CarloXunpeng Huang, Hanze Dong, Yifan HAO, Yian Ma, and Tong ZhangIn International Conference on Learning Representations (ICLR) 2024
The efficacy of modern generative models is commonly contingent upon the precision of score estimation along the diffusion path, with a focus on diffusion models and their ability to generate high-quality data samples. This study delves into the application of reverse diffusion to Monte Carlo sampling. It is shown that score estimation can be transformed into a mean estimation problem via the decomposition of the transition kernel. By estimating the mean of the posterior distribution, we derive a novel Monte Carlo sampling algorithm from the reverse diffusion process, which is distinct from traditional Markov Chain Monte Carlo (MCMC) methods. We calculate the error requirements and sample size for the posterior distribution, and use the result to derive an algorithm that can approximate the target distribution to any desired accuracy. Additionally, by estimating the log-Sobolev constant of the posterior distribution, we show under suitable conditions the problem of sampling from the posterior can be easier than direct sampling from the target distribution using traditional MCMC techniques. For Gaussian mixture models, we demonstrate that the new algorithm achieves significant improvement over the traditional Langevin-style MCMC sampling methods both theoretically and practically. Our algorithm offers a new perspective and solution beyond classical MCMC algorithms for challenging complex distributions.
@inproceedings{huang2024reversf, abbr = {ICLR}, bibtex_show = {true}, title = {Reverse Diffusion Monte Carlo}, author = {Huang, Xunpeng and Dong, Hanze and HAO, Yifan and Ma, Yian and Zhang, Tong}, booktitle = {International Conference on Learning Representations (ICLR)}, year = {2024}, html = {https://openreview.net/forum?id=kIPEyMSdFV}, selected = {true} }
2021
- AAAIACMo: Angle-Calibrated Moment Methods for Stochastic OptimizationXunpeng Huang, Runxin Xu, Hao Zhou, Zhe Wang, Zhengyang Liu, and Lei LiIn Proceedings of the AAAI Conference on Artificial Intelligence 2021
Stochastic gradient descent (SGD) is a widely used method for its outstanding generalization ability and simplicity. Adaptive gradient methods have been proposed to further accelerate the optimization process. In this paper, we revisit existing adaptive gradient optimization methods with a new interpretation. Such new perspective leads to a refreshed understanding of the roles of second moments in stochastic optimization. Based on this, we propose Angle-Calibration Moment method (ACMo), a novel stochastic optimization method. It enjoys the benefits of second moments with only first moment updates. Theoretical analysis shows that ACMo is able to achieve the same convergence rate as mainstream adaptive methods. Experiments on a variety of CV and NLP tasks demonstrate that ACMo has a comparable convergence to state-of-the-art Adam-type optimizers, and even a better generalization performance in most cases. The code is available at https://github.com/Xunpeng746/ACMo.
@inproceedings{huang2021acmo, abbr = {AAAI}, bibtex_show = {true}, title = {ACMo: Angle-Calibrated Moment Methods for Stochastic Optimization}, author = {Huang, Xunpeng and Xu, Runxin and Zhou, Hao and Wang, Zhe and Liu, Zhengyang and Li, Lei}, booktitle = {Proceedings of the AAAI Conference on Artificial Intelligence}, volume = {35}, number = {9}, pages = {7857--7864}, year = {2021}, html = {https://ojs.aaai.org/index.php/AAAI/article/view/16959} }
2020
- AAAISpan: A stochastic projected approximate newton methodXunpeng Huang, Xianfeng Liang, Zhengyang Liu, Lei Li, Yue Yu, and Yitan LiIn Proceedings of the AAAI Conference on Artificial Intelligence 2020
Second-order optimization methods have desirable convergence properties. However, the exact Newton method requires expensive computation for the Hessian and its inverse. In this paper, we propose SPAN, a novel approximate and fast Newton method. SPAN computes the inverse of the Hessian matrix via low-rank approximation and stochastic Hessian-vector products. Our experiments on multiple benchmark datasets demonstrate that SPAN outperforms existing first-order and second-order optimization methods in terms of the convergence wall-clock time. Furthermore, we provide a theoretical analysis of the per-iteration complexity, the approximation error, and the convergence rate. Both the theoretical analysis and experimental results show that our proposed method achieves a better trade-off between the convergence rate and the per-iteration efficiency.
@inproceedings{huang2020span, abbr = {AAAI}, bibtex_show = {true}, title = {Span: A stochastic projected approximate newton method}, author = {Huang, Xunpeng and Liang, Xianfeng and Liu, Zhengyang and Li, Lei and Yu, Yue and Li, Yitan}, booktitle = {Proceedings of the AAAI Conference on Artificial Intelligence}, volume = {34}, number = {02}, pages = {1520--1527}, year = {2020}, html = {https://ojs.aaai.org/index.php/AAAI/article/view/5511}, selected = {false} }
2018
- Inf. SciFinding potential lenders in P2P lending: A hybrid random walk approachHefu Zhang, Hongke Zhao, Qi Liu, Tong Xu, Enhong Chen, and Xunpeng HuangInformation Sciences 2018
@article{zhang2018finding, abbr = {Inf. Sci}, bibtex_show = {true}, title = {Finding potential lenders in P2P lending: A hybrid random walk approach}, author = {Zhang, Hefu and Zhao, Hongke and Liu, Qi and Xu, Tong and Chen, Enhong and Huang, Xunpeng}, journal = {Information Sciences}, volume = {432}, pages = {376--391}, year = {2018}, publisher = {Elsevier} } - AAAIConfidence-aware matrix factorization for recommender systemsChao Wang, Qi Liu, Runze Wu, Enhong Chen, Chuanren Liu, Xunpeng Huang, and Zhenya HuangIn Proceedings of the AAAI Conference on Artificial Intelligence 2018
@inproceedings{wang2018confidence, abbr = {AAAI}, bibtex_show = {true}, title = {Confidence-aware matrix factorization for recommender systems}, author = {Wang, Chao and Liu, Qi and Wu, Runze and Chen, Enhong and Liu, Chuanren and Huang, Xunpeng and Huang, Zhenya}, booktitle = {Proceedings of the AAAI Conference on Artificial Intelligence}, volume = {32}, number = {1}, year = {2018} } - DASFAAEnhancing network embedding with auxiliary information: An explicit matrix factorization perspectiveJunliang Guo, Linli Xu, Xunpeng Huang, and Enhong ChenIn International Conference on Database Systems for Advanced Applications 2018
@inproceedings{guo2018enhancing, abbr = {DASFAA}, bibtex_show = {true}, title = {Enhancing network embedding with auxiliary information: An explicit matrix factorization perspective}, author = {Guo, Junliang and Xu, Linli and Huang, Xunpeng and Chen, Enhong}, booktitle = {International Conference on Database Systems for Advanced Applications}, pages = {3--19}, year = {2018}, organization = {Springer} }
2017
- IJCAIIncremental Matrix Factorization: A Linear Feature Transformation Perspective.Xunpeng Huang, Le Wu, Enhong Chen, Hengshu Zhu, Qi Liu, and Yijun WangIn IJCAI 2017
Matrix Factorization (MF) is among the most widely used techniques for collaborative filtering based recommendation. Along this line, a critical demand is to incrementally refine the MF models when new ratings come in an online scenario. However, most of existing incremental MF algorithms are limited by specific MF models or strict use restrictions. In this paper, we propose a general incremental MF framework by designing a linear transformation of user and item latent vectors over time. This framework shows a relatively high accuracy with a computation and space efficient training process in an online scenario. Meanwhile, we explain the framework with a low-rank approximation perspective, and give an upper bound on the training error when this framework is used for incremental learning in some special cases. Finally, extensive experimental results on two real-world datasets clearly validate the effectiveness, efficiency and storage performance of the proposed framework.
@inproceedings{huang2017incremental, abbr = {IJCAI}, bibtex_show = {true}, title = {Incremental Matrix Factorization: A Linear Feature Transformation Perspective.}, author = {Huang, Xunpeng and Wu, Le and Chen, Enhong and Zhu, Hengshu and Liu, Qi and Wang, Yijun}, booktitle = {IJCAI}, pages = {1901--1908}, year = {2017}, html = {https://www.ijcai.org/proceedings/2017/0264.pdf} }