I am a third-year Ph.D. student at HKUST, co-advised by Prof. Tong Zhang and Prof. Yang Xiang.
I am furtunate work closely with Prof. Yi-an Ma (University of California San Diego) and Prof. Difan Zou (The University of Hong Kong).
Before that, I got my M.sc. degree in School of Computer Science and Technology from University of Science and Technology of China supervised by Prof. Enhong Chen.
Besides, I used to work at Bytedance AI Lab under Prof. Lei Li’s supervision.
My current research interests are machine learning algorithms and theory including sampling algorithms, stochastic/nonconvex optimization and mean-field analysis.
news
Sep 1, 2021
I joined Statml group in HKUST.
selected publications
NeurIPS
Reverse Transition Kernel: A Flexible Framework to Accelerate Diffusion Inference
In Annual Conference on Neural Information Processing Systems (NeurIPS) 2024, Best paper of ICML Workshop on Structured Probabilistic Inference & Generative Modeling
2024
To generate data from trained diffusion models, most inference algorithms, such as DDPM, DDIM, and other variants, rely on discretizing the reverse SDEs or their equivalent ODEs. In this paper, we view such approaches as decomposing the entire denoising diffusion process into several segments, each corresponding to a reverse transition kernel (RTK) sampling subproblem. Specifically, DDPM uses a Gaussian approximation for the RTK, resulting in low per-subproblem complexity but requiring a large number of segments (i.e., subproblems), which is conjectured to be inefficient. To address this, we develop a general RTK framework that enables a more balanced subproblem decomposition, resulting in \tildeO(1) subproblems, each with strongly log-concave targets. We then propose leveraging two fast sampling algorithms, the Metropolis-Adjusted Langevin Algorithm (MALA) and Underdamped Langevin Dynamics (ULD), for solving these strongly log-concave subproblems. This gives rise to the RTK-MALA and RTK-ULD algorithms for diffusion inference. In theory, we further develop the convergence guarantees for RTK-MALA and RTK-ULD in total variation (TV) distance: RTK-ULD can achieve εtarget error within \tildeO(d^1/2ε^-1) under mild conditions, and RTK-MALA enjoys a O(d^2\log(d/ε)) convergence rate under slightly stricter conditions. These theoretical results surpass the state-of-the-art convergence rates for diffusion inference and are well supported by numerical experiments.
COLT
Faster Sampling without Isoperimetry via Diffusion-based Monte Carlo
To sample from a general target distribution p∗∝e−f∗ beyond the isoperimetric condition, Huang et al. (2023) proposed to perform sampling through reverse diffusion, giving rise to Diffusion-based Monte Carlo (DMC). Specifically, DMC follows the reverse SDE of a diffusion process that transforms the target distribution to the standard Gaussian, utilizing a non-parametric score estimation. However, the original DMC algorithm encountered high gradient complexity, resulting in an exponential dependency on the error tolerance ϵ of the obtained samples. In this paper, we demonstrate that the high complexity of DMC originates from its redundant design of score estimation, and proposed a more efficient algorithm, called RS-DMC, based on a novel recursive score estimation method. In particular, we first divide the entire diffusion process into multiple segments and then formulate the score estimation step (at any time step) as a series of interconnected mean estimation and sampling subproblems accordingly, which are correlated in a recursive manner. Importantly, we show that with a proper design of the segment decomposition, all sampling subproblems will only need to tackle a strongly log-concave distribution, which can be very efficient to solve using the Langevin-based samplers with a provably rapid convergence rate. As a result, we prove that the gradient complexity of RS-DMC only has a quasi-polynomial dependency on ϵ, which significantly improves exponential gradient complexity in Huang et al. (2023). Furthermore, under commonly used dissipative conditions, our algorithm is provably much faster than the popular Langevin-based algorithms. Our algorithm design and theoretical framework illuminate a novel direction for addressing sampling problems, which could be of broader applicability in the community.
ICML
Faster Sampling via Stochastic Gradient Proximal Sampler
Stochastic gradients have been widely integrated into Langevin-based methods to improve their scalability and efficiency in solving large-scale sampling problems. However, the proximal sampler, which exhibits much faster convergence than Langevin-based algorithms in the deterministic setting \citelee2021structured, has yet to be explored in its stochastic variants. In this paper, we study the Stochastic Proximal Samplers (SPS) for sampling from non-log-concave distributions. We first establish a general framework for implementing stochastic proximal samplers and establish the convergence theory accordingly. We show that the convergence to the target distribution can be guaranteed as long as the second moment of the algorithm trajectory is bounded and restricted Gaussian oracles can be well approximated. We then provide two implementable variants based on Stochastic gradient Langevin dynamics (SGLD) and Metropolis-adjusted Langevin algorithm (MALA), giving rise to SPS-SGLD and SPS-MALA. We further show that SPS-SGLD and SPS-MALA can achieve ε-sampling error in total variation (TV) distance within \tilde\mathcalO(dε^-2) and \tilde\mathcalO(d^1/2ε^-2) gradient complexities, which outperform the best-known result by at least an \tilde\mathcalO(d^1/3) factor. This enhancement in performance is corroborated by our empirical studies on synthetic data with various dimensions, demonstrating the efficiency of our proposed algorithm.
The efficacy of modern generative models is commonly contingent upon the precision of score estimation along the diffusion path, with a focus on diffusion models and their ability to generate high-quality data samples. This study delves into the application of reverse diffusion to Monte Carlo sampling. It is shown that score estimation can be transformed into a mean estimation problem via the decomposition of the transition kernel. By estimating the mean of the posterior distribution, we derive a novel Monte Carlo sampling algorithm from the reverse diffusion process, which is distinct from traditional Markov Chain Monte Carlo (MCMC) methods. We calculate the error requirements and sample size for the posterior distribution, and use the result to derive an algorithm that can approximate the target distribution to any desired accuracy. Additionally, by estimating the log-Sobolev constant of the posterior distribution, we show under suitable conditions the problem of sampling from the posterior can be easier than direct sampling from the target distribution using traditional MCMC techniques. For Gaussian mixture models, we demonstrate that the new algorithm achieves significant improvement over the traditional Langevin-style MCMC sampling methods both theoretically and practically. Our algorithm offers a new perspective and solution beyond classical MCMC algorithms for challenging complex distributions.