publications
publications by categories in reversed chronological order. generated by jekyll-scholar.
2025
- PreprintNew Sphere Packings from the Antipode ConstructionIn arXiv preprint, 2025
We construct non-lattice sphere packings in dimensions 19, 20, 21, 23, 44, 45, and 47, demonstrating record densities that surpass all previously documented results in these dimensions. The construction applies the antipode method to suboptimal cross-sections of \(\Lambda_{24}\) and \(P_{48p}\).
2024
2023
- ICLRProvable Sim-to-real Transfer in Continuous Domain with Partial ObservationsIn International Conference on Learning Representations, 2023
We study sim-to-real transfer in continuous domains with partial observations, modeled by linear quadratic Gaussian (LQG) systems. We show that a popular robust adversarial training algorithm can learn a policy from simulation that is competitive to the optimal real-world policy, providing the first provable guarantee in this setting.
2022
- ICLRUnderstanding Domain Randomization for Sim-to-real TransferIn International Conference on Learning Representations(Spotlight, top 6%) , 2022
We provide a theoretical framework for domain randomization, modeling the simulator as a set of MDPs with tunable parameters. We prove sharp bounds on the sim-to-real gap and show that successful transfer is achievable without any real-world training samples, highlighting the importance of history-dependent policies.
2021
- ICMLNear-Optimal Representation Learning for Linear Bandits and Linear RLIn Proceedings of the 38th International Conference on Machine Learning, 2021
We study multi-task representation learning for linear bandits and episodic RL with linear value function approximation. Our algorithm MTLR-OFUL achieves \(\tilde{O}(M\sqrt{dkT} + d\sqrt{kMT})\) regret, significantly improving over the \(\tilde{O}(Md\sqrt{T})\) baseline, yielding the first theoretical characterization of multi-task representation learning benefits in RL exploration.
2020
- ICLRDistributed Bandit Learning: Near-Optimal Regret with Efficient CommunicationIn International Conference on Learning Representations, 2020
We design communication protocols for distributed bandit learning with M agents under central coordination. For multi-armed bandits, we achieve near-optimal regret with only \(O(M\log(MK))\) communication cost — independent of the time horizon T and matching the lower bound up to a log factor.