Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–50 of 2,167 results for author: Zhao, Y

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.04127  [pdf, other

    cs.CL cs.AI

    Are We Done with MMLU?

    Authors: Aryo Pradipta Gema, Joshua Ong Jun Leang, Giwon Hong, Alessio Devoto, Alberto Carlo Maria Mancino, Rohit Saxena, Xuanli He, Yu Zhao, Xiaotang Du, Mohammad Reza Ghasemi Madani, Claire Barale, Robert McHardy, Joshua Harris, Jean Kaddour, Emile van Krieken, Pasquale Minervini

    Abstract: Maybe not. We identify and analyse errors in the popular Massive Multitask Language Understanding (MMLU) benchmark. Even though MMLU is widely adopted, our analysis demonstrates numerous ground truth errors that obscure the true capabilities of LLMs. For example, we find that 57% of the analysed questions in the Virology subset contain errors. To address this issue, we introduce a comprehensive fr… ▽ More

    Submitted 7 June, 2024; v1 submitted 6 June, 2024; originally announced June 2024.

  2. arXiv:2406.03978  [pdf, other

    cs.MA cs.LG

    Mini Honor of Kings: A Lightweight Environment for Multi-Agent Reinforcement Learning

    Authors: Lin Liu, Jian Zhao, Cheng Hu, Zhengtao Cao, Youpeng Zhao, Zhenbin Ye, Meng Meng, Wenjun Wang, Zhaofeng He, Houqiang Li, Xia Lin, Lanxiao Huang

    Abstract: Games are widely used as research environments for multi-agent reinforcement learning (MARL), but they pose three significant challenges: limited customization, high computational demands, and oversimplification. To address these issues, we introduce the first publicly available map editor for the popular mobile game Honor of Kings and design a lightweight environment, Mini Honor of Kings (Mini Ho… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

  3. arXiv:2406.03963  [pdf, other

    cs.CL

    A + B: A General Generator-Reader Framework for Optimizing LLMs to Unleash Synergy Potential

    Authors: Wei Tang, Yixin Cao, Jiahao Ying, Bo Wang, Yuyue Zhao, Yong Liao, Pengyuan Zhou

    Abstract: Retrieval-Augmented Generation (RAG) is an effective solution to supplement necessary knowledge to large language models (LLMs). Targeting its bottleneck of retriever performance, "generate-then-read" pipeline is proposed to replace the retrieval stage with generation from the LLM itself. Although promising, this research direction is underexplored and still cannot work in the scenario when source… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

    Comments: Accepted to ACL'24 (Findings)

  4. arXiv:2406.03333  [pdf, other

    cs.CV

    A Flexible Recursive Network for Video Stereo Matching Based on Residual Estimation

    Authors: Youchen Zhao, Guorong Luo, Hua Zhong, Haixiong Li

    Abstract: Due to the high similarity of disparity between consecutive frames in video sequences, the area where disparity changes is defined as the residual map, which can be calculated. Based on this, we propose RecSM, a network based on residual estimation with a flexible recursive structure for video stereo matching. The RecSM network accelerates stereo matching using a Multi-scale Residual Estimation Mo… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

  5. arXiv:2406.03088  [pdf, other

    cs.AR cs.LG

    HASS: Hardware-Aware Sparsity Search for Dataflow DNN Accelerator

    Authors: Zhewen Yu, Sudarshan Sreeram, Krish Agrawal, Junyi Wu, Alexander Montgomerie-Corcoran, Cheng Zhang, Jianyi Cheng, Christos-Savvas Bouganis, Yiren Zhao

    Abstract: Deep Neural Networks (DNNs) excel in learning hierarchical representations from raw data, such as images, audio, and text. To compute these DNN models with high performance and energy efficiency, these models are usually deployed onto customized hardware accelerators. Among various accelerator designs, dataflow architecture has shown promising performance due to its layer-pipelined structure and i… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

    Comments: accepted to FPL2024

  6. arXiv:2406.03032  [pdf, other

    cs.CV

    Instructing Prompt-to-Prompt Generation for Zero-Shot Learning

    Authors: Man Liu, Huihui Bai, Feng Li, Chunjie Zhang, Yunchao Wei, Meng Wang, Tat-Seng Chua, Yao Zhao

    Abstract: Zero-shot learning (ZSL) aims to explore the semantic-visual interactions to discover comprehensive knowledge transferred from seen categories to classify unseen categories. Recently, prompt engineering has emerged in ZSL, demonstrating impressive potential as it enables the zero-shot transfer of diverse visual concepts to downstream tasks. However, these methods are still not well generalized to… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

  7. arXiv:2406.02696  [pdf, other

    cs.LG

    iQRL -- Implicitly Quantized Representations for Sample-efficient Reinforcement Learning

    Authors: Aidan Scannell, Kalle Kujanpää, Yi Zhao, Mohammadreza Nakhaei, Arno Solin, Joni Pajarinen

    Abstract: Learning representations for reinforcement learning (RL) has shown much promise for continuous control. We propose an efficient representation learning method using only a self-supervised latent-state consistency loss. Our approach employs an encoder and a dynamics model to map observations to latent states and predict future latent states, respectively. We achieve high performance and prevent rep… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

    Comments: 9 pages, 11 figures

  8. arXiv:2406.02074  [pdf, other

    cs.CV

    FaceCom: Towards High-fidelity 3D Facial Shape Completion via Optimization and Inpainting Guidance

    Authors: Yinglong Li, Hongyu Wu, Xiaogang Wang, Qingzhao Qin, Yijiao Zhao, Yong wang, Aimin Hao

    Abstract: We propose FaceCom, a method for 3D facial shape completion, which delivers high-fidelity results for incomplete facial inputs of arbitrary forms. Unlike end-to-end shape completion methods based on point clouds or voxels, our approach relies on a mesh-based generative network that is easy to optimize, enabling it to handle shape completion for irregular facial scans. We first train a shape genera… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

    Comments: accepted to CVPR2024

  9. arXiv:2406.02066  [pdf, other

    cs.LG q-bio.BM

    Preference Optimization for Molecule Synthesis with Conditional Residual Energy-based Models

    Authors: Songtao Liu, Hanjun Dai, Yue Zhao, Peng Liu

    Abstract: Molecule synthesis through machine learning is one of the fundamental problems in drug discovery. Current data-driven strategies employ one-step retrosynthesis models and search algorithms to predict synthetic routes in a top-bottom manner. Despite their effective performance, these strategies face limitations in the molecule synthetic route generation due to a greedy selection of the next molecul… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

    Comments: Accepted by ICML 2024(Oral)

  10. arXiv:2406.01987  [pdf, other

    cs.CV

    Dealing with All-stage Missing Modality: Towards A Universal Model with Robust Reconstruction and Personalization

    Authors: Yunpeng Zhao, Cheng Chen, Qing You Pang, Quanzheng Li, Carol Tang, Beng-Ti Ang, Yueming Jin

    Abstract: Addressing missing modalities presents a critical challenge in multimodal learning. Current approaches focus on developing models that can handle modality-incomplete inputs during inference, assuming that the full set of modalities are available for all the data during training. This reliance on full-modality data for training limits the use of abundant modality-incomplete samples that are often e… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

  11. arXiv:2406.01983  [pdf, other

    cs.CL

    RKLD: Reverse KL-Divergence-based Knowledge Distillation for Unlearning Personal Information in Large Language Models

    Authors: Bichen Wang, Yuzhe Zi, Yixin Sun, Yanyan Zhao, Bing Qin

    Abstract: With the passage of the Right to Be Forgotten (RTBF) regulations and the scaling up of language model training datasets, research on model unlearning in large language models (LLMs) has become more crucial. Before the era of LLMs, machine unlearning research focused mainly on classification tasks in models with small parameters. In these tasks, the content to be forgotten or retained is clear and… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

    Comments: Work is in progress

  12. arXiv:2406.01414  [pdf, other

    cs.LG eess.SP

    CE-NAS: An End-to-End Carbon-Efficient Neural Architecture Search Framework

    Authors: Yiyang Zhao, Yunzhuo Liu, Bo Jiang, Tian Guo

    Abstract: This work presents a novel approach to neural architecture search (NAS) that aims to increase carbon efficiency for the model design process. The proposed framework CE-NAS addresses the key challenge of high carbon cost associated with NAS by exploring the carbon emission variations of energy and energy differences of different NAS algorithms. At the high level, CE-NAS leverages a reinforcement-le… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

    Comments: arXiv admin note: text overlap with arXiv:2307.04131

  13. arXiv:2406.01125  [pdf, other

    cs.CV

    $Δ$-DiT: A Training-Free Acceleration Method Tailored for Diffusion Transformers

    Authors: Pengtao Chen, Mingzhu Shen, Peng Ye, Jianjian Cao, Chongjun Tu, Christos-Savvas Bouganis, Yiren Zhao, Tao Chen

    Abstract: Diffusion models are widely recognized for generating high-quality and diverse images, but their poor real-time performance has led to numerous acceleration works, primarily focusing on UNet-based structures. With the more successful results achieved by diffusion transformers (DiT), there is still a lack of exploration regarding the impact of DiT structure on generation, as well as the absence of… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

    Comments: 12 pages, 6 figures, 6 tables

  14. arXiv:2406.00770  [pdf, other

    cs.CL cs.AI

    Automatic Instruction Evolving for Large Language Models

    Authors: Weihao Zeng, Can Xu, Yingxiu Zhao, Jian-Guang Lou, Weizhu Chen

    Abstract: Fine-tuning large pre-trained language models with Evol-Instruct has achieved encouraging results across a wide range of tasks. However, designing effective evolving methods for instruction evolution requires substantial human expertise. This paper proposes Auto Evol-Instruct, an end-to-end framework that evolves instruction datasets using large language models without any human effort. The framew… ▽ More

    Submitted 2 June, 2024; originally announced June 2024.

  15. arXiv:2406.00758  [pdf, other

    eess.IV cs.CV cs.MM

    Once-for-All: Controllable Generative Image Compression with Dynamic Granularity Adaption

    Authors: Anqi Li, Yuxi Liu, Huihui Bai, Feng Li, Runmin Cong, Meng Wang, Yao Zhao

    Abstract: Although recent generative image compression methods have demonstrated impressive potential in optimizing the rate-distortion-perception trade-off, they still face the critical challenge of flexible rate adaption to diverse compression necessities and scenarios. To overcome this challenge, this paper proposes a Controllable Generative Image Compression framework, Control-GIC, the first capable of… ▽ More

    Submitted 5 June, 2024; v1 submitted 2 June, 2024; originally announced June 2024.

  16. arXiv:2406.00376  [pdf, other

    cs.DS cs.DB

    Approaching 100% Confidence in Stream Summary through ReliableSketch

    Authors: Yuhan Wu, Hanbo Wu, Xilai Liu, Yikai Zhao, Tong Yang, Kaicheng Yang, Sha Wang, Lihua Miao, Gaogang Xie

    Abstract: To approximate sums of values in key-value data streams, sketches are widely used in databases and networking systems. They offer high-confidence approximations for any given key while ensuring low time and space overhead. While existing sketches are proficient in estimating individual keys, they struggle to maintain this high confidence across all keys collectively, an objective that is criticall… ▽ More

    Submitted 1 June, 2024; originally announced June 2024.

  17. arXiv:2406.00341  [pdf, other

    eess.IV cs.CV

    DSCA: A Digital Subtraction Angiography Sequence Dataset and Spatio-Temporal Model for Cerebral Artery Segmentation

    Authors: Qihang Xie, Mengguo Guo, Lei Mou, Dan Zhang, Da Chen, Caifeng Shan, Yitian Zhao, Ruisheng Su, Jiong Zhang

    Abstract: Cerebrovascular diseases (CVDs) remain a leading cause of global disability and mortality. Digital Subtraction Angiography (DSA) sequences, recognized as the golden standard for diagnosing CVDs, can clearly visualize the dynamic flow and reveal pathological conditions within the cerebrovasculature. Therefore, precise segmentation of cerebral arteries (CAs) and classification between their main tru… ▽ More

    Submitted 1 June, 2024; originally announced June 2024.

  18. arXiv:2406.00291  [pdf, other

    cs.LG cs.AI

    Multi-objective Neural Architecture Search by Learning Search Space Partitions

    Authors: Yiyang Zhao, Linnan Wang, Tian Guo

    Abstract: Deploying deep learning models requires taking into consideration neural network metrics such as model size, inference latency, and #FLOPs, aside from inference accuracy. This results in deep learning model designers leveraging multi-objective optimization to design effective deep neural networks in multiple criteria. However, applying multi-objective optimizations to neural architecture search (N… ▽ More

    Submitted 31 May, 2024; originally announced June 2024.

  19. arXiv:2405.20990  [pdf, other

    cs.CR cs.AI cs.LG

    Locking Machine Learning Models into Hardware

    Authors: Eleanor Clifford, Adhithya Saravanan, Harry Langford, Cheng Zhang, Yiren Zhao, Robert Mullins, Ilia Shumailov, Jamie Hayes

    Abstract: Modern Machine Learning models are expensive IP and business competitiveness often depends on keeping this IP confidential. This in turn restricts how these models are deployed -- for example it is unclear how to deploy a model on-device without inevitably leaking the underlying model. At the same time, confidential computing technologies such as Multi-Party Computation or Homomorphic encryption r… ▽ More

    Submitted 31 May, 2024; originally announced May 2024.

    Comments: 10 pages, 2 figures of main text; 14 pages, 16 figures of appendices

  20. arXiv:2405.20234  [pdf, other

    cs.AI

    Context Injection Attacks on Large Language Models

    Authors: Cheng'an Wei, Kai Chen, Yue Zhao, Yujia Gong, Lu Xiang, Shenchen Zhu

    Abstract: Large Language Models (LLMs) such as ChatGPT and Llama-2 have become prevalent in real-world applications, exhibiting impressive text generation performance. LLMs are fundamentally developed from a scenario where the input data remains static and lacks a clear structure. To behave interactively over time, LLM-based chat systems must integrate additional contextual information (i.e., chat history)… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

  21. arXiv:2405.19856  [pdf, other

    cs.CL cs.SE

    DevEval: A Manually-Annotated Code Generation Benchmark Aligned with Real-World Code Repositories

    Authors: Jia Li, Ge Li, Yunfei Zhao, Yongmin Li, Huanyu Liu, Hao Zhu, Lecheng Wang, Kaibo Liu, Zheng Fang, Lanshen Wang, Jiazheng Ding, Xuanming Zhang, Yuqi Zhu, Yihong Dong, Zhi Jin, Binhua Li, Fei Huang, Yongbin Li

    Abstract: How to evaluate the coding abilities of Large Language Models (LLMs) remains an open question. We find that existing benchmarks are poorly aligned with real-world code repositories and are insufficient to evaluate the coding abilities of LLMs. To address the knowledge gap, we propose a new benchmark named DevEval, which has three advances. (1) DevEval aligns with real-world repositories in multi… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

    Comments: Accepted by the 62nd Annual Meeting of the Association for Computational Linguistics (ACL 2024). arXiv admin note: substantial text overlap with arXiv:2404.00599, arXiv:2401.06401

  22. arXiv:2405.19684  [pdf, other

    cs.CV

    A Comprehensive Survey on Underwater Image Enhancement Based on Deep Learning

    Authors: Xiaofeng Cong, Yu Zhao, Jie Gui, Junming Hou, Dacheng Tao

    Abstract: Underwater image enhancement (UIE) is a challenging research task in the field of computer vision. Although hundreds of UIE algorithms have been proposed, a comprehensive and systematic review is still lacking. To promote future research, we summarize the UIE task from multiple perspectives. First, the physical models, data construction processes, evaluation metrics, and loss functions are introdu… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

    Comments: A survey on the underwater image enhancement task

  23. arXiv:2405.19673  [pdf, other

    cs.LG cs.AI stat.ML

    Bridging Model-Based Optimization and Generative Modeling via Conservative Fine-Tuning of Diffusion Models

    Authors: Masatoshi Uehara, Yulai Zhao, Ehsan Hajiramezanali, Gabriele Scalia, Gökcen Eraslan, Avantika Lal, Sergey Levine, Tommaso Biancalani

    Abstract: AI-driven design problems, such as DNA/protein sequence design, are commonly tackled from two angles: generative modeling, which efficiently captures the feasible design space (e.g., natural images or biological sequences), and model-based optimization, which utilizes reward models for extrapolation. To combine the strengths of both approaches, we adopt a hybrid method that fine-tunes cutting-edge… ▽ More

    Submitted 31 May, 2024; v1 submitted 29 May, 2024; originally announced May 2024.

    Comments: Under review

  24. arXiv:2405.19609  [pdf, other

    cs.CV cs.GR

    SMPLX-Lite: A Realistic and Drivable Avatar Benchmark with Rich Geometry and Texture Annotations

    Authors: Yujiao Jiang, Qingmin Liao, Zhaolong Wang, Xiangru Lin, Zongqing Lu, Yuxi Zhao, Hanqing Wei, Jingrui Ye, Yu Zhang, Zhijing Shao

    Abstract: Recovering photorealistic and drivable full-body avatars is crucial for numerous applications, including virtual reality, 3D games, and tele-presence. Most methods, whether reconstruction or generation, require large numbers of human motion sequences and corresponding textured meshes. To easily learn a drivable avatar, a reasonable parametric body model with unified topology is paramount. However,… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

    Comments: ICME 2024;Project page: https://alex-jyj.github.io/SMPLX-Lite/

  25. arXiv:2405.18971  [pdf, other

    cs.IR

    Mitigate Position Bias with Coupled Ranking Bias on CTR Prediction

    Authors: Yao Zhao, Zhining Liu, Tianchi Cai, Haipeng Zhang, Chenyi Zhuang, Jinjie Gu

    Abstract: Position bias, i.e., users' preference of an item is affected by its placing position, is well studied in the recommender system literature. However, most existing methods ignore the widely coupled ranking bias, which is also related to the placing position of the item. Using both synthetic and industrial datasets, we first show how this widely coexisted ranking bias deteriorates the performance o… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

    Comments: 5 pages, 3 figures

  26. arXiv:2405.18933  [pdf, other

    cs.LG

    LSPI: Heterogeneous Graph Neural Network Classification Aggregation Algorithm Based on Size Neighbor Path Identification

    Authors: Yufei Zhao, Shiduo Wang, Hua Duan

    Abstract: Existing heterogeneous graph neural network algorithms (HGNNs) mostly rely on meta-paths to capture the rich semantic information contained in heterogeneous graphs (also known as heterogeneous information networks (HINs)), but most of these HGNNs focus on different ways of feature aggre gation and ignore the properties of the meta-paths themselves. This paper studies meta-paths in three commonly u… ▽ More

    Submitted 31 May, 2024; v1 submitted 29 May, 2024; originally announced May 2024.

  27. arXiv:2405.18416  [pdf, other

    cs.CV

    3D StreetUnveiler with Semantic-Aware 2DGS

    Authors: Jingwei Xu, Yikai Wang, Yiqun Zhao, Yanwei Fu, Shenghua Gao

    Abstract: Unveiling an empty street from crowded observations captured by in-car cameras is crucial for autonomous driving. However, removing all temporarily static objects, such as stopped vehicles and standing pedestrians, presents a significant challenge. Unlike object-centric 3D inpainting, which relies on thorough observation in a small scene, street scene cases involve long trajectories that differ fr… ▽ More

    Submitted 30 May, 2024; v1 submitted 28 May, 2024; originally announced May 2024.

    Comments: Project page: https://streetunveiler.github.io

  28. arXiv:2405.18361  [pdf, other

    cs.CV

    Is a 3D-Tokenized LLM the Key to Reliable Autonomous Driving?

    Authors: Yifan Bai, Dongming Wu, Yingfei Liu, Fan Jia, Weixin Mao, Ziheng Zhang, Yucheng Zhao, Jianbing Shen, Xing Wei, Tiancai Wang, Xiangyu Zhang

    Abstract: Rapid advancements in Autonomous Driving (AD) tasks turned a significant shift toward end-to-end fashion, particularly in the utilization of vision-language models (VLMs) that integrate robust logical reasoning and cognitive abilities to enable comprehensive end-to-end planning. However, these VLM-based approaches tend to integrate 2D vision tokenizers and a large language model (LLM) for ego-car… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

  29. arXiv:2405.18224  [pdf, other

    cs.CV

    SSLChange: A Self-supervised Change Detection Framework Based on Domain Adaptation

    Authors: Yitao Zhao, Turgay Celik, Nanqing Liu, Feng Gao, Heng-Chao Li

    Abstract: In conventional remote sensing change detection (RS CD) procedures, extensive manual labeling for bi-temporal images is first required to maintain the performance of subsequent fully supervised training. However, pixel-level labeling for CD tasks is very complex and time-consuming. In this paper, we explore a novel self-supervised contrastive framework applicable to the RS CD task, which promotes… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

    Comments: This manuscript has been submitted to IEEE TGRS and is under review

  30. arXiv:2405.17602  [pdf, other

    cs.IR

    Augmenting Textual Generation via Topology Aware Retrieval

    Authors: Yu Wang, Nedim Lipka, Ruiyi Zhang, Alexa Siu, Yuying Zhao, Bo Ni, Xin Wang, Ryan Rossi, Tyler Derr

    Abstract: Despite the impressive advancements of Large Language Models (LLMs) in generating text, they are often limited by the knowledge contained in the input and prone to producing inaccurate or hallucinated content. To tackle these issues, Retrieval-augmented Generation (RAG) is employed as an effective strategy to enhance the available knowledge base and anchor the responses in reality by pulling addit… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

  31. arXiv:2405.17532  [pdf, other

    cs.CV

    ClassDiffusion: More Aligned Personalization Tuning with Explicit Class Guidance

    Authors: Jiannan Huang, Jun Hao Liew, Hanshu Yan, Yuyang Yin, Yao Zhao, Yunchao Wei

    Abstract: Recent text-to-image customization works have been proven successful in generating images of given concepts by fine-tuning the diffusion models on a few examples. However, these methods tend to overfit the concepts, resulting in failure to create the concept under multiple conditions (e.g. headphone is missing when generating a <sks> dog wearing a headphone'). Interestingly, we notice that the bas… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

  32. arXiv:2405.17509  [pdf, other

    cs.LG

    Reference Neural Operators: Learning the Smooth Dependence of Solutions of PDEs on Geometric Deformations

    Authors: Ze Cheng, Zhongkai Hao, Xiaoqiang Wang, Jianing Huang, Youjia Wu, Xudan Liu, Yiru Zhao, Songming Liu, Hang Su

    Abstract: For partial differential equations on domains of arbitrary shapes, existing works of neural operators attempt to learn a mapping from geometries to solutions. It often requires a large dataset of geometry-solution pairs in order to obtain a sufficiently accurate neural operator. However, for many industrial applications, e.g., engineering design optimization, it can be prohibitive to satisfy the r… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

  33. arXiv:2405.17419  [pdf, other

    cs.CV cs.AI cs.LG

    MultiOOD: Scaling Out-of-Distribution Detection for Multiple Modalities

    Authors: Hao Dong, Yue Zhao, Eleni Chatzi, Olga Fink

    Abstract: Detecting out-of-distribution (OOD) samples is important for deploying machine learning models in safety-critical applications such as autonomous driving and robot-assisted surgery. Existing research has mainly focused on unimodal scenarios on image data. However, real-world applications are inherently multimodal, which makes it essential to leverage information from multiple modalities to enhance… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

    Comments: Code and MultiOOD benchmark: https://github.com/donghao51/MultiOOD

  34. arXiv:2405.17329  [pdf, other

    cs.IT eess.SP

    Joint MIMO Transceiver and Reflector Design for Reconfigurable Intelligent Surface-Assisted Communication

    Authors: Yaqiong Zhao, Jindan Xu, Wei Xu, Kezhi Wang, Xinquan Ye, Chau Yuen, Xiaohu You

    Abstract: In this paper, we consider a reconfigurable intelligent surface (RIS)-assisted multiple-input multiple-output communication system with multiple antennas at both the base station (BS) and the user. We plan to maximize the achievable rate through jointly optimizing the transmit precoding matrix, the receive combining matrix, and the RIS reflection matrix under the constraints of the transmit power… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

    Comments: 14 pages, 12 figures

  35. arXiv:2405.17325  [pdf, other

    hep-ex cs.LG

    Novel Approaches for ML-Assisted Particle Track Reconstruction and Hit Clustering

    Authors: Uraz Odyurt, Nadezhda Dobreva, Zef Wolffs, Yue Zhao, Antonio Ferrer Sánchez, Roberto Ruiz de Austri Bazan, José D. Martín-Guerrero, Ana-Lucia Varbanescu, Sascha Caron

    Abstract: Track reconstruction is a vital aspect of High-Energy Physics (HEP) and plays a critical role in major experiments. In this study, we delve into unexplored avenues for particle track reconstruction and hit clustering. Firstly, we enhance the algorithmic design effort by utilising a simplified simulator (REDVID) to generate training data that is specifically composed for simplicity. We demonstrate… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

  36. arXiv:2405.17149  [pdf, other

    cs.CV

    LCM: Locally Constrained Compact Point Cloud Model for Masked Point Modeling

    Authors: Yaohua Zha, Naiqi Li, Yanzi Wang, Tao Dai, Hang Guo, Bin Chen, Zhi Wang, Zhihao Ouyang, Shu-Tao Xia

    Abstract: The pre-trained point cloud model based on Masked Point Modeling (MPM) has exhibited substantial improvements across various tasks. However, these models heavily rely on the Transformer, leading to quadratic complexity and limited decoder, hindering their practice application. To address this limitation, we first conduct a comprehensive analysis of existing Transformer-based MPM, emphasizing the i… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

  37. arXiv:2405.17013  [pdf, other

    cs.CV

    MotionLLM: Multimodal Motion-Language Learning with Large Language Models

    Authors: Qi Wu, Yubo Zhao, Yifan Wang, Yu-Wing Tai, Chi-Keung Tang

    Abstract: Recent advancements in Multimodal Large Language Models (MM-LLMs) have demonstrated promising potential in terms of generalization and robustness when applied to different modalities. While previous works have already achieved 3D human motion generation using various approaches including language modeling, they mostly % are mostly carefully designed use specialized architecture and are restricted… ▽ More

    Submitted 27 May, 2024; v1 submitted 27 May, 2024; originally announced May 2024.

    Comments: Project page: https://knoxzhao.github.io/MotionLLM

  38. arXiv:2405.16645  [pdf, other

    cs.CV

    Diffusion4D: Fast Spatial-temporal Consistent 4D Generation via Video Diffusion Models

    Authors: Hanwen Liang, Yuyang Yin, Dejia Xu, Hanxue Liang, Zhangyang Wang, Konstantinos N. Plataniotis, Yao Zhao, Yunchao Wei

    Abstract: The availability of large-scale multimodal datasets and advancements in diffusion models have significantly accelerated progress in 4D content generation. Most prior approaches rely on multiple image or video diffusion models, utilizing score distillation sampling for optimization or generating pseudo novel views for direct supervision. However, these methods are hindered by slow optimization spee… ▽ More

    Submitted 26 May, 2024; originally announced May 2024.

    Comments: Project page: https://vita-group.github.io/Diffusion4D

  39. arXiv:2405.16090  [pdf, other

    cs.HC eess.SP

    EEG-DBNet: A Dual-Branch Network for Temporal-Spectral Decoding in Motor-Imagery Brain-Computer Interfaces

    Authors: Xicheng Lou, Xinwei Li, Hongying Meng, Jun Hu, Meili Xu, Yue Zhao, Jiazhang Yang, Zhangyong Li

    Abstract: Motor imagery electroencephalogram (EEG)-based brain-computer interfaces (BCIs) offer significant advantages for individuals with restricted limb mobility. However, challenges such as low signal-to-noise ratio and limited spatial resolution impede accurate feature extraction from EEG signals, thereby affecting the classification accuracy of different actions. To address these challenges, this stud… ▽ More

    Submitted 30 May, 2024; v1 submitted 25 May, 2024; originally announced May 2024.

  40. arXiv:2405.16071  [pdf, other

    cs.CV

    DynRefer: Delving into Region-level Multi-modality Tasks via Dynamic Resolution

    Authors: Yuzhong Zhao, Feng Liu, Yue Liu, Mingxiang Liao, Chen Gong, Qixiang Ye, Fang Wan

    Abstract: Region-level multi-modality methods can translate referred image regions to human preferred language descriptions. Unfortunately, most of existing methods using fixed visual inputs remain lacking the resolution adaptability to find out precise language descriptions. In this study, we propose a dynamic resolution approach, referred to as DynRefer, to pursue high-accuracy region-level referring thro… ▽ More

    Submitted 25 May, 2024; originally announced May 2024.

    Comments: Code is available at https://github.com/callsys/DynRefer

  41. arXiv:2405.16062  [pdf, other

    cs.IT eess.SP

    Movable Antenna Empowered Physical Layer Security Without Eve's CSI: Joint Optimization of Beamforming and Antenna Positions

    Authors: Zhiyong Feng, Yujia Zhao, Kan Yu, Dong Li

    Abstract: Physical layer security (PLS) technology based on the fixed-position antenna (FPA) has {attracted widespread attention}. Due to the fixed feature of the antennas, current FPA-based PLS schemes cannot fully utilize the spatial degree of freedom, and thus a weaken secure gain in the desired/undesired direction may exist. Different from the concept of FPA, mobile antenna (MA) is a novel technology th… ▽ More

    Submitted 25 May, 2024; originally announced May 2024.

  42. arXiv:2405.16056  [pdf, other

    cs.LG

    FedSheafHN: Personalized Federated Learning on Graph-structured Data

    Authors: Wenfei Liang, Yanan Zhao, Rui She, Yiming Li, Wee Peng Tay

    Abstract: Personalized subgraph Federated Learning (FL) is a task that customizes Graph Neural Networks (GNNs) to individual client needs, accommodating diverse data distributions. However, applying hypernetworks in FL, while aiming to facilitate model personalization, often encounters challenges due to inadequate representation of client-specific characteristics. To overcome these limitations, we propose a… ▽ More

    Submitted 31 May, 2024; v1 submitted 25 May, 2024; originally announced May 2024.

    Comments: This paper was submitted to ICML 2024 in Feb 2024. You can find a record here:https://github.com/CarrieWFF/ICML-2024-submission-recording/blob/main/Screenshot%20of%20FedSheafHN%20submission%20to%20ICML%202024.png

  43. arXiv:2405.15630  [pdf, other

    cs.SE

    GPTZoo: A Large-scale Dataset of GPTs for the Research Community

    Authors: Xinyi Hou, Yanjie Zhao, Shenao Wang, Haoyu Wang

    Abstract: The rapid advancements in Large Language Models (LLMs) have revolutionized natural language processing, with GPTs, customized versions of ChatGPT available on the GPT Store, emerging as a prominent technology for specific domains and tasks. To support academic research on GPTs, we introduce GPTZoo, a large-scale dataset comprising 730,420 GPT instances. Each instance includes rich metadata with 21… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

  44. Let Me Do It For You: Towards LLM Empowered Recommendation via Tool Learning

    Authors: Yuyue Zhao, Jiancan Wu, Xiang Wang, Wei Tang, Dingxian Wang, Maarten de Rijke

    Abstract: Conventional recommender systems (RSs) face challenges in precisely capturing users' fine-grained preferences. Large language models (LLMs) have shown capabilities in commonsense reasoning and leveraging external tools that may help address these challenges. However, existing LLM-based RSs suffer from hallucinations, misalignment between the semantic space of items and the behavior space of users,… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

  45. "This really lets us see the entire world:" Designing a conversational telepresence robot for homebound older adults

    Authors: Yaxin Hu, Laura Stegner, Yasmine Kotturi, Caroline Zhang, Yi-Hao Peng, Faria Huq, Yuhang Zhao, Jeffrey P. Bigham, Bilge Mutlu

    Abstract: In this paper, we explore the design and use of conversational telepresence robots to help homebound older adults interact with the external world. An initial needfinding study (N=8) using video vignettes revealed older adults' experiential needs for robot-mediated remote experiences such as exploration, reminiscence and social participation. We then designed a prototype system to support these go… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

    Comments: In proceedings of ACM Designing Interactive Systems (DIS) 2024

    MSC Class: 68-06

  46. arXiv:2405.14855  [pdf, other

    cs.CV cs.AI

    Synergistic Global-space Camera and Human Reconstruction from Videos

    Authors: Yizhou Zhao, Tuanfeng Y. Wang, Bhiksha Raj, Min Xu, Jimei Yang, Chun-Hao Paul Huang

    Abstract: Remarkable strides have been made in reconstructing static scenes or human bodies from monocular videos. Yet, the two problems have largely been approached independently, without much synergy. Most visual SLAM methods can only reconstruct camera trajectories and scene structures up to scale, while most HMR methods reconstruct human meshes in metric scale but fall short in reasoning with cameras an… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

    Comments: CVPR 2024

  47. arXiv:2405.14007  [pdf, other

    cs.LG

    A Practice in Enrollment Prediction with Markov Chain Models

    Authors: Yan Zhao, Amy Otteson

    Abstract: Enrollment projection is a critical aspect of university management, guiding decisions related to resource allocation and revenue forecasting. However, despite its importance, there remains a lack of transparency regarding the methodologies utilized by many institutions. This paper presents an innovative approach to enrollment projection using Markov Chain modeling, drawing upon a case study condu… ▽ More

    Submitted 22 May, 2024; originally announced May 2024.

  48. arXiv:2405.13923  [pdf, other

    cs.CL

    Why Not Transform Chat Large Language Models to Non-English?

    Authors: Xiang Geng, Ming Zhu, Jiahuan Li, Zhejian Lai, Wei Zou, Shuaijie She, Jiaxin Guo, Xiaofeng Zhao, Yinglu Li, Yuang Li, Chang Su, Yanqing Zhao, Xinglin Lyu, Min Zhang, Jiajun Chen, Hao Yang, Shujian Huang

    Abstract: The scarcity of non-English data limits the development of non-English large language models (LLMs). Transforming English-centric LLMs to non-English has been identified as an effective and resource-efficient method. Previous works start from base LLMs and perform knowledge distillation (KD) with data generated by stronger LLMs, e.g. GPT-4. Compared to base LLMs, chat LLMs are further optimized fo… ▽ More

    Submitted 31 May, 2024; v1 submitted 22 May, 2024; originally announced May 2024.

  49. arXiv:2405.13820  [pdf, other

    cs.CL

    Towards Comprehensive and Efficient Post Safety Alignment of Large Language Models via Safety Patching

    Authors: Weixiang Zhao, Yulin Hu, Zhuojun Li, Yang Deng, Yanyan Zhao, Bing Qin, Tat-Seng Chua

    Abstract: Safety alignment of large language models (LLMs) has been gaining increasing attention. However, current safety-aligned LLMs suffer from the fragile and imbalanced safety mechanisms, which can still be induced to generate unsafe responses, exhibit over-safety by rejecting safe user inputs, and fail to preserve general utility after safety alignment. To this end, we propose a novel post safety alig… ▽ More

    Submitted 22 May, 2024; originally announced May 2024.

    Comments: 24 pages, 8 figures and 12 tables

  50. arXiv:2405.13381  [pdf

    cs.LG

    Optimizing Search Advertising Strategies: Integrating Reinforcement Learning with Generalized Second-Price Auctions for Enhanced Ad Ranking and Bidding

    Authors: Chang Zhou, Yang Zhao, Jin Cao, Yi Shen, Xiaoling Cui, Chiyu Cheng

    Abstract: This paper explores the integration of strategic optimization methods in search advertising, focusing on ad ranking and bidding mechanisms within E-commerce platforms. By employing a combination of reinforcement learning and evolutionary strategies, we propose a dynamic model that adjusts to varying user interactions and optimizes the balance between advertiser cost, user relevance, and platform r… ▽ More

    Submitted 29 May, 2024; v1 submitted 22 May, 2024; originally announced May 2024.

    Comments: Accepted by 2024 5th International Conference on Electronic communication and Artificial Intelligence (ICECAI 2024)