Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–50 of 3,691 results for author: Chen, X

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.06305  [pdf, other

    cs.CV

    NeuroMoCo: A Neuromorphic Momentum Contrast Learning Method for Spiking Neural Networks

    Authors: Yuqi Ma, Huamin Wang, Hangchi Shen, Xuemei Chen, Shukai Duan, Shiping Wen

    Abstract: Recently, brain-inspired spiking neural networks (SNNs) have attracted great research attention owing to their inherent bio-interpretability, event-triggered properties and powerful perception of spatiotemporal information, which is beneficial to handling event-based neuromorphic datasets. In contrast to conventional static image datasets, event-based neuromorphic datasets present heightened compl… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

    Comments: 32 pages,4 figures,4 tables

  2. arXiv:2406.05938  [pdf, other

    cs.LG math.OC

    Expressive Power of Graph Neural Networks for (Mixed-Integer) Quadratic Programs

    Authors: Ziang Chen, Xiaohan Chen, Jialin Liu, Xinshang Wang, Wotao Yin

    Abstract: Quadratic programming (QP) is the most widely applied category of problems in nonlinear programming. Many applications require real-time/fast solutions, though not necessarily with high precision. Existing methods either involve matrix decomposition or use the preconditioned conjugate gradient method. For relatively large instances, these methods cannot achieve the real-time requirement unless the… ▽ More

    Submitted 9 June, 2024; originally announced June 2024.

  3. arXiv:2406.05862  [pdf, other

    cs.CL cs.AI cs.CV

    II-Bench: An Image Implication Understanding Benchmark for Multimodal Large Language Models

    Authors: Ziqiang Liu, Feiteng Fang, Xi Feng, Xinrun Du, Chenhao Zhang, Zekun Wang, Yuelin Bai, Qixuan Zhao, Liyang Fan, Chengguang Gan, Hongquan Lin, Jiaming Li, Yuansheng Ni, Haihong Wu, Yaswanth Narsupalli, Zhigang Zheng, Chengming Li, Xiping Hu, Ruifeng Xu, Xiaojun Chen, Min Yang, Jiaheng Liu, Ruibo Liu, Wenhao Huang, Ge Zhang , et al. (1 additional authors not shown)

    Abstract: The rapid advancements in the development of multimodal large language models (MLLMs) have consistently led to new breakthroughs on various benchmarks. In response, numerous challenging and comprehensive benchmarks have been proposed to more accurately assess the capabilities of MLLMs. However, there is a dearth of exploration of the higher-order perceptual capabilities of MLLMs. To fill this gap,… ▽ More

    Submitted 9 June, 2024; originally announced June 2024.

    Comments: 100 pages, 82 figures

  4. arXiv:2406.05839  [pdf, other

    eess.AS cs.AI

    MaLa-ASR: Multimedia-Assisted LLM-Based ASR

    Authors: Guanrou Yang, Ziyang Ma, Fan Yu, Zhifu Gao, Shiliang Zhang, Xie Chen

    Abstract: As more and more information-rich data like video become available, utilizing multi-modal auxiliary information to enhance audio tasks has sparked widespread research interest. The recent surge in research on LLM-based audio models provides fresh perspectives for tackling audio tasks. Given that LLM can flexibly ingest multiple inputs, we propose MaLa-ASR, an LLM-based ASR model that can integrate… ▽ More

    Submitted 9 June, 2024; originally announced June 2024.

  5. arXiv:2406.05361  [pdf, other

    cs.CL

    Write Summary Step-by-Step: A Pilot Study of Stepwise Summarization

    Authors: Xiuying Chen, Shen Gao, Mingzhe Li, Qingqing Zhu, Xin Gao, Xiangliang Zhang

    Abstract: Nowadays, neural text generation has made tremendous progress in abstractive summarization tasks. However, most of the existing summarization models take in the whole document all at once, which sometimes cannot meet the needs in practice. Practically, social text streams such as news events and tweets keep growing from time to time, and can only be fed to the summarization system step by step. He… ▽ More

    Submitted 8 June, 2024; originally announced June 2024.

    Comments: 10 pages, 4 figures, published in TASLP

  6. arXiv:2406.05360  [pdf, other

    cs.CL

    Flexible and Adaptable Summarization via Expertise Separation

    Authors: Xiuying Chen, Mingzhe Li, Shen Gao, Xin Cheng, Qingqing Zhu, Rui Yan, Xin Gao, Xiangliang Zhang

    Abstract: A proficient summarization model should exhibit both flexibility -- the capacity to handle a range of in-domain summarization tasks, and adaptability -- the competence to acquire new knowledge and adjust to unseen out-of-domain tasks. Unlike large language models (LLMs) that achieve this through parameter scaling, we propose a more parameter-efficient approach in this study. Our motivation rests o… ▽ More

    Submitted 8 June, 2024; originally announced June 2024.

    Comments: 10 pages, 7 figures, published in SIGIR 2024

  7. arXiv:2406.05132  [pdf, other

    cs.CV cs.AI cs.CL cs.LG cs.RO

    3D-GRAND: Towards Better Grounding and Less Hallucination for 3D-LLMs

    Authors: Jianing Yang, Xuweiyi Chen, Nikhil Madaan, Madhavan Iyengar, Shengyi Qian, David F. Fouhey, Joyce Chai

    Abstract: The integration of language and 3D perception is crucial for developing embodied agents and robots that comprehend and interact with the physical world. While large language models (LLMs) have demonstrated impressive language understanding and generation capabilities, their adaptation to 3D environments (3D-LLMs) remains in its early stages. A primary challenge is the absence of large-scale datase… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

    Comments: Project website: https://3d-grand.github.io

  8. arXiv:2406.05070  [pdf, other

    cs.DB

    Targeted Mining Precise-positioning Episode Rules

    Authors: Jian Zhu, Xiaoye Chen, Wensheng Gan, Zefeng Chen, Philip S. Yu

    Abstract: The era characterized by an exponential increase in data has led to the widespread adoption of data intelligence as a crucial task. Within the field of data mining, frequent episode mining has emerged as an effective tool for extracting valuable and essential information from event sequences. Various algorithms have been developed to discover frequent episodes and subsequently derive episode rules… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

    Comments: IEEE TETCI, 14 pages

  9. arXiv:2406.04744  [pdf, other

    cs.CL

    CRAG -- Comprehensive RAG Benchmark

    Authors: Xiao Yang, Kai Sun, Hao Xin, Yushi Sun, Nikita Bhalla, Xiangsen Chen, Sajal Choudhary, Rongze Daniel Gui, Ziran Will Jiang, Ziyu Jiang, Lingkun Kong, Brian Moran, Jiaqi Wang, Yifan Ethan Xu, An Yan, Chenyu Yang, Eting Yuan, Hanwen Zha, Nan Tang, Lei Chen, Nicolas Scheffer, Yue Liu, Nirav Shah, Rakesh Wanga, Anuj Kumar , et al. (2 additional authors not shown)

    Abstract: Retrieval-Augmented Generation (RAG) has recently emerged as a promising solution to alleviate Large Language Model (LLM)'s deficiency in lack of knowledge. Existing RAG datasets, however, do not adequately represent the diverse and dynamic nature of real-world Question Answering (QA) tasks. To bridge this gap, we introduce the Comprehensive RAG Benchmark (CRAG), a factual question answering bench… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

  10. arXiv:2406.04589  [pdf, other

    cs.SD eess.AS

    MUSE: Flexible Voiceprint Receptive Fields and Multi-Path Fusion Enhanced Taylor Transformer for U-Net-based Speech Enhancement

    Authors: Zizhen Lin, Xiaoting Chen, Junyu Wang

    Abstract: Achieving a balance between lightweight design and high performance remains a challenging task for speech enhancement. In this paper, we introduce Multi-path Enhanced Taylor (MET) Transformer based U-net for Speech Enhancement (MUSE), a lightweight speech enhancement network built upon the Unet architecture. Our approach incorporates a novel Multi-path Enhanced Taylor (MET) Transformer block, whic… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

  11. arXiv:2406.04582  [pdf, other

    eess.AS cs.SD

    Neural Codec-based Adversarial Sample Detection for Speaker Verification

    Authors: Xuanjun Chen, Jiawei Du, Haibin Wu, Jyh-Shing Roger Jang, Hung-yi Lee

    Abstract: Automatic Speaker Verification (ASV), increasingly used in security-critical applications, faces vulnerabilities from rising adversarial attacks, with few effective defenses available. In this paper, we propose a neural codec-based adversarial sample detection method for ASV. The approach leverages the codec's ability to discard redundant perturbations and retain essential information. Specificall… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

    Comments: Accepted by Interspeech 2024

  12. arXiv:2406.04520  [pdf, other

    cs.CL cs.AI

    NATURAL PLAN: Benchmarking LLMs on Natural Language Planning

    Authors: Huaixiu Steven Zheng, Swaroop Mishra, Hugh Zhang, Xinyun Chen, Minmin Chen, Azade Nova, Le Hou, Heng-Tze Cheng, Quoc V. Le, Ed H. Chi, Denny Zhou

    Abstract: We introduce NATURAL PLAN, a realistic planning benchmark in natural language containing 3 key tasks: Trip Planning, Meeting Planning, and Calendar Scheduling. We focus our evaluation on the planning capabilities of LLMs with full information on the task, by providing outputs from tools such as Google Flights, Google Maps, and Google Calendar as contexts to the models. This eliminates the need for… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

  13. arXiv:2406.03694  [pdf, other

    cs.CV cs.IT

    Untrained Neural Nets for Snapshot Compressive Imaging: Theory and Algorithms

    Authors: Mengyu Zhao, Xi Chen, Xin Yuan, Shirin Jalali

    Abstract: Snapshot compressive imaging (SCI) recovers high-dimensional (3D) data cubes from a single 2D measurement, enabling diverse applications like video and hyperspectral imaging to go beyond standard techniques in terms of acquisition speed and efficiency. In this paper, we focus on SCI recovery algorithms that employ untrained neural networks (UNNs), such as deep image prior (DIP), to model source st… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

  14. arXiv:2406.03184  [pdf, other

    cs.CV

    Ouroboros3D: Image-to-3D Generation via 3D-aware Recursive Diffusion

    Authors: Hao Wen, Zehuan Huang, Yaohui Wang, Xinyuan Chen, Yu Qiao, Lu Sheng

    Abstract: Existing single image-to-3D creation methods typically involve a two-stage process, first generating multi-view images, and then using these images for 3D reconstruction. However, training these two stages separately leads to significant data bias in the inference phase, thus affecting the quality of reconstructed results. We introduce a unified 3D generation framework, named Ouroboros3D, which in… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

    Comments: See our project page at https://costwen.github.io/Ouroboros3D/

  15. arXiv:2406.03002  [pdf, other

    eess.IV cs.CV

    Phy-Diff: Physics-guided Hourglass Diffusion Model for Diffusion MRI Synthesis

    Authors: Juanhua Zhang, Ruodan Yan, Alessandro Perelli, Xi Chen, Chao Li

    Abstract: Diffusion MRI (dMRI) is an important neuroimaging technique with high acquisition costs. Deep learning approaches have been used to enhance dMRI and predict diffusion biomarkers through undersampled dMRI. To generate more comprehensive raw dMRI, generative adversarial network based methods are proposed to include b-values and b-vectors as conditions, but they are limited by unstable training and l… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

    Comments: Accepted by MICCAI 2024

  16. arXiv:2406.02974  [pdf

    cs.CL

    Readability-guided Idiom-aware Sentence Simplification (RISS) for Chinese

    Authors: Jingshen Zhang, Xinglu Chen, Xinying Qiu, Zhimin Wang, Wenhe Feng

    Abstract: Chinese sentence simplification faces challenges due to the lack of large-scale labeled parallel corpora and the prevalence of idioms. To address these challenges, we propose Readability-guided Idiom-aware Sentence Simplification (RISS), a novel framework that combines data augmentation techniques with lexcial simplification. RISS introduces two key components: (1) Readability-guided Paraphrase Se… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

    Comments: Accepted to the 23rd China National Conference on Computational Linguistics (CCL 2024)

  17. arXiv:2406.02518  [pdf, other

    cs.CV eess.IV

    DDGS-CT: Direction-Disentangled Gaussian Splatting for Realistic Volume Rendering

    Authors: Zhongpai Gao, Benjamin Planche, Meng Zheng, Xiao Chen, Terrence Chen, Ziyan Wu

    Abstract: Digitally reconstructed radiographs (DRRs) are simulated 2D X-ray images generated from 3D CT volumes, widely used in preoperative settings but limited in intraoperative applications due to computational bottlenecks, especially for accurate but heavy physics-based Monte Carlo methods. While analytical DRR renderers offer greater efficiency, they overlook anisotropic X-ray image formation phenomena… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

  18. arXiv:2406.02328  [pdf, other

    cs.SD eess.AS

    SimpleSpeech: Towards Simple and Efficient Text-to-Speech with Scalar Latent Transformer Diffusion Models

    Authors: Dongchao Yang, Dingdong Wang, Haohan Guo, Xueyuan Chen, Xixin Wu, Helen Meng

    Abstract: In this study, we propose a simple and efficient Non-Autoregressive (NAR) text-to-speech (TTS) system based on diffusion, named SimpleSpeech. Its simpleness shows in three aspects: (1) It can be trained on the speech-only dataset, without any alignment information; (2) It directly takes plain text as input and generates speech through an NAR way; (3) It tries to model speech in a finite and compac… ▽ More

    Submitted 5 June, 2024; v1 submitted 4 June, 2024; originally announced June 2024.

    Comments: Accepted by InterSpeech 2024

  19. arXiv:2406.02039  [pdf, other

    cs.AR

    LMB: Augmenting PCIe Devices with CXL-Linked Memory Buffer

    Authors: Jiapin Wang, Xiangping Zhang, Chenlei Tang, Xiang Chen, Tao Lu

    Abstract: PCIe devices, such as SSDs and GPUs, are pivotal in modern data centers, and their value is set to grow amidst the emergence of AI and large models. However, these devices face onboard DRAM shortage issue due to internal space limitation, preventing accommodation of sufficient DRAM modules alongside flash or GPU processing chips. Current solutions either curb device-internal memory usage or supple… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

  20. arXiv:2406.01794  [pdf, other

    cs.CR cs.GT

    It Takes Two: A Peer-Prediction Solution for Blockchain Verifier's Dilemma

    Authors: Zishuo Zhao, Xi Chen, Yuan Zhou

    Abstract: The security of blockchain systems is fundamentally based on the decentralized consensus in which the majority of parties behave honestly, and the process of content verification is essential to keep the robustness of blockchain systems. However, the phenomenon that a secure blockchain system with few or no cheaters could not provide sufficient incentive for verifiers to honestly perform the costl… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

    Comments: 9 pages, 1 figure

  21. arXiv:2406.01555  [pdf, other

    cs.CV

    Towards Flexible Interactive Reflection Removal with Human Guidance

    Authors: Xiao Chen, Xudong Jiang, Yunkang Tao, Zhen Lei, Qing Li, Chenyang Lei, Zhaoxiang Zhang

    Abstract: Single image reflection removal is inherently ambiguous, as both the reflection and transmission components requiring separation may follow natural image statistics. Existing methods attempt to address the issue by using various types of low-level and physics-based cues as sources of reflection signals. However, these cues are not universally applicable, since they are only observable in specific… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

  22. arXiv:2406.01210  [pdf, other

    cs.CV

    GeminiFusion: Efficient Pixel-wise Multimodal Fusion for Vision Transformer

    Authors: Ding Jia, Jianyuan Guo, Kai Han, Han Wu, Chao Zhang, Chang Xu, Xinghao Chen

    Abstract: Cross-modal transformers have demonstrated superiority in various vision tasks by effectively integrating different modalities. This paper first critiques prior token exchange methods which replace less informative tokens with inter-modal features, and demonstrate exchange based methods underperform cross-attention mechanisms, while the computational demand of the latter inevitably restricts its u… ▽ More

    Submitted 3 June, 2024; v1 submitted 3 June, 2024; originally announced June 2024.

    Comments: Accepted by ICML 2024, code and models are available at https://github.com/JiaDingCN/GeminiFusion

  23. arXiv:2406.00965  [pdf, other

    cs.RO cs.AI

    Efficient Behavior Tree Planning with Commonsense Pruning and Heuristic

    Authors: Xinglin Chen, Yishuai Cai, Yunxin Mao, Minglong Li, Zhou Yang, Wen Shanghua, Wenjing Yang, Weixia Xu, Ji Wang

    Abstract: Behavior Tree (BT) planning is crucial for autonomous robot behavior control, yet its application in complex scenarios is hampered by long planning times. Pruning and heuristics are common techniques to accelerate planning, but it is difficult to design general pruning strategies and heuristic functions for BT planning problems. This paper proposes improving BT planning efficiency for everyday ser… ▽ More

    Submitted 3 June, 2024; v1 submitted 2 June, 2024; originally announced June 2024.

  24. arXiv:2406.00725  [pdf, other

    cs.IR

    Maximum-Entropy Regularized Decision Transformer with Reward Relabelling for Dynamic Recommendation

    Authors: Xiaocong Chen, Siyu Wang, Lina Yao

    Abstract: Reinforcement learning-based recommender systems have recently gained popularity. However, due to the typical limitations of simulation environments (e.g., data inefficiency), most of the work cannot be broadly applied in all domains. To counter these challenges, recent advancements have leveraged offline reinforcement learning methods, notable for their data-driven approach utilizing offline data… ▽ More

    Submitted 2 June, 2024; originally announced June 2024.

  25. arXiv:2406.00663  [pdf, other

    cs.CV cs.AI cs.LG

    SimSAM: Zero-shot Medical Image Segmentation via Simulated Interaction

    Authors: Benjamin Towle, Xin Chen, Ke Zhou

    Abstract: The recently released Segment Anything Model (SAM) has shown powerful zero-shot segmentation capabilities through a semi-automatic annotation setup in which the user can provide a prompt in the form of clicks or bounding boxes. There is growing interest around applying this to medical imaging, where the cost of obtaining expert annotations is high, privacy restrictions may limit sharing of patient… ▽ More

    Submitted 2 June, 2024; originally announced June 2024.

    Comments: Published at ISBI 2024. Awarded Top 12 Oral Presentation

  26. arXiv:2406.00626  [pdf, other

    cs.MM cs.SD eess.AS

    Intelligent Text-Conditioned Music Generation

    Authors: Zhouyao Xie, Nikhil Yadala, Xinyi Chen, Jing Xi Liu

    Abstract: CLIP (Contrastive Language-Image Pre-Training) is a multimodal neural network trained on (text, image) pairs to predict the most relevant text caption given an image. It has been used extensively in image generation by connecting its output with a generative model such as VQGAN, with the most notable example being OpenAI's DALLE-2. In this project, we apply a similar approach to bridge the gap bet… ▽ More

    Submitted 2 June, 2024; originally announced June 2024.

  27. arXiv:2406.00615  [pdf, other

    cs.IR cs.LG

    Making Recommender Systems More Knowledgeable: A Framework to Incorporate Side Information

    Authors: Yukun Jiang, Leo Guo, Xinyi Chen, Jing Xi Liu

    Abstract: Session-based recommender systems typically focus on using only the triplet (user_id, timestamp, item_id) to make predictions of users' next actions. In this paper, we aim to utilize side information to help recommender systems catch patterns and signals otherwise undetectable. Specifically, we propose a general framework for incorporating item-specific side information into the recommender system… ▽ More

    Submitted 2 June, 2024; originally announced June 2024.

    Comments: 15 pages, 8 figures

  28. arXiv:2406.00596  [pdf, other

    cs.LG

    Multi-variable Adversarial Time-Series Forecast Model

    Authors: Xiaoqiao Chen

    Abstract: Short-term industrial enterprises power system forecasting is an important issue for both load control and machine protection. Scientists focus on load forecasting but ignore other valuable electric-meters which should provide guidance of power system protection. We propose a new framework, multi-variable adversarial time-series forecasting model, which regularizes Long Short-term Memory (LSTM) mo… ▽ More

    Submitted 1 June, 2024; originally announced June 2024.

    Comments: 14 pages. arXiv admin note: text overlap with arXiv:1701.00160 by other authors

  29. arXiv:2406.00545  [pdf, ps, other

    cs.CV cs.AI

    Memory-guided Network with Uncertainty-based Feature Augmentation for Few-shot Semantic Segmentation

    Authors: Xinyue Chen, Miaojing Shi

    Abstract: The performance of supervised semantic segmentation methods highly relies on the availability of large-scale training data. To alleviate this dependence, few-shot semantic segmentation (FSS) is introduced to leverage the model trained on base classes with sufficient data into the segmentation of novel classes with few data. FSS methods face the challenge of model generalization on novel classes du… ▽ More

    Submitted 9 June, 2024; v1 submitted 1 June, 2024; originally announced June 2024.

    Comments: Accepted to IEEE International Conference on Multimedia and Expo (ICME) 2024 as an oral presentation

  30. arXiv:2406.00276  [pdf

    cs.LG cs.AI cs.CE physics.data-an

    Non-destructive Degradation Pattern Decoupling for Ultra-early Battery Prototype Verification Using Physics-informed Machine Learning

    Authors: Shengyu Tao, Mengtian Zhang, Zixi Zhao, Haoyang Li, Ruifei Ma, Yunhong Che, Xin Sun, Lin Su, Xiangyu Chen, Zihao Zhou, Heng Chang, Tingwei Cao, Xiao Xiao, Yaojun Liu, Wenjun Yu, Zhongling Xu, Yang Li, Han Hao, Xuan Zhang, Xiaosong Hu, Guangmin ZHou

    Abstract: Manufacturing complexities and uncertainties have impeded the transition from material prototypes to commercial batteries, making prototype verification critical to quality assessment. A fundamental challenge involves deciphering intertwined chemical processes to characterize degradation patterns and their quantitative relationship with battery performance. Here we show that a physics-informed mac… ▽ More

    Submitted 31 May, 2024; originally announced June 2024.

    ACM Class: J.2; G.3

  31. arXiv:2406.00115  [pdf, other

    cs.PL

    Towards LLM-Powered Verilog RTL Assistant: Self-Verification and Self-Correction

    Authors: Hanxian Huang, Zhenghan Lin, Zixuan Wang, Xin Chen, Ke Ding, Jishen Zhao

    Abstract: We explore the use of Large Language Models (LLMs) to generate high-quality Register-Transfer Level (RTL) code with minimal human interference. The traditional RTL design workflow requires human experts to manually write high-quality RTL code, which is time-consuming and error-prone. With the help of emerging LLMs, developers can describe their requirements to LLMs which then generate correspondin… ▽ More

    Submitted 31 May, 2024; originally announced June 2024.

  32. arXiv:2406.00083  [pdf, other

    cs.CR cs.AI cs.CL cs.IR cs.LG

    BadRAG: Identifying Vulnerabilities in Retrieval Augmented Generation of Large Language Models

    Authors: Jiaqi Xue, Mengxin Zheng, Yebowen Hu, Fei Liu, Xun Chen, Qian Lou

    Abstract: Large Language Models (LLMs) are constrained by outdated information and a tendency to generate incorrect data, commonly referred to as "hallucinations." Retrieval-Augmented Generation (RAG) addresses these limitations by combining the strengths of retrieval-based methods and generative models. This approach involves retrieving relevant information from a large, up-to-date dataset and using it to… ▽ More

    Submitted 6 June, 2024; v1 submitted 2 June, 2024; originally announced June 2024.

  33. arXiv:2406.00023  [pdf, other

    cs.CL

    LocMoE+: Enhanced Router with Token Feature Awareness for Efficient LLM Pre-Training

    Authors: Jing Li, Zhijie Sun, Dachao Lin, Xuan He, Yi Lin, Binfan Zheng, Li Zeng, Rongqian Zhao, Xin Chen

    Abstract: Mixture-of-Experts (MoE) architectures have recently gained increasing popularity within the domain of large language models (LLMs) due to their ability to significantly reduce training and inference overhead. However, MoE architectures face challenges, such as significant disparities in the number of tokens assigned to each expert and a tendency toward homogenization among experts, which adversel… ▽ More

    Submitted 23 May, 2024; originally announced June 2024.

  34. arXiv:2405.20978  [pdf, other

    cs.AI

    Enhancing Noise Robustness of Retrieval-Augmented Language Models with Adaptive Adversarial Training

    Authors: Feiteng Fang, Yuelin Bai, Shiwen Ni, Min Yang, Xiaojun Chen, Ruifeng Xu

    Abstract: Large Language Models (LLMs) exhibit substantial capabilities yet encounter challenges, including hallucination, outdated knowledge, and untraceable reasoning processes. Retrieval-augmented generation (RAG) has emerged as a promising solution, integrating knowledge from external databases to mitigate these challenges. However, inappropriate retrieved passages can potentially hinder the LLMs' capac… ▽ More

    Submitted 31 May, 2024; originally announced May 2024.

    Journal ref: ACL 2024, Main Conference

  35. arXiv:2405.20853  [pdf, other

    cs.CV

    MeshXL: Neural Coordinate Field for Generative 3D Foundation Models

    Authors: Sijin Chen, Xin Chen, Anqi Pang, Xianfang Zeng, Wei Cheng, Yijun Fu, Fukun Yin, Yanru Wang, Zhibin Wang, Chi Zhang, Jingyi Yu, Gang Yu, Bin Fu, Tao Chen

    Abstract: The polygon mesh representation of 3D data exhibits great flexibility, fast rendering speed, and storage efficiency, which is widely preferred in various applications. However, given its unstructured graph representation, the direct generation of high-fidelity 3D meshes is challenging. Fortunately, with a pre-defined ordering strategy, 3D meshes can be represented as sequences, and the generation… ▽ More

    Submitted 31 May, 2024; originally announced May 2024.

  36. arXiv:2405.20674  [pdf, other

    cs.CV

    4Diffusion: Multi-view Video Diffusion Model for 4D Generation

    Authors: Haiyu Zhang, Xinyuan Chen, Yaohui Wang, Xihui Liu, Yunhong Wang, Yu Qiao

    Abstract: Current 4D generation methods have achieved noteworthy efficacy with the aid of advanced diffusion generative models. However, these methods lack multi-view spatial-temporal modeling and encounter challenges in integrating diverse prior knowledge from multiple diffusion models, resulting in inconsistent temporal appearance and flickers. In this paper, we propose a novel 4D generation pipeline, nam… ▽ More

    Submitted 31 May, 2024; originally announced May 2024.

    Comments: Project Page: https://aejion.github.io/4diffusion/

  37. arXiv:2405.20641  [pdf, other

    cs.CR

    Query Provenance Analysis for Robust and Efficient Query-based Black-box Attack Defense

    Authors: Shaofei Li, Ziqi Zhang, Haomin Jia, Ding Li, Yao Guo, Xiangqun Chen

    Abstract: Query-based black-box attacks have emerged as a significant threat to machine learning systems, where adversaries can manipulate the input queries to generate adversarial examples that can cause misclassification of the model. To counter these attacks, researchers have proposed Stateful Defense Models (SDMs) for detecting adversarial query sequences and rejecting queries that are "similar" to the… ▽ More

    Submitted 31 May, 2024; originally announced May 2024.

  38. arXiv:2405.20614  [pdf, other

    cs.CV

    EPIDetect: Video-based convulsive seizure detection in chronic epilepsy mouse model for anti-epilepsy drug screening

    Authors: Junming Ren, Zhoujian Xiao, Yujia Zhang, Yujie Yang, Ling He, Ezra Yoon, Stephen Temitayo Bello, Xi Chen, Dapeng Wu, Micky Tortorella, Jufang He

    Abstract: In the preclinical translational studies, drug candidates with remarkable anti-epileptic efficacy demonstrate long-term suppression of spontaneous recurrent seizures (SRSs), particularly convulsive seizures (CSs), in mouse models of chronic epilepsy. However, the current methods for monitoring CSs have limitations in terms of invasiveness, specific laboratory settings, high cost, and complex opera… ▽ More

    Submitted 31 May, 2024; originally announced May 2024.

  39. arXiv:2405.20600  [pdf, other

    cs.AI

    Multi-label Class Incremental Emotion Decoding with Augmented Emotional Semantics Learning

    Authors: Kaicheng Fu, Changde Du, Xiaoyu Chen, Jie Peng, Huiguang He

    Abstract: Emotion decoding plays an important role in affective human-computer interaction. However, previous studies ignored the dynamic real-world scenario, where human experience a blend of multiple emotions which are incrementally integrated into the model, leading to the multi-label class incremental learning (MLCIL) problem. Existing methods have difficulty in solving MLCIL issue due to notorious cata… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

  40. arXiv:2405.20596  [pdf, other

    cs.CV cs.LG

    Generalized Semi-Supervised Learning via Self-Supervised Feature Adaptation

    Authors: Jiachen Liang, Ruibing Hou, Hong Chang, Bingpeng Ma, Shiguang Shan, Xilin Chen

    Abstract: Traditional semi-supervised learning (SSL) assumes that the feature distributions of labeled and unlabeled data are consistent which rarely holds in realistic scenarios. In this paper, we propose a novel SSL setting, where unlabeled samples are drawn from a mixed distribution that deviates from the feature distribution of labeled samples. Under this setting, previous SSL methods tend to predict wr… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

    Comments: 10 pages; Accepted by NeurIPS 2023

  41. arXiv:2405.20064  [pdf, other

    eess.AS cs.SD

    1st Place Solution to Odyssey Emotion Recognition Challenge Task1: Tackling Class Imbalance Problem

    Authors: Mingjie Chen, Hezhao Zhang, Yuanchao Li, Jiachen Luo, Wen Wu, Ziyang Ma, Peter Bell, Catherine Lai, Joshua Reiss, Lin Wang, Philip C. Woodland, Xie Chen, Huy Phan, Thomas Hain

    Abstract: Speech emotion recognition is a challenging classification task with natural emotional speech, especially when the distribution of emotion types is imbalanced in the training and test data. In this case, it is more difficult for a model to learn to separate minority classes, resulting in those sometimes being ignored or frequently misclassified. Previous work has utilised class weighted loss for t… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

  42. arXiv:2405.19565  [pdf, other

    physics.soc-ph cs.GT q-bio.PE

    Unbending strategies shepherd cooperation and suppress extortion in spatial populations

    Authors: Zijie Chen, Yuxin Geng, Xingru Chen, Feng Fu

    Abstract: Evolutionary game dynamics on networks typically consider the competition among simple strategies such as cooperation and defection in the Prisoner's Dilemma and summarize the effect of population structure as network reciprocity. However, it remains largely unknown regarding the evolutionary dynamics involving multiple powerful strategies typically considered in repeated games, such as the zero-d… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

    Comments: 21 pages, 6 figures

  43. arXiv:2405.19534  [pdf, other

    cs.LG cs.AI cs.CL

    Preference Learning Algorithms Do Not Learn Preference Rankings

    Authors: Angelica Chen, Sadhika Malladi, Lily H. Zhang, Xinyi Chen, Qiuyi Zhang, Rajesh Ranganath, Kyunghyun Cho

    Abstract: Preference learning algorithms (e.g., RLHF and DPO) are frequently used to steer LLMs to produce generations that are more preferred by humans, but our understanding of their inner workings is still limited. In this work, we study the conventional wisdom that preference learning trains models to assign higher likelihoods to more preferred outputs than less preferred outputs, measured via… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

  44. arXiv:2405.19463  [pdf, other

    stat.ML cs.LG econ.EM math.OC

    Stochastic Optimization Algorithms for Instrumental Variable Regression with Streaming Data

    Authors: Xuxing Chen, Abhishek Roy, Yifan Hu, Krishnakumar Balasubramanian

    Abstract: We develop and analyze algorithms for instrumental variable regression by viewing the problem as a conditional stochastic optimization problem. In the context of least-squares instrumental variable regression, our algorithms neither require matrix inversions nor mini-batches and provides a fully online approach for performing instrumental variable regression with streaming data. When the true mode… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

  45. arXiv:2405.19373  [pdf, other

    eess.SP cs.LG

    Multi-modal Mood Reader: Pre-trained Model Empowers Cross-Subject Emotion Recognition

    Authors: Yihang Dong, Xuhang Chen, Yanyan Shen, Michael Kwok-Po Ng, Tao Qian, Shuqiang Wang

    Abstract: Emotion recognition based on Electroencephalography (EEG) has gained significant attention and diversified development in fields such as neural signal processing and affective computing. However, the unique brain anatomy of individuals leads to non-negligible natural differences in EEG signals across subjects, posing challenges for cross-subject emotion recognition. While recent studies have attem… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

    Comments: Accepted by International Conference on Neural Computing for Advanced Applications, 2024

  46. arXiv:2405.19325  [pdf, other

    cs.CL

    Nearest Neighbor Speculative Decoding for LLM Generation and Attribution

    Authors: Minghan Li, Xilun Chen, Ari Holtzman, Beidi Chen, Jimmy Lin, Wen-tau Yih, Xi Victoria Lin

    Abstract: Large language models (LLMs) often hallucinate and lack the ability to provide attribution for their generations. Semi-parametric LMs, such as kNN-LM, approach these limitations by refining the output of an LM for a given prompt using its nearest neighbor matches in a non-parametric data store. However, these models often exhibit slow inference speeds and produce non-fluent texts. In this paper, w… ▽ More

    Submitted 30 May, 2024; v1 submitted 29 May, 2024; originally announced May 2024.

  47. arXiv:2405.19247  [pdf, other

    cs.LG

    Comparative Study of Neighbor-based Methods for Local Outlier Detection

    Authors: Zhuang Qi, Junlin Zhang, Xiaming Chen, Xin Qi

    Abstract: The neighbor-based method has become a powerful tool to handle the outlier detection problem, which aims to infer the abnormal degree of the sample based on the compactness of the sample and its neighbors. However, the existing methods commonly focus on designing different processes to locate outliers in the dataset, while the contributions of different types neighbors to outlier detection has not… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

  48. arXiv:2405.18726  [pdf, other

    cs.SD cs.CV cs.MM eess.AS

    Reverse the auditory processing pathway: Coarse-to-fine audio reconstruction from fMRI

    Authors: Che Liu, Changde Du, Xiaoyu Chen, Huiguang He

    Abstract: Drawing inspiration from the hierarchical processing of the human auditory system, which transforms sound from low-level acoustic features to high-level semantic understanding, we introduce a novel coarse-to-fine audio reconstruction method. Leveraging non-invasive functional Magnetic Resonance Imaging (fMRI) data, our approach mimics the inverse pathway of auditory processing. Initially, we utili… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

  49. arXiv:2405.18356  [pdf, other

    eess.IV cs.CV

    Universal and Extensible Language-Vision Models for Organ Segmentation and Tumor Detection from Abdominal Computed Tomography

    Authors: Jie Liu, Yixiao Zhang, Kang Wang, Mehmet Can Yavuz, Xiaoxi Chen, Yixuan Yuan, Haoliang Li, Yang Yang, Alan Yuille, Yucheng Tang, Zongwei Zhou

    Abstract: The advancement of artificial intelligence (AI) for organ segmentation and tumor detection is propelled by the growing availability of computed tomography (CT) datasets with detailed, per-voxel annotations. However, these AI models often struggle with flexibility for partially annotated datasets and extensibility for new classes due to limitations in the one-hot encoding, architectural design, and… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

    Comments: Accepted to Medical Image Analysis

  50. arXiv:2405.18172  [pdf, other

    cs.CV cs.AI cs.LG

    AnyFit: Controllable Virtual Try-on for Any Combination of Attire Across Any Scenario

    Authors: Yuhan Li, Hao Zhou, Wenxiang Shang, Ran Lin, Xuanhong Chen, Bingbing Ni

    Abstract: While image-based virtual try-on has made significant strides, emerging approaches still fall short of delivering high-fidelity and robust fitting images across various scenarios, as their models suffer from issues of ill-fitted garment styles and quality degrading during the training process, not to mention the lack of support for various combinations of attire. Therefore, we first propose a ligh… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

    Comments: Project website: https://colorful-liyu.github.io/anyfit-page/