Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–50 of 1,715 results for author: Liu, W

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.06329  [pdf, other

    cs.CL eess.AS

    A Parameter-efficient Language Extension Framework for Multilingual ASR

    Authors: Wei Liu, Jingyong Hou, Dong Yang, Muyong Cao, Tan Lee

    Abstract: Covering all languages with a multilingual speech recognition model (MASR) is very difficult. Performing language extension on top of an existing MASR is a desirable choice. In this study, the MASR continual learning problem is probabilistically decomposed into language identity prediction (LP) and cross-lingual adaptation (XLA) sub-problems. Based on this, we propose an architecture-based framewo… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

    Comments: Accepted by Interspeech 2024

  2. arXiv:2406.05821  [pdf, other

    cs.CV

    F-LMM: Grounding Frozen Large Multimodal Models

    Authors: Size Wu, Sheng Jin, Wenwei Zhang, Lumin Xu, Wentao Liu, Wei Li, Chen Change Loy

    Abstract: Endowing Large Multimodal Models (LMMs) with visual grounding capability can significantly enhance AIs' understanding of the visual world and their interaction with humans. However, existing methods typically fine-tune the parameters of LMMs to learn additional segmentation tokens and overfit grounding and segmentation datasets. Such a design would inevitably cause a catastrophic diminution in the… ▽ More

    Submitted 9 June, 2024; originally announced June 2024.

    Comments: Project Page: https://github.com/wusize/F-LMM

  3. arXiv:2406.05613  [pdf, other

    cs.RO

    Distributed Motion Control of Multiple Mobile Manipulator System with Disturbance and Communication Delay

    Authors: Wenhang Liu, Meng Ren, Kun Song, Michael Yu Wang, Zhenhua Xiong

    Abstract: In real-world object manipulation scenarios, multiple mobile manipulator systems may suffer from disturbances and asynchrony, leading to excessive interaction forces and causing object damage or emergency stops. This paper presents a novel distributed motion control approach aimed at reducing these unnecessary interaction forces. The control strategy only utilizes force information without the nee… ▽ More

    Submitted 8 June, 2024; originally announced June 2024.

  4. arXiv:2406.05318  [pdf

    cs.CV cs.AI

    Integrating Text and Image Pre-training for Multi-modal Algorithmic Reasoning

    Authors: Zijian Zhang, Wei Liu

    Abstract: In this paper, we present our solution for SMART-101 Challenge of CVPR Multi-modal Algorithmic Reasoning Task 2024. Unlike traditional visual questions and answer tasks, this challenge evaluates abstraction, deduction and generalization ability of neural network in solving visuo-linguistic puzzles designed for specially children in the 6-8 age group. Our model is based on two pre-trained models, d… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

  5. arXiv:2406.04356  [pdf, other

    cs.SE cs.AI

    BugBlitz-AI: An Intelligent QA Assistant

    Authors: Yi Yao, Jun Wang, Yabai Hu, Lifeng Wang, Yi Zhou, Jack Chen, Xuming Gai, Zhenming Wang, Wenjun Liu

    Abstract: The evolution of software testing from manual to automated methods has significantly influenced quality assurance (QA) practices. However, challenges persist in post-execution phases, particularly in result analysis and reporting. Traditional post-execution validation phases require manual intervention for result analysis and report generation, leading to inefficiencies and potential development c… ▽ More

    Submitted 17 May, 2024; originally announced June 2024.

  6. arXiv:2406.04344  [pdf, other

    cs.LG cs.CL cs.CV

    Verbalized Machine Learning: Revisiting Machine Learning with Language Models

    Authors: Tim Z. Xiao, Robert Bamler, Bernhard Schölkopf, Weiyang Liu

    Abstract: Motivated by the large progress made by large language models (LLMs), we introduce the framework of verbalized machine learning (VML). In contrast to conventional machine learning models that are typically optimized over a continuous parameter space, VML constrains the parameter space to be human-interpretable natural language. Such a constraint leads to a new perspective of function approximation… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

    Comments: Technical Report v1 (92 pages, 15 figures)

  7. arXiv:2406.04302  [pdf, other

    cs.LG

    Representational Alignment Supports Effective Machine Teaching

    Authors: Ilia Sucholutsky, Katherine M. Collins, Maya Malaviya, Nori Jacoby, Weiyang Liu, Theodore R. Sumers, Michalis Korakakis, Umang Bhatt, Mark Ho, Joshua B. Tenenbaum, Brad Love, Zachary A. Pardos, Adrian Weller, Thomas L. Griffiths

    Abstract: A good teacher should not only be knowledgeable; but should be able to communicate in a way that the student understands -- to share the student's representation of the world. In this work, we integrate insights from machine teaching and pragmatic communication with the burgeoning literature on representational alignment to characterize a utility curve defining a relationship between representatio… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

    Comments: Preprint

  8. arXiv:2406.04129  [pdf, other

    cs.CV

    LenslessFace: An End-to-End Optimized Lensless System for Privacy-Preserving Face Verification

    Authors: Xin Cai, Hailong Zhang, Chenchen Wang, Wentao Liu, Jinwei Gu, Tianfan Xue

    Abstract: Lensless cameras, innovatively replacing traditional lenses for ultra-thin, flat optics, encode light directly onto sensors, producing images that are not immediately recognizable. This compact, lightweight, and cost-effective imaging solution offers inherent privacy advantages, making it attractive for privacy-sensitive applications like face verification. Typical lensless face verification adopt… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

    Comments: under review

  9. arXiv:2406.03307  [pdf

    math.NA cs.CE

    Multi-Patch Isogeometric Convolution Hierarchical Deep-learning Neural Network

    Authors: Lei Zhang, Chanwook Park, T. J. R. Hughes, Wing Kam Liu

    Abstract: A seamless integration of neural networks with Isogeometric Analysis (IGA) was first introduced in [1] under the name of Hierarchical Deep-learning Neural Network (HiDeNN) and has systematically evolved into Isogeometric Convolution HiDeNN (in short, C-IGA) [2]. C-IGA achieves higher order approximations without increasing the degree of freedom. Due to the Kronecker delta property of C-IGA shape f… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

    Comments: 30 pages, 15 figures in main text, additional 10 pages for appendix

  10. arXiv:2406.03159  [pdf, other

    cs.NI cs.DC

    Hurry: Dynamic Collaborative Framework For Low-orbit Mega-Constellation Data Downloading

    Authors: Handong Luo, Wenhao Liu, Qi Zhang, Ziheng Yang, Quanwei Lin, Wenjun Zhu, Kun Qiu, Zhe Chen, Yue Gao

    Abstract: Low-orbit mega-constellation network, which utilize thousands of satellites to provide a variety of network services and collect a wide range of space information, is a rapidly growing field. Each satellite collects TB-level data daily, including delay-sensitive data used for crucial tasks, such as military surveillance, natural disaster monitoring, and weather forecasting. According to NASA's sta… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

    Comments: 15 pages, 7 figures

  11. arXiv:2406.03035  [pdf, other

    cs.CV

    Follow-Your-Pose v2: Multiple-Condition Guided Character Image Animation for Stable Pose Control

    Authors: Jingyun Xue, Hongfa Wang, Qi Tian, Yue Ma, Andong Wang, Zhiyuan Zhao, Shaobo Min, Wenzhe Zhao, Kaihao Zhang, Heung-Yeung Shum, Wei Liu, Mengyang Liu, Wenhan Luo

    Abstract: Pose-controllable character video generation is in high demand with extensive applications for fields such as automatic advertising and content creation on social media platforms. While existing character image animation methods using pose sequences and reference images have shown promising performance, they tend to struggle with incoherent animation in complex scenarios, such as multiple characte… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

  12. arXiv:2406.02962  [pdf, other

    cs.CL cs.AI cs.IR

    Docs2KG: Unified Knowledge Graph Construction from Heterogeneous Documents Assisted by Large Language Models

    Authors: Qiang Sun, Yuanyi Luo, Wenxiao Zhang, Sirui Li, Jichunyang Li, Kai Niu, Xiangrui Kong, Wei Liu

    Abstract: Even for a conservative estimate, 80% of enterprise data reside in unstructured files, stored in data lakes that accommodate heterogeneous formats. Classical search engines can no longer meet information seeking needs, especially when the task is to browse and explore for insight formulation. In other words, there are no obvious search keywords to use. Knowledge graphs, due to their natural visual… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

  13. arXiv:2406.01927  [pdf, other

    cs.CR

    Position-based Rogue Access Point Detection

    Authors: Wenjie Liu, Panos Papadimitratos

    Abstract: Rogue Wi-Fi access point (AP) attacks can lead to data breaches and unauthorized access. Existing rogue AP detection methods and tools often rely on channel state information (CSI) or received signal strength indicator (RSSI), but they require specific hardware or achieve low detection accuracy. On the other hand, AP positions are typically fixed, and Wi-Fi can support indoor positioning of user d… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

  14. arXiv:2406.01916  [pdf, other

    cs.CV

    FastLGS: Speeding up Language Embedded Gaussians with Feature Grid Mapping

    Authors: Yuzhou Ji, He Zhu, Junshu Tang, Wuyi Liu, Zhizhong Zhang, Yuan Xie, Lizhuang Ma, Xin Tan

    Abstract: The semantically interactive radiance field has always been an appealing task for its potential to facilitate user-friendly and automated real-world 3D scene understanding applications. However, it is a challenging task to achieve high quality, efficiency and zero-shot ability at the same time with semantics in radiance fields. In this work, we present FastLGS, an approach that supports real-time… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

  15. arXiv:2406.01900  [pdf, other

    cs.CV

    Follow-Your-Emoji: Fine-Controllable and Expressive Freestyle Portrait Animation

    Authors: Yue Ma, Hongyu Liu, Hongfa Wang, Heng Pan, Yingqing He, Junkun Yuan, Ailing Zeng, Chengfei Cai, Heung-Yeung Shum, Wei Liu, Qifeng Chen

    Abstract: We present Follow-Your-Emoji, a diffusion-based framework for portrait animation, which animates a reference portrait with target landmark sequences. The main challenge of portrait animation is to preserve the identity of the reference portrait and transfer the target expression to this portrait while maintaining temporal consistency and fidelity. To address these challenges, Follow-Your-Emoji equ… ▽ More

    Submitted 6 June, 2024; v1 submitted 3 June, 2024; originally announced June 2024.

    Comments: Project Page: https://follow-your-emoji.github.io/

  16. arXiv:2406.00240  [pdf, other

    cs.LG cs.CL cs.CR

    Exploring Vulnerabilities and Protections in Large Language Models: A Survey

    Authors: Frank Weizhen Liu, Chenhui Hu

    Abstract: As Large Language Models (LLMs) increasingly become key components in various AI applications, understanding their security vulnerabilities and the effectiveness of defense mechanisms is crucial. This survey examines the security challenges of LLMs, focusing on two main areas: Prompt Hacking and Adversarial Attacks, each with specific types of threats. Under Prompt Hacking, we explore Prompt Injec… ▽ More

    Submitted 31 May, 2024; originally announced June 2024.

  17. arXiv:2405.20576  [pdf, other

    cs.CR

    Federated Graph Analytics with Differential Privacy

    Authors: Shang Liu, Yang Cao, Takao Murakami, Weiran Liu, Seng Pei Liew, Tsubasa Takahashi, Jinfei Liu, Masatoshi Yoshikawa

    Abstract: Collaborative graph analysis across multiple institutions is becoming increasingly popular. Realistic examples include social network analysis across various social platforms, financial transaction analysis across multiple banks, and analyzing the transmission of infectious diseases across multiple hospitals. We define the federated graph analytics, a new problem for collaborative graph analytics… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

    Comments: 13 pages

  18. arXiv:2405.20044  [pdf, other

    cs.CV

    A Point-Neighborhood Learning Framework for Nasal Endoscope Image Segmentation

    Authors: Pengyu Jie, Wanquan Liu, Chenqiang Gao, Yihui Wen, Rui He, Pengcheng Li, Jintao Zhang, Deyu Meng

    Abstract: The lesion segmentation on endoscopic images is challenging due to its complex and ambiguous features. Fully-supervised deep learning segmentation methods can receive good performance based on entirely pixel-level labeled dataset but greatly increase experts' labeling burden. Semi-supervised and weakly supervised methods can ease labeling burden, but heavily strengthen the learning difficulty. To… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

    Comments: 10 pages, 10 figures,

  19. arXiv:2405.19751  [pdf, other

    cs.CV cs.AI

    HQ-DiT: Efficient Diffusion Transformer with FP4 Hybrid Quantization

    Authors: Wenxuan Liu, Sai Qian Zhang

    Abstract: Diffusion Transformers (DiTs) have recently gained substantial attention in both industrial and academic fields for their superior visual generation capabilities, outperforming traditional diffusion models that use U-Net. However,the enhanced performance of DiTs also comes with high parameter counts and implementation costs, seriously restricting their use on resource-limited devices such as mobil… ▽ More

    Submitted 31 May, 2024; v1 submitted 30 May, 2024; originally announced May 2024.

  20. arXiv:2405.19338  [pdf, other

    eess.SP cs.AI cs.CV

    Accurate Patient Alignment without Unnecessary Imaging Dose via Synthesizing Patient-specific 3D CT Images from 2D kV Images

    Authors: Yuzhen Ding, Jason M. Holmes, Hongying Feng, Baoxin Li, Lisa A. McGee, Jean-Claude M. Rwigema, Sujay A. Vora, Daniel J. Ma, Robert L. Foote, Samir H. Patel, Wei Liu

    Abstract: In radiotherapy, 2D orthogonally projected kV images are used for patient alignment when 3D-on-board imaging(OBI) unavailable. But tumor visibility is constrained due to the projection of patient's anatomy onto a 2D plane, potentially leading to substantial setup errors. In treatment room with 3D-OBI such as cone beam CT(CBCT), the field of view(FOV) of CBCT is limited with unnecessarily high imag… ▽ More

    Submitted 1 April, 2024; originally announced May 2024.

    Comments: 17 pages, 8 figures and tables

  21. arXiv:2405.19133  [pdf, other

    cs.NI

    Preamble Design and Burst-Mode DSP for Upstream Reception of 200G Coherent TDM-PON

    Authors: Haide Wang, Ji Zhou, Jinyang Yang, Zhiyang Liu, Cheng Li, Weiping Liu, Changyuan Yu

    Abstract: Burst-mode DSP based on 10ns preamble is proposed for upstream reception of 200G coherent TDM-PON. The 128-symbol tone preamble is used for SOP, frequency offset, and sampling phase estimation, while the 192-symbol CAZAC preamble is used for frame synchronization and channel estimation.

    Submitted 29 May, 2024; originally announced May 2024.

    Comments: This papaer has been submitted to the ECOC 2024

  22. arXiv:2405.18241  [pdf

    cs.CL cs.AI

    Active Use of Latent Constituency Representation in both Humans and Large Language Models

    Authors: Wei Liu, Ming Xiang, Nai Ding

    Abstract: Understanding how sentences are internally represented in the human brain, as well as in large language models (LLMs) such as ChatGPT, is a major challenge for cognitive science. Classic linguistic theories propose that the brain represents a sentence by parsing it into hierarchically organized constituents. In contrast, LLMs do not explicitly parse linguistic constituents and their latent represe… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

    Comments: 62 pages, 5 figures. Under review

  23. arXiv:2405.18240  [pdf, other

    cs.CV

    MSPE: Multi-Scale Patch Embedding Prompts Vision Transformers to Any Resolution

    Authors: Wenzhuo Liu, Fei Zhu, Shijie Ma, Cheng-Lin Liu

    Abstract: Although Vision Transformers (ViTs) have recently advanced computer vision tasks significantly, an important real-world problem was overlooked: adapting to variable input resolutions. Typically, images are resized to a fixed resolution, such as 224x224, for efficiency during training and inference. However, uniform input size conflicts with real-world scenarios where images naturally vary in resol… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

  24. arXiv:2405.18004  [pdf, other

    cs.CV

    SkinCAP: A Multi-modal Dermatology Dataset Annotated with Rich Medical Captions

    Authors: Juexiao Zhou, Liyuan Sun, Yan Xu, Wenbin Liu, Shawn Afvari, Zhongyi Han, Jiaoyan Song, Yongzhi Ji, Xiaonan He, Xin Gao

    Abstract: With the widespread application of artificial intelligence (AI), particularly deep learning (DL) and vision-based large language models (VLLMs), in skin disease diagnosis, the need for interpretability becomes crucial. However, existing dermatology datasets are limited in their inclusion of concept-level meta-labels, and none offer rich medical descriptions in natural language. This deficiency imp… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

  25. arXiv:2405.17016  [pdf, other

    cs.CV

    $\text{Di}^2\text{Pose}$: Discrete Diffusion Model for Occluded 3D Human Pose Estimation

    Authors: Weiquan Wang, Jun Xiao, Chunping Wang, Wei Liu, Zhao Wang, Long Chen

    Abstract: Continuous diffusion models have demonstrated their effectiveness in addressing the inherent uncertainty and indeterminacy in monocular 3D human pose estimation (HPE). Despite their strengths, the need for large search spaces and the corresponding demand for substantial training data make these models prone to generating biomechanically unrealistic poses. This challenge is particularly noticeable… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

  26. arXiv:2405.16546  [pdf, other

    cs.IR cs.CL

    Cocktail: A Comprehensive Information Retrieval Benchmark with LLM-Generated Documents Integration

    Authors: Sunhao Dai, Weihao Liu, Yuqi Zhou, Liang Pang, Rongju Ruan, Gang Wang, Zhenhua Dong, Jun Xu, Ji-Rong Wen

    Abstract: The proliferation of Large Language Models (LLMs) has led to an influx of AI-generated content (AIGC) on the internet, transforming the corpus of Information Retrieval (IR) systems from solely human-written to a coexistence with LLM-generated content. The impact of this surge in AIGC on IR systems remains an open question, with the primary challenge being the lack of a dedicated benchmark for rese… ▽ More

    Submitted 26 May, 2024; originally announced May 2024.

    Comments: Accepted by Findings of ACL 2024; Datasets Link: https://huggingface.co/IR-Cocktail

  27. arXiv:2405.15662  [pdf, other

    cs.LG

    Class Machine Unlearning for Complex Data via Concepts Inference and Data Poisoning

    Authors: Wenhan Chang, Tianqing Zhu, Heng Xu, Wenjian Liu, Wanlei Zhou

    Abstract: In current AI era, users may request AI companies to delete their data from the training dataset due to the privacy concerns. As a model owner, retraining a model will consume significant computational resources. Therefore, machine unlearning is a new emerged technology to allow model owner to delete requested training data or a class with little affecting on the model performance. However, for la… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

  28. arXiv:2405.15622  [pdf, other

    cs.CV

    LAM3D: Large Image-Point-Cloud Alignment Model for 3D Reconstruction from Single Image

    Authors: Ruikai Cui, Xibin Song, Weixuan Sun, Senbo Wang, Weizhe Liu, Shenzhou Chen, Taizhang Shang, Yang Li, Nick Barnes, Hongdong Li, Pan Ji

    Abstract: Large Reconstruction Models have made significant strides in the realm of automated 3D content generation from single or multiple input images. Despite their success, these models often produce 3D meshes with geometric inaccuracies, stemming from the inherent challenges of deducing 3D shapes solely from image data. In this work, we introduce a novel framework, the Large Image and Point Cloud Align… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

    Comments: 19 pages, 10 figures

  29. arXiv:2405.15008  [pdf, other

    cs.SE cs.DB

    An Empirical Study on the Characteristics of Database Access Bugs in Java Applications

    Authors: Wei Liu, Shouvick Mondal, Tse-Hsun Chen

    Abstract: Database-backed applications rely on the database access code to interact with the underlying database management systems (DBMSs). Although many prior studies aim at database access issues like SQL anti-patterns or SQL code smells, there is a lack of study of database access bugs during the maintenance of database-backed applications. In this paper, we empirically investigate 423 database access b… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

    Comments: Accepted by the ACM Transactions on Software Engineering and Methodology (TOSEM)

  30. arXiv:2405.14201  [pdf, other

    cs.CV

    FreeTuner: Any Subject in Any Style with Training-free Diffusion

    Authors: Youcan Xu, Zhen Wang, Jun Xiao, Wei Liu, Long Chen

    Abstract: With the advance of diffusion models, various personalized image generation methods have been proposed. However, almost all existing work only focuses on either subject-driven or style-driven personalization. Meanwhile, state-of-the-art methods face several challenges in realizing compositional personalization, i.e., composing different subject and style concepts, such as concept disentanglement,… ▽ More

    Submitted 26 May, 2024; v1 submitted 23 May, 2024; originally announced May 2024.

  31. arXiv:2405.13839  [pdf, other

    cs.GR

    Diffusing Winding Gradients (DWG): A Parallel and Scalable Method for 3D Reconstruction from Unoriented Point Clouds

    Authors: Weizhou Liu, Jiaze Li, Xuhui Chen, Fei Hou, Shiqing Xin, Xingce Wang, Zhongke Wu, Chen Qian, Ying He

    Abstract: This paper presents a method for reconstructing watertight 3D surfaces from unoriented point clouds. Starting with randomly initialized normals, the method iteratively refines each normal by diffusing the gradient of the generalized winding number (GWN) field. Upon convergence, the target surface is extracted using the standard Marching Cubes algorithm. Our method is conceptually simple, easy to i… ▽ More

    Submitted 22 May, 2024; originally announced May 2024.

  32. arXiv:2405.13325  [pdf, other

    cs.CL cs.AI cs.IR

    DEGAP: Dual Event-Guided Adaptive Prefixes for Templated-Based Event Argument Extraction Model with Slot Querying

    Authors: Guanghui Wang, Dexi Liu, Qizhi Wan, Xiping Liu, Wanlong Liu

    Abstract: Recent advancements in event argument extraction (EAE) involve incorporating beneficial auxiliary information into models during training and inference, such as retrieved instances and event templates. Additionally, some studies introduce learnable prefix vectors to models. These methods face three challenges: (1) insufficient utilization of relevant event instances due to deficiencies in retrieva… ▽ More

    Submitted 21 May, 2024; originally announced May 2024.

  33. arXiv:2405.13171  [pdf

    cs.HC q-bio.NC

    Launching Your VR Neuroscience Laboratory

    Authors: Ying Choon Wu, Christopher Maymon, Jonathon Paden, Weichen Liu

    Abstract: The proliferation and refinement of affordable virtual reality (VR) technologies and wearable sensors have opened new frontiers in cognitive and behavioral neuroscience. This chapter offers a broad overview of VR for anyone interested in leveraging it as a research tool. In the first section, it examines the fundamental functionalities of VR and outlines important considerations that inform the de… ▽ More

    Submitted 21 May, 2024; originally announced May 2024.

    Journal ref: In Virtual Reality in Behavioral Neuroscience: New Insights and Methods (pp. 25-46). Cham: Springer International Publishing (2023)

  34. arXiv:2405.12540  [pdf, other

    cs.CV cs.MM

    Context-Enhanced Video Moment Retrieval with Large Language Models

    Authors: Weijia Liu, Bo Miao, Jiuxin Cao, Xuelin Zhu, Bo Liu, Mehwish Nasim, Ajmal Mian

    Abstract: Current methods for Video Moment Retrieval (VMR) struggle to align complex situations involving specific environmental details, character descriptions, and action narratives. To tackle this issue, we propose a Large Language Model-guided Moment Retrieval (LMR) approach that employs the extensive knowledge of Large Language Models (LLMs) to improve video context representation as well as cross-moda… ▽ More

    Submitted 21 May, 2024; originally announced May 2024.

  35. arXiv:2405.12476  [pdf, other

    cs.CV

    Benchmarking Fish Dataset and Evaluation Metric in Keypoint Detection -- Towards Precise Fish Morphological Assessment in Aquaculture Breeding

    Authors: Weizhen Liu, Jiayu Tan, Guangyu Lan, Ao Li, Dongye Li, Le Zhao, Xiaohui Yuan, Nanqing Dong

    Abstract: Accurate phenotypic analysis in aquaculture breeding necessitates the quantification of subtle morphological phenotypes. Existing datasets suffer from limitations such as small scale, limited species coverage, and inadequate annotation of keypoints for measuring refined and complex morphological phenotypes of fish body parts. To address this gap, we introduce FishPhenoKey, a comprehensive dataset… ▽ More

    Submitted 31 May, 2024; v1 submitted 20 May, 2024; originally announced May 2024.

    Comments: Accepted by IJCAI2024, Code: https://github.com/WeizhenLiuBioinform/Fish-Phenotype-Detect

  36. arXiv:2405.11732  [pdf

    cs.CV physics.med-ph

    Quality assurance of organs-at-risk delineation in radiotherapy

    Authors: Yihao Zhao, Cuiyun Yuan, Ying Liang, Yang Li, Chunxia Li, Man Zhao, Jun Hu, Wei Liu, Chenbin Liu

    Abstract: The delineation of tumor target and organs-at-risk is critical in the radiotherapy treatment planning. Automatic segmentation can be used to reduce the physician workload and improve the consistency. However, the quality assurance of the automatic segmentation is still an unmet need in clinical practice. The patient data used in our study was a standardized dataset from AAPM Thoracic Auto-Segmenta… ▽ More

    Submitted 19 May, 2024; originally announced May 2024.

    Comments: 14 pages,5 figures, 3 tables

    MSC Class: 68T07 ACM Class: I.4.9

  37. arXiv:2405.11726  [pdf, other

    cs.RO

    RHAML: Rendezvous-based Hierarchical Architecture for Mutual Localization

    Authors: Gaoming Chen, Kun Song, Xiang Xu, Wenhang Liu, Zhenhua Xiong

    Abstract: Mutual localization serves as the foundation for collaborative perception and task assignment in multi-robot systems. Effectively utilizing limited onboard sensors for mutual localization between marker-less robots is a worthwhile goal. However, due to inadequate consideration of large scale variations of the observed robot and localization refinement, previous work has shown limited accuracy when… ▽ More

    Submitted 19 May, 2024; originally announced May 2024.

    Comments: 8 pages, 8 figures, submitted to RA-L

  38. arXiv:2405.11335  [pdf, other

    cs.CR

    Detecting Complex Multi-step Attacks with Explainable Graph Neural Network

    Authors: Wei Liu, Peng Gao, Haotian Zhang, Ke Li, Weiyong Yang, Xingshen Wei, Shuji Wu

    Abstract: Complex multi-step attacks have caused significant damage to numerous critical infrastructures. To detect such attacks, graph neural network based methods have shown promising results by modeling the system's events as a graph. However, existing methods still face several challenges when deployed in practice. First, there is a lack of sufficient real attack data especially considering the large vo… ▽ More

    Submitted 18 May, 2024; originally announced May 2024.

    Comments: Corresponding author: Peng Gao (gao.itslab@gmail.com)

  39. arXiv:2405.10925  [pdf

    stat.ME cs.AI cs.LG

    High-dimensional multiple imputation (HDMI) for partially observed confounders including natural language processing-derived auxiliary covariates

    Authors: Janick Weberpals, Pamela A. Shaw, Kueiyu Joshua Lin, Richard Wyss, Joseph M Plasek, Li Zhou, Kerry Ngan, Thomas DeRamus, Sudha R. Raman, Bradley G. Hammill, Hana Lee, Sengwee Toh, John G. Connolly, Kimberly J. Dandreo, Fang Tian, Wei Liu, Jie Li, José J. Hernández-Muñoz, Sebastian Schneeweiss, Rishi J. Desai

    Abstract: Multiple imputation (MI) models can be improved by including auxiliary covariates (AC), but their performance in high-dimensional data is not well understood. We aimed to develop and compare high-dimensional MI (HDMI) approaches using structured and natural language processing (NLP)-derived AC in studies with partially observed confounders. We conducted a plasmode simulation study using data from… ▽ More

    Submitted 17 May, 2024; originally announced May 2024.

  40. arXiv:2405.10596  [pdf, other

    cs.IR

    CELA: Cost-Efficient Language Model Alignment for CTR Prediction

    Authors: Xingmei Wang, Weiwen Liu, Xiaolong Chen, Qi Liu, Xu Huang, Defu Lian, Xiangyang Li, Yasheng Wang, Zhenhua Dong, Ruiming Tang

    Abstract: Click-Through Rate (CTR) prediction holds a paramount position in recommender systems. The prevailing ID-based paradigm underperforms in cold-start scenarios due to the skewed distribution of feature frequency. Additionally, the utilization of a single modality fails to exploit the knowledge contained within textual features. Recent efforts have sought to mitigate these challenges by integrating P… ▽ More

    Submitted 17 May, 2024; originally announced May 2024.

    Comments: 10 pages, 5 figures

    MSC Class: 68T07

  41. arXiv:2405.10300  [pdf, other

    cs.CV

    Grounding DINO 1.5: Advance the "Edge" of Open-Set Object Detection

    Authors: Tianhe Ren, Qing Jiang, Shilong Liu, Zhaoyang Zeng, Wenlong Liu, Han Gao, Hongjie Huang, Zhengyu Ma, Xiaoke Jiang, Yihao Chen, Yuda Xiong, Hao Zhang, Feng Li, Peijun Tang, Kent Yu, Lei Zhang

    Abstract: This paper introduces Grounding DINO 1.5, a suite of advanced open-set object detection models developed by IDEA Research, which aims to advance the "Edge" of open-set object detection. The suite encompasses two models: Grounding DINO 1.5 Pro, a high-performance model designed for stronger generalization capability across a wide range of scenarios, and Grounding DINO 1.5 Edge, an efficient model o… ▽ More

    Submitted 31 May, 2024; v1 submitted 16 May, 2024; originally announced May 2024.

    Comments: homepage: https://deepdataspace.com/home

  42. arXiv:2405.10041  [pdf, other

    cs.CV

    Revealing Hierarchical Structure of Leaf Venations in Plant Science via Label-Efficient Segmentation: Dataset and Method

    Authors: Weizhen Liu, Ao Li, Ze Wu, Yue Li, Baobin Ge, Guangyu Lan, Shilin Chen, Minghe Li, Yunfei Liu, Xiaohui Yuan, Nanqing Dong

    Abstract: Hierarchical leaf vein segmentation is a crucial but under-explored task in agricultural sciences, where analysis of the hierarchical structure of plant leaf venation can contribute to plant breeding. While current segmentation techniques rely on data-driven models, there is no publicly available dataset specifically designed for hierarchical leaf vein segmentation. To address this gap, we introdu… ▽ More

    Submitted 16 May, 2024; originally announced May 2024.

    Comments: Accepted by IJCAI2024, Code: https://github.com/WeizhenLiuBioinform/HALVS-Hierarchical-Vein-Segment.git

  43. arXiv:2405.09779  [pdf, other

    cs.RO

    Integrating Uncertainty-Aware Human Motion Prediction into Graph-Based Manipulator Motion Planning

    Authors: Wansong Liu, Kareem Eltouny, Sibo Tian, Xiao Liang, Minghui Zheng

    Abstract: There has been a growing utilization of industrial robots as complementary collaborators for human workers in re-manufacturing sites. Such a human-robot collaboration (HRC) aims to assist human workers in improving the flexibility and efficiency of labor-intensive tasks. In this paper, we propose a human-aware motion planning framework for HRC to effectively compute collision-free motions for mani… ▽ More

    Submitted 15 May, 2024; originally announced May 2024.

  44. arXiv:2405.09113  [pdf, ps, other

    cs.LG

    Efficient LLM Jailbreak via Adaptive Dense-to-sparse Constrained Optimization

    Authors: Kai Hu, Weichen Yu, Tianjun Yao, Xiang Li, Wenhe Liu, Lijun Yu, Yining Li, Kai Chen, Zhiqiang Shen, Matt Fredrikson

    Abstract: Recent research indicates that large language models (LLMs) are susceptible to jailbreaking attacks that can generate harmful content. This paper introduces a novel token-level attack method, Adaptive Dense-to-Sparse Constrained Optimization (ADC), which effectively jailbreaks several open-source LLMs. Our approach relaxes the discrete jailbreak optimization into a continuous optimization and prog… ▽ More

    Submitted 15 May, 2024; originally announced May 2024.

  45. arXiv:2405.08816  [pdf, other

    cs.CV cs.RO

    The RoboDrive Challenge: Drive Anytime Anywhere in Any Condition

    Authors: Lingdong Kong, Shaoyuan Xie, Hanjiang Hu, Yaru Niu, Wei Tsang Ooi, Benoit R. Cottereau, Lai Xing Ng, Yuexin Ma, Wenwei Zhang, Liang Pan, Kai Chen, Ziwei Liu, Weichao Qiu, Wei Zhang, Xu Cao, Hao Lu, Ying-Cong Chen, Caixin Kang, Xinning Zhou, Chengyang Ying, Wentao Shang, Xingxing Wei, Yinpeng Dong, Bo Yang, Shengyin Jiang , et al. (66 additional authors not shown)

    Abstract: In the realm of autonomous driving, robust perception under out-of-distribution conditions is paramount for the safe deployment of vehicles. Challenges such as adverse weather, sensor malfunctions, and environmental unpredictability can severely impact the performance of autonomous systems. The 2024 RoboDrive Challenge was crafted to propel the development of driving perception technologies that c… ▽ More

    Submitted 29 May, 2024; v1 submitted 14 May, 2024; originally announced May 2024.

    Comments: ICRA 2024; 32 pages, 24 figures, 5 tables; Code at https://robodrive-24.github.io/

  46. arXiv:2405.08748  [pdf, other

    cs.CV

    Hunyuan-DiT: A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding

    Authors: Zhimin Li, Jianwei Zhang, Qin Lin, Jiangfeng Xiong, Yanxin Long, Xinchi Deng, Yingfang Zhang, Xingchao Liu, Minbin Huang, Zedong Xiao, Dayou Chen, Jiajun He, Jiahao Li, Wenyue Li, Chen Zhang, Rongwei Quan, Jianxiang Lu, Jiabin Huang, Xiaoyan Yuan, Xiaoxiao Zheng, Yixuan Li, Jihong Zhang, Chao Zhang, Meng Chen, Jie Liu , et al. (20 additional authors not shown)

    Abstract: We present Hunyuan-DiT, a text-to-image diffusion transformer with fine-grained understanding of both English and Chinese. To construct Hunyuan-DiT, we carefully design the transformer structure, text encoder, and positional encoding. We also build from scratch a whole data pipeline to update and evaluate data for iterative model optimization. For fine-grained language understanding, we train a Mu… ▽ More

    Submitted 14 May, 2024; originally announced May 2024.

    Comments: Project Page: https://dit.hunyuan.tencent.com/

  47. arXiv:2405.08458  [pdf, other

    cs.CV

    Rethinking Prior Information Generation with CLIP for Few-Shot Segmentation

    Authors: Jin Wang, Bingfeng Zhang, Jian Pang, Honglong Chen, Weifeng Liu

    Abstract: Few-shot segmentation remains challenging due to the limitations of its labeling information for unseen classes. Most previous approaches rely on extracting high-level feature maps from the frozen visual encoder to compute the pixel-wise similarity as a key prior guidance for the decoder. However, such a prior representation suffers from coarse granularity and poor generalization to new classes si… ▽ More

    Submitted 14 May, 2024; originally announced May 2024.

    Comments: Accepted by CVPR 2024; The camera-ready version

  48. arXiv:2405.08345  [pdf, other

    cs.RO

    Multi-Robot Rendezvous in Unknown Environment with Limited Communication

    Authors: Kun Song, Gaoming Chen, Wenhang Liu, Zhenhua Xiong

    Abstract: Rendezvous aims at gathering all robots at a specific location, which is an important collaborative behavior for multirobot systems. However, in an unknown environment, it is challenging to achieve rendezvous. Previous researches mainly focus on special scenarios where communication is not allowed and each robot executes a random searching strategy, which is highly time-consuming, especially in la… ▽ More

    Submitted 14 May, 2024; originally announced May 2024.

    Comments: Submit to RAL. 8 pages, 6 figures

  49. arXiv:2405.07962  [pdf, other

    cs.RO eess.SY

    KG-Planner: Knowledge-Informed Graph Neural Planning for Collaborative Manipulators

    Authors: Wansong Liu, Kareem Eltouny, Sibo Tian, Xiao Liang, Minghui Zheng

    Abstract: This paper presents a novel knowledge-informed graph neural planner (KG-Planner) to address the challenge of efficiently planning collision-free motions for robots in high-dimensional spaces, considering both static and dynamic environments involving humans. Unlike traditional motion planners that struggle with finding a balance between efficiency and optimality, the KG-Planner takes a different a… ▽ More

    Submitted 13 May, 2024; originally announced May 2024.

  50. arXiv:2405.05876  [pdf, other

    cs.RO cs.AI cs.CV cs.LG

    Composable Part-Based Manipulation

    Authors: Weiyu Liu, Jiayuan Mao, Joy Hsu, Tucker Hermans, Animesh Garg, Jiajun Wu

    Abstract: In this paper, we propose composable part-based manipulation (CPM), a novel approach that leverages object-part decomposition and part-part correspondences to improve learning and generalization of robotic manipulation skills. By considering the functional correspondences between object parts, we conceptualize functional actions, such as pouring and constrained placing, as combinations of differen… ▽ More

    Submitted 9 May, 2024; originally announced May 2024.

    Comments: Presented at CoRL 2023. For videos and additional results, see our website: https://cpmcorl2023.github.io/