Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–50 of 106 results for author: Yin, S

Searching in archive cs. Search in all archives.
.
  1. arXiv:2405.18132  [pdf, other

    cs.CV

    EG4D: Explicit Generation of 4D Object without Score Distillation

    Authors: Qi Sun, Zhiyang Guo, Ziyu Wan, Jing Nathan Yan, Shengming Yin, Wengang Zhou, Jing Liao, Houqiang Li

    Abstract: In recent years, the increasing demand for dynamic 3D assets in design and gaming applications has given rise to powerful generative pipelines capable of synthesizing high-quality 4D objects. Previous methods generally rely on score distillation sampling (SDS) algorithm to infer the unseen views and motion of 4D objects, thus leading to unsatisfactory results with defects like over-saturation and… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

  2. arXiv:2405.17221  [pdf, other

    cs.AI cs.AR

    Efficient Orchestrated AI Workflows Execution on Scale-out Spatial Architecture

    Authors: Jinyi Deng, Xinru Tang, Zhiheng Yue, Guangyang Lu, Qize Yang, Jiahao Zhang, Jinxi Li, Chao Li, Shaojun Wei, Yang Hu, Shouyi Yin

    Abstract: Given the increasing complexity of AI applications, traditional spatial architectures frequently fall short. Our analysis identifies a pattern of interconnected, multi-faceted tasks encompassing both AI and general computational processes. In response, we have conceptualized "Orchestrated AI Workflows," an approach that integrates various tasks with logic-driven decisions into dynamic, sophisticat… ▽ More

    Submitted 21 May, 2024; originally announced May 2024.

  3. arXiv:2405.15223  [pdf, other

    cs.CV cs.LG cs.RO

    iVideoGPT: Interactive VideoGPTs are Scalable World Models

    Authors: Jialong Wu, Shaofeng Yin, Ningya Feng, Xu He, Dong Li, Jianye Hao, Mingsheng Long

    Abstract: World models empower model-based agents to interactively explore, reason, and plan within imagined environments for real-world decision-making. However, the high demand for interactivity poses challenges in harnessing recent advancements in video generative models for developing world models at scale. This work introduces Interactive VideoGPT (iVideoGPT), a scalable autoregressive transformer fram… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

  4. arXiv:2405.07551  [pdf, other

    cs.CL cs.AI

    MuMath-Code: Combining Tool-Use Large Language Models with Multi-perspective Data Augmentation for Mathematical Reasoning

    Authors: Shuo Yin, Weihao You, Zhilong Ji, Guoqiang Zhong, Jinfeng Bai

    Abstract: The tool-use Large Language Models (LLMs) that integrate with external Python interpreters have significantly enhanced mathematical reasoning capabilities for open-source LLMs, while tool-free methods chose another track: augmenting math reasoning data. However, a great method to integrate the above two research paths and combine their advantages remains to be explored. In this work, we firstly in… ▽ More

    Submitted 13 May, 2024; originally announced May 2024.

    Comments: The state-of-the-art open-source tool-use LLMs for mathematical reasoning

  5. arXiv:2405.06887  [pdf, other

    cs.CV

    FineParser: A Fine-grained Spatio-temporal Action Parser for Human-centric Action Quality Assessment

    Authors: Jinglin Xu, Sibo Yin, Guohao Zhao, Zishuo Wang, Yuxin Peng

    Abstract: Existing action quality assessment (AQA) methods mainly learn deep representations at the video level for scoring diverse actions. Due to the lack of a fine-grained understanding of actions in videos, they harshly suffer from low credibility and interpretability, thus insufficient for stringent applications, such as Olympic diving events. We argue that a fine-grained understanding of actions requi… ▽ More

    Submitted 10 May, 2024; originally announced May 2024.

    Comments: Accepted by CVPR 2024

  6. arXiv:2405.05722  [pdf, other

    cs.LG cond-mat.mtrl-sci physics.chem-ph

    A Framework of SO(3)-equivariant Non-linear Representation Learning and its Application to Electronic-Structure Hamiltonian Prediction

    Authors: Shi Yin, Xinyang Pan, Fengyan Wang, Feng Wu, Lixin He

    Abstract: We present both a theoretical and a methodological framework that addresses a critical challenge in applying deep learning to physical systems: the reconciliation of non-linear expressiveness with SO(3)-equivariance in predictions of SO(3)-equivariant quantities, such as the electronic-structure Hamiltonian. Inspired by covariant theory in physics, we address this problem by exploring the mathemat… ▽ More

    Submitted 9 May, 2024; v1 submitted 9 May, 2024; originally announced May 2024.

  7. arXiv:2405.02155  [pdf, other

    cs.CV

    Multi-method Integration with Confidence-based Weighting for Zero-shot Image Classification

    Authors: Siqi Yin, Lifan Jiang

    Abstract: This paper introduces a novel framework for zero-shot learning (ZSL), i.e., to recognize new categories that are unseen during training, by using a multi-model and multi-alignment integration method. Specifically, we propose three strategies to enhance the model's performance to handle ZSL: 1) Utilizing the extensive knowledge of ChatGPT and the powerful image generation capabilities of DALL-E to… ▽ More

    Submitted 3 May, 2024; originally announced May 2024.

  8. arXiv:2404.18612  [pdf

    cs.RO

    Enhancing Prosthetic Safety and Environmental Adaptability: A Visual-Inertial Prosthesis Motion Estimation Approach on Uneven Terrains

    Authors: Chuheng Chen, Xinxing Chen, Shucong Yin, Yuxuan Wang, Binxin Huang, Yuquan Leng, Chenglong Fu

    Abstract: Environment awareness is crucial for enhancing walking safety and stability of amputee wearing powered prosthesis when crossing uneven terrains such as stairs and obstacles. However, existing environmental perception systems for prosthesis only provide terrain types and corresponding parameters, which fails to prevent potential collisions when crossing uneven terrains and may lead to falls and oth… ▽ More

    Submitted 29 April, 2024; originally announced April 2024.

  9. Social Force Embedded Mixed Graph Convolutional Network for Multi-class Trajectory Prediction

    Authors: Quancheng Du, Xiao Wang, Shouguo Yin, Lingxi Li, Huansheng Ning

    Abstract: Accurate prediction of agent motion trajectories is crucial for autonomous driving, contributing to the reduction of collision risks in human-vehicle interactions and ensuring ample response time for other traffic participants. Current research predominantly focuses on traditional deep learning methods, including convolutional neural networks (CNNs) and recurrent neural networks (RNNs). These meth… ▽ More

    Submitted 20 April, 2024; originally announced April 2024.

    Comments: 11 pages,3 figures, published to IEEE Transactions on Intelligent vehicles

  10. arXiv:2404.12104  [pdf, other

    cs.CV cs.CL cs.LG

    Ethical-Lens: Curbing Malicious Usages of Open-Source Text-to-Image Models

    Authors: Yuzhu Cai, Sheng Yin, Yuxi Wei, Chenxin Xu, Weibo Mao, Felix Juefei-Xu, Siheng Chen, Yanfeng Wang

    Abstract: The burgeoning landscape of text-to-image models, exemplified by innovations such as Midjourney and DALLE 3, has revolutionized content creation across diverse sectors. However, these advancements bring forth critical ethical concerns, particularly with the misuse of open-source models to generate content that violates societal norms. Addressing this, we introduce Ethical-Lens, a framework designe… ▽ More

    Submitted 18 April, 2024; originally announced April 2024.

    Comments: 42 pages, 17 figures, 29 tables

  11. arXiv:2404.06762  [pdf, other

    cs.CL cs.HC

    Personality-aware Student Simulation for Conversational Intelligent Tutoring Systems

    Authors: Zhengyuan Liu, Stella Xin Yin, Geyu Lin, Nancy F. Chen

    Abstract: Intelligent Tutoring Systems (ITSs) can provide personalized and self-paced learning experience. The emergence of large language models (LLMs) further enables better human-machine interaction, and facilitates the development of conversational ITSs in various disciplines such as math and language learning. In dialogic teaching, recognizing and adapting to individual characteristics can significantl… ▽ More

    Submitted 10 April, 2024; originally announced April 2024.

  12. arXiv:2404.06194  [pdf, other

    cs.CV

    Exploring the Potential of Large Foundation Models for Open-Vocabulary HOI Detection

    Authors: Ting Lei, Shaofeng Yin, Yang Liu

    Abstract: Open-vocabulary human-object interaction (HOI) detection, which is concerned with the problem of detecting novel HOIs guided by natural language, is crucial for understanding human-centric scenes. However, prior zero-shot HOI detectors often employ the same levels of feature maps to model HOIs with varying distances, leading to suboptimal performance in scenes containing human-object pairs with a… ▽ More

    Submitted 10 April, 2024; v1 submitted 9 April, 2024; originally announced April 2024.

  13. arXiv:2404.03429  [pdf, other

    cs.CL

    Scaffolding Language Learning via Multi-modal Tutoring Systems with Pedagogical Instructions

    Authors: Zhengyuan Liu, Stella Xin Yin, Carolyn Lee, Nancy F. Chen

    Abstract: Intelligent tutoring systems (ITSs) that imitate human tutors and aim to provide immediate and customized instructions or feedback to learners have shown their effectiveness in education. With the emergence of generative artificial intelligence, large language models (LLMs) further entitle the systems to complex and coherent conversational interactions. These systems would be of great help in lang… ▽ More

    Submitted 4 April, 2024; originally announced April 2024.

  14. arXiv:2403.04481  [pdf, other

    cs.CL cs.AI

    Do Large Language Model Understand Multi-Intent Spoken Language ?

    Authors: Shangjian Yin, Peijie Huang, Yuhong Xu, Haojing Huang, Jiatian Chen

    Abstract: This research signifies a considerable breakthrough in leveraging Large Language Models (LLMs) for multi-intent spoken language understanding (SLU). Our approach re-imagines the use of entity slots in multi-intent SLU applications, making the most of the generative potential of LLMs within the SLU landscape, leading to the development of the EN-LLM series. Furthermore, we introduce the concept of… ▽ More

    Submitted 15 April, 2024; v1 submitted 7 March, 2024; originally announced March 2024.

  15. arXiv:2403.03742  [pdf, other

    cs.HC

    Mitigating Ageism through Virtual Reality: Intergenerational Collaborative Escape Room Design

    Authors: Ruotong Zou, Shuyu Yin, Tianqi Song, Peinuan Qin, Yi-Chieh Lee

    Abstract: As virtual reality (VR) becomes more popular for intergenerational collaboration, there is still a significant gap in research regarding understanding the potential for reducing ageism. Our study aims to address this gap by analyzing ageism levels before and after VR escape room collaborative experiences. We recruited 28 participants to collaborate with an older player in a challenging VR escape r… ▽ More

    Submitted 6 March, 2024; originally announced March 2024.

  16. arXiv:2403.00019  [pdf, other

    cs.LG stat.ML

    Transformer-based Parameter Estimation in Statistics

    Authors: Xiaoxin Yin, David S. Yin

    Abstract: Parameter estimation is one of the most important tasks in statistics, and is key to helping people understand the distribution behind a sample of observations. Traditionally parameter estimation is done either by closed-form solutions (e.g., maximum likelihood estimation for Gaussian distribution), or by iterative numerical methods such as Newton-Raphson method when closed-form solution does not… ▽ More

    Submitted 27 February, 2024; originally announced March 2024.

  17. arXiv:2402.16899  [pdf, other

    cs.LG cs.AI

    A priori Estimates for Deep Residual Network in Continuous-time Reinforcement Learning

    Authors: Shuyu Yin, Qixuan Zhou, Fei Wen, Tao Luo

    Abstract: Deep reinforcement learning excels in numerous large-scale practical applications. However, existing performance analyses ignores the unique characteristics of continuous-time control problems, is unable to directly estimate the generalization error of the Bellman optimal loss and require a boundedness assumption. Our work focuses on continuous-time control problems and proposes a method that is a… ▽ More

    Submitted 7 March, 2024; v1 submitted 24 February, 2024; originally announced February 2024.

  18. GazeTrak: Exploring Acoustic-based Eye Tracking on a Glass Frame

    Authors: Ke Li, Ruidong Zhang, Boao Chen, Siyuan Chen, Sicheng Yin, Saif Mahmud, Qikang Liang, François Guimbretière, Cheng Zhang

    Abstract: In this paper, we present GazeTrak, the first acoustic-based eye tracking system on glasses. Our system only needs one speaker and four microphones attached to each side of the glasses. These acoustic sensors capture the formations of the eyeballs and the surrounding areas by emitting encoded inaudible sound towards eyeballs and receiving the reflected signals. These reflected signals are further… ▽ More

    Submitted 23 February, 2024; v1 submitted 22 February, 2024; originally announced February 2024.

    Comments: 16 pages, 5 figures, 7 tables, The 30th Annual International Conference on Mobile Computing and Networking (ACM MobiCom 2024)

  19. arXiv:2402.10534  [pdf, other

    cs.CV

    Using Left and Right Brains Together: Towards Vision and Language Planning

    Authors: Jun Cen, Chenfei Wu, Xiao Liu, Shengming Yin, Yixuan Pei, Jinglong Yang, Qifeng Chen, Nan Duan, Jianguo Zhang

    Abstract: Large Language Models (LLMs) and Large Multi-modality Models (LMMs) have demonstrated remarkable decision masking capabilities on a variety of tasks. However, they inherently operate planning within the language space, lacking the vision and spatial imagination ability. In contrast, humans utilize both left and right hemispheres of the brain for language and visual planning during the thinking pro… ▽ More

    Submitted 16 February, 2024; originally announced February 2024.

    Comments: 19 pages, 13 figures

  20. arXiv:2402.02140  [pdf, other

    cs.CV eess.IV

    Generative Visual Compression: A Review

    Authors: Bolin Chen, Shanzhi Yin, Peilin Chen, Shiqi Wang, Yan Ye

    Abstract: Artificial Intelligence Generated Content (AIGC) is leading a new technical revolution for the acquisition of digital content and impelling the progress of visual compression towards competitive performance gains and diverse functionalities over traditional codecs. This paper provides a thorough review on the recent advances of generative visual compression, illustrating great potentials and promi… ▽ More

    Submitted 3 February, 2024; originally announced February 2024.

  21. arXiv:2402.01271  [pdf, other

    eess.AS cs.SD

    An Intra-BRNN and GB-RVQ Based END-TO-END Neural Audio Codec

    Authors: Linping Xu, Jiawei Jiang, Dejun Zhang, Xianjun Xia, Li Chen, Yijian Xiao, Piao Ding, Shenyi Song, Sixing Yin, Ferdous Sohel

    Abstract: Recently, neural networks have proven to be effective in performing speech coding task at low bitrates. However, under-utilization of intra-frame correlations and the error of quantizer specifically degrade the reconstructed audio quality. To improve the coding quality, we present an end-to-end neural speech codec, namely CBRC (Convolutional and Bidirectional Recurrent neural Codec). An interleave… ▽ More

    Submitted 2 February, 2024; originally announced February 2024.

    Comments: INTERSPEECH 2023

  22. arXiv:2401.17840  [pdf, other

    cs.SI

    Propagation Dynamics of Rumor vs. Non-rumor across Multiple Social Media Platforms Driven by User Characteristics

    Authors: Dongpeng Hou, Shu Yin, Chao Gao, Xianghua Li, Zhen Wang

    Abstract: Studying information propagation dynamics in social media can elucidate user behaviors and patterns. However, previous research often focuses on single platforms and fails to differentiate between the nuanced roles of source users and other participants in cascades. To address these limitations, we analyze propagation cascades on Twitter and Weibo combined with a crawled dataset of nearly one mill… ▽ More

    Submitted 31 January, 2024; originally announced January 2024.

  23. arXiv:2401.17409  [pdf, other

    cs.HC

    EchoWrist: Continuous Hand Pose Tracking and Hand-Object Interaction Recognition Using Low-Power Active Acoustic Sensing On a Wristband

    Authors: Chi-Jung Lee, Ruidong Zhang, Devansh Agarwal, Tianhong Catherine Yu, Vipin Gunda, Oliver Lopez, James Kim, Sicheng Yin, Boao Dong, Ke Li, Mose Sakashita, Francois Guimbretiere, Cheng Zhang

    Abstract: Our hands serve as a fundamental means of interaction with the world around us. Therefore, understanding hand poses and interaction context is critical for human-computer interaction. We present EchoWrist, a low-power wristband that continuously estimates 3D hand pose and recognizes hand-object interactions using active acoustic sensing. EchoWrist is equipped with two speakers emitting inaudible s… ▽ More

    Submitted 29 March, 2024; v1 submitted 30 January, 2024; originally announced January 2024.

  24. arXiv:2401.17093  [pdf, other

    cs.CV cs.CL

    StrokeNUWA: Tokenizing Strokes for Vector Graphic Synthesis

    Authors: Zecheng Tang, Chenfei Wu, Zekai Zhang, Mingheng Ni, Shengming Yin, Yu Liu, Zhengyuan Yang, Lijuan Wang, Zicheng Liu, Juntao Li, Nan Duan

    Abstract: To leverage LLMs for visual synthesis, traditional methods convert raster image information into discrete grid tokens through specialized visual modules, while disrupting the model's ability to capture the true semantic representation of visual scenes. This paper posits that an alternative representation of images, vector graphics, can effectively surmount this limitation by enabling a more natura… ▽ More

    Submitted 30 January, 2024; originally announced January 2024.

  25. arXiv:2401.00744  [pdf, other

    physics.comp-ph cond-mat.mtrl-sci cs.LG

    Harmonizing SO(3)-Equivariance with Neural Expressiveness: a Hybrid Deep Learning Framework Oriented to the Prediction of Electronic Structure Hamiltonian

    Authors: Shi Yin, Xinyang Pan, Xudong Zhu, Tianyu Gao, Haochong Zhang, Feng Wu, Lixin He

    Abstract: Deep learning for predicting the electronic structure Hamiltonian of quantum systems necessitates satisfying the covariance laws, among which achieving SO(3)-equivariance without sacrificing the non-linear expressive capability of networks remains unsolved. To navigate the harmonization between equivariance and expressiveness, we propose a deep learning method, namely HarmoSE, synergizing two dist… ▽ More

    Submitted 4 May, 2024; v1 submitted 1 January, 2024; originally announced January 2024.

  26. arXiv:2312.11820  [pdf, other

    cs.AR

    SoC-Tuner: An Importance-guided Exploration Framework for DNN-targeting SoC Design

    Authors: Shixin Chen, Su Zheng, Chen Bai, Wenqian Zhao, Shuo Yin, Yang Bai, Bei Yu

    Abstract: Designing a system-on-chip (SoC) for deep neural network (DNN) acceleration requires balancing multiple metrics such as latency, power, and area. However, most existing methods ignore the interactions among different SoC components and rely on inaccurate and error-prone evaluation tools, leading to inferior SoC design. In this paper, we present SoC-Tuner, a DNN-targeting exploration framework to f… ▽ More

    Submitted 18 December, 2023; originally announced December 2023.

    Comments: ASP-DAC 2024

  27. arXiv:2312.05739  [pdf, other

    cs.SI cs.AI

    GAMC: An Unsupervised Method for Fake News Detection using Graph Autoencoder with Masking

    Authors: Shu Yin, Chao Gao, Zhen Wang

    Abstract: With the rise of social media, the spread of fake news has become a significant concern, potentially misleading public perceptions and impacting social stability. Although deep learning methods like CNNs, RNNs, and Transformer-based models like BERT have enhanced fake news detection, they primarily focus on content, overlooking social context during news propagation. Graph-based techniques have in… ▽ More

    Submitted 9 December, 2023; originally announced December 2023.

    Journal ref: the Thirty-Eighth AAAI Conference on Artificial Intelligence,2024

  28. arXiv:2311.15030  [pdf, other

    cs.RO

    Tuning-free Quasi-stiffness Control Framework of a Powered Transfemoral Prosthesis for Task-adaptive Walking

    Authors: Teng Ma, Shucong Yin, Zhimin Hou, Binxin Huang, Haoyong Yu, Chenglong Fu

    Abstract: Impedance-based control represents a prevalent strategy in the development of powered transfemoral prostheses. However, creating a task-adaptive, tuning-free controller that effectively generalizes across diverse locomotion modes and terrain conditions continues to be a significant challenge. This letter proposes a tuning-free and task-adaptive quasi-stiffness control framework for powered prosthe… ▽ More

    Submitted 26 March, 2024; v1 submitted 25 November, 2023; originally announced November 2023.

    Comments: 8 pages, 10 figures. This work has been submitted to the IEEE-RAL for possible publication

  29. arXiv:2311.00241  [pdf, other

    cs.CV

    1DFormer: a Transformer Architecture Learning 1D Landmark Representations for Facial Landmark Tracking

    Authors: Shi Yin, Shijie Huan, Shangfei Wang, Jinshui Hu, Tao Guo, Bing Yin, Baocai Yin, Cong Liu

    Abstract: Recently, heatmap regression methods based on 1D landmark representations have shown prominent performance on locating facial landmarks. However, previous methods ignored to make deep explorations on the good potentials of 1D landmark representations for sequential and structural modeling of multiple landmarks to track facial landmarks. To address this limitation, we propose a Transformer architec… ▽ More

    Submitted 1 February, 2024; v1 submitted 31 October, 2023; originally announced November 2023.

  30. arXiv:2310.16045  [pdf, other

    cs.CV cs.AI cs.CL cs.LG

    Woodpecker: Hallucination Correction for Multimodal Large Language Models

    Authors: Shukang Yin, Chaoyou Fu, Sirui Zhao, Tong Xu, Hao Wang, Dianbo Sui, Yunhang Shen, Ke Li, Xing Sun, Enhong Chen

    Abstract: Hallucination is a big shadow hanging over the rapidly evolving Multimodal Large Language Models (MLLMs), referring to the phenomenon that the generated text is inconsistent with the image content. In order to mitigate hallucinations, existing studies mainly resort to an instruction-tuning manner that requires retraining the models with specific data. In this paper, we pave a different way, introd… ▽ More

    Submitted 24 October, 2023; originally announced October 2023.

    Comments: 16 pages, 7 figures. Code Website: https://github.com/BradyFU/Woodpecker

  31. arXiv:2310.11901  [pdf, other

    cs.CR

    Malicious Agent Detection for Robust Multi-Agent Collaborative Perception

    Authors: Yangheng Zhao, Zhen Xiang, Sheng Yin, Xianghe Pang, Siheng Chen, Yanfeng Wang

    Abstract: Recently, multi-agent collaborative (MAC) perception has been proposed and outperformed the traditional single-agent perception in many applications, such as autonomous driving. However, MAC perception is more vulnerable to adversarial attacks than single-agent perception due to the information exchange. The attacker can easily degrade the performance of a victim agent by sending harmful informati… ▽ More

    Submitted 18 October, 2023; originally announced October 2023.

  32. arXiv:2310.09568  [pdf, other

    cs.AR

    Wafer-scale Computing: Advancements, Challenges, and Future Perspectives

    Authors: Yang Hu, Xinhan Lin, Huizheng Wang, Zhen He, Xingmao Yu, Jiahao Zhang, Qize Yang, Zheng Xu, Sihan Guan, Jiahao Fang, Haoran Shang, Xinru Tang, Xu Dai, Shaojun Wei, Shouyi Yin

    Abstract: Nowadays, artificial intelligence (AI) technology with large models plays an increasingly important role in both academia and industry. It also brings a rapidly increasing demand for the computing power of the hardware. As the computing demand for AI continues to grow, the growth of hardware computing power has failed to keep up. This has become a significant factor restricting the development of… ▽ More

    Submitted 14 October, 2023; originally announced October 2023.

    ACM Class: B.7.0; C.1

  33. arXiv:2309.17446  [pdf, other

    cs.CL cs.LG cs.PL cs.SE

    L2CEval: Evaluating Language-to-Code Generation Capabilities of Large Language Models

    Authors: Ansong Ni, Pengcheng Yin, Yilun Zhao, Martin Riddell, Troy Feng, Rui Shen, Stephen Yin, Ye Liu, Semih Yavuz, Caiming Xiong, Shafiq Joty, Yingbo Zhou, Dragomir Radev, Arman Cohan

    Abstract: Recently, large language models (LLMs), especially those that are pretrained on code, have demonstrated strong capabilities in generating programs from natural language inputs in a few-shot or even zero-shot manner. Despite promising results, there is a notable lack of a comprehensive evaluation of these models language-to-code generation capabilities. Existing studies often focus on specific task… ▽ More

    Submitted 2 October, 2023; v1 submitted 29 September, 2023; originally announced September 2023.

    Comments: Project Website: https://l2c-eval.github.io/

  34. arXiv:2309.16251  [pdf, other

    cs.HC

    The effect of 3D stereopsis and hand-tool alignment on learning effectiveness and skill transfer of a VR-based simulator for dental training

    Authors: Maximilian Kaluschke, Myat Su Yin, Peter Haddawy, Siriwan Suebnukarn, Gabriel Zachmann

    Abstract: Dental simulators gained prevalence in recent years. Important aspects distinguishing VR hardware configurations are 3D stereoscopic rendering and visual alignment of the user's hands with the virtual tools. New dental simulators are often evaluated without analysing the impact of these simulation aspects. In this paper, we seek to determine the impact of 3D stereoscopic rendering and of hand-tool… ▽ More

    Submitted 28 September, 2023; originally announced September 2023.

    Comments: 26 pages, 15 figures, Accepted at online journal PLoS ONE

    MSC Class: 62A86 (Primary) 62H30 (Secondary) ACM Class: J.3; G.3

  35. arXiv:2309.02855  [pdf, other

    cs.CV eess.IV

    Bandwidth-efficient Inference for Neural Image Compression

    Authors: Shanzhi Yin, Tongda Xu, Yongsheng Liang, Yuanyuan Wang, Yanghao Li, Yan Wang, Jingjing Liu

    Abstract: With neural networks growing deeper and feature maps growing larger, limited communication bandwidth with external memory (or DRAM) and power constraints become a bottleneck in implementing network inference on mobile and edge devices. In this paper, we propose an end-to-end differentiable bandwidth efficient neural inference method with the activation compressed by neural data compression method.… ▽ More

    Submitted 6 September, 2023; v1 submitted 6 September, 2023; originally announced September 2023.

    Comments: 9 pages, 6 figures, submitted to ICASSP 2024

    MSC Class: 68U10(primary); 94A08 68T07(secondary) ACM Class: I.2.6; I.4.2

  36. arXiv:2309.01273  [pdf, other

    cs.AR eess.SY

    WindMill: A Parameterized and Pluggable CGRA Implemented by DIAG Design Flow

    Authors: Haojia Hui, Jiangyuan Gu, Xunbo Hu, Yang Hu, Leibo Liu, Shaojun Wei, Shouyi Yin

    Abstract: With the cross-fertilization of applications and the ever-increasing scale of models, the efficiency and productivity of hardware computing architectures have become inadequate. This inadequacy further exacerbates issues in design flexibility, design complexity, development cycle, and development costs (4-d problems) in divergent scenarios. To address these challenges, this paper proposed a flexib… ▽ More

    Submitted 3 September, 2023; originally announced September 2023.

    Comments: 7 pages, 10 figures

  37. arXiv:2308.13785  [pdf, other

    cs.CV

    ORES: Open-vocabulary Responsible Visual Synthesis

    Authors: Minheng Ni, Chenfei Wu, Xiaodong Wang, Shengming Yin, Lijuan Wang, Zicheng Liu, Nan Duan

    Abstract: Avoiding synthesizing specific visual concepts is an essential challenge in responsible visual synthesis. However, the visual concept that needs to be avoided for responsible visual synthesis tends to be diverse, depending on the region, context, and usage scenarios. In this work, we formalize a new task, Open-vocabulary Responsible Visual Synthesis (ORES), where the synthesis model is able to avo… ▽ More

    Submitted 26 August, 2023; originally announced August 2023.

  38. arXiv:2308.08089  [pdf, other

    cs.CV

    DragNUWA: Fine-grained Control in Video Generation by Integrating Text, Image, and Trajectory

    Authors: Shengming Yin, Chenfei Wu, Jian Liang, Jie Shi, Houqiang Li, Gong Ming, Nan Duan

    Abstract: Controllable video generation has gained significant attention in recent years. However, two main limitations persist: Firstly, most existing works focus on either text, image, or trajectory-based control, leading to an inability to achieve fine-grained control in videos. Secondly, trajectory control research is still in its early stages, with most experiments being conducted on simple datasets li… ▽ More

    Submitted 15 August, 2023; originally announced August 2023.

  39. Towards Efficient Control Flow Handling in Spatial Architecture via Architecting the Control Flow Plane

    Authors: Jinyi Deng, Xinru Tang, Jiahao Zhang, Yuxuan Li, Linyun Zhang, Boxiao Han, Hongjun He, Fengbin Tu, Leibo Liu, Shaojun Wei, Yang Hu, Shouyi Yin

    Abstract: Spatial architecture is a high-performance architecture that uses control flow graphs and data flow graphs as the computational model and producer/consumer models as the execution models. However, existing spatial architectures suffer from control flow handling challenges. Upon categorizing their PE execution models, we find that they lack autonomous, peer-to-peer, and temporally loosely-coupled c… ▽ More

    Submitted 19 September, 2023; v1 submitted 6 July, 2023; originally announced July 2023.

    ACM Class: C.1.3; F.1.2

  40. arXiv:2306.13549  [pdf, other

    cs.CV cs.AI cs.CL cs.LG

    A Survey on Multimodal Large Language Models

    Authors: Shukang Yin, Chaoyou Fu, Sirui Zhao, Ke Li, Xing Sun, Tong Xu, Enhong Chen

    Abstract: Recently, Multimodal Large Language Model (MLLM) represented by GPT-4V has been a new rising research hotspot, which uses powerful Large Language Models (LLMs) as a brain to perform multimodal tasks. The surprising emergent capabilities of MLLM, such as writing stories based on images and OCR-free math reasoning, are rare in traditional multimodal methods, suggesting a potential path to artificial… ▽ More

    Submitted 1 April, 2024; v1 submitted 23 June, 2023; originally announced June 2023.

    Comments: Project page:https://github.com/BradyFU/Awesome-Multimodal-Large-Language-Models

  41. arXiv:2305.17706  [pdf, other

    cs.SD cs.AI eess.AS

    Spot keywords from very noisy and mixed speech

    Authors: Ying Shi, Dong Wang, Lantian Li, Jiqing Han, Shi Yin

    Abstract: Most existing keyword spotting research focuses on conditions with slight or moderate noise. In this paper, we try to tackle a more challenging task: detecting keywords buried under strong interfering speech (10 times higher than the keyword in amplitude), and even worse, mixed with other keywords. We propose a novel Mix Training (MT) strategy that encourages the model to discover low-energy keywo… ▽ More

    Submitted 28 May, 2023; originally announced May 2023.

    Comments: Interspeech 2023

  42. arXiv:2305.10769  [pdf, other

    cs.LG cs.CV

    Catch-Up Distillation: You Only Need to Train Once for Accelerating Sampling

    Authors: Shitong Shao, Xu Dai, Shouyi Yin, Lujun Li, Huanran Chen, Yang Hu

    Abstract: Diffusion Probability Models (DPMs) have made impressive advancements in various machine learning domains. However, achieving high-quality synthetic samples typically involves performing a large number of sampling steps, which impedes the possibility of real-time sample synthesis. Traditional accelerated sampling algorithms via knowledge distillation rely on pre-trained model weights and discrete… ▽ More

    Submitted 13 June, 2023; v1 submitted 18 May, 2023; originally announced May 2023.

  43. arXiv:2304.06883  [pdf, other

    cs.IT eess.SP

    Intelligent Reflecting Surface Aided Wireless Communication Systems: Joint Location and Passive Beamforming Design

    Authors: Jintao Luo, Sixing Yin

    Abstract: Intelligent reflecting surface (IRS) has been widely studied in recent years, it has emerged as a new technology which can reflect the incident signal by intelligently configuring the reflection elements, thus changing the signal propagation environment, enhancing the signals users desire and suppressing the interference between users. In this paper, we study an IRS aided multi-users wireless comm… ▽ More

    Submitted 5 May, 2024; v1 submitted 13 April, 2023; originally announced April 2023.

    Comments: Following the publication of our work, we identified errors in our data analysis process. To uphold the standards of academic integrity and the accuracy of our findings, we feel it necessary to withdraw the current version of our paper. We plan to submit a revised version upon thorough review and correction of these errors

  44. arXiv:2303.12346  [pdf, other

    cs.CV cs.AI

    NUWA-XL: Diffusion over Diffusion for eXtremely Long Video Generation

    Authors: Shengming Yin, Chenfei Wu, Huan Yang, Jianfeng Wang, Xiaodong Wang, Minheng Ni, Zhengyuan Yang, Linjie Li, Shuguang Liu, Fan Yang, Jianlong Fu, Gong Ming, Lijuan Wang, Zicheng Liu, Houqiang Li, Nan Duan

    Abstract: In this paper, we propose NUWA-XL, a novel Diffusion over Diffusion architecture for eXtremely Long video generation. Most current work generates long videos segment by segment sequentially, which normally leads to the gap between training on short videos and inferring long videos, and the sequential generation is inefficient. Instead, our approach adopts a ``coarse-to-fine'' process, in which the… ▽ More

    Submitted 22 March, 2023; originally announced March 2023.

  45. arXiv:2303.09114  [pdf, other

    cs.CV

    AU-aware graph convolutional network for Macro- and Micro-expression spotting

    Authors: Shukang Yin, Shiwei Wu, Tong Xu, Shifeng Liu, Sirui Zhao, Enhong Chen

    Abstract: Automatic Micro-Expression (ME) spotting in long videos is a crucial step in ME analysis but also a challenging task due to the short duration and low intensity of MEs. When solving this problem, previous works generally lack in considering the structures of human faces and the correspondence between expressions and relevant facial muscles. To address this issue for better performance of ME spotti… ▽ More

    Submitted 16 March, 2023; originally announced March 2023.

    Comments: Accepted by ICME-2023

  46. arXiv:2303.04671  [pdf, other

    cs.CV

    Visual ChatGPT: Talking, Drawing and Editing with Visual Foundation Models

    Authors: Chenfei Wu, Shengming Yin, Weizhen Qi, Xiaodong Wang, Zecheng Tang, Nan Duan

    Abstract: ChatGPT is attracting a cross-field interest as it provides a language interface with remarkable conversational competency and reasoning capabilities across many domains. However, since ChatGPT is trained with languages, it is currently not capable of processing or generating images from the visual world. At the same time, Visual Foundation Models, such as Visual Transformers or Stable Diffusion,… ▽ More

    Submitted 8 March, 2023; originally announced March 2023.

  47. arXiv:2303.04086  [pdf, other

    cs.GR cs.CV cs.DC

    NEPHELE: A Neural Platform for Highly Realistic Cloud Radiance Rendering

    Authors: Haimin Luo, Siyuan Zhang, Fuqiang Zhao, Haotian Jing, Penghao Wang, Zhenxiao Yu, Dongxue Yan, Junran Ding, Boyuan Zhang, Qiang Hu, Shu Yin, Lan Xu, JIngyi Yu

    Abstract: We have recently seen tremendous progress in neural rendering (NR) advances, i.e., NeRF, for photo-real free-view synthesis. Yet, as a local technique based on a single computer/GPU, even the best-engineered Instant-NGP or i-NGP cannot reach real-time performance when rendering at a high resolution, and often requires huge local computing resources. In this paper, we resort to cloud rendering and… ▽ More

    Submitted 7 March, 2023; originally announced March 2023.

  48. arXiv:2302.10781  [pdf, other

    cs.CV

    Learning 3D Photography Videos via Self-supervised Diffusion on Single Images

    Authors: Xiaodong Wang, Chenfei Wu, Shengming Yin, Minheng Ni, Jianfeng Wang, Linjie Li, Zhengyuan Yang, Fan Yang, Lijuan Wang, Zicheng Liu, Yuejian Fang, Nan Duan

    Abstract: 3D photography renders a static image into a video with appealing 3D visual effects. Existing approaches typically first conduct monocular depth estimation, then render the input frame to subsequent frames with various viewpoints, and finally use an inpainting model to fill those missing/occluded regions. The inpainting model plays a crucial role in rendering quality, but it is normally trained on… ▽ More

    Submitted 21 February, 2023; originally announced February 2023.

    Comments: 10 pages, 7 figures

  49. High-Dimensional Yield Estimation using Shrinkage Deep Features and Maximization of Integral Entropy Reduction

    Authors: Shuo Yin, Guohao Dai, Wei W. Xing

    Abstract: Despite the fast advances in high-sigma yield analysis with the help of machine learning techniques in the past decade, one of the main challenges, the curse of dimensionality, which is inevitable when dealing with modern large-scale circuits, remains unsolved. To resolve this challenge, we propose an absolute shrinkage deep kernel learning, ASDK, which automatically identifies the dominant proces… ▽ More

    Submitted 5 December, 2022; originally announced December 2022.

    MSC Class: 68U07 ACM Class: J.6

    Journal ref: ASPDAC 2023, January, Tokyo, Japan

  50. arXiv:2211.15235  [pdf, other

    cs.CV

    Reducing Domain Gap in Frequency and Spatial domain for Cross-modality Domain Adaptation on Medical Image Segmentation

    Authors: Shaolei Liu, Siqi Yin, Linhao Qu, Manning Wang

    Abstract: Unsupervised domain adaptation (UDA) aims to learn a model trained on source domain and performs well on unlabeled target domain. In medical image segmentation field, most existing UDA methods depend on adversarial learning to address the domain gap between different image modalities, which is ineffective due to its complicated training process. In this paper, we propose a simple yet effective UDA… ▽ More

    Submitted 28 November, 2022; originally announced November 2022.

    Comments: accepted at Thirty-Seventh AAAI Conference on Artificial Intelligence (AAAI-23)