Scale-Adaptive Convolutions for Scene Parsing.

multi-scale features and the spatial inhomogeneity of a scene. ... Recently, methods based on fully convolutional neural networks have achieved new records on scene parsing. ... aggregating these features for a better solution for scene parsing. ...

arXiv:1806.07049v1 fatcat:hrn7trkgizhp5mlwknybinn3ou

In this paper, we propose a Content-Adaptive Scale Interaction Network (CASINet) to exploit the multi-scale features for scene parsing. ... We achieve state-of-the-art performance on three scene parsing benchmarks Cityscapes, ADE20K and LIP. ... To address the above problems, we propose a Content-Adaptive Scale Interaction Network (CASINet) to adaptively exploit multi-scale features for scene parsing. ...

arXiv:1904.08170v3 fatcat:rxeulrn225gerdd5mf5mgamqum

Multiple Versions

Exploiting global contextual information has been shown useful for improving performance of scene parsing and hence is widely used. ... In this paper, unlike previous work that captures long-range dependencies with multi-scale feature fusion or attention mechanism, we address the scene parsing tasks by aggregating rich contextual information ... LIU AND YIN: DUAL GRAPH-BASED CONTEXT AGGREGATION FOR SCENE PARSING ...

dblp:conf/bmvc/LiuY21 fatcat:xbwdb4pkvvgijdwdh7bxsslcuy

Extensive experiments on ADE20K and Cityscapes datasets well demonstrate the effectiveness of the two refinement methods for refining scene parsing predictions. ... Most of existing scene parsing methods suffer from the serious problems of both inconsistent parsing results and object boundary shift. ... neural network within a uniform framework for scene parsing. ...

doi:10.24963/ijcai.2017/479 dblp:conf/ijcai/ZhangTLLY17 fatcat:2zclvahdt5f3thmvxuchozngui

Each position on the feature map is connected to all the other ones through a self-adaptively learned attention mask. Moreover, information propagation in bi-direction for scene parsing is enabled. ... complex scenes. ... We thank Sensetime Research for providing computing resources. ...

doi:10.1007/978-3-030-01240-3_17 fatcat:mqwwq7mr5nfwxigtdibwwgobey

Recent works attempt to improve scene parsing performance by exploring different levels of contexts, and typically train a well-designed convolutional network to exploit useful contexts across all pixels ... Furthermore, we import multiple such modules to build several adaptive context blocks in different levels of network to obtain a coarse-to-fine result. ... Most recent approaches for scene parsing are based on Fully Convolutional Networks (FCNs) [24] . However, there are two limitations in FCN framworks. ...

arXiv:1911.01664v1 fatcat:mdw3adrzkbfe7hfw2akvlo45oi

We propose the method that uses only computer graphics datasets to parse the real world 3D scenes. 3D scene parsing based on semantic segmentation is required to implement the categorical interaction in ... Our application performs accurate 3D scene parsing in real-time on an actual room. ... They all evaluate performance on the dataset of driving scene parsing. Our approach aims for indoor scene parsing which has domain shift between computer graphics data and real data. ...

arXiv:1812.03622v2 fatcat:f3p3x42u5vcfre4d7tooh2hkoq

Multiple Versions

To deal with these dilemmas, in this paper, we propose a multi-branch adaptive hard region mining network (MBANet) for urban scene parsing of HRRSIs. ... In our experiments, the three branches complemented each other in feature extraction and demonstrated state-of-the-art performance for urban scene parsing of HRRSIs. ... Conclusions This paper proposes a multi-branch adaptive hard region mining network for urban scene parsing of HRRSIs. ...

doi:10.3390/rs14215527 fatcat:zfpbtfi3vre3fpx6sehgwc7wte

DOAJ Szczepanski

Furthermore, considering the vast difference in instance sizes, we propose a detail refining decoder for scene parsing, which recovers the details of different semantic regions in traffic scenes owing ... We perform a detailed analysis of the dataset and comprehensively evaluate previous popular scene parsing methods, instance segmentation methods and unsupervised domain adaption methods. ... [57] first proposed a fully convolutional network (FCN) that generates dense predictions for scene parsing. ...

arXiv:2303.02835v2 fatcat:jyfdzowu4jhzfhhgby5vje4iue

Open Access Multiple Versions

steering parsing architecture that effectively adapts the learned spatiotemporal features to scene parsing tasks and provides strong guidance for any off-the-shelf parsing model to achieve better video ... scene parsing performance. ... Related Work Recent image scene parsing progress is mostly stimulated by various new CNN architectures, including the fully convolutional architecture (FCN) with multi-scale or larger receptive fields ...

arXiv:1612.00119v2 fatcat:penm6nsyubevhptaeuar6awx7y

Multiple Versions

Recently, Fully Convolutional Network (FCN) seems to be the go-to architecture for image segmentation, including semantic scene parsing. ... To address these limitations, in this paper we propose a novel Deep Multiphase Level Set (DMLS) method for semantic scene parsing, which efficiently incorporates multiphase level sets into deep neural ... In summary, our contributions are three folds: • We propose a novel MLS framework for large-scale multi-class scene parsing. ...

arXiv:1910.03166v2 fatcat:7viff6ta75hy7fgg5rnxcya2wa

Multiple Versions

Most of the current solutions employ generic image parsing models that treat all scales and locations in the images equally and do not consider the geometry property of car-captured urban scene images. ... Thus, they suffer from heterogeneous object scales caused by perspective projection of cameras on actual scenes and inevitably encounter parsing failures on distant objects as well as other boundary and ... FCN in FoveaNet FoveaNet is based on the fully convolutional network (FCN) [25] for parsing the images. As a deeper CNN model benefits more for the parsing performance, we here follow Chen et al. ...

arXiv:1708.02421v1 fatcat:r57jbwtnsrbb3oodjlgf7tmc44

Since events may occur in auditory and visual modalities, multimodal detailed perception is essential for complete scene comprehension. ... However, they do not consider semantic information at multiple scales, which makes the model difficult to localize events in different lengths. ... Our model captures and integrates multimodal pyramid features in distinct temporal scales for comprehensive scene understanding. ...

arXiv:2111.12374v2 fatcat:urygwtn3bjgednej3vz5gvm77i

Multiple Versions

Index Terms -scene parsing; object window; deep convolutional network; SIFT flow 1. ... In this paper, we focus on improving scene parsing accuracy in these two issues. ... This demonstrates the effectiveness of using deep convolutional features as visual scene descriptor. ...

doi:10.1109/icip.2015.7351134 dblp:conf/icip/MaHH15 fatcat:6wbatlo6onfvvdcto2zhmhuz3y

Specifically, we first review the general frameworks for each task and introduce the relevant variants. The advantages and limitations of each method are also discussed. ... Finally, we explore the future trends and challenges of semantic image parsing. ... Zhao et al. proposed a superior framework−−pyramid scene parsing network (PSPNet) [1] −−for scene parsing in complex scenes. ...

doi:10.1007/s11704-018-7195-8 fatcat:p5hvfwhl5rbork5vf4rpnx3h6u

Multiple Versions

MoE-SPNet: A Mixture-of-Experts Scene Parsing Network [article]

Preserved Fulltext

CaseNet: Content-Adaptive Scale Interaction Networks for Scene Parsing [article]

Preserved Fulltext

Other Versions

Dual Graph-Based Context Aggregation for Scene Parsing

Preserved Fulltext

Global-residual and Local-boundary Refinement Networks for Rectifying Scene Parsing Predictions

Preserved Fulltext

PSANet: Point-wise Spatial Attention Network for Scene Parsing [chapter]

Preserved Fulltext

Adaptive Context Network for Scene Parsing [article]

Preserved Fulltext

3D Scene Parsing via Class-Wise Adaptation [article]

Preserved Fulltext

Other Versions

Multi-Branch Adaptive Hard Region Mining Network for Urban Scene Parsing of High-Resolution Remote-Sensing Images

Preserved Fulltext

Traffic Scene Parsing through the TSP6K Dataset [article]

Preserved Fulltext

Other Versions

Video Scene Parsing with Predictive Feature Learning [article]

Preserved Fulltext

Other Versions

Deep Multiphase Level Set for Scene Parsing [article]

Preserved Fulltext

Other Versions

FoveaNet: Perspective-aware Urban Scene Parsing [article]

Preserved Fulltext

MM-Pyramid: Multimodal Pyramid Attentional Network for Audio-Visual Event Localization and Video Parsing [article]

Preserved Fulltext

Other Versions

Nonparametric scene parsing with deep convolutional features and dense alignment

Preserved Fulltext

Learning deep representations for semantic image parsing: a comprehensive overview

Preserved Fulltext