58 | 0 | 52 |
下载次数 | 被引频次 | 阅读次数 |
多视图立体匹配(Multi-View Stereo)是计算机视觉领域的重要任务之一,旨在从多个视角的图像中恢复场景的结构信息。然而,由于成本体积聚合在局部存在着严重的不一致性,直接聚合几何相邻成本会导致严重错误导向。现有的方法要么寻求二维空间的最优选择性聚集,要么增加聚集的手段,但都无法有效解决成本体积的几何不一致性,导致深度估计的精度和鲁棒性不佳。为了解决这个问题,提出用于多视图立体的协同表达(CRMVS),旨在协同多个模块整合几何的一致性信息,提高多视图立体匹配任务的深度估计精度和鲁棒性。首先,利用改进的特征金字塔网络(FPN)增强网络的特征提取能力。其次,设计了一个渐进式权重网络模块(PWN)进行代价体的构建。最后,设计了一个几何代价聚合与精化网络模块(GCR)来对代价体进行精准聚合。实验结果表明在DTU,Tanks&Temple数据集上都展现出了先进的性能。
Abstract:Multi-View Stereo is one of the important tasks in the field of computer vision, which aims to recover the structural information of a scene from images from multiple perspectives. However, due to the severe local inconsistencies in cost-volume aggregation,direct aggregation of geometrically adjacent costs can lead to serious misdirection. The existing methods either seek the optimal selective aggregation of two-dimensional space, or increase the means of aggregation, but they cannot effectively solve the geometric inconsistency of cost volume, resulting in poor accuracy and robustness of depth estimation. In order to solve this problem, a Collaborative Representation for Multi-view Stereo(CRMVS) was proposed, which aimed to integrate the consistency information of the geometry with multiple modules and improve the depth estimation accuracy and robustness of the multi-view stereo matching task. Firstly, we use the improved Feature Pyramid Network(FPN) to enhance the feature extraction capability of the network. Secondly, we design a progressive weighted network module(PWN) to construct the cost body. Finally, we design a Geometric Cost Aggregation and Refinement Network Module(GCR) to accurately aggregate the cost body. Experimental results show that our method shows advanced performance on DTU, Tanks &Temple datasets.
[1] Kendall A, Martirosyan H, Dasgupta S, et al. End-to-end learning of geometry and context for deep stereo regression[J]. Proceedings of the IEEE International Conference on Computer Vision,2017:66-75.
[2] Luo G, Wu X, Lin L. A Dual Attention Network for Scene Segmentation and Object Proposal Generation[J]. In European Conference on Computer Vision(ECCV),2018:269-285.
[3] Zhang H, Goodfellow I, Metaxas D, et al. Self-Attention Generative Adversarial Networks[J]. International Conference on Machine Learning(ICML),2018,arXiv:1805.08318.
[4] Park J, Tai Y W. Deep multi-view stereo using 3d guided pruning network[J]. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition,2019:5184-5193.
[5] Yao H, Luo Z, Li S, et al. MVSNet:Depth inference for unstructured multi-view stereo[J]. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition,2018:2821-2830.
[6] Godard C, Mac Aodha O, Firman M, et al. Digging into selfsupervised monocular depth estimation[J]. Proceedings of the IEEE/CVF International Conference on Computer Vision,2019:3828-3838.
[7]周晓清,王翔,郑锦,等.基于自适应空间稀疏化的高效多视图立体匹配[J].电子学报,2023,51(11):3079-3091.
[8]孙凯,张成,詹天,等.融合注意力机制和多层动态形变卷积的多视图立体视觉重建方法[J].兵工学报,2024:1-11.
[9] Furukawa Y, Ponce J. Accurate, Dense, and Robust Multiview Stereopsis[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2010, 32:1362-1376.
[10] Yao Y, Luo Z, Li S, et al. Recurrent mvsnet for high-resolution multi-view stereo depth inference[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition,2019:5525-5534.
[11] Chen R, Han S, Xu J, et al. Point-based multi-view stereo network[C]//Proceedings of the IEEE/CVF international conference on computer vision,2019:1538-1547.
[12] Gu X, Fan Z, Zhu S, et al. Cascade cost volume for highresolution multi-view stereo and stereo matching[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020:2495-2504.
[13] Zhang X, Hu Y, Wang H, et al. Long-range attention network for multi-view stereo[C]//proceedings of the IEEE/CVF winter conference on applications of computer vision,2021:3782-3791.
[14] Henrik Aan?s, Rasmus Ramsb?l Jensen, George Vogiatzis, et al.Large-scale data for multiple-view stereopsis[J]. International Journal of Computer Vision,2016:153–168.
[15] Arno Knapitsch, Jaesik Park, Qian-Yi Zhou, et al. Tanks and temples:Benchmarking large-scale scene reconstruction[J]. ACM Transactions on Graphics(ToG),2017,36(4):1–13.
[16] Silvano Galliani, Katrin Lasinger, Konrad Schindler. Massively parallel multiview stereopsis by surface normal diffusion[J]. Proceedings of the IEEE International Conference on Computer Vision,2015:873–881.
[17] Sch?nberger J L, Zheng E, Frahm J M, et al. Pixelwise view selection for unstructured multi-view stereo[C]//Computer Vision–ECCV2016:14th European Conference, Amsterdam, The Netherlands, Springer International Publishing,2016:501-518.
[18] Tong W, Guan X, Kang J, et al. Normal assisted pixel-visibility learning with cost aggregation for multiview stereo[J]. IEEE Transactions on Intelligent Transportation Systems, 2022, 23(12):24686-24697.
[19] Wei Z, Zhu Q, Min C, et al. Aa-rmvsnet:Adaptive aggregation recurrent multi-view stereo network[C]//Proceedings of the IEEE/CVF international conference on computer vision, 2021:6187-6196.
[20] Zhang J, Li S, Luo Z, et al. Vis-mvsnet:Visibility-aware multiview stereo network[J]. International Journal of Computer Vision, 2023, 131(1):199-214.
[21] Peng R, Wang R, Wang Z, et al. Rethinking depth estimation for multi-view stereo:A unified representation[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022:8645-8654.
[22] Wang X, Zhu Z, Huang G, et al. Mvster:Epipolar transformer for efficient multi-view stereo[C]//European Conference on Computer Vision.Cham:Springer Nature Switzerland, 2022:573-591.
[23] Ding Y, Yuan W, Zhu Q, et al. Transmvsnet:Global contextaware multi-view stereo network with transformers[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022:8585-8594.
[24] Mi Z, Di C, Xu D. Generalized binary search network for highlyefficient multi-view stereo[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition,2022:12991-13000.
[25] Zhang Y, Zhu J, Lin L. Multi-view stereo representation revist:Region-aware mvsnet[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition,2023:17376-17385.
[26] Zhang Z, Peng R, Hu Y, et al. Geomvsnet:Learning multi-view stereo with geometry perception[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition,2023:21508-21518.
[27] Cao C, Ren X, Fu Y. Mvsformer:Learning robust image representations via transformers and temperature-based depth for multi-view stereo[J]. arXiv preprint arXiv:2208.02541, 2022, 5.
[28] Liu T, Ye X, Zhao W, et al. When epipolar constraint meets nonlocal operators in multi-view stereo[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision,2023:18088-18097.
基本信息:
DOI:
中图分类号:TP391.41
引用信息:
[1]朱治年,刘韵婷,肖培宇等.多阶段协同的多视图立体算法研究[J].通信与信息技术,2025,No.274(02):28-32.
基金信息: