Traffic Measurement Optimization for Cross-Domain Ad Hoc Networks Based on Meta-Learning and Reinforcement Learning
-
摘要: 跨域自组织网络是一种将不同介质上的节点进行自组织、网络拓扑自适应的网络。在跨域通信网络中, 直接测量技术可获得准确的端到端网络流量信息。但跨域网络中部分节点的低算力和低存储特性, 影响了所有节点运行网络流量测量进程。针对此, 文中提出一种基于元学习与近端策略优化的网络流量测量优化方法, 该方法根据上一时隙网络运行环境, 来确定下一时隙执行网络流量测量的节点集合, 目标是在尽可能少的节点上执行测量进程从而获取尽可能多的网络流量信息。文中同时通过3个网络数据集对所提方法进行仿真验证, 实验结果表明, 基于元学习和强化学习的跨域自组织网络流量测量优化算法可以有效选择流经流量大的节点, 具有较快的收敛速度和测量效率。Abstract: Cross-domain Ad Hoc network is a network that self-organizes nodes on different media and adapts to network topology. In cross-domain communication networks, direct measurement technology helps obtain accurate end-to-end network traffic information. However, the low computational power and low storage characteristics of some nodes in the cross-domain network hinder all nodes from running the network traffic measurement process. To address this issue, a network traffic measurement optimization method based on meta-learning and proximal policy optimization(PPO) was proposed. This method determined the set of nodes that performed network traffic measurement in the next time slot according to the network operating environment of the previous time slot, so as to perform the measurement process on as few nodes as possible to obtain as much network traffic information as possible. Three network datasets were used to verify the proposed method. The experimental results show that the traffic measurement optimization algorithm for cross-domain Ad Hoc networks based on meta-learning and reinforcement learning can effectively select the nodes with large traffic flow, with faster convergence speed and higher measurement efficiency.
-
表 1 NS3仿真无线自组织网络参数
Table 1. Parameters of NS3 simulated wireless Ad Hoc network
参数 数值 节点个数 $7 \times 7$ 网络运行时间/s 2 700 移动模型 高斯马尔可夫模型 网络移动边界 限制在$450 \times 450 \times 20$的长方体内 单个节点移动边界 限制在$50 \times 50 \times 20$的长方体内 表 2 基于元学习与PPO网络的测量优化结构参数
Table 2. Structural parameters for measurement optimization based on meta-learning with PPO network
模型 层级 参数值 策略网络 输入层 神经单元个数: M
(M表示OD流总数)FC-1、FC-2、
FC-3: 全连接层神经单元个数: 100
激活函数: ReLU函数输出层 神经单元个数: N
(N表示节点个数)
激活函数: SoftMax函数评价网络 输入层 神经单元个数: M FC-1、FC-2、
FC-3: 全连接层神经单元个数: 100
激活函数: ReLU函数输出层 神经单元个数: 1(奖励值) 表 3 仿真过程关键参数
Table 3. Key parameters in the simulation process
参数 数值 动作网络学习率 0.000 1 评论网络学习率 0.000 1 奖励衰退因子$ \gamma $ 0.9 PPO中截断范围的参数 0.2 优势函数的缩放因子 0.95 经验回放池大小 200 内层训练轮数 10 外层训练轮数 40 -
[1] RAMPHULL D, MUNGUR A, ARMOOGUM S, et al. A review of mobile Ad Hoc network(MANET) protocols and their applications[C]//2021 5th International Conference on Intelligent Computing and Control Systems (ICICCS). Chongqing, China: IEEE, 2021: 204-211. [2] CHO J H, SWAMI A, CHEN R. A survey on trust management for mobile Ad Hoc networks[J]. IEEE Communications Surveys & Tutorials, 2010, 13(4): 562-583. [3] CONTI M, GIORDANO S. Mobile Ad Hoc networking: Milestones, challenges, and new research directions[J]. IEEE Communications Magazine, 2014, 52(1): 85-96. doi: 10.1109/MCOM.2014.6710069 [4] HAMDI M M, AUDAH L, RASHID S A, et al. A review of applications, characteristics and challenges in vehicular Ad Hoc networks(VANETs)[C]//2020 International Congress on Human-Computer Interaction, Optimization and Robotic Applications(HORA). Ankara, Turkey: IEEE, 2020: 1-7. [5] JHAVERI R H, PATEL S J, JINWALA D C. DoS attacks in mobile Ad Hoc networks: A survey[C]//2012 Second International Conference on Advanced Computing & Communication Technologies. Rohtak, India: IEEE, 2012: 535-541. [6] SAHINGOZ O K. Networking models in flying Ad Hoc networks(FANETs): Concepts and challenges[J]. Journal of Intelligent & Robotic Systems, 2014, 74: 513-527. [7] CLEMM A, ZHANI M F, BOUTABA R. Network management 2030: Operations and control of network 2030 services[J]. Journal of Network and Systems Management, 2020, 28(4): 721-750. doi: 10.1007/s10922-020-09517-0 [8] FATIMA M, KHURSHEED A. Heterogeneous Ad Hoc network management: An overview[M]//SANJOY D. Cloud computing enabled big-data analytics in 2ireless Ad-Hoc networks. Boca Raton: CRC Press, 2022: 103-123. [9] GHODE S D, BHOYAR K K. NEMA: Node energy monitoring algorithm for zone head selection in mobile Ad Hoc network using residual battery power of node[C]//2016 International Conference on Wireless Communications, Signal Processing and Networking(WiSPNET). Chennai, India: IEEE, 2016: 1999-2004. [10] SALSANO S, PATRIARCA F, PRESTI F L, et al. Accurate and efficient measurements of IP level performance to drive interface selection in heterogeneous wireless networks[J]. IEEE Transactions on Mobile Computing, 2018, 17(10): 2223-2235. doi: 10.1109/TMC.2018.2807842 [11] YU Y, NING Z, SONG Q, et al. A dynamic cooperative monitor node selection algorithm in wireless mesh networks[C]//2015 IEEE 17th International Conference on High Performance Computing and Communications, 2015 IEEE 7th International Symposium on Cyberspace Safety and Security, and 2015 IEEE 12th International Conference on Embedded Software and Systems. New York, NY, USA: IEEE, 2015: 1800-1805. [12] YU Y, GUO L, HUANG J, et al. A cross-layer security monitoring selection algorithm based on traffic prediction[J]. IEEE Access, 2018, 6: 35382-35391. doi: 10.1109/ACCESS.2018.2851993 [13] PHAN X T, MARTINEZ-CASANUEVA I D, FUKUDA K. Adaptive and distributed monitoring mechanism in software-defined networks[C]//2017 13th International Conference on Network and Service Management(CNSM). Tokyo, Japan: IEEE, 2017: 1-5. [14] SHIN D H, BAGCHI S, WANG C C. Toward optimal distributed monitoring of multi-channel wireless networks[J]. IEEE Transactions on Mobile Computing, 2015, 15(7): 1826-1838. [15] BOUHTOU M, KLOPFENSTEIN O. Robust optimization for selecting netflow points of measurement in an IP network[C]//IEEE GLOBECOM 2007-IEEE Global Telecommunications Conference. Washington, D.C., USA: IEEE, 2007: 2581-2585. [16] NIE L, WANG H, JIANG X, et al. Traffic measurement optimization based on reinforcement learning in large-scale its-oriented backbone networks[J]. IEEE Access, 2020, 8: 36988-36996. doi: 10.1109/ACCESS.2020.2975238 [17] ABURUMMAN A, SEO W J, ISLAM R, et al. A secure cross-domain sip solution for mobile Ad Hoc network using dynamic clustering[C]//Security and Privacy in Communication Networks: 11th EAI International Conference, SecureComm 2015. Dallas, TX, USA: Springer International Publishing, 2015: 649-664. [18] YANG Y, WU J, LONG C, et al. A blockchain-based cross-domain authentication for conditional privacy preserving in vehicular ad-hoc network[C]//2021 The 3rd International Conference on Blockchain Technology. Shanghai, China: Association for Computing Machinery, 2021: 183-188. [19] Li S E. Deep reinforcement learning[M]//Reinforcement Learning for Sequential Decision and Optimal Control. Singapore: Springer Nature Singapore, 2023: 365-402. [20] OROOJLOOY A, HAJINEZHAD D. A review of cooperative multi-agent deep reinforcement learning[J]. Applied Intelligence, 2023, 53(11): 13677-13722. doi: 10.1007/s10489-022-04105-y [21] YU C, VELU A, VINITSKY E, et al. The surprising effectiveness of PPO in cooperative multi-agent games[J]. Advances in Neural Information Processing Systems, 2022, 35: 24611-24624. [22] HOSPEDALES T, ANTONIOU A, MICAELLI P, et al. Meta-Learning in neural networks: A survey[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021, 44(9): 5149-5169. [23] WANG J, HU J, MIN G, et al. Fast adaptive task offloading in edge computing based on meta reinforcement learning[J]. IEEE Transactions on Parallel and Distributed Systems, 2020, 32(1): 242-253. [24] JEONG T, KIM H. OOD-MAML: Meta-learning for few-shot out-of-distribution detection and classification[J]. Advances in Neural Information Processing Systems, 2020, 33: 3907-3916.