The continuous growth of advanced services integrating processing, storage, real-time data exchange, and transmission capacities highlights the importance of Quality of Service (QoS) and Quality of Experience (QoE). In particular, it is very relevant for enterprises that rely on Wide Area Networks (WANs) for reliable and efficient communication between their headquarters and branches. With the evolution of new applications with stringent requirements, ensuring high-performing WANs is a critical priority for businesses looking to remain competitive and provide a seamless customer experience. This paper explores using Rein-forcement Learning (RL) algorithms on Software-Defined Wide Area Network (SD-WAN) to improve QoS and reduce costs. SD-WAN allows for the dynamic reconfiguration of network devices in real-time, better meeting network measurements and service requirements. By leveraging self-learning techniques such as RL, which exploits feedback mechanisms, we can improve network availability by automatically routing traffic over existing network technologies. We also compare two SD-WAN topology scenarios, including direct WAN connections between Customer Premises Equipment (CPEs) within enterprise premises and CPEs used as peering points for traffic routing. Our approach shows promising results regarding network performance and cost-effectiveness, which can benefit businesses looking to improve their network infrastructure.