The increasing capability of computing power and mobile graphics has made possible the release of self-contained augmented reality (AR) headsets featuring efficient head-anchored tracking solutions. Ego motion estimation based on well-established infrared tracking of markers ensures sufficient accuracy and robustness. Unfortunately, wearable visible-light stereo cameras with short baseline and operating under uncontrolled lighting conditions suffer from tracking failures and ambiguities in pose estimation. To improve the accuracy of optical self-tracking and its resiliency to marker occlusions, degraded camera calibrations, and inconsistent lighting, in this work we propose a sensor fusion approach based on Kalman filtering that integrates optical tracking data with inertial tracking data when computing motion correlation. In order to measure improvements in AR overlay accuracy, experiments are performed with a custom-made AR headset designed for supporting complex manual tasks performed under direct vision. Experimental results show that the proposed solution improves the head-mounted display (HMD) tracking accuracy by one third and improves the robustness by also capturing the orientation of the target scene when some of the markers are occluded and when the optical tracking yields unstable and/or ambiguous results due to the limitations of using head-anchored stereo tracking cameras under uncontrollable lighting conditions.