Off-Policy Temporal Difference Learning For Robotics And Autonomous Systems