Learning-based optimal and robust control: A policy optimization perspective