\subsection{Learning To Act} Used if there is no state-transition model or cost/reward func. \bi{Markov Decision Process}