Let K be the number of states and L the number of actions. Then |Ht|=Kt+1 Lt and the computation time would be:
The above value is for a general history dependent policy. If the policy is Markovian, then |Ht|=K and we get a computation time of K2(N-1)