Multi-processing is used to parallelize data generation and training
Different environments can be used in a format close to gym
Recurrent networks are supported and their state is saved and dynamically updated in the replay buffer
Data can be generated according to multiple actor policies (greedy, deterministic, etc.) and combinations are possible to formulate "training schedules"
Frame stacking is supported
Other modifications to standard Q-learning are possible