Situations have come up where the 5+ minute kill signal for
run_task_manager is emitted to the worker process running it, but
since the worker improperly inherited the AWXConsumerBase().stop()
handler a deadlock ultimately was triggered on the database
connection.
* Sleep before trying to reconnect
Most common reason for entering this reconnect loop is when Redis
service stops before the callback receiver when stopping tower services.
additionaly, optimize away several per-event host lookups and
changed/failed propagation lookups
we've always performed these (fairly expensive) queries *on every event
save* - if you're processing tens of thousands of events in short
bursts, this is way too slow
this commit also introduces a new command for profiling the insertion
rate of events, `awx-manage callback_stats`
see: https://github.com/ansible/awx/issues/5514
this commit implements the bulk of `awx-manage run_dispatcher`, a new
command that binds to RabbitMQ via kombu and balances messages across
a pool of workers that are similar to celeryd workers in spirit.
Specifically, this includes:
- a new decorator, `awx.main.dispatch.task`, which can be used to
decorate functions or classes so that they can be designated as
"Tasks"
- support for fanout/broadcast tasks (at this point in time, only
`conf.Setting` memcached flushes use this functionality)
- support for job reaping
- support for success/failure hooks for job runs (i.e.,
`handle_work_success` and `handle_work_error`)
- support for auto scaling worker pool that scale processes up and down
on demand
- minimal support for RPC, such as status checks and pool recycle/reload