additionaly, optimize away several per-event host lookups and
changed/failed propagation lookups
we've always performed these (fairly expensive) queries *on every event
save* - if you're processing tens of thousands of events in short
bursts, this is way too slow
this commit also introduces a new command for profiling the insertion
rate of events, `awx-manage callback_stats`
see: https://github.com/ansible/awx/issues/5514
this _technically_ prevents a remote code exploit where a user who has
access to publish AMQP messages to the dispatch queue could craft
a special message that would import and run arbitrary Python functions;
that said, the types of user with this privilege level are generally
_already_ the awx user (so they can already do this by hand if they
want)
this is a simple sanity check, but it should help us avoid shooting
ourselves in the foot in complicated scenarios, such as:
1. A dispatcher worker is running a job, and it's killed with `kill -9`
2. The dispatcher attempts to reap jobs with a matching celery_task_id
3. The associated sync project update has the *same* celery_task_id
(an implementation detail of how we implemented that), and it ends
up getting reaped _even though_ it's already finished and has
status=successful
1. Install awx w/ a single node.
2. Start a long-running job.
3. Forcibly kill the `awx-manage run_dispatcher` process (e.g.,
SIGKILL) and do not start it again.
4. The job remains in running - without a second cluster to discover
the job, it is never reaped.
5. This PR allows you to cancel the job from the UI+API.
when `--reload` is sent to the dispatcher, it sends a special QUIT
message to each worker in the pool so that it will exit gracefully at
the next opportunity
when a worker process exits unexpectedly, the dispatcher attempts to
recover its queued messages and sends them to another worker in the
pool; in this scenario, we should _never_ re-enqueue these special
QUIT messages (because the process doesn't need to quit, it's already
gone)
To reproduce this race condition:
1. Launch an adhoc that does `sleep 60`
2. Run `awx-manage run_dispatcher --reload` to enqueue a `QUIT` message
into the worker's queue
3. Find the pid of the worker running the `sleep 60` and `SIGKILL` it.
4. Observe that dispatcher attempts to requeue the `QUIT` message and
logs a confusing error.
this commit implements the bulk of `awx-manage run_dispatcher`, a new
command that binds to RabbitMQ via kombu and balances messages across
a pool of workers that are similar to celeryd workers in spirit.
Specifically, this includes:
- a new decorator, `awx.main.dispatch.task`, which can be used to
decorate functions or classes so that they can be designated as
"Tasks"
- support for fanout/broadcast tasks (at this point in time, only
`conf.Setting` memcached flushes use this functionality)
- support for job reaping
- support for success/failure hooks for job runs (i.e.,
`handle_work_success` and `handle_work_error`)
- support for auto scaling worker pool that scale processes up and down
on demand
- minimal support for RPC, such as status checks and pool recycle/reload