run this command on _any_ node in an awx cluster:
$ awx-manage profile_sql --threshold=2.0 --minutes=1
...and for 1 minute, the timing for _every_ SQL query in _every_ awx
Python process that uses the Django ORM will be measured
queries that run longer than (in this example) 2 seconds will be
written to a per-process sqlite database in /var/lib/awx/profile, and
the file will contain an EXPLAIN VERBOSE for the query and the full
Python stack that led to that SQL query's execution (this includes not
just WSGI requests, but background processes like the runworker and
dispatcher)
$ awx-manage profile_sql --threshold=0
...can be used to disable profiling again (if you don't want to wait for
the minute to expire)
* Removed the emailing of admins on request error. When turned on, the
handler will include all django settings in the email. This is not
desirable from a security standpoint.
this commit implements the bulk of `awx-manage run_dispatcher`, a new
command that binds to RabbitMQ via kombu and balances messages across
a pool of workers that are similar to celeryd workers in spirit.
Specifically, this includes:
- a new decorator, `awx.main.dispatch.task`, which can be used to
decorate functions or classes so that they can be designated as
"Tasks"
- support for fanout/broadcast tasks (at this point in time, only
`conf.Setting` memcached flushes use this functionality)
- support for job reaping
- support for success/failure hooks for job runs (i.e.,
`handle_work_success` and `handle_work_error`)
- support for auto scaling worker pool that scale processes up and down
on demand
- minimal support for RPC, such as status checks and pool recycle/reload
now that we have the CSRF middleware, we have a reliable token
available to us which we can use to verify individual ws_receive
payloads; this is _simpler_ than making sure you've properly configured
trusted origins, and it's also more secure than Origin header checks
see: https://github.com/ansible/tower/issues/2661
* Each time a route is needed (i.e. when a task is sumitted to celery).
The router will be queried. This is ideal. With the previous method we
had to consider how a change in the routes would propogate to all celery
workers and nodes.
* fully describe the default awx queue
* Our dynamic queue registration would correct awx_private_queue.
However, we don't want celery to even create an "invalid"/extra
queue-exchange-route. This change makes sure we don't create extranious
things in rabbitmq.
* reduce the cluster queue registration output. Only output when the
queue registration list changes.
refactor existing handlers to be the related
"real" handler classes, which are swapped
out dynamically by external logger "proxy" handler class
real handler swapout only done on setting change
remove restart_local_services method
get rid of uWSGI fifo file
change TCP/UDP return type contract so that it mirrors
the request futures object
add details to socket error messages
* Using Kombu's default Broadcast() constructor requires only 1
parameter. That parameter defines the exchange name and the queue name
is randomly generated per-node.
* This caused problems if/when celery enters an infinite restart loop
because too many rabbit queues get created and rabbit OOM's
(gracefully).
* To remedy this we tell Broadcast the queue name to use, which is
derived from some constant + the node name so that the per-node queue
name is stable.
* Before, we had a special group, tower, that ran any async work that
tower needed done. This allowed users fine grain control over which
nodes did background work. However, this granularity was too complicated
for users. So now, all tower system work goes to a special non-user
exposed celery queue. Tower remains the fallback instance group to
execute jobs on. The tower group will be created upon install and
protected from deletion.