zware/awx - awx - Gitea: Git with a cup of tea

zware/awx

mirror of https://github.com/ZwareBear/awx.git synced 2026-04-08 00:51:48 -05:00

Author	SHA1	Message	Date
AlanCoding	e59cb07064	Add wording for control message log	2020-02-11 10:01:25 -05:00
Ryan Petrello	38a08d163c	get rid of celery/celerybeat alternative to https://github.com/ansible/awx/pull/2530 which makes use of https://pypi.org/project/schedule/ this doesn't have support for any persistence (like how celery beat uses a shelve file), because all of our periodic jobs run at most every few minutes	2020-02-10 17:32:02 -05:00
Ryan Petrello	3c31e0ed16	some more minor callback cleanup and development tweaks	2020-01-27 17:18:09 -05:00
Ryan Petrello	78b00652bd	add the ability to enable profiling for the callback receiver workers	2020-01-27 12:03:53 -05:00
Ryan Petrello	8f33f1a6c2	remove another expensive logging lookup in the parent callback process	2020-01-24 16:46:32 -05:00
Bill Nottingham	4e46d5d7cd	Fix some lint	2020-01-20 17:15:27 -05:00
Ryan Petrello	8bd9233d2c	remove some unnecessary callback receiver debugging code	2020-01-14 14:21:53 -05:00
Ryan Petrello	306f504fb7	optimize the callback receiver to buffer writes on high throughput additionaly, optimize away several per-event host lookups and changed/failed propagation lookups we've always performed these (fairly expensive) queries on every event save - if you're processing tens of thousands of events in short bursts, this is way too slow this commit also introduces a new command for profiling the insertion rate of events, `awx-manage callback_stats` see: https://github.com/ansible/awx/issues/5514	2020-01-14 12:04:26 -05:00
AlanCoding	eec08fdcca	Log case of duplicate UUIDs	2020-01-09 07:31:32 -05:00
Ryan Petrello	83550eeba0	make the callback receiver more robust to duplicate UUIDs from ansible	2019-11-01 09:24:52 -04:00
Ryan Petrello	3094b67664	work around a bug in the k8s client that leaves trash in /tmp	2019-10-29 11:24:17 -04:00
Ryan Petrello	d01088d33e	Revert "add support for `awx-manage run_callback_receiver --status`"	2019-10-18 09:49:02 -04:00
Ryan Petrello	ffb1707e74	add support for `awx-manage run_callback_receiver --status`	2019-10-17 11:10:27 -04:00
Buymov Ivan	f2676064fd	Fix error with rejoining node to cluster after lost connection to postgres	2019-09-27 01:17:27 -04:00
Ryan Petrello	40b1e89b67	add the ability to disable RabbitMQ queue durability	2019-05-28 15:49:32 -04:00
Ryan Petrello	17a803f49c	remove the old callback plugin import paths and callback-specific tests	2019-04-12 16:11:23 -04:00
Ryan Petrello	32ee9838af	use the correct logger for the callback receiver the callback receiver and dispatcher share several modules, so add logic to use the correct logger	2019-03-15 08:09:47 -04:00
Ryan Petrello	daeeaf413a	clean up unnecessary usage of the six library (awx only supports py3)	2019-01-25 00:19:48 -05:00
Ryan Petrello	4707dc2a05	clean up some unnecessary dispatcher reaping code	2019-01-24 11:11:05 -05:00
Ryan Petrello	b2442d42a3	detect dead DB connections in the dispatcher when reaping jobs	2019-01-22 08:40:26 -05:00
Ryan Petrello	f223df303f	convert py2 -> py3	2019-01-15 14:09:01 -05:00
Ryan Petrello	5950f26c69	only allow the task dispatch worker to import and run decorated tasks this _technically_ prevents a remote code exploit where a user who has access to publish AMQP messages to the dispatch queue could craft a special message that would import and run arbitrary Python functions; that said, the types of user with this privilege level are generally _already_ the awx user (so they can already do this by hand if they want)	2018-12-12 17:46:41 -05:00
Ryan Petrello	0391dbc292	add additional DB retry logic to the callback receiver initially, I implemented this for _only_ the task worker, but it's probably needed for callback event workers, too	2018-11-29 11:57:46 -05:00
Ryan Petrello	38bf174bda	don't reap jobs that aren't running this is a simple sanity check, but it should help us avoid shooting ourselves in the foot in complicated scenarios, such as: 1. A dispatcher worker is running a job, and it's killed with `kill -9` 2. The dispatcher attempts to reap jobs with a matching celery_task_id 3. The associated sync project update has the same celery_task_id (an implementation detail of how we implemented that), and it ends up getting reaped _even though_ it's already finished and has status=successful	2018-11-28 18:11:12 -05:00
Matthew Jones	7330102961	Remove a warning message for dispatcher pool for tests	2018-11-19 11:19:57 -05:00
Ryan Petrello	37234ca66e	prevent the dispatcher from using a nonsensical max_workers value	2018-11-16 10:16:39 -05:00
AlanCoding	482395eb6a	reduce default verbosity of devel-specific callback logging	2018-10-26 10:03:46 -04:00
Ryan Petrello	3be9113d6b	fix a bug that breaks job cancel on single node jobs 1. Install awx w/ a single node. 2. Start a long-running job. 3. Forcibly kill the `awx-manage run_dispatcher` process (e.g., SIGKILL) and do not start it again. 4. The job remains in running - without a second cluster to discover the job, it is never reaped. 5. This PR allows you to cancel the job from the UI+API.	2018-10-19 09:10:33 -04:00
Ryan Petrello	0d29bbfdc6	make the dispatcher more fault-tolerant to prolonged database outages	2018-10-18 20:00:07 -04:00
Ryan Petrello	53ae05094e	use the proper logger for the callback receiver	2018-10-17 10:56:29 -04:00
Ryan Petrello	720a634702	don't attempt to recover special QUIT messages in the worker pool when `--reload` is sent to the dispatcher, it sends a special QUIT message to each worker in the pool so that it will exit gracefully at the next opportunity when a worker process exits unexpectedly, the dispatcher attempts to recover its queued messages and sends them to another worker in the pool; in this scenario, we should _never_ re-enqueue these special QUIT messages (because the process doesn't need to quit, it's already gone) To reproduce this race condition: 1. Launch an adhoc that does `sleep 60` 2. Run `awx-manage run_dispatcher --reload` to enqueue a `QUIT` message into the worker's queue 3. Find the pid of the worker running the `sleep 60` and `SIGKILL` it. 4. Observe that dispatcher attempts to requeue the `QUIT` message and logs a confusing error.	2018-10-15 12:17:52 -04:00
Ryan Petrello	ff1e8cc356	replace celery task decorators with a kombu-based publisher this commit implements the bulk of `awx-manage run_dispatcher`, a new command that binds to RabbitMQ via kombu and balances messages across a pool of workers that are similar to celeryd workers in spirit. Specifically, this includes: - a new decorator, `awx.main.dispatch.task`, which can be used to decorate functions or classes so that they can be designated as "Tasks" - support for fanout/broadcast tasks (at this point in time, only `conf.Setting` memcached flushes use this functionality) - support for job reaping - support for success/failure hooks for job runs (i.e., `handle_work_success` and `handle_work_error`) - support for auto scaling worker pool that scale processes up and down on demand - minimal support for RPC, such as status checks and pool recycle/reload	2018-10-11 10:53:30 -04:00
Ryan Petrello	da74f1d01f	refactor and test the callback receiver as a base for a task dispatcher	2018-10-11 10:53:26 -04:00

33 Commits