zware/awx - awx - Gitea: Git with a cup of tea

zware/awx

mirror of https://github.com/ZwareBear/awx.git synced 2026-04-05 15:41:49 -05:00

Author	SHA1	Message	Date
Ryan Petrello	baad765179	refactor some callback receiver code the bigint migration removed the foreign key constraints for: - host_id - job_id (and projectupdate_id, etc...) because of this, we don't really need to check explicitly for a host_id IntegrityError anymore (because it won't occur) additionally, while it's possible to insert an event with a mismatched job_id now (for example, you can totally start a long-running job, and delete the job record in the background using the ORM or psql), doing so results in DoesNotExist errors in the code that handles the playbook_on_stats events	2020-09-25 13:12:42 -04:00
Ryan Petrello	cd0b9de7b9	remove multiprocessing.Queue usage from the callback receiver instead, just have each worker connect directly to redis this has a few benefits: - it's simpler to explain and debug - back pressure on the queue keeps messages around in redis (which is observable, and survives the restart of Python processes) - it's likely notably more performant at high loads	2020-09-24 13:53:58 -04:00
Ryan Petrello	57f8e48894	make --status more robust for dispatcher, and add support for receiver make the --status flag work by fetching a periodically recorded snapshot of internal process state; additionally, update the callback receiver to also record these statistics so we can gain more insight into any performance issues	2020-09-17 15:33:37 -04:00
Ryan Petrello	0df6409244	remove task state tracking from the callback receiver we don't have support for displaying these stats anyways, so there's no point in using resources tracking them, especially for high-volume installs	2020-09-16 13:40:42 -04:00
Ryan Petrello	a0e5e74cab	fix a typo in an f-string	2020-07-31 12:48:45 -04:00
Rebeccah	118e1b8df1	removing memchache mentions in comments remove memcached folder as it is no longer needed, also address a couple grammatical errors	2020-06-18 15:52:59 -04:00
Jeff Bradberry	ced8f42835	Force worker processes to have a different signal handler from the parent Situations have come up where the 5+ minute kill signal for run_task_manager is emitted to the worker process running it, but since the worker improperly inherited the AWXConsumerBase().stop() handler a deadlock ultimately was triggered on the database connection.	2020-06-04 15:41:28 -04:00
Ryan Petrello	b4b261b918	fix busted flake8	2020-05-01 13:51:37 -04:00
chris meyers	a8f52c1639	actually do exponential calc rather than 2 Log the time til reconnect attemp to log message rather than attempt number	2020-04-28 15:24:08 -04:00
chris meyers	2ecd055d1e	sleep backoff on cb receiver reconnect * Sleep before trying to reconnect Most common reason for entering this reconnect loop is when Redis service stops before the callback receiver when stopping tower services.	2020-04-28 12:47:40 -04:00
Christian Adams	a899a147e1	Fix new flake8 from pyflakes 2.2.0 release	2020-04-20 09:50:50 -04:00
Ryan Petrello	80147acc1c	work around redis connection failures in the callback receiver if redis stops/starts, sometimes the callback receiver doesn't recover without a restart; this fixes that	2020-04-09 15:38:03 -04:00
Ryan Petrello	c8044b4755	migrate event table primary keys from integer to bigint see: https://github.com/ansible/awx/issues/6010	2020-03-26 15:54:38 -04:00
softwarefactory-project-zuul[bot]	0fb800f5d0	Merge pull request #6344 from chrismeyersfsu/redis-cleanup1 Redis cleanup1 Reviewed-by: https://github.com/apps/softwarefactory-project-zuul	2020-03-20 13:07:40 +00:00
Ryan Petrello	d40a5dec8f	change when we send job notifications to avoid a race condition success/failure notifications for playbooks include summary data about the hosts in based on the contents of the playbook_on_stats event the current implementation suffers from a number of race conditions that sometimes can cause that data to be missing or incomplete; this change makes it so that for playbooks we build (and send) the notification in response to the playbook_on_stats event, not the EOF event	2020-03-19 10:01:52 -04:00
chris meyers	5e481341bc	flake8	2020-03-19 10:01:20 -04:00
chris meyers	c7de3b0528	fix spelling	2020-03-19 10:01:20 -04:00
chris meyers	7f2e1d46bc	replace janky unique channel name w/ uuid * postgres notify/listen channel names have size limitations as well as character limitations. Respect those limitations while at the same time generate a unique channel name.	2020-03-19 08:59:15 -04:00
chris meyers	12158bdcba	remove dead code	2020-03-19 08:57:05 -04:00
Egor Margineanu	f858eda6b1	Made OPTIONS optional	2020-03-19 13:43:06 +01:00
Egor Margineanu	3a208a0be2	Added support for PG port and options. related #6340	2020-03-19 13:29:06 +01:00
chris meyers	093d204d19	fix flake8	2020-03-18 16:10:19 -04:00
chris meyers	be58906aed	remove kombu	2020-03-18 16:10:17 -04:00
chris meyers	dc6c353ecd	remove support for multi-reader dispatch queue * Under the new postgres backed notify/listen message queue, this never actually worked. Without using the database to store state, we can not provide a at-most-once delivery mechanism w/ multi-readers. * With this change, work is done ONLY on the node that requested for the work to be done. Under rabbitmq, the node that was first to get the message off the queue would do the work; presumably the least busy node.	2020-03-18 16:10:16 -04:00
chris meyers	2a2c34f567	combine all the broker replacement pieces * local redis for event processing * postgres for message broker * redis for websockets	2020-03-18 16:10:15 -04:00
chris meyers	558e92806b	POC postgres broker	2020-03-18 16:10:15 -04:00
chris meyers	355fb125cb	redis events	2020-03-18 16:10:15 -04:00
chris meyers	c8eeacacca	POC channels 2	2020-03-18 16:10:12 -04:00
Ryan Petrello	5364e78397	switch the periodic scheduler to a child process (instead of a thread) I have a hunch that our usage of a daemon thread is causing import lock contention related to https://github.com/ansible/awx/issues/5617 We've encountered similar issues before with threads across dispatcher processes at fork time, and cpython has had bugs like this in recent history: https://bugs.python.org/issue38884 My gut tells me this might be related. The prior implementation - based on celerybeat - ran its code in a process (not a thread), and the timing of that merge matches the period of time we started noticing issues. Currently testing it to see if it resolves some of the issues we're seeing.	2020-02-27 12:15:15 -05:00
Ryan Petrello	8b1806d4ca	add code for detecting (and killing) a hung task manager task	2020-02-26 07:53:04 -05:00
AlanCoding	e59cb07064	Add wording for control message log	2020-02-11 10:01:25 -05:00
Ryan Petrello	38a08d163c	get rid of celery/celerybeat alternative to https://github.com/ansible/awx/pull/2530 which makes use of https://pypi.org/project/schedule/ this doesn't have support for any persistence (like how celery beat uses a shelve file), because all of our periodic jobs run at most every few minutes	2020-02-10 17:32:02 -05:00
Ryan Petrello	3c31e0ed16	some more minor callback cleanup and development tweaks	2020-01-27 17:18:09 -05:00
Ryan Petrello	78b00652bd	add the ability to enable profiling for the callback receiver workers	2020-01-27 12:03:53 -05:00
Ryan Petrello	8f33f1a6c2	remove another expensive logging lookup in the parent callback process	2020-01-24 16:46:32 -05:00
Bill Nottingham	4e46d5d7cd	Fix some lint	2020-01-20 17:15:27 -05:00
Ryan Petrello	8bd9233d2c	remove some unnecessary callback receiver debugging code	2020-01-14 14:21:53 -05:00
Ryan Petrello	306f504fb7	optimize the callback receiver to buffer writes on high throughput additionaly, optimize away several per-event host lookups and changed/failed propagation lookups we've always performed these (fairly expensive) queries on every event save - if you're processing tens of thousands of events in short bursts, this is way too slow this commit also introduces a new command for profiling the insertion rate of events, `awx-manage callback_stats` see: https://github.com/ansible/awx/issues/5514	2020-01-14 12:04:26 -05:00
AlanCoding	eec08fdcca	Log case of duplicate UUIDs	2020-01-09 07:31:32 -05:00
Ryan Petrello	83550eeba0	make the callback receiver more robust to duplicate UUIDs from ansible	2019-11-01 09:24:52 -04:00
Ryan Petrello	3094b67664	work around a bug in the k8s client that leaves trash in /tmp	2019-10-29 11:24:17 -04:00
Ryan Petrello	d01088d33e	Revert "add support for `awx-manage run_callback_receiver --status`"	2019-10-18 09:49:02 -04:00
Ryan Petrello	ffb1707e74	add support for `awx-manage run_callback_receiver --status`	2019-10-17 11:10:27 -04:00
Buymov Ivan	f2676064fd	Fix error with rejoining node to cluster after lost connection to postgres	2019-09-27 01:17:27 -04:00
Ryan Petrello	40b1e89b67	add the ability to disable RabbitMQ queue durability	2019-05-28 15:49:32 -04:00
Ryan Petrello	17a803f49c	remove the old callback plugin import paths and callback-specific tests	2019-04-12 16:11:23 -04:00
Ryan Petrello	32ee9838af	use the correct logger for the callback receiver the callback receiver and dispatcher share several modules, so add logic to use the correct logger	2019-03-15 08:09:47 -04:00
Ryan Petrello	daeeaf413a	clean up unnecessary usage of the six library (awx only supports py3)	2019-01-25 00:19:48 -05:00
Ryan Petrello	4707dc2a05	clean up some unnecessary dispatcher reaping code	2019-01-24 11:11:05 -05:00
Ryan Petrello	b2442d42a3	detect dead DB connections in the dispatcher when reaping jobs	2019-01-22 08:40:26 -05:00

1 2

63 Commits