Commit Graph

1230 Commits

Author SHA1 Message Date
Jim Ladd
f317fca9e4 add auto-discovered nodes to default IG
* add advisory_lock to avoid IG update race logic
* update IG by way of policy_instance_list
2021-08-26 11:15:14 -07:00
Jim Ladd
561fc289fb disable discovered instances by default 2021-08-26 11:15:14 -07:00
Alan Rominger
daf4310176 Clean up work_type processing and fix execution vs control capacity (#10930)
* Clean up added work_type processing for mesh_code branch

* track both execution and control capacity

* Remove unused execution_capacity property

* Count all forms of capacity to make test pass

* Force jobs to be on execution nodes, updates on control nodes

* Introduce capacity_type property to abstract some details out

* Update test to cover all job types at same time

* Register OpenShift nodes as control types

* Remove unqualified consumed_capacity from task manager and make unit tests work

* Remove unqualified consumed_capacity from task manager and make unit tests work

* Update unit test to execution vs control TM logic changes

* Fix bug, else handling for work_type method
2021-08-26 07:24:14 -04:00
Shane McDonald
274e487a96 Attempt to surface streaming errors that were being eaten (#10918) 2021-08-24 10:33:00 -04:00
Alan Rominger
940c189c12 Corresponding AWX changes for runner --worker-info schema update (#10926) 2021-08-24 08:41:36 -04:00
Alan Rominger
928c35ede5 Model changes for instance last_seen field to replace modified (#10870)
* Model changes for instance last_seen field to replace modified

* Break up refresh_capacity into smaller units

* Rename execution node methods, fix last_seen clustering

* Use update_fields to make it clear save only affects capacity

* Restructing to pass unit tests

* Fix bug where a PATCH did not update capacity value
2021-08-24 08:41:35 -04:00
Alan Rominger
3b1e40d227 Use the ansible-runner worker --worker-info to perform execution node capacity checks (#10825)
* Introduce utilities for --worker-info health check integration

* Handle case where ansible-runner is not installed

* Add ttl parameter for health check

* Reformulate return data structure and add lots of error cases

* Move up the cleanup tasks, close sockets

* Integrate new --worker-info into the execution node capacity check

* Undo the raw value override from the PoC

* Additional refinement to execution node check frequency

* Put in more complete network diagram

* Followup on comment to remove modified from from health check responsibilities
2021-08-24 08:41:35 -04:00
Alan Rominger
4e84c7c4c4 Use the existing get_receptor_ctl method (#10813) 2021-08-24 08:41:35 -04:00
Alan Rominger
f47eb126e2 Adopt the node_type field in receptor logic (#10802)
* Adopt the node_type field in receptor logic

* Refactor Instance.objects.register so we do not reset capacity to 0
2021-08-24 08:41:34 -04:00
Alan Rominger
9881bb72b8 Treat the awx_1 node as a hybrid node for now, use local work type (#10726) 2021-08-24 08:40:21 -04:00
Alan Rominger
f597205fa7 Run capacity checks with container isolation (#10688)
This requires swapping out the container images
  for the execution nodes from awx-ee to the awx image

For completeness, the hop node image is switched to the raw
  receptor image

A few outright bugs are fixed here
  memory calculation just was not right at all
  the execution_capacity calculation was reverse of intention

Drop in a few TODOs about error handling from debugging
2021-08-24 08:40:19 -04:00
Alan Rominger
e7be86867d Fix rebase bug specific to ad hoc commands 2021-08-24 08:40:19 -04:00
Alan Rominger
13300bdbd4 Update rebase to keep old control plane capacity check
Also do some basic work to separate control versus execution capacity
  this is to assure that we don't send jobs to the control node
2021-08-24 08:40:19 -04:00
Alan Rominger
39e23db523 Make minor changes to add needed imports 2021-08-24 08:40:19 -04:00
Ryan Petrello
05cb876df5 implement an initial development environment for receptor-based clusters 2021-08-24 08:40:18 -04:00
softwarefactory-project-zuul[bot]
68e309ee32 Merge pull request #10607 from AlanCoding/unused_exception
Remove unused exception about custom venvs

random cleanup

Reviewed-by: Bianca Henderson <beeankha@gmail.com>
2021-07-09 15:43:00 +00:00
Alan Rominger
17f9b57028 Remove unused exception about custom venvs 2021-07-07 11:38:37 -04:00
Alan Rominger
e96080a512 No result_traceback is blank, not null 2021-07-07 11:37:30 -04:00
Alan Rominger
f126a6343b Fix bug setting execution_node to null (not blank) (#5169) 2021-06-28 10:51:06 -04:00
Shane McDonald
1ed170fff0 Dont overwrite result_traceback if it was already set. 2021-06-28 10:51:06 -04:00
Alan Rominger
390e1f9a0a Fix obvious logical bug with project folder pre-creation (#5155) 2021-06-28 10:51:04 -04:00
Shane McDonald
397908543d Disable activity stream for updates in status handler 2021-06-28 10:51:04 -04:00
Shane McDonald
2fa27000ab Prevent inventory updates started via projects from running on controlplane 2021-06-28 10:51:03 -04:00
Alan Rominger
d0b7d970c4 Create partition for 2 job types that bypass TM (#5138)
* Create partition for 2 job types that bypass TM

* Mock create_partition in unit tests
2021-06-28 10:51:03 -04:00
Seth Foster
75a27c38c2 Add a periodic task to reap unreleased receptor work units
- Add work_unit_id field to UnifiedJob
2021-06-22 10:49:31 -04:00
Jeff Bradberry
2d1a859719 Remove unused import 2021-06-09 13:48:23 -04:00
Shane McDonald
373cd9c20b Remove usage of AWXReceptorJob in metadata.py 2021-06-09 13:48:23 -04:00
Alan Rominger
579d49033a Remove debugging log message 2021-06-08 13:33:54 -04:00
Alan Rominger
210d5084f0 Move skip flag up from event_data and pop it off 2021-06-08 13:33:54 -04:00
Alan Rominger
53e8a9e709 Fix bug 2021-06-08 13:33:53 -04:00
Alan Rominger
15effd7ade Add some conditions for always-send and never-send event types
Always send websocket messages for
  high priority events like playbook_on_stats

Never send websocket messages for
  events with no output
  unless they are a high priority event type
2021-06-08 13:33:53 -04:00
Alan Rominger
4052603238 make sure log format does not error 2021-06-08 13:33:53 -04:00
Alan Rominger
4b6b8f2bdd Finish up the immediate or average rate method 2021-06-08 13:33:52 -04:00
Alan Rominger
70420dc3e4 THIS DOES NOT WORK pass events if they fit either timing criteria 2021-06-08 13:33:52 -04:00
Alan Rominger
50ca2d47ce Further log adjustments 2021-06-08 13:33:52 -04:00
Alan Rominger
faa0a6cf9a fix up log wording 2021-06-08 13:33:52 -04:00
Alan Rominger
01228cea02 Implement max event websocket rate as setting 2021-06-08 13:33:50 -04:00
Alan Rominger
cbb461ab71 Fix bug 2021-06-08 13:33:23 -04:00
Alan Rominger
b551608f16 Move websocket skip logic into event_handler 2021-06-08 13:33:22 -04:00
Alan Rominger
b26eaa3bd2 Remove uses of ansible_virtualenv_path 2021-06-07 21:14:35 -04:00
Jim Ladd
db6f565dca black formatting 2021-06-04 09:17:08 -07:00
Ryan Petrello
200901e53b upgrade to partitions without a costly bulk data migration
keep pre-upgrade events in an old table (instead of a partition)

- instead of creating a default partition, keep all events in special
"unpartitioned" tables
- track these tables via distinct proxy=true models
- when generating the queryset for a UnifiedJob's events, look at the
  creation date of the job; if it's before the date of the migration,
  query on the old unpartitioned table, otherwise use the more modern table
  that provides auto-partitioning
2021-06-04 09:17:08 -07:00
Ryan Petrello
c7ab3ea86e move the partition data migration to be a post-upgrade async process
this copies the approach we took with the bigint migration
2021-06-04 09:17:07 -07:00
Jim Ladd
acfa1c4d1d Drop todo / question / conditional
* can safely assume job_created is set
* .. and if it isn't, we want to expose that bug
2021-06-04 09:17:07 -07:00
Jim Ladd
ea2afeec1f Drop todo / answered question 2021-06-04 09:17:07 -07:00
Jim Ladd
1af1a5e9da Convert job_created to string for serialization 2021-06-04 09:17:06 -07:00
Jim Ladd
c0d38e91f5 When saving JobEvents, include job_created
* this is the partition key
* .. used to determine which partition job event rows are sent to
2021-06-04 09:17:06 -07:00
Bill Nottingham
00e60d2698 Add additional controller directory for collections for inventory update 2021-06-03 13:26:23 -04:00
softwarefactory-project-zuul[bot]
7725c6f18f Merge pull request #10305 from jbradberry/resolve-workflow-ee
Include the EE set on a workflow template in the resolver hierarchy

SUMMARY
This step comes immediately after checking the actual job/template for
an explicitly set EE.
Note that now, because of how jobs are spawned off of workflow nodes,
the call to .resolve_execution_environment() no longer happens in
.create_unified_job().  The job instance within .create_unified_job()
doesn't yet have access to the node that it will be attached to,
making it impossible to use this information in the resolver if called
there.
related #9560
ISSUE TYPE


Feature Pull Request
Bugfix Pull Request

COMPONENT NAME

API

AWX VERSION

Reviewed-by: Shane McDonald <me@shanemcd.com>
Reviewed-by: Christian Adams <rooftopcellist@gmail.com>
2021-06-02 16:22:39 +00:00
Chris Meyers
067e6a5163 when sharing paths use little z
* AWX_ISOLATION_SHOW_PATHS will be shared between containers. Strange
file not found error can crop up when concurrently accessing shared
directories between multiple containers that are bind mounted with big
Z. So make sure we use little z.
2021-06-01 15:11:25 -04:00