Commit Graph

2313 Commits

Author SHA1 Message Date
Shane McDonald
c5976e2584 Add setting for missed heartbeats before marking node offline 2022-08-17 11:39:30 -04:00
John Westcott IV
b84a192bad Altering events relationship to hosts to increase performance (#12447)
Removing cascade on delete at model level that could cause locking issues.
2022-08-16 12:03:05 -04:00
Alan Rominger
f7e6a32444 Optimize task manager with debug toolbar, adjust prefetch (#12588) 2022-08-10 10:05:13 -04:00
Seth Foster
e6f8852b05 Cache task_impact
task_impact is now a field on the database
It is calculated and set during create_unified_job

set task_impact on .save for adhoc commands
2022-08-05 14:33:47 -04:00
Seth Foster
957b2b7188 Cache preferred instance groups
When creating unified job, stash the list of pk values from the
instance groups returned from preferred_instance_groups so that the
task management system does not need to call out to this method
repeatedly.

.preferred_instance_groups_cache is the new field
2022-08-05 14:33:28 -04:00
Alan Rominger
b94b3a1e91 [task_manager_refactor] Move approval node expiration logic into queryset (#12502)
Instead of loading all pending Workflow Approvals in the task manager,
  run a query that will only return the expired apporovals
  directly expire all which are returned by that query

Cache expires time as a new field in order to simplify WorkflowApproval filter
2022-08-05 14:33:27 -04:00
Seth Foster
0a47d05d26 split schedule_task_manager into 3
each call to schedule_task_manager becomes one of

ScheduleTaskManager
ScheduleDependencyManager
ScheduleWorkflowManager
2022-08-05 14:33:25 -04:00
Seth Foster
431b9370df Split TaskManager into
- DependencyManager spawns dependencies if necessary
- WorkflowManager processes running workflows to see if a new job is
  ready to spawn
- TaskManager starts tasks if unblocked and has execution capacity
2022-08-05 14:29:02 -04:00
John Westcott IV
62fc3994fb Modifying SAML adapter to not auto-add default galaxy creds to orgs on login (#12504)
* Modifying SAML adapter to not auto-add default galaxy creds to orgs on login

* Adding test, fixing old tests and moving add_default_galaxy_credential to pipeline
2022-07-25 17:16:22 -03:00
John Westcott IV
389c4a3180 Adding fields to job_metadata for workflows and approval nodes (#12255) 2022-07-18 16:53:49 +02:00
John Westcott IV
37ff9913d3 Adding GOOGLE_APPLICATION_CREDENTIALS env var (#12389)
* Adding GOOGLE_APPLICATION_CREDENTIALS env var
* Updating tests
2022-07-12 08:51:02 -04:00
Seth Foster
d2013bd416 Merge pull request #12366 from fosterseth/remove_update_on_project_update
Remove deprecated field update_on_project_update
2022-06-28 13:15:57 -04:00
Seth Foster
0522233892 remove update_on_project_update from InventorySource 2022-06-24 15:27:08 -04:00
Alan Rominger
783b744bdb Pass combined artifacts from nested workflows into downstream nodes (#12223)
* Track combined artifacts on workflow jobs

* Avoid schema change for passing nested workflow artifacts

* Basic support for nested workflow artifacts, add test

* Forgot that only does not work with polymorphic

* Remove incorrect field

* Consolidate logic and prevent recursion with UJ artifacts method

* Stop trying to do precedence by status, filter for obvious ones

* Review comments about sets

* Fix up bug with convergence node paths and artifacts
2022-06-23 16:54:53 -03:00
John Westcott IV
fddf292d47 Additional changes from review 2022-06-10 10:26:24 -04:00
John Westcott IV
62dabcae63 Removing unneeded function 2022-06-10 10:26:23 -04:00
John Westcott IV
c836fafb61 modifying schedules API to return a list of links 2022-06-10 10:26:23 -04:00
Alan Rominger
78d3d6dc94 Merge pull request #12219 from AlanCoding/really_skip
Change Demo Project status to successful
2022-06-09 11:19:57 -04:00
Lila
5f783fd5ee Revised job_tags to handle more than 1024 characters. 2022-06-06 13:28:22 -04:00
Alan Rominger
bca6e00e37 Change Demo Project status to successful 2022-05-12 16:14:09 -04:00
Seth Foster
0ae9fe3624 if dependency fails, fail job in task manager 2022-05-12 14:00:13 -04:00
Seth Foster
1b662fcca5 SCM inv source trigger project update
- scm based inventory sources should launch project updates prior to
running inventory updates for that source.
- fixes scenario where a job is based on projectA, but the inventory
source is based on projectB. Running the job will likely trigger a
sync for projectA, but not projectB.

comments
2022-05-12 14:00:12 -04:00
Alan Rominger
29d60844a8 Fix notification timing issue by sending in the latter of 2 events (#12110)
* Track host_status_counts and use that to process notifications

* Remove now unused setting

* Back out changes to callback class not needed after all

* Skirt the need for duck typing by leaning on the cached field

* Delete tests for deleted task

* Revert "Back out changes to callback class not needed after all"

This reverts commit 3b8ae350d218991d42bffd65ce4baac6f41926b2.

* Directly hardcode stats_event_type for callback class

* Fire notifications if stats event was never sent

* Remove test content for deleted methods

* Add placeholder for when no hosts matched

* Make field default be None, denote events processed with empty dict

* Make UI process null value for host_status_counts

* Fix tracking of EOF dispatch for system jobs

* Reorganize EVENT_MAP into class properties

* Consolidate conditional I missed from EVENT_MAP refactor

* Give up on the null condition, also applies for empty hosts

* Remove cls position argument not being used

* Move wrapup method out of class, add tests
2022-04-29 13:54:31 -04:00
John Westcott IV
c67f50831b Modifying schedules API to allow for rrulesets #5733 (#12043)
* Added schedule_rruleset lookup plugin for awx.awx
* Added DB migration for rrule size
* Updated schedule docs
* The schedule API endpoint will now return an array of errors on rule validation to try and inform the user of all errors instead of just the first
2022-04-28 15:38:20 -04:00
Alan Rominger
cb63d92bbf Remove committed_capacity field, delete supporting code (#12086)
* Remove committed_capacity field, delete supporting code

* Track consumed capacity to solve the negatives problem

* Use more verbose name for IG queryset
2022-04-22 13:41:32 -04:00
Elijah DeLee
689a216726 move static methods used by task manager (#12050)
* move static methods used by task manager

These static methods were being used to act on Instance-like objects
that were SimpleNamespace objects with the necessary attributes.

This change introduces dedicated classes to replace the SimpleNamespace
objects and moves the formerlly staticmethods to a place where they are
more relevant instead of tacked onto models to which they were only
loosly related.

Accept in-memory data structure in init methods for tests

* initialize remaining capacity AFTER we built map of instances
2022-04-21 13:05:06 -04:00
Alan Rominger
75d7cb5bca Merge pull request #11989 from AlanCoding/deprecate_uopu
Mark inventory source field for deprecation
2022-04-18 11:59:05 -04:00
Elijah DeLee
2e9974133a calculate remaining capacity in static method
this is to avoid additional queries when we allready have all
the active jobs fetched in the task manager
2022-04-13 11:56:07 -04:00
Alan Rominger
5a304db840 Mark inventory source field for deprecation 2022-04-12 16:24:35 -04:00
Elijah DeLee
4328b4cb67 drop call that queries all running and waiting jobs
this is to fix one more place in the task manager where we end up
querying all running and waiting jobs.

Partial fix for https://github.com/ansible/awx/issues/11671
2022-04-12 10:31:47 -04:00
Alan Rominger
3d22c8ae91 Merge pull request #11968 from AlanCoding/cleanup_tweaks
Minor tweaks to ansible-runner cleanup task arguments
2022-03-29 15:00:33 -04:00
Alan Rominger
deac08ba8a Add regression test for overly agressive cleanup behavior 2022-03-28 22:23:33 -04:00
Jeff Bradberry
6c1adade25 Merge pull request #11947 from jbradberry/django-3.2-upgrade
Remove the out-of-band JSONField migration
2022-03-28 12:02:53 -04:00
Alan Rominger
85ec83c3fd Minor tweaks to ansible-runner cleanup task arguments 2022-03-28 10:52:09 -04:00
Lucas Dias
01ce3440eb added os.path and module import 2022-03-25 14:26:00 +01:00
Jeff Bradberry
d54838cd94 Remove the out-of-band migration
that was turning all old JSONFields into a jsonb type database column.
The use of JSONBlob makes this unnecessary.
2022-03-24 15:21:59 -04:00
Jeff Bradberry
e3f3ab224a Replace all previously text-based json fields with JSONBlob
This JSONBlob field type is a wrapper around Django's new generic
JSONField, but with the database column type forced to be text.  This
should behave close enough to our old wrapper around
django-jsonfield's JSONField and will avoid needing to do the
out-of-band database migration.
2022-03-24 15:21:54 -04:00
Lucas Dias
18b1440d7c fixed hardcode tmp ha.py 2022-03-24 17:59:43 +01:00
John Westcott IV
593eebf062 Adding awx_ as well as tower_ variable names for webhooks (#11925)
Adding utility to ease testing webhooks from command line
Modifying all variables to use a constants list of variable names
2022-03-24 11:58:15 -04:00
Jeff Bradberry
ac6a82eee4 Merge pull request #11654 from jbradberry/django-3.2-upgrade
Django 3.2 upgrade
2022-03-17 10:34:22 -04:00
Alan Rominger
99bbc347ec Fill in errors for hop nodes when Last Seen is out of date, and clear them when not (#11714)
* Process unresponsive and newly responsive hop nodes

* Use more natural way to zero hop node capacity, add test

* Use warning as opposed to warn for log messages
2022-03-09 13:21:32 -05:00
Jeff Bradberry
676b8f6d8f Implement an out-of-band migration to change the json fields 2022-03-07 18:11:36 -05:00
Jeff Bradberry
05142a779d Replace all usage of customized json fields with the Django builtin
The event_data field on event models, however, is getting an
overridden version that retains the underlying text data type for the
column, to avoid a heavy data migration on those tables.

Also, certain of the larger tables are getting these fields with the
NOT NULL constraint turned off, to avoid a long migration.

Remove the django.utils.six monkey patch we did at the beginning of
the upgrade.
2022-03-07 18:11:36 -05:00
Jeff Bradberry
b852baaa39 Fix up logger .warn() calls to use .warning() instead
This is a usage that was deprecated in Python 3.0.
2022-03-07 18:11:36 -05:00
Jeff Bradberry
a3a216f91f Fix up new Django 3.0 deprecations
Mostly text based: force/smart_text, ugettext_*
2022-03-07 18:11:36 -05:00
Elijah DeLee
799968460d Fixup conversion of memory and cpu settings to support k8s resource request format (#11725)
fix memory and cpu settings to suport k8s resource request format

* fix conversion of memory setting to bytes

This setting has not been getting set by default, and needed some fixing
up to be compatible with setting the memory in the same way as we set it
in the operator, as well as with other changes from last year which
assume that ansible runner is returning memory in bytes.

This way we can start setting this setting in the operator, and get a
more accurate reflection of how much memory is available to the control
pod in k8s.

On platforms where services are all sharing memory, we deduct a
penalty from the memory available. On k8s we don't need to do this
because the web, redis, and task containers each have memory
allocated to them.

* Support CPU setting expressed in units used by k8s

This setting has not been getting set by default, and needed some fixing
up to be compatible with setting the CPU resource request/limits in the
same way as we set it in the resource requests/limits.

This way we can start setting this setting in the
operator, and get a more accurate reflection of how much cpu is
available to the control pod in k8s.

Because cpu on k8s can be partial cores, migrate cpu field to decimal.

k8s does not allow granularity of less than 100m (equivalent to 0.1 cores), so only
store up to 1 decimal place.

fix analytics to deal with decimal cpu

need to use DjangoJSONEncoder when Decimal fields in data passed to
json.dumps
2022-02-15 14:08:24 -05:00
Alex Corey
326d12382f Adds Inventory labels (#11558)
* Adds inventory labels end point

* Adds label field to inventory form
2022-02-14 15:14:08 -05:00
Elijah DeLee
604cbc1737 Consume control capacity (#11665)
* Select control node before start task

Consume capacity on control nodes for controlling tasks and consider
remainging capacity on control nodes before selecting them.

This depends on the requirement that control and hybrid nodes should all
be in the instance group named 'controlplane'. Many tests do not satisfy that
requirement. I'll update the tests in another commit.

* update tests to use controlplane

We don't start any tasks if we don't have a controlplane instance group

Due to updates to fixtures, update tests to set node type and capacity
explicitly so they get expected result.

* Fixes for accounting of control capacity consumed

Update method is used to account for currently consumed capacity for
instance groups in the in-memory capacity tracking data structure we initialize in
after_lock_init and then update via calculate_capacity_consumed (both in
task_manager.py)

Also update fit_task_to_instance to consider control impact on instances

Trust that these functions do the right thing looking for a
node with capacity, and cut out redundant check for the whole group's
capacity per Alan's reccomendation.

* Refactor now redundant code

Deal with control type tasks before we loop over the preferred instance
groups, which cuts out the need for some redundant logic.

Also, fix a bug where I was missing assigning the execution node in one case!

* set job explanation on tasks that need capacity

move the job explanation for jobs that need capacity to a function
so we can re-use it in the three places we need it.

* project updates always run on the controlplane

Instance group ordering makes no sense on project updates because they
always need to run on the control plane.

Also, since hybrid nodes should always run the control processes for the
jobs running on them as execution nodes, account for this when looking for a
execution node.

* fix misleading message

the variables and wording were both misleading, fix to be more accurate
description in the two different cases where this log may be emitted.

* use settings correctly

use settings.DEFAULT_CONTROL_PLANE_QUEUE_NAME instead of a hardcoded
name
cache the controlplane_ig object during the after lock init to avoid
an uneccesary query
eliminate mistakenly duplicated AWX_CONTROL_PLANE_TASK_IMPACT and use
only AWX_CONTROL_NODE_TASK_IMPACT

* add test for control capacity consumption

add test to verify that when there are 2 jobs and only capacity for one
that one will move into waiting and the other stays in pending

* add test for hybrid node capacity consumption

assert that the hybrid node is used for both control and execution and
capacity is deducted correctly

* add test for task.capacity_type = control

Test that control type tasks have the right capacity consumed and
get assigned to the right instance group

Also fix lint in the tests

* jobs_running not accurate for control nodes

We can either NOT use "idle instances" for control nodes, or we need
to update the jobs_running property on the Instance model to count
jobs where the node is the controller_node.

I didn't do that because it may be an expensive query, and it would be
hard to make it match with jobs_running on the InstanceGroup which
filters on tasks assigned to the instance group.

This change chooses to stop considering "idle" control nodes an option,
since we can't acurrately identify them.

The way things are without any change, is we are continuing to over consume capacity on control nodes
because this method sees all control nodes as "idle" at the beginning
of the task manager run, and then only counts jobs started in that run
in the in-memory tracking. So jobs which last over a number of task
manager runs build up consuming capacity, which is accurately reported
via Instance.consumed_capacity

* Reduce default task impact for control nodes

This is something we can experiment with as far as what users
want at install time, but start with just 1 for now.

* update capacity docs

Describe usage of the new setting and the concept of control impact.

Co-authored-by: Alan Rominger <arominge@redhat.com>
Co-authored-by: Rebeccah <rhunter@redhat.com>
2022-02-14 10:13:22 -05:00
Alan Rominger
285ff080d0 Prevent duplicate query in local health check 2022-01-27 15:27:07 -05:00
Shane McDonald
b678d61318 Merge pull request #11569 from zjzh/devel
Update ad_hoc_commands.py
2022-01-26 16:51:30 -05:00