Commit Graph

51 Commits

Author SHA1 Message Date
Ryan Petrello 326ed22efe properly handle import errors in the isolated capacity healthcheck
if the awx_capacity module runs on an isolated node with missing
libraries (i.e., psutil) or bad permissions, then the runner status will
be "failed"

in this scenario, we *still* want to react by recording a capacity=0
2020-01-31 10:17:20 -05:00
Ryan Petrello 220168f5ee fix a bug in isolated check timeout handling 2019-12-06 12:44:50 -05:00
Christian Adams 4f8b624b96 Make spelling of canceled consistent 2019-11-26 00:31:15 -05:00
Shane McDonald db2316b791 Remove usage of idle_timeout when checking status of isolated / containerized jobs 2019-11-22 11:41:00 -05:00
Ryan Petrello ccaaee61f0 improve cleanup of anonymous kubeconfig files 2019-10-29 11:24:12 -04:00
Ryan Petrello 6dfc714c75 when isolated or container jobs fail to launch, set job status to error
a status of error makes more sense, because failed generally points to
an issue with the playbook itself, while error is more generally used
for reporting issues internal to Tower

see: https://github.com/ansible/awx/issues/4909
2019-10-29 11:24:10 -04:00
Shane McDonald bd5003ca98 Task manager / scheduler Kubernetes integration 2019-10-04 13:21:21 -04:00
Ryan Petrello 82be87566f improve host key checking configurability
see: https://github.com/ansible/tower/issues/3737
2019-09-30 14:13:07 -04:00
Ryan Petrello c6c14d4fb9 properly record Instance.cpu and Instance.memory for isolated nodes 2019-05-03 15:30:41 -04:00
Ryan Petrello f1d87bf392 fix a bug that breaks the isolated heartbeat 2019-04-16 16:24:40 -04:00
softwarefactory-project-zuul[bot] d222bed932 Merge pull request #3712 from jladdjr/iso_node_healthcheck_should_not_reset_capacity
Do not reset capacity of iso nodes when disabled

Reviewed-by: https://github.com/softwarefactory-project-zuul[bot]
2019-04-15 20:40:01 +00:00
Jim Ladd 6ef3b18803 Do not reset capacity of iso nodes when disabled 2019-04-15 12:36:15 -07:00
Ryan Petrello 387682ed8d if runner crashes, attempt to record why
this attempts to surface the underlying runner exception for tracebacks
like this one:

FileNotFoundError: [Errno 2] No such file or directory:
'/tmp/awx_41_93gtgv25/artifacts/41/status'
2019-04-15 13:17:45 -04:00
softwarefactory-project-zuul[bot] 58966d7368 Merge pull request #3625 from ryanpetrello/iso-forks
WIP: specify --forks on isolated health check calls

Reviewed-by: https://github.com/softwarefactory-project-zuul[bot]
2019-04-11 21:41:37 +00:00
softwarefactory-project-zuul[bot] e3dfc6c796 Merge pull request #3596 from jbradberry/capture-isolated-command
Updated IsolatedManager to take a callback that captures the remote command

Reviewed-by: https://github.com/softwarefactory-project-zuul[bot]
2019-04-05 17:15:11 +00:00
Ryan Petrello 81fe923577 don't write playbook stdout to sys.stdout (it's duplicated in log files)
this instructs runner to _not_ write to stdout when we invoke
runner.interface.run(); AWX consumes/ingests this strictly as events
2019-04-05 11:20:34 -04:00
Ryan Petrello 79d580d5b9 update periodic isolated cleanup to match the new paths post-runner 2019-04-05 09:43:27 -04:00
Ryan Petrello 5a4a812c73 specify --forks on isolated health check calls
this requires ansible-runner 1.3.2
2019-04-04 20:12:14 -04:00
Jeff Bradberry 3f6d3506c6 Change the artifact file convention for isolated nodes to 'command'
since that's what landed in the ansible-runner PR.
2019-04-04 14:25:50 -04:00
Jeff Bradberry 467700e4bb Bring the check_callback back into the loop
but try to process it only once.
2019-04-03 16:04:07 -04:00
Jeff Bradberry b4e508f72a Bring the check_callback call out of the loop
We shouldn't need to call it multiple times.
2019-04-03 15:12:29 -04:00
Jeff Bradberry 32286a9d49 Change the artifact to also capture the actual envvars data 2019-04-02 17:10:26 -04:00
Jeff Bradberry cac48e7cfb Updated IsolatedManager to take a callback that captures the remote command 2019-04-02 15:40:56 -04:00
chris meyers 71fcb1a82c process host facts for iso runs
* Move isolated clean to our final run hook
* ISO and non-iso code path now share the post-fact-processing code
2019-03-29 16:16:22 -04:00
Ryan Petrello 563a0cc2a4 move awx.main.expect to awx.main.isolated 2019-03-29 12:14:40 -04:00
AlanCoding e79ca131a6 initial commit to move folder isolated->expect 2017-08-15 11:32:44 -04:00
AlanCoding 42ccd870d9 Automatically cancel job if cancel callback fails and log 2017-08-11 16:43:08 -04:00
Ryan Petrello 7db9b48e9c add a configurable for disabling the auto-generated isolated RSA key
some users won't want to utilize the RSA key we auto-generate for
isolated node SSH access, but will instead want to manage SSH
authentication by hand outside of Tower

see: https://github.com/ansible/ansible-tower/issues/7380
2017-08-03 17:16:28 -04:00
AlanCoding 5d254d781a provide the job id in isolated management logs 2017-08-03 10:29:48 -04:00
AlanCoding 1112557c79 set capacity to 0 if instance has not checked in lately 2017-07-27 16:20:04 -04:00
Matthew Jones b3b4a515e2 Refactor some tower periodic tasks to label as awx 2017-07-26 13:35:30 -04:00
Matthew Jones d4b1a07495 Rename tower display plugins to awx display 2017-07-26 13:33:30 -04:00
Matthew Jones c7a85d9738 Mass rename from ansible_(awx|tower) -> (awx|tower) 2017-07-26 13:33:26 -04:00
Ryan Petrello e29492a259 more tower -> awx for task execution and isolated tooling 2017-07-25 10:36:06 -04:00
Ryan Petrello 8ce1421c6a fix tower-expect -> awx-expect for isolated tower builds 2017-07-24 16:03:58 -04:00
Ryan Petrello d42ea31f75 use a named pipe for isolated secret passthrough (not stdin)
it's not unusual for the secret data we pass into the `run_isolated.yml`
playbook to be quite long, namely because it can contain RSA key
data; by passing this value into the ansible-playbook process using
`vars_prompt`, we're limited by pexpect's tty line limit (which looks
like it caps out around 4k).  Because of this, large payloads are
being truncated and causing job run failures.

this changes the implementation to use a named pipe instead, which
doesn't have the same limitation

see: #7183
2017-07-20 12:42:03 -04:00
Ryan Petrello 53259e4d24 properly capture job events for adhoc commands run on isolated instances
see: #7100
2017-07-17 14:51:24 -04:00
Ryan Petrello 0a5b9c458b standardize tasks.py temporary file paths under a single parameter
see: #3472
2017-07-05 13:50:43 -04:00
AlanCoding 70b1b9c81d isolated connection timeout and log file for playbook out 2017-07-05 08:48:01 -04:00
Ryan Petrello 413e8c3bc9 isolated nodes should report their awx version in their heartbeat
see: #6810
2017-06-29 16:55:11 -04:00
Ryan Petrello 405c01a847 more isolated production tinkering
see: #5903
see: #6507
2017-06-29 09:35:26 -04:00
AlanCoding 05bcd4b674 fix bug where isolated management jobs could not load JSON output 2017-06-28 11:41:30 -04:00
Ryan Petrello aaff005234 Merge pull request #6745 from ryanpetrello/fix-6659
RFC: install a randomized RSA key for controller -> isolated rampart auth
2017-06-27 11:52:36 -04:00
Ryan Petrello 3000f52a92 install a randomized RSA key for controller -> isolated rampart auth
see: #6507
2017-06-27 10:53:44 -04:00
Ryan Petrello 5adc1c603a properly update the heartbeat timestamp for isolated nodes 2017-06-26 11:03:56 -04:00
AlanCoding 40287d8e78 multi-host isolated heartbeat w tower-isolated check
* use tower-expect command to determine job status when running
  the isolated heartbeat playbook
* grok JSON output of playbook to obtain result information
* run playbook against multiple isolated hosts at the same time
  (addresses scalability concerns)
2017-06-20 14:36:18 -04:00
AlanCoding f371dd71b2 Run isolated heartbeat against all hosts at once
Previously we were running the playbook on a host-by-host
basis, but this changes it to pass in the list of all
isolated isntances the machine is responsible for.
Using the `json` Ansible stdout module, we are able to
parse the output for information on each host.
2017-06-19 12:13:36 -04:00
AlanCoding dd1a261bc3 setup playbook and heartbeat for isolated deployments
* Allow isolated_group_ use in setup playbook
* Tweaks to host/queue registration commands complementing setup
* Create isolated heartbeat task and check capacity
* Add content about isolated instances to acceptance docs
2017-06-19 12:13:36 -04:00
Ryan Petrello 1ea03aa4c9 more isolated task execution tweaking
* set a more reasonable default `AWX_ISOLATED_CHECK_INTERVAL`
* make manual cancellation work for high values of
  `AWX_ISOLATED_CHECK_INTERVAL`
* remove the `/tmp/ansible_tower/jobs/` symlink directory

see: #6616
2017-06-16 15:37:07 -04:00
Ryan Petrello 44e0c8621a isolated ramparts: replace systemd unit with a tower-expect binary
instead of launching isolated tasks via `systemctl`, treat
`awx.main.isolated.run` as an executable that knows how to daemonize

additionally, add `setup.py isolated_build` for isolated Tower source
distribution
2017-06-16 09:59:21 -04:00