Commit Graph

224 Commits

Author SHA1 Message Date
chris meyers
04767641af isolate cache 2018-05-17 12:58:11 -04:00
chris meyers
4761e17566 disabled instance stay subscribed to bcast queue
* A disabled node needs to stay subscribed to the broadcast queue
because the work to re-subscribe the node to queues when the node is
re-enabled is done over the broadcast queue.
2018-05-09 17:03:26 -04:00
chris meyers
9f745dd3b8 control celery routes using celery router
* Each time a route is needed (i.e. when a task is sumitted to celery).
The router will be queried. This is ideal. With the previous method we
had to consider how a change in the routes would propogate to all celery
workers and nodes.

* fully describe the default awx queue
* Our dynamic queue registration would correct awx_private_queue.
However, we don't want celery to even create an "invalid"/extra
queue-exchange-route. This change makes sure we don't create extranious
things in rabbitmq.

* reduce the cluster queue registration output. Only output when the
queue registration list changes.
2018-05-02 12:57:36 -04:00
AlanCoding
ac20aa954a Replace logging-related restart with dynamic handler
refactor existing handlers to be the related
  "real" handler classes, which are swapped
  out dynamically by external logger "proxy" handler class

real handler swapout only done on setting change

remove restart_local_services method
get rid of uWSGI fifo file

change TCP/UDP return type contract so that it mirrors
  the request futures object
add details to socket error messages
2018-05-02 09:47:22 -04:00
chris meyers
648d9165ff broadcast queues get a per-node stable queue name
* Using Kombu's default Broadcast() constructor requires only 1
parameter. That parameter defines the exchange name and the queue name
is randomly generated per-node.
* This caused problems if/when celery enters an infinite restart loop
because too many rabbit queues get created and rabbit OOM's
(gracefully).
* To remedy this we tell Broadcast the queue name to use, which is
derived from some constant + the node name so that the per-node queue
name is stable.
2018-05-01 13:09:10 -04:00
Ryan Petrello
1eb5e98743 Merge branch 'release_3.2.4' into release_3.3.0 2018-04-26 11:10:28 -04:00
Wayne Witzel III
404b476576 Allow real null to be searched in host_filter 2018-04-25 11:46:21 -04:00
chris meyers
a56771c8f0 send all tower work to a user-hidden queue
* Before, we had a special group, tower, that ran any async work that
tower needed done. This allowed users fine grain control over which
nodes did background work. However, this granularity was too complicated
for users. So now, all tower system work goes to a special non-user
exposed celery queue. Tower remains the fallback instance group to
execute jobs on. The tower group will be created upon install and
protected from deletion.
2018-04-20 13:04:36 -04:00
Ryan Petrello
f8211b0588 add more edge case handling for yaml unsafe marking 2018-04-19 09:16:22 -04:00
AlanCoding
c397cacea5 add protection for job-compatible vars 2018-04-18 07:14:02 -04:00
Ryan Petrello
835f2eebc3 make extra var YAML serialization more robust to non-dict extra vars 2018-04-17 15:39:37 -04:00
Ryan Petrello
7074dcd677 don't allow usage of jinja templates in certain ansible CLI flags
see: https://github.com/ansible/tower/issues/1338
2018-04-17 09:20:05 -04:00
Ryan Petrello
88c243c92a mark all unsafe launch-time extra vars as !unsafe
see: https://github.com/ansible/tower/issues/1338
see: https://bugzilla.redhat.com/show_bug.cgi?id=1565865
2018-04-16 16:47:44 -04:00
adamscmRH
dcb6ca33a5 fix id for app in act_stream 2018-04-13 14:37:19 -04:00
chris meyers
bd7d9db1ce correctly cascade set null
* It's problematic to delete an instance that is referenced by a foreign
key; where the referening model is one that has a Polymorphic parent.
* Specifically, when Django goes to nullify the relationship it relies
on the related instances[0] class type to issue a query to decide what
to nullify. So if the foreignkey references multiple different types
(i.e. ProjectUpdate, Job) then only 1 of those class types will get
nullified. The end result is an IntegrityError when delete() is called.
* This changeset ensures that the parent Polymorphic class is queried so
that all the foreignkey entries are nullified
* Also remove old Django "hack" that doesn't work with Django 1.11
2018-04-06 11:10:16 -04:00
AlanCoding
98dc59765e exclude last_used from activity stream 2018-03-28 12:53:11 -04:00
AlanCoding
8c167e50c9 Continuously stream data from verbose jobs
In verbose unified job models (inventory updates, system jobs,
etc.), do not delay dispatch just because the encoded
event data is not part of the data written to the buffer.

This allows output from these commands to be submitted
to the callback queue as they are produced, instead
of waiting until the buffer is closed.
2018-03-28 11:05:49 -04:00
Chris Meyers
3446134501 Merge pull request #1646 from chrismeyersfsu/fix-kombu_unicode
use non-unicode queue names
2018-03-21 21:59:05 -04:00
chris meyers
e0803b9f08 use non-unicode queue names
* Use unicode InstanceGroup and queue names up until the point we
actually create the queue
* kombu add_consumers returns a dict with a value that contians the
passed in queue name. Trouble is, the returned dict value is a string
and not a unicode string and this results in an error.
2018-03-21 16:50:07 -04:00
Ryan Petrello
6a96e6a268 dynamically set worker autoscale max_concurrency based on system memory 2018-03-21 11:10:48 -04:00
Chris Meyers
2640ef8b1c Merge pull request #1536 from chrismeyersfsu/fix-protect_instance_groups
prevent instance group delete if running jobs
2018-03-15 14:57:45 -04:00
chris meyers
5d5d8152c5 prevent instance group delete if running jobs
* related to https://github.com/ansible/ansible-tower/issues/7936
2018-03-15 14:25:49 -04:00
Matthew Jones
b0cf4de072 Implement container-cluster aware capacity determination
* Added two settings values for declaring absolute cpu and memory
  capacity that will be picked up by the capacity utility methods
* installer inventory variables for controlling the amount of cpu and
  memory container requests/limits for the awx task containers
* Added fixed values for cpu and memory container requests for other
  containers
* configmap uses the declared inventory variables to define the
  capacity inputs that will be used by AWX to correspond to the same
  inputs for requests/limits on the deployment.
2018-03-14 14:35:45 -04:00
Matthew Jones
acde2520d0 Sort cloud regions in a stable way
* All comes first
* Then US regions
* Then all other regions alphabetically
2018-03-13 15:31:28 -04:00
Alan Rominger
dcae4f65b5 Merge pull request #1330 from AlanCoding/capable_of_anything
New copy fields, clean up user_capabilities logic
2018-03-13 12:05:45 -04:00
chris meyers
e2ed1542e6 more celery rollback
* Setting reload code calls a celery 4.x method signature. This changes
it back to a 3.x safe call.
2018-03-09 09:27:09 -05:00
adamscmRH
701a5c9a36 hides client_secret from act stream 2018-03-02 14:47:49 -05:00
Chris Meyers
d551566b4d Merge pull request #1372 from chrismeyersfsu/old-celery3
celery 4.x to 3.x roll back
2018-02-27 15:26:46 -05:00
chris meyers
6606a29f57 celery 4.x -> 3.x change route config name 2018-02-27 14:13:05 -05:00
AlanCoding
ce9234df0f Revamp user_capabilities with new copy fields
Add copy fields corresponding to new server-side copying

Refactor the way user_capabilities are delivered
 - move the prefetch definition from views to serializer
 - store temporary mapping in serializer context
 - use serializer backlinks to denote polymorphic prefetch model exclusions
2018-02-26 12:13:41 -05:00
Matthew Jones
8505783350 Merge remote-tracking branch 'tower/release_3.2.3' into devel
* tower/release_3.2.3:
  fix unicode bugs with log statements
  use --export option for ansible-inventory
  add support for new "BECOME" prompt in Ansible 2.5+ for adhoc commands
  enforce strings for secret password inputs on Credentials
  fix a bug for "users should be able to change type of unused credential"
  fix xss vulnerabilities - on host recent jobs popover - on schedule name tooltip
  fix a bug when testing UDP-based logging configuration
  bump templates form credential_types page limit
  Wait for Slack RTM API websocket connection to be established
  don't process artifacts from custom `set_stat` calls asynchronously
  don't overwrite env['ANSIBLE_LIBRARY'] when fact caching is enabled
  only allow facts to cache in the proper file system location
  replace our memcached-based fact cache implementation with local files
  add support for new "BECOME" prompt in Ansible 2.5+
  fix a bug in inventory generation for isolated nodes
  properly handle unicode for isolated job buffers
2018-02-20 12:22:25 -05:00
Seth Jennings
42ff1cfd67 add import_playbook as top-level playbook indicator 2018-02-19 16:03:08 -06:00
Matthew Jones
0d2daecf49 Merge pull request #1243 from matburt/fix_clustering_isolated
Fix isolated instance clustering implementation
2018-02-14 08:32:24 -05:00
Matthew Jones
ffe5a92eb9 Update isolated instance capacity calculaltion 2018-02-13 21:51:50 -05:00
Matthew Jones
925d9efecf Fixing up isolated node execution after cluster changes
* Rework queue detection to include control groups and isolated instances
* Fix up development tooling around isolated nodes
* Update unit tests
2018-02-13 21:51:38 -05:00
Chris Church
67ec811e8d Merge pull request #1186 from cclauss/execfile-file-reduce-StandardError
Miscellaneous Python 3 changes: execfile(), file(), reduce(), StandardError
2018-02-13 15:11:24 -05:00
Ryan Petrello
194c2dcf0b improve a bwrap test 2018-02-12 10:14:37 -05:00
Ryan Petrello
83b5377387 Merge pull request #1187 from ryanpetrello/file-your-vars-away-for-a-rainy-day
pass extra vars via file rather than via commandline
2018-02-12 08:48:19 -05:00
cclauss
2e623ad80c Change unicode() --> six.text_type() for Python 3 2018-02-11 21:09:12 +01:00
Bill Nottingham
aa5bd9f5bf Pass extra vars via file rather than via commandline, including custom creds.
The extra vars file created lives in the playbook private runtime
directory, and will be reaped along with the rest of the directory.

Adjust assorted unit tests as necessary.
2018-02-10 09:27:24 -05:00
cclauss
260aec543e Misc Python 3 changes: execfile(), file(), reduce(), StandardError 2018-02-09 17:17:05 +01:00
cclauss
c371b869dc basestring to six.string_types for Python 3 2018-02-09 16:28:36 +01:00
Bill Nottingham
c1a0e2cd16 Have bubblewrap mount a new /proc in the wrapped environment.
Since we're running with a new pid namespace, we should have
a new /proc that is in that namespace. Otherwise things will
be weird.
2018-02-07 15:47:03 -05:00
Matthew Jones
70bf78e29f Apply capacity algorithm changes
* This also adds fields to the instance view for tracking cpu and
  memory usage as well as information on what the capacity ranges are
* Also adds a flag for enabling/disabling instances which removes them
  from all queues and has them stop processing new work
* The capacity is now based almost exclusively on some value relative
  to forks
* capacity_adjustment allows you to commit an instance to a certain
  amount of forks, cpu focused or memory focused
* Each job run adds a single fork overhead (that's the reasoning
  behind the +1)
2018-02-01 16:57:09 -05:00
Matthew Jones
d9e774c4b6 Updates for automatic triggering of policies
* Switch policy router queue to not be "tower" so that we don't
  fall into a chicken/egg scenario
* Show fixed policy list in serializer so a user can determine if
  an instance is manually managed
* Change IG membership mixin to not directly handle applying topology
  changes. Instead it just makes sure the policy instance list is
  accurate
* Add create/delete hooks for instances and groups to trigger policy
  re-evaluation
* Update policy algorithm for fairer distribution
* Fix an issue where CELERY_ROUTES wasn't renamed after celery/django
  upgrade
* Update unit tests to be more explicit
* Update count calculations used by algorithm to only consider
  non-manual instances
* Adding unit tests and fixture
* Don't propagate logging messages from awx.main.tasks and
  awx.main.scheduler
* Use advisory lock to prevent policy eval conflicts
* Allow updating instance groups from view
2018-02-01 16:56:16 -05:00
Chris Meyers
c9ff3e99b8 celeryd attach to queues dynamically
* Based on the tower topology (Instance and InstanceGroup
relationships), have celery dyamically listen to queues on boot
* Add celery task capable of "refreshing" what queues each celeryd
worker listens to. This will be used to support changes in the topology.
* Cleaned up some celery task definitions.
* Converged wrongly targeted job launch/finish messages to 'tower'
queue, rather than a 1-off queue.
* Dynamically route celery tasks destined for the local node
* separate beat process

add support for separate beat process
2018-02-01 16:37:33 -05:00
Ryan Petrello
982539f444 fix a bug when testing UDP-based logging configuration
see: https://github.com/ansible/ansible-tower/issues/7868
2018-01-29 12:05:51 -05:00
Wayne Witzel III
55a616cba6 Load Celery inspector manually when needed 2018-01-29 14:57:03 +00:00
Ryan Petrello
5387846cbb Merge pull request #992 from ryanpetrello/optimize-output-event-filter
optimize OutputEventFilter for large stdout streams
2018-01-17 14:24:15 -05:00
Chris Meyers
e33265e12c add job_id to fact cache log output 2018-01-17 10:19:27 -05:00