SKA PSS CI Systems Developer Guide
Table of Contents
[[TOC]]
Introduction
The PSS Cheetah pipeline will be run on dedicated servers, using a large number of nodes. The contents of this repository enable us to deploy our search software on different machines, both during development of the code and later on in production deployment.
The repository contains a typical Ansible file system structure, including playbooks, tasks, and
an inventory file. It also contains the iac_deployer tool, which wraps Ansible and provides
some configuration and testing suites. The iac_deployer is used to initiate testing and deployment of
configurations as defined in the ansible directory.
The tool should be run from your local host. To get started, check out this repository
with git directly to the machine you would like to use as local host, and follow the instructions below.
For testing, the tool will create docker containers on the local host (see Scenario 1), while for deployment it will
target the machines specified in the inventory file, called production (see Scenario 2).
Some things to note about how the tool works:
If you are running the tool on a local host (bare metal or Virtual Machine):
Test containers get created and deleted during testing, they don’t persist.
There are component tests and machine tests:
A component test makes a container, then pulls the relevant component (e.g. boost), installs the component, checks everything is working as expected, rolls-back the component, then removes the container.
A machine test creates a container following an ansible playbook, installs everything in that playbook, checks everything is working as expected, does a roll-back, then removes the container.
Tests that include our Gitlab runner component (both machine tests and component tests) create an additional container that has a full instance of gitlab inside, to test registering the gitlab runner.
If you are instead running in gitlab CI:
All of the above is then run inside an additional container, using Ubuntu 18.04. If you want to change this OS, you may also need to update the requirements file for python.
In actual deployment, you will run the iac_deployer tool on a local machine with the addresses or target machine names included in an inventory file (
production).The tool will create a python virtual environment and install Ansible inside the environment. This is beneficial as it is then isolated from the host machine and can have different versions of python-related software (including Ansible) than what is installed directly on the host.
Installing Python and virtualenv
Before running the iac_deployer script, please ensure you have Python and virtualenv by following the steps below (either for Debian or MacOS).
Debian
Installation via package manager
For Debian-based OSes, please ensure Python 3 (>= 3.6) is installed and the python3-venv apt package is installed, since the Python venv module is used for creating a Python virtual environment where all necessary pip packages will be installed. For example, installation on Ubuntu would look something like the following:
$ sudo apt install python3 python3-venv
NOTE: If you are using Ubuntu 20.04, the python3-venv package may not be available. In which case, you probably already have the virtualenv package. Use the command below to test it.
Verify that you have the virtualenv package by running:
$ python3 -m venv -h
If you see the usage of the venv module, you have successfully installed the virtualenv package.
Installation via pip
Alternatively, you can install the virtualenv package with pip itself. Firstly, ensure you have Python 3. You can type
$ python3 --version
to check this. Next, check if you have pip installed by typing
$ python3 -m pip --version
If you get something like:
pip 9.0.1 from /usr/lib/python3/dist-packages (python 3.6)
then you have pip. Otherwise, install pip either by installing the apt package python3-pip with
$ sudo apt install python3-pip
or by using the ensurepip module:
$ python3 -m ensurepip --upgrade
You should now have pip. Now, install the virtualenv package with:
$ python3 -m pip install virtualenv
Check that you now have the venv module by typing:
$ python3 -m venv -h
If you see the usage of the venv module, you have successfully installed the virtualenv package.
MacOS
Please download the latest Python release from https://www.python.org/downloads/mac-osx/. Then, open a terminal and verify that you have Python 3:
$ python3 --version
Next, install pip with:
$ python -m ensurepip --upgrade
You should now have pip. Now, install the virtualenv package with:
$ python3 -m pip install virtualenv
Check that you now have the venv module by typing:
$ python3 -m venv -h
If you see the usage of the venv module, you have successfully installed the virtualenv package.
Installing Docker on MacOS
If you are using the repository on a Mac, you will need to manually install Docker. (On Debian-based systems, this is automated via Ansible.) Please go to https://docs.docker.com/desktop/mac/install/ and download the correct version of the Docker Desktop installer for your architecture. Then, follow the instructions further down the page (https://docs.docker.com/desktop/mac/install/#install-and-run-docker-desktop-on-mac). Once you’ve finished, open a terminal and verify that you can run:
$ docker --version
$ docker ps
Getting started with iac_deployer
Once you know you have Python 3 and virtualenv (and Docker if you’re on a Mac) installed, you can run iac_deployer (from the repository’s root directory). The first time the script is run, Ansible and other required pip packages will be installed within a Python virtual environment, which will be located in the venv directory at the repository’s root.
Once the installation has completed (it will take a little while to install ansible), verify you can view the script’s usage by typing:
$ ./iac_deployer -h
Commands quick-start
List available component tests with:
$ ./iac_deployer component_tests list
You can then choose one from the list and run it with:
$ ./iac_deployer component_tests run <component_name>
You can do similar things for the machine configuration tests i.e.:
$ ./iac_deployer machine_tests list
and
$ ./iac_deployer machine_tests run <machine_name>
Check the status of a machine with:
$ ./iac_deployer machines status <machine_name>
For more detailed help, see the example usage patterns below.
Running the unit tests
You can run the unit tests for the Python code (not for the Ansible configurations) by following the steps below. Note that this is different from running the machine and component tests via iac_deployer - this tests the Python code behind the iac_deployer script.
After making sure you’ve run iac_deployer at least once and installation was successful, source into the created virtual environment with (from the repository’s root):
$ source venv/bin/activate
Change into the python directory:
$ cd python/
Run the tests:
$ tox -e tests
You can also run the linter:
$ tox -e lint
When you’re finished with the virtual environment, simply type:
$ deactivate
to exit the virtual environment.
GitLab branching strategy
The Gitflow workflow will be used (full details are documented here: https://www.atlassian.com/git/tutorials/comparing-workflows/gitflow-workflow). This is implemented for this repository in the following manner:
In general, deployments to the production machines should only be made from the
mainbranch. This (currently) isn’t automated via the GitLab CI/CD pipeline; instead, an authorised user (who has SSH keys set up for accessing the production machines) should clone a local copy of the repository and use the script to make the deployment.If the user wants to rollback a machine to a previous Git commit, they should clone the repository, then use
gitto checkout the commit, and then use the deployment script from there.
The integration branch is called
devand is the branch that should be branched off of when creating a new feature branch. Once any feature branches are completed, they should be merged back intodevvia merge requests. When the CI/CD pipeline is executed for thedevbranch, tests that emulate the production machines’ configurations, as well as component-specific tests, are run. The pipeline is also automatically run whenever a feature branch is pushed to, to help to catch any errors early before merging the branch intodevvia a merge request.When a deployment needs to be made,
devshould be merged intomain, and then an authorised user should make the deployment from there. If any “hotfixes” need to be applied to address any deployment issues, these should be made ondevor a feature branch; before being merged intomainvia a merge request.The differences between our workflow and the linked gitflow workflow are:
Release branches are not used, since we are not using version numbers on
main.We should not have any hotfix branches - any hotfixes should be applied to
devor a feature branch.
Git workflow
Please ensure you follow the Git workflow for this repository closely. See the following sections for common use cases:
Editing the repository’s contents
If you wish to make updates to any of the files within this repository (for example if you want to add a new component, edit an existing one, add a new machine configuration etc.), please follow these steps:
Clone the repository to your local machine. If you have GitLab SSH keys set up:
$ git clone git@gitlab.com:ska-telescope/pss/ska-pss-ci-systems.git
Otherwise, use HTTPS:
$ git clone https://gitlab.com/ska-telescope/pss/ska-pss-ci-systems.git
Next, switch to the dev (integration) branch:
$ git checkout dev
Then, create a feature branch from the dev branch:
$ git checkout -b <feature_branch_name>
Now implement your changes. View the below sections for how to add a new component, how to add a machine configuration, etc.
Once you’ve finished implementing your changes, you can optionally run the machine and component tests locally on your development machine before pushing to the remote (which will run all tests again via the CI pipeline).
Please note: at the time of writing, there are some component and machine tests that involve the creation of a local GitLab CE Docker container to run tests against. If you wish to run these tests, your development machine / VM needs to have a minimum of 4GB of RAM, due to the GitLab container requiring that much. For more details, please visit: https://docs.gitlab.com/ee/install/requirements.html#memory
Also note that the component tests can take between 1.5 hours and 5 hours to run, depending on the speed of your machine. The machine tests can take between 20 mins and 2 hours.
To run all component tests, type:
$ ./iac_deployer component_tests run
Similarly, to run all machine tests, type:
$ ./iac_deployer machine_tests run
You can also type the --help argument after each run to view the usage for the run sub-command.
If you want to run a specific component / machine test, just specify the name of the component / machine after the run command. For example:
$ ./iac_deployer component_tests run 'gitlab-runner'
To push your feature branch to the remote and initiate CI testing:
$ git push --set-upstream origin <feature_branch_name>
This should then automatically trigger the CI/CD pipeline for your branch. Visit https://gitlab.com/ska-telescope/pss/ska-pss-ci-systems/-/pipelines to view all the repository’s pipelines.
Ensure that the pipeline passes. If it does not, fix the problems (either via the GitLab Web IDE or your working copy of the repository) and try again. Repeat this until the pipeline passes.
Next, create a merge request (MR) for merging your feature branch into dev and wait for it to be reviewed.
Once your MR has been approved, merge it into dev. Note: it’s okay if dev has failing pipelines, but main should (ideally) never have failing pipelines.
That’s it, your edits are now on dev, the integration branch. If GitLab hasn’t done so already, you can now delete your feature branch. To do this with git:
$ git branch -d <feature_branch_name> # Delete local branch
$ git push -d origin <feature_branch_name> # Delete remote branch
If the remote branch is still listed when you type git branch -a, run this command to update the list of remote branches:
$ git remote update origin --prune
Making a deployment to one or more machines
If you wish to deploy the defined Ansible configurations to one or more machines, please first ensure you are authorised to do this (i.e. you can ssh to each machine you want to deploy to, either via password or key-based authentication).
Now, decide whether you need to make a new commit on the main branch or not. If you want the latest configurations from the dev branch, you will likely need to do this. To make a new commit: first ensure that the latest commit on the dev branch has passed the CI/CD pipeline. If it has not, then either make a “hotfix” directly on the dev branch, or, if the required changes are more substantial, create a feature branch and perform the edits as required, using the steps in the above section. You can then create a merge request (MR) for merging dev into main. You can set the MR to automatically complete once the pipeline for dev has passed.
Once you have a commit on main that has passed the pipeline, you can make the deployment. Please follow these steps:
Clone the repository to your local machine. If you have GitLab SSH keys set up:
$ git clone git@gitlab.com:ska-telescope/pss/ska-pss-ci-systems.git
Otherwise, use HTTPS:
$ git clone https://gitlab.com/ska-telescope/pss/ska-pss-ci-systems.git
Next, switch to the main (master) branch:
$ git checkout main
Next, checkout the particular commit SHA that you want to use for the deployment. (This can be helpful if you want to rollback a machine to a previous deployment.) If you just want to use the latest commit on main, you can skip this step.
$ git checkout <commit_hash>
If you need to, you can run all component / machine tests (note that this will take some time) with:
$ ./iac_deployer component_tests run
$ ./iac_deployer machine_tests run
You can see what changes would be made by your deployment first by using the machines status function:
$ ./iac_deployer machines status <machine_name>
To make the deployment to a particular machine, type:
$ ./iac_deployer machines deploy <machine_name>
If any errors occur during deployment, you will be asked to fix them by making a new commit (or, alternatively, you can try again from the same commit). You should implement any fixes on the integration (dev) or a feature branch (not main), and repeat the above process until the deployment succeeds.
Prerequisites
Python 3
Used pip packages (listed in python/requirements.txt):
ansible == 4.2.0
docker == 5.0.0
PyYAML == 5.4.1
pytest == 6.2.4
pytest-cov == 2.12.1
tox == 3.24.1
Additionally, these packages are used for linting the Python code (listed in python/tox.ini):
flake8
flake8_formatter_junit_xml
Please do not add any new dependencies without consent from the repository’s authors!
Setting up SSH keys
You can enable Ansible to SSH to the host machines that you want to configure without any passwords by setting up SSH keys for them. The easiest way to do this is to use the following commands:
ssh-keygento generate your SSH key (if you’ve already got a key, i.e. if the file ~/.ssh/id_rsa.pub already exists, you can skip this step)ssh-copy-idto copy your public SSH key to each host that you need Ansible to be able to connect to.Then you should test that you can successfully SSH to each host without any passwords with
ssh <user>@<hostname>.
Changing the contents of requirements.txt
Please note that if you change the contents of python/requirements.txt, you will need to delete the venv directory and run iac_deployer again, so that the changes are picked up.
Deleting the venv directory
If requirements.txt changes or the iac_deployer script changes, you may wish to delete the virtual environment and create it again.
Simply run (from the repository’s root directory):
$ rm -rf venv/
to remove the virtual environment.
The next time you run iac_deployer, it will be re-created and the pip packages listed in python/requirements.txt will be installed within it.
Python script example usage
Testing a component (role)
Initialise the test playbook for the gitlab-runner component for each of the defined operating systems:
$ ./iac_deployer component_tests run 'gitlab-runner'
You can use the -v option to provide verbose output. You can use the --os option to specify which operating system to test the component on e.g. ubuntu-20.04. Available OSes are listed in the ansible/roles/os-containerised/files directory. At the time of writing, these are centos-7, ubuntu-18.04, and ubuntu-20.04.
Testing a machine’s configuration
Initialise the test playbook for the kelvin machine:
$ ./iac_deployer machine_tests run 'kelvin'
This will emulate the kelvin machine’s configuration locally within a development Docker container to highlight any immediate errors.
Checking the status of a machine’s configuration
Check the difference between the contents of the repository for the kelvin machine now and the repository’s contents when it was last deployed to:
$ ./iac_deployer machines status 'kelvin'
If the repository’s contents are the same, then this will simply say that kelvin is up-to-date.
Deploying the machine configurations
To attempt to deploy the configurations for all the machines that are in the inventory file (ansible/production):
$ ./iac_deployer machines deploy
To only deploy to the kelvin and tengu machines:
$ ./iac_deployer machines deploy 'kelvin' 'tengu'
You can use the –rollback-only option to ONLY rollback the existing configuration (if any) of the machine(s).
What are deployments and rollbacks?
A deployment is the usual execution of an Ansible configuration upon a target
machine. For example, in deployment mode, a certain APT or YUM package may get
installed on the target machine. Whereas, in rollback mode, this package would
get uninstalled. The point is, when a deployment is made, there should be
corresponding rollback steps that “undo” everything that the deployment did in
the first place. You can see examples of deployment and rollback steps in the
existing Ansible roles. Each role should have a deploy and rollback
subdirectory under tasks that define deployment and rollback steps,
respectively (although this is not strictly enforced - and it may not make sense
for every role to have this exact structure).
Whenever a new machine configuration is added to this repository, that machine’s
status will begin as “not deployed” - i.e. it is waiting to be deployed to. Once
iac_deployer has been used to deploy the machine’s configuration from the
repository (and there were no errors during this process), the machine’s status
will change to “deployed”. Going back to the earlier example, the certain APT or
YUM package (which we’ll call package A) will now be installed on the target
machine.
If this machine’s configuration needs to be altered (for example, if package A needs to be replaced with a different package - which we’ll call package B), then, firstly, the existing deployment on the machine must be rolled back so that package A is uninstalled. The machine’s status would then change back to “not deployed”. Then, a new deployment can be made from the newer git commit of the repository that installs package B instead of package A. After this deployment is done, the machine’s status would change to “deployed” again. Note that the newer git commit doesn’t need to reference package A anywhere - since the older git commit will be used for rolling back package A.
If, after this process, you decide that package A is needed again instead of package B, then you can checkout the older git commit and initialise the deployment from there. This would first rollback the newer deployment (uninstalling package B), and, if that succeeded, would then proceed with installing package A from the older commit again.
A summary of the steps needed for the above example are shown below. Please note that this is an example used to illustrate how deployment and rollbacks work and thus shouldn’t be done to real production machines, but can be done on a local testing branch of this repository that should be deleted afterwards.
Ensure your currently checked out git commit of the repository contains the install and uninstall steps for package A. If it doesn’t, find the commit SHA and check it out:
$ git checkout <commit_sha>
Deploy package A to the machine with the name
<example_machine>:$ ./iac_deployer machines deploy <example_machine>
If no errors occured during the deployment, then package A will now be installed on the machine. You can check the machine’s status with:
$ ./iac_deployer machines status <example_machine>
This should report that the machine is now deployed to.
Now make a new git commit that replaces the Ansible installation and uninstallation steps of package A with package B. For example, by using
git commit:$ git commit -m "Replace package A with B"
If you now run the status command again, it should report that the machine is deployed to, but differs in file contents between the commit when it was deployed to and the current git HEAD i.e. it should show the package A steps being replaced by the package B steps.
The existing deployment can be rolled back by using the following command:
$ ./iac_deployer machines deploy <example_machine> --rollback-only
This can be run from any git commit. It will uninstall package A by making use of the older commit automatically.
The new deployment can then be made (that installs package B). This command should be run from the latest commit (i.e. the commit that includes package B’s installation and uninstallation steps):
$ ./iac_deployer machines deploy <example_machine>
Note that this step can be skipped if, on the previous step, you omitted the
--rollback-onlyargument. In that case you must have run the command from the git commit that includes package B’s steps. That would have automatically deployed the newer git commit following the rollback of the older one.Package B should now be installed. You can use the same status command as earlier to check that the machine’s status is now “deployed”.
To now remove package B and install package A again, you need to first checkout the older git commit:
$ git checkout <commit_sha that includes package A's steps>
From there, you can use
iac_deployeragain to rollback the existing configuration (removing package B), and then to install package A again, all in one command:$ ./iac_deployer machines deploy <example_machine>
Following this, package B should be removed, and package A should be installed again.
Managing the available machine operating systems to run tests on
Machine configuration playbooks (listed under ansible/hosts/) must define a
variable, called os, that specifies what operating system the machine is
expected to be running. For example, ubuntu-18.04. If, when attempting a
deployment, there is a mismatch between the actual detected operating system and
the one defined by the os variable, then an error will be raised. The reason
this is done is so that the machine tests can accurately emulate the operating
systems that the real machines are running via Docker containers. The machine
tests do this by looking up the value of the os variable, which is then used
to find the corresponding directory that contains the appropriate Dockerfile to
create the test container.
The available operating systems to be emulated can be found under
ansible/roles/os-containerised/files/. Each OS directory must contain a
Dockerfile that will extend from the relevant OS version (e.g. ubuntu:20.04).
Please see the existing OS directories and their Dockerfiles to get an idea of
what is required. As a minimum, each Dockerfile must contain:
A
FROMline - the first line in the Dockerfile. This must specify what Docker image that this Dockerfile should extend from. For example, if the operating system is Ubuntu 20.04, the line would read:FROM ubuntu:20.04. You can browse available Docker images that can be extended from via https://hub.docker.com/One or more
RUNlines that run the following necessary commands:A command that installs Python 3. For example, to update the
aptpackage index and install Python:RUN apt-get -y update && apt-get -y install python3.A command that upgrades the packages that are already installed. For example, for
apt:RUN apt-get -y upgradeA command that installs any other necessary packages that are needed by any Ansible modules that will be in use during machine and component tests. For example, the
ansible.builtin.aptmodule requires thepython3-aptpackage to be installed. Therefore, a line such asRUN apt-get -y install python3-aptwould be needed in the Dockerfile.A command that installs sudo, a command that adds a new user that is added to the sudoers group, and a command that ensures that the user can run sudo commands without needing to type in their password. Please see the existing Dockerfiles for examples of this.
A
USERline that sets the user to be used in the container to the created user (mentioned in the above point) rather than the default of ‘root’. This is done to ensure that privilege escalation matches the typical workflow of using sudo on the production machines (rather than logging in directly as the root user).A
CMDline that specifies thesleep infinitycommand as the default command for the container. This is so the container will stick around while the Ansible playbooks are being run upon it.
Generating ignore lists of files for a machine operating system
If you have just created a new operating system to emulate via the steps above,
then when testing components (via the ./iac_deployer component_tests
sub-command), you will find that a test may report that there were file changes
that weren’t handled by the component being tested. This will probably mean you
will need to create an “ignore list” of files for your new operating system.
Please see the section entitled “How to create a test for a role (component)”
for more details about this.
Note: you can specify what OS to test the component on when using the
component_tests sub-command. For example:
$ ./iac_deployer component_tests run <component_name> --os "ubuntu-20.04"
Therefore, by replacing “ubuntu-20.04” with the name of your OS, you can run the component test on your new OS only (rather than all OSes) to speed up testing.
How to create a role (component)
Each Ansible role in this repository represents an individual component that can
be deployed to a machine or set of machines. Roles should be placed under the
ansible/roles directory. There are already some sample roles that you can look
at to get an idea of what is required. Also, there is a full example template
for a role under ansible/examples/role_template/. You can copy this directory
to ansible/roles/ and rename it and edit its files to suit your needs. Full
details on how to create a new role without this template are below.
Create a new directory under
ansible/roles. Its name will be the name of the component.Within this new directory, create another directory called
tasks. Withintasks, create a file namedmain.yml. This file will serve as the main entry point to the role.The tasks that the role executes should be based on the value of the
deploy_modevariable. This variable tells the role whether the role should be deployed (i.e. installed), or rolled back (i.e. uninstalled).deploy_modewill equal'deploy'for deploy mode, and will equal'rollback'for rollback mode. You should createdeployandrollbackdirectories undertasksthat each contain their ownmain.ymlfiles. Themain.ymlfile undertasksshould then include eitherdeploy/main.ymlfor deploy mode orrollback/main.ymlfor rollback mode. The idea is that rollback mode should undo everything that deploy mode did. See thedockerandgitlab-runnerrole for an example of all of this. Or can you can look at theansible/examples/role_template/tasks/directory.If you wish, you can create other
.ymlfiles within thedeployorrollbackdirectories that can be included from other task files. For example, you could havedebian.ymlandredhat.ymlfiles that contain OS-specific steps. See one of the sample roles e.g.docker(orrole_template) for an example of this.If, during the rollback steps, your role needs to remove a package e.g. via APT or YUM, then you must use the
package_managerrole to do this. The reason for this ispackage_managerautomatically checks for reverse dependencies of the packages you’re trying to remove, and if it finds any, the package removal will be skipped. Reverse dependencies are other packages that have dependencies on the package(s) you’re trying to remove. If the package(s) you have listed were removed, then the reverse dependencies would be automatically removed too, which is undesirable, since you may end up removing a package that you didn’t intend to remove. If it’s okay that these reverse dependencies get removed, then they should be added to the list of packages provided topackage_manager, so that it knows that it’s okay for those packages to be removed too. Please see therole_template,gitlab-runner, ordockerroles (looking at files under therollbackdirectory) for an example of this.If your role depends on another role, then create a directory called
metawithin the role’s root directory and create amain.ymlfile within that. Then use thedependencieskey, followed by a list of YAML values for each role that your role depends on. See thegitlab-runnerrole or therole_templatefor an example. You can read more about role dependencies here: https://docs.ansible.com/ansible/latest/user_guide/playbooks_reuse_roles.html#using-role-dependenciesNB: Please note that you should be careful if you create multiple roles, where one of these roles depends on the other, that have other overlapping dependencies. For example, if role A conditionally depends on role B, and both role A and B have a non-conditional dependency on role C, then, when running role A, Ansible will apply the condition for role B to the role C dependency for both cases - rather than just for the role B case. This can potentially cause role C’s steps to be skipped for apparently unknown reasons. A workaround for this is to use the Ansible include_role or import_role modules for role C from within role A’s tasks. The role B condition will then not be applied to the included role C.
It can also be helpful to have a set of default values for the variables that your role makes use of, that the user can override (e.g. via the command line) if they wish. You can set the default values via another
main.ymlfile under adefaultsdirectory i.e. the path tomain.ymlwould be<repo root>/ansible/<name of role>/defaults/main.yml.If your role needs to deploy one or more files to the machine(s) being deployed to, you can store these under a
filessubdirectory (filesshould be at the same level asdefaultsandtasks).You can use Ansible handlers that can perform tasks e.g. restart a service, only if certain other task(s) have been done e.g. updating the configuration file for that service. The handlers should live in a
main.ymlfile under ahandlerssubdirectory. You would use thenotifysetting within a task definition to tell Ansible which handlers should be run when the task is run. For more information about handlers, please visit: https://docs.ansible.com/ansible/latest/user_guide/playbooks_handlers.html
More information about Ansible roles can be found here: https://docs.ansible.com/ansible/latest/user_guide/playbooks_reuse_roles.html
How to create a test for a role (component)
It is good practice to test each Ansible role (component) you have created to
ensure that they work as you intend before deploying them to production
machines. To do this, for each component that you want to test, you should
create a corresponding directory under ansible/tests/components/ that is named
the same as the Ansible role (e.g. gitlab-runner). See
ansible/examples/component_test_template/ for a template for creating a
component test. The component test directory should contain the following files
/ directories:
deploy.ymlandrollback.ymlplaybooks, that contain steps for deploying and rolling back the component, respectively. During the test, each playbook will be run twice in succession to ensure that no errors occur the second time. These playbooks are mandatory.Optionally, you can define additional Ansible playbook file(s) that can be used to test the role more thoroughly. You can define multiple playbooks if you want to test different aspects of the component. Each playbook file must start with the
test_prefix and have a.ymlextension. For example,test_docker-runner.yml.iac_deployerwill search for files that meet this criteria and run each of them in turn after thedeploy.ymlplaybook has been run the second time.A file called
ignore_list.ymlcan be created in the component test directory. This file should contain a YAML list of file / directory patterns that should cause any files / directories that match any of these patterns to be ignored by the file checking step of the component test. For example, if you know that your component will create a file called /test_file.txt, but you know it doesn’t matter if that file isn’t deleted by the component’s rollback steps, then you should add the file to the ignore list.In addition, you can define OS-specific ignore lists by creating a directory named after the OS (e.g.
ubuntu-20.04) in the component test directory, and within that directory, you should create anotherignore_list.ymlfile. This file will then only be used when the component is being tested on the defined OS.Additionally, if the component (i.e. role) defines any role dependencies in its
meta/main.ymlfile, then the ignore lists for those components (plus the ignore lists for any of the dependencies’ dependencies) will be included too.
Optionally, playbooks called
setup.ymlandteardown.ymlcan also be created in the component test directory. These playbooks should contain any steps that need to be carried out before the first run of thedeploy.ymlplaybook and after the last run of therollback.ymlplaybook, respectively.setup.ymlwill be executed once before all other playbooks begin, andteardown.ymlwill be executed once after all other playbooks have completed. This can be helpful if your tests need some sort of fixture to run their tests against. For example, thegitlab-runnercomponent test creates a local GitLab instance in a container for testing against and removes it at the end.Also, if the test needs any variables to be set that need to be available to all of the playbooks, then these can be defined in a
vars.ymlfile.
See the gitlab-runner directory for an example test for that component. It is
good practice to write your component tests such that once they have completed,
everything that the test set up is teared down so the development machine is
left in a clean state (i.e. via the use of the setup.yml and teardown.yml
playbooks).
Once you are happy with the tests you have created, you should run them by using
the wrapper script’s function component_tests. You should pass the name of the
component to the function (e.g. gitlab-runner).
How to create a configuration for a machine or group
The roles (components) that can be deployed to machine(s) or group(s) of
machines should be available under ansible/roles. Separate Ansible playbooks
can then be used to decide which roles should be deployed on which machines.
Firstly, if this is a new machine or group of machines, you will need to add it to the
productioninventory file (located at<repo root>/ansible/production). See the file for an example configuration. All hosts should be listed at the top of the file, with any groups defined below that. Groups names are defined by surrounding them with square brackets[]. See this page for more details about building the inventory file: https://docs.ansible.com/ansible/latest/user_guide/intro_inventory.htmlNext, create a file named after the machine or group within the
ansible/hosts/directory (oransible/groups/for a group), with a.ymlextension i.e.<host / group name>.yml.There is a sample file at
ansible/examples/template.ymlthat shows an example which you can adapt to suit your needs. In particular, you should replace all occurrences of<machine / group name>in the file with the name of the machine or group that you want to configure.
Next, specify the list of roles that should be deployed on the particular machine or group. This is simply a YAML list underneath the
roles:heading.For each machine you are configuring, in the host playbook, you need to set the
osvariable to the operating system that the production machine is running. When the configuration is deployed, if there is a mismatch between this value and the actual detected operating system, an error will be raised. Theosvariable is used for themachine_testsfunction so it knows what operating system to use as a base. To see an example of this, view one of the existing machine playbooks inansible/hosts.You should now test your machine or group configuration. You can do this by using the
machine_testsfunction from the wrapper script. This function will create a Docker container on your development machine with the same OS as the production machine. It will then deploy the roles within the container, which should help to identify any errors before deploying in production.Once you are happy that the configuration is tested, you can use the
machines deployfunction to deploy it in production (following the appropriate git workflow - see the above section).
How to test a configuration for a machine
You can use the machine_tests function of the wrapper script to test your
configuration for a production machine locally in a development Docker container
to help ensure that the configuration deploys successfully before deploying in
production. The steps that the wrapper script takes are as follows:
The user runs
machine_tests run [machine_name]. There must be a playbook for the machine underansible/hosts/and the operating system for the machine must be defined as a variable calledoswithin this playbook (under thevarssection).The script looks up the value of the
osvariable for the machine and remembers itThe script runs the
ansible/tests/docker-create.ymlplaybook, passing theosandcontainer_namevariables. Thecontainer_namevariable will be equal to the machine name.This will create a Docker container named after the production machine, using the machine’s OS
The script generates a
site.ymlplaybook by concatenating all theansible/hosts/*.ymlandansible/groups/*.ymlplaybooks togetherThe script copies the production inventory and replaces the line for the machine under
[all]with a line that is the same except it setsansible_connectiontodockerand removes any real IP address of the machineThe script then runs
site.yml, passing the edited inventory, but limits the hosts to just the Docker container named after the machineThis will then deploy the production machine’s config inside the Docker container, and any immediate problems will become obvious
Afterwards, the script runs
ansible/tests/docker-destroy.ymlwithcontainer_nameset to the machine name to tear down the Docker container
How the rollback of components works
When a deployment is initialised via the machines deploy function, the following steps will be carried out, for each machine (host) that is getting deployed to:
The status of the machine will be looked up from the status repository and, if it already has a deployed configuration, the configuration will be rolled back. Otherwise, the rollback steps are skipped
The
git archivecommand will be used to generate atararchive of the repository at the commit SHA of the last deployment (as obtained from the status repository) and theiac_deployerscript from the extracted archive’s contents will be run to perform the rollback.This will cause the host playbook in the extracted archive (found in
ansible/hosts) to be run, and thedeploy_modevariable will be set to “rollback”.The roles for the host will then be run with the value of this variable, which will run the uninstall steps for each role, removing the config that was deployed previously.
Once that has finished, the extracted archive will be deleted, and then the original repository’s playbook will be run, but this time with the
deploy_modevariable set to “deploy”. This will initiate the latest deployment with the updated roles.Once the deployment has completed, the
status.jsonfile from the status repository will be updated to point to the current commit for this host.These steps will then be repeated for any other hosts that are being deployed to.
How host status checking works
When the user calls the machines status function, for each machine (host)
specified, a diff report that will show the differences between the last
deployment to the host and the current repository’s configuration will be
generated based on the following criteria:
The report will show a list of roles that have been added, removed, or modified. Also, if the machine’s playbook has changed, this will also be shown.
If there are no role changes, the report will simply state that the host is up-to-date with the repository’s current configuration.
Full script usage details
You can simply type
$ ./iac_deployer --help
to view the script’s usage. The output is shown below.
usage: iac_deployer [-h] {component_tests,machine_tests,machines} ...
Interface to initiate testing and deployment of the defined Ansible
configurations from the repository.
positional arguments:
{component_tests,machine_tests,machines}
Sub-command help
component_tests List or run component tests
machine_tests List or run tests for machine configurations
machines Perform or check machine deployments, including
rollbacks
optional arguments:
-h, --help show this help message and exit
The command specific help outputs are shown below.
component_tests
$ ./iac_deployer component_tests --help
usage: iac_deployer component_tests [-h] {list,run} ...
positional arguments:
{list,run} sub-command help
list List available component tests
run Run a component test
optional arguments:
-h, --help show this help message and exit
machine_tests
$ ./iac_deployer machine_tests --help
usage: iac_deployer machine_tests [-h] {list,run} ...
positional arguments:
{list,run} sub-command help
list List available machine configurations that can be tested
run Run a test for a machine configuration
optional arguments:
-h, --help show this help message and exit
machines
$ ./iac_deployer machines --help
usage: iac_deployer machines [-h] {deploy,status} ...
positional arguments:
{deploy,status} sub-command help
deploy Command to initiate rollbacks (if necessary), followed by
deployments of machine configurations
status Command to check the status of machines
optional arguments:
-h, --help show this help message and exit