10. 11. 2023
5 min read
How to create a CI archive for your project
Automating the archiving of CI runs and performance metrics enhances efficiency, promotes informed decision-making, and ensures long-term development process integrity. That is why we decided to create a CI archive for our project and we will take you through its creation process.
Jovan Blažek
Automating the archiving of CI runs and performance metrics from pull requests is crucial for several reasons.
Firstly, it ensures the preservation of valuable historical data. By automatically archiving these metrics, developers can maintain a comprehensive record of the performance and behavior of their codebase over time. This historical data can be precious for debugging and troubleshooting, allowing developers to identify patterns, trends, and regressions.
Secondly, having an accessible archive of CI runs and performance metrics facilitates collaboration and knowledge sharing within the development team. With this readily available information, team members can easily reference past results, learn from previous experiences, and make informed decisions about code changes.
Overall, automating the archiving of CI runs and performance metrics enhances efficiency, promotes informed decision-making, and ensures the long-term integrity of development processes.
That is why we decided to create a CI archive for our project and we would like to take you through our journey of creating it.
In search of a solution
When deciding on the solution, we had to take into account the following requirements:
the solution should be easy to implement
the solution should be free or cheap
the data should be accessible only by team members
Our first thought was to store data in a git repository, in a separate branch. This would be easy to implement and cheap to serve. Our project repository is already private, however, the pages it serves are public. To combat this problem, we decided to use PageCrypt, which would encrypt the html files, keeping our test results private. For serving these files we decided to use Render, which offers free static site hosting and has less strict size and bandwidth limits than GitHub Pages.
Implementation of Git-based CI archive
With initial research done, we split the implementation into the following steps:
Create a clean branch for archiving CI results and other metrics.
After each CI run, commit results to this branch with GitHub actions.
Configure automatic deployment on Render
The archive will have the following structure:
/├─ pull-requests/│ ├─ [PR number]/│ │ ├─ playwright-report/│ │ │ ├─ ...│ │ ││ │ ├─ jest-report/│ │ │ ├─ ...│ │ ││ │ ├─ index.html - Generated list of directories| ├─ index.html - Generated list of directories├─ index.html - Generated list of directories
GitHub actions workflow
We started with a simple goal in mind - archive the results of Playwright E2E tests. To do that, we wrote the following GitHub action job, which runs after the completion of playwright tests.
In short, this job:
creates or updates needed folder structure in the archive branch
downloads the test results and encrypts them using PageCrypt
regenerates navigation files using the bash script
pushes the changes to the archive branch
Let's take a closer look at this workflow step by step.
Job initialization
deploy-results: name: Deploy results to Render needs: [playwright-tests] permissions: contents: write pull-requests: write env: PW_REPORT_FOLDER: ${{github.workspace}}/archive/pull-requests/${{github.event.pull_request.number}}/playwright-report HTML_GENERATOR: ${{github.workspace}}/archive/generateHtmlListOfFolders.sh
runs-on: ubuntu-latest steps: - name: Checkout ci-archive branch uses: actions/checkout@v3 with: ref: ci-archive path: archive
- uses: actions/setup-node@v3 with: node-version: '16.12.0'
Firstly, we clone the archive branch into the archive
directory and initialize node.
Creating a folder structure and removing old reports
With the environment ready, we download the test results and move them to the correct place - replacing the old report in the process. When done, we run the PageCrypt to encrypt the index.html
file. Playwright report consists only of one html file, so we can encrypt it directly.
- name: Create a folder structure run: mkdir -p ${{env.PW_REPORT_FOLDER}}
- name: Remove old report run: rm -rf ${{env.PW_REPORT_FOLDER}}/*
- name: Download report uses: actions/download-artifact@v3 with: name: playwright-report path: ${{env.PW_REPORT_FOLDER}}
- name: Move report files run: | cd ${{env.PW_REPORT_FOLDER}} mv html-report/* . rm -rf html-report
- name: Encrypt report run: npx --yes pagecrypt ${{env.PW_REPORT_FOLDER}}/index.html ${{env.PW_REPORT_FOLDER}}/index.html "${{secrets.PAGECRYPT_PASSWORD}}"
Updating archive navigation
With everything ready and in the right place, we can regenerate archive navigation by running bash script. This script is responsible for creating a list with links to every directory in the current one. This way, we can regenerate the index.html
files after any change to the archive files. We run the script three times, once for each directory in the archive (root, pull-requests, PR number).
#!/bin/bash
TITLE=$1
FILENAME="index.html"
# Check if file exists and remove if it doesif [ -f $FILENAME ]; then rm $FILENAMEfi
touch $FILENAMEcat > $FILENAME << EOF<!DOCTYPE html><html> <head> <meta name="robots" content="noindex" /> <title>$TITLE</title> <link rel="stylesheet" href="/style.css" /> </head> <body> <h1>$TITLE</h1> <ul>EOF
# Loop over all subfolders in the current directoryfor DIR in ./*; do if [ -d "$DIR" ]; then # If directory, add link to the unordered list in the HTML file DIR_NAME=${DIR:2} echo " <li><a href=\\"$DIR/\\">$DIR_NAME</a></li>" >> $FILENAME fidone
cat >> $FILENAME << EOF </ul> </body></html>EOF
- name: Update archive folder lists run: | cd ${{github.workspace}}/archive ${{env.HTML_GENERATOR}} "CI Archive" cd ${{github.workspace}}/archive/pull-requests ${{env.HTML_GENERATOR}} "Pull Requests" cd ${{github.workspace}}/archive/pull-requests/${{github.event.pull_request.number}} ${{env.HTML_GENERATOR}} "Pull Request #${{github.event.pull_request.number}}" shell: bash
Committing and pushing changes
Finally, we commit and push the changes to the archive branch. We use git-auto-commit-action to do that. The commit message references the PR and a commit that tests ran on. This way, we can easily find the test results for a given commit. To make it easier to navigate to the archive, we add a sticky comment to the PR with a link to the archive using the sticky-pull-request-comment action.
- name: Commit and push changes to the ci-archive branch uses: stefanzweifel/git-auto-commit-action@v4 with: commit_message: 'ci: Update test results #${{github.event.pull_request.number}} ${{github.event.pull_request.head.sha}}' branch: ci-archive repository: archive
- name: Add sticky comment with link with: header: 'deploy-results' message: 'CI results for this PR are available at <https://example.com/pull-requests/${{github.event.pull_request.number}>}/'
How the turntables
After implementing the Git-based archive, everything was working great, results were being archived, data-driven decisions were being made, and everything was fine. Until it wasn't.
After a few weeks, we noticed poor performance of our CI pipeline. The time to complete was getting longer and longer and the archive was to blame. The way we were archiving the results was not efficient. The results of our E2E tests also include screenshots which were taking up a lot of space in our Git history. This caused the archive branch to grow to an enormous size, which got to the point when the pipeline would just time out or run out of space while downloading the archive branch.
We quickly realized that we needed to change the way we archive our results. The incremental history did not matter to us, we only cared about the final state of the PR. So we decided to move away from Git and use a different solution.
Archiving CI results using CDN
To combat the growing size of our repository, we turned our heads to a proper storage method for "larger" files - CDN. Luckily we were already using BunnyCDN on our project, so most of the work was already done.
We migrated the current archive to CDN and changed the workflow to upload the results to CDN instead of Git.
Updates to the github actions workflow
The logic for checking out the branch and pushing to it were removed. In their place, we added a step to upload the test results to CDN. The script uses API endpoints of the CDN to replace the old files with the new ones. It also regenerates the index.html
files used for navigation. (So long bash...)
The final version of the workflow looks like this:
deploy-results:name: Deploy results to CDNneeds: [playwright-tests]env: PW_REPORT_FOLDER: ${{github.workspace}}/downloadedArtifacts/pull-requests/${{github.event.pull_request.number}}/playwright-report
runs-on: ubuntu-lateststeps: - uses: actions/checkout@v3
- uses: actions/setup-node@v3 with: node-version: '16.12.0'
- name: Install dependencies run: npm ci
- name: Create a folder structure run: mkdir -p ${{env.PW_REPORT_FOLDER}}
- name: Download report uses: actions/download-artifact@v3 with: name: playwright-report path: ${{env.PW_REPORT_FOLDER}}
- name: Move report files run: | cd ${{env.PW_REPORT_FOLDER}} mv html-report/* . rm -rf html-report
- name: Encrypt report run: npx --yes pagecrypt ${{env.PW_REPORT_FOLDER}}/index.html ${{env.PW_REPORT_FOLDER}}/index.html "${{secrets.PAGECRYPT_PASSWORD}}"
- name: Upload report to CDN env: BUNNY_CDN_CI_ARCHIVE_API_KEY: ${{secrets.BUNNY_CDN_CI_ARCHIVE_API_KEY}} BUNNY_CDN_STORAGE_ZONE_NAME: archive run: | node ${{github.workspace}}/.github/workflows/uploadCiResultsToCdn ${{github.event.pull_request.number}}
- name: Add sticky comment with link with: header: 'deploy-results' message: 'CI results for this PR are available at <https://archive.b-cdn.net/pull-requests/${{github.event.pull_request.number}>}/'
Fight the YAGNI (you aren’t gonna need it) principle
Archiving CI results and other project health metrics is a great way to preserve valuable historical data and move your team toward more data-driven decisions. We have learned the hard way that building something quickly and cheaply can come back to bite you later, so sometimes you have to fight the YAGNI (you aren't gonna need it) principle and build something that will scale better in the future. We hope that this article will help you in your journey of building your own archive.
You might
also like