Pipeline Developer Forum

CI / CD
Monday, May 27, 2019

Thank you to everyone who made it out to the first official forum, and second actual meeting! It was amazing to see the wide range of backgrounds, especially in a smaller group. All in all, our attendance wasn’t really big enough to consider this as any kind of accurate survey, but we had some very cool chats and I think it brought out some interesting thoughts and valuable insights. For those of you that couldn’t make it, please fill out the email survey for June, and I hope to see you at the next one!

Continuous Integration / Continuous Deployment

I’ll start by leaving a couple of wikipedia quotes here for those not entirely familiar with this month’s topic:

Continuous Integration (CI) is the practice of merging all developers’ working copies to a shared mainline several times a day.

Continuous deployment (CD) is a software engineering approach in which software functionalities are delivered frequently through automated deployments.

In more relevant terms, the discussions in this forum focused mainly on how pipeline software and tools make it from the developer to the artist and/or production.

In this discussion, it seems easy to segregate our code into three main groups: python code/tools, compiled DCC plugins, and API/web servers. In this particular forum, we focused mainly on the first couple, touching servers only briefly.

General Summary

At the end of the day, it seems there are certainly some tools and software which are more prevalent or well known - such as Jenkins and rez. Beyond that, there may be consistencies between any two studios, but nothing that appears to be universal throughout. There is however, a lot of agreement about what we’d like to see and what practices and tooling we believe would be better for both developers and production.

What is ‘building’?

Something that came up during our chatter was a question about the word ‘build’: What do we really mean when we talk about building our code? The obvious initial answer is that building really applies only to compiled languages, and that this process of producing a shared library or executable is what we mean by ‘building’. Then again, what about the process of generating .py files from .ui files in Qt? Or if we pre-compile python byte code (.pyc) files before release? Or pull together something for an internal packaging system? We realized quickly that there are many processes that we have which you could label as a ‘build’ process.

It seems that if we want to generalize, it makes sense to simply say that a ‘build’ process is any process which generates an ‘artifact’ to be deployed, distributed or otherwise used in another process. This is by no means a new concept in the software world, but is a helpful though in directing conversation in this area.

How does your code get to an artist?

We went around the room answering this question in turn, and it seems to solicit an incredible range of answers. It seems safe to say that you’ll find many different answers to this question even within most studios - depending on who you ask. Here’s a sample of some of the workflows that we heard about and talked about:

What about containers?!

There does appear to be some talk and application of containers, but we are definitely not seeing the kinds of container-based development workflows seen in the web sector. This may be because of the challenges in using DCC apps in containers; the desktop (aka not client-server) nature of the software that we build; or any other multitude of differences between our industry and others.

Python and PYTHONPATH

Every studio produces a lot of python code, without a doubt. As an interpreted language, python brings with it it’s own deployment challenges. Solutions like rez can help to manage all the different packages and versions that are used, and some studios opt to use more integrated toolchains like pypi and pip each with their own set of challenges. One of the interesting things to be wary of is the PYTHONPATH, as many packaging systems like rez add a search path entry for each package in the environment. In many scenarios this is ok, but we have to remember that this search process is always going to be O(n), and with a large package set this can become a significant amount of time. It seems best overall to build a deployment system which allows for either a small set of entries containing many packages, or an explicitly ordered search path - each of which introduce other problem sets to be solved.

What are some of the biggest issues in our current or past deployment pipelines?

  1. The existence and accuracy of tests

    Automated testing is great, and a good test suite can give great confidence to a development team whenever change is being introduced. The problem, however, is when the tests that exist are not sufficient, or do not exist at all for some areas of the code. We tend to put a lot of trust into our automated testing - and maybe this not unreasonable - but it can also cause us to be less diligent when releasing our software.

  2. The fallacy of artist testing

    One common practice and recommendation seems to be rolling out software to a smaller group of users first in order to have them ’test’ the code before it goes out to everyone. By no means did we identify this as a bad practice, but we recognize that the interactions with the pipeline are complex, and that a few days or a week of regular use often does not represent the full gamut of use cases. It seems important for us to remember that giving it to a few users for a week is not enough to prove ‘stability’.

  3. Too many steps or walls

    Often tools and features are developed in close collaboration with an artist, or a critical fix needs to be made which blocks one or more people. It seems like no matter how perfect the CD system is, there are always problems when it slows down these iterations between the end user and developer.

What makes up the dream deployment pipeline?

Aside from eliminating the above…this is a laundry list not only related to CI/CD but also just developer tooling in general. The conversation on this question easily opened up into questions about the entire software life-cycle.

  1. Fast and accessible rollback button/procedure.

    Ideally this is not something that requires a lot of sign off or high authority to initiate. It is not a long list of manual steps and processes that can introduce more problems.

  2. Awareness of changeset and features

    This is something like release notes, but much more. It feels like information about changes are most helpful when they are presented immediately after the change and in a language that is clear to the user. This allows them to be more aware of new functionality and prepared for possible issues.

  3. Quick to fix, hard to break

    There is a bit of a scale here: Changes that solve issues in the code and remove blockers should have a clear and short path to affected users; Code that introduces significant change or has a larger potential to introduce new problems must be exposed to more scrutiny and inspection.

  4. Full code life-cycle, deprecation and removal with metrics

    It should be easy to identify how much - if at all - any piece of code is used. Ideally, there is also a process in place to regularly discuss, evaluate, and take action on code and tools that are rarely or never used. Similarly, code that is written for one production or use-case should be track-able as such, and deprecated or discussed appropriately.

  5. Automatic error collection and reporting

    We recognize that artists and other users are not the best place to gather issue reports. There are many reasons for this - notably tight schedules and numbness - and we should not rely on them. Problems that arise in our code should be automatically collected and reported to use. Furthermore, when users do report errors, we should be able to collect and attach relevant technical information automatically.

  6. Easily run code in a production context, through artist workflows

    We do not use our own tools everyday, but that is often the best way to know how well they are working or what about them does not work well. We would love to be able to see our code being used, or be able to understand and run through workflows ourselves.

  7. Pre-emptive testing against upstream releases of DCC applications

    So that we can be aware of or plan for changes/issues that affect our code/tools. If not to address them right away, we can at least be aware of the work that may be needed before the new version can be rolled out.

Conclusion

This is a large topic of discussion, which seems to be viewed and understood by everyone a little bit differently. In the end, though, it appears that we all agree on ways to improve our deployment processes, but that it often requires significant investment or effort to affect change or even properly evaluate current procedures. It will certainly be interesting to see if and how these things change into the future. Thanks again to everyone who participated, and if you have any thoughts or comments on the above we’d love to hear them at our next forum!

Suggest edits to this page on GitHub