Below are the notes collected from the Siggraph 2023 Birds of a Feather event on pipelines. The BoF space was organized into topic groups where attendees could move between topics and discuss as they wished. As the organizer, I asked folks in each group to try and keep notes as much as possible. Below is the result of that process.
These notes ask as many questions as they answer, follow different styles and are often terse. However, they also provide interesting insight into the topics provided and what was being discussed by each group.
Notes by Topic
Observable Pipelines
Event bus pipelines
- E.g., render complete event
- Downstream tasks subscribed to events.
- What happens when the system goes down?
Tool use metrics?
- Problems at scale?
- How granularly to track?
Artist comprehension of
“Unit tests” for assets
Node-based event graph for tasks “Temerity” (Jim Callahan)
How to show/hide parts of the whole that the artists should(n’t) see?
Event bus
- Can “rerun” events to redo work
- E.g. LUTS changed -> Regenerate all reviewables
Vs Pre-Calc tasks
- No spook action at a distance
- How to re-run a part of a graph?
File Logs vs Logging service?
Issue reporting tool - report error w/context
- Local deque of user action events for “what were you doing?”
OpenTelemetry?
Poll: What do you like to use for tracing / metrics / log viewing?
Datadog: 1
Grafana: 4
Jaeger: 1
ElasticSearch: 1
Can we track a session of events and tag it when exceptions happen?
- Tags are good
- Track all the events?
- How do you avoid serial aspect of tracking everything
- Production management side of observable
- How do you indicate that a step is complete from an artist and pipeline perspective?
- Notification of upstream / downstream changes
- Web-based tools and latency compared to Qt
Multi-Site Data Pipelines
Notes by Tom Cowland and Jason Scott
Topics
- External data management
- Multi-vendor with multiple pipelines
- Validation
- Ingestion
- Manage the trust boundary
- Studios with multiple brands and multiple locations
- Can “universal” pipelines work?
- More likely to succeed with multi-vendors if there are contracts at the boundary.
- “Skynet” (singular pipeline from a studio to all other vendors) precludes local innovation and iteration.
- Use USD with URIs (vs. paths) across multiple pipelines
- Desired direction
- DDoS can be managed
- Sidecar/pre-cache farm files can facilitate sharing
Challenges
- Data syncing is hard
- Knowing full closure of everything that needs to be synced
- Understanding transfer latency on the production process
- Database syncing / replication vs. a single database
- Many use single 3rd-party databases (ShotGrid, ftrack), and the latency for these doesn’t seem to be one of the larger issues that studios are dealing with
- Multi-site databases require conflict management (versions, etc.)
- Visibility of location and disk space for transfer costs
- Filesystem replication (many people interested in the different solutions out there, but no one in the BoF group is using in production yet)
- Only using on-demand NFS reads can stall production too much
- Still need to pre-warm some transfers
- A multi-vendor shared file system is just as hard as a “universal” pipeline
- Control is important when considering commercial offerings for sync (vs. cost overheads)
- Latency of file streaming
Questions to consider
- Thick vs. thin clients
- Single global-centralized data center (even cloud) with thin clients doesn’t work in all locations due to latency
- Many studios use on-site thin clients for consistency of workflow (with those not on-site) and flexibility
- For some studios, do individuals act as “sites” in the same way that full data centers do?
- Cloud workstations vs. data centers
- Types of transfer methods
- rsync (with a tarball wrapper)
- ZFS
- rclone
- Transfer priority systems
- Most use the existing farm queue manager as a transfer priority manager
- Visibility for production
- Don’t forget non-technical solutions (workflow design or social engineering that might help manage expectations or usages of data)
- Give outsource companies a metadata path specification (a JSON-file mapping path fields with intended usage)
- Many companies supply a document explaining path specification, but a JSON representation could be helpful
- To avoid version number conflict, use non-numerical versions to allow local ID generation
Recommendations
Transfer systems:
Databases:
Infrastructure Management
Notes by Joe D’Amato
Remote Problems
All extra tech support
- Dealing with onsite tech setup
- Virtual vs 1:1
- Negative: How do we flip to GPU farm, Strict machine profile, Licensing
- Positive: Fine-tuned configs,
Render Farm
- 1 job / machine vs slot packing
- USD How to move data & I/O patterns
- Prioritization schemas are broken
- Production: Short shows have no downtime, Limits of Qube
- Always use test jobs
- Mandating jobs not pick up until test
- T-Shirt size for job types cores / mem
- Silo resources for security
Liquid cooling
- Closed loop
- External cooling
- Using HVAC
- Immersive racks
Quantum compute
- 5–7 years
- Near instantaneous
GPU
- 50% faster each year
- How do you buy more now, knowing that?
What Is / Is Not Pipeline?
Pipeline = transport, conformed metadata
Workflow(s) = artist-specific, able to shape the data to enter and move down the pipe
What is pipeline:
- Management of data between defined steps
Efficiency vs sustainability of workflows
- Social level (workers)
- Environmental (power, resources)
- Equipment (hardware & software, licenses)
- Are they exclusive?
Pipeline as reactive to artists vs workflows vs keeping an overall structure (fixed)
What is Pipeline?
- The defined process of pushing content through the steps of production
- Workflows get the content organized to get processed by the pipeline
- I think…
Pipelines are
- Flexible
- Supports the goals of the production
- Invisible / transparent as possible
What is NOT a pipeline?
- How do you perform the art?
- Production tracking?
Pipeline can be the resulting workflow when you successfully get from start to finish
The order of “jobs” that need to be done to get from point A to point B
Pipeline is the glue that connects workflows and gives output files
Define the rules - how data has to be saved and used
What is Pipeline?
Python at Scale
Python typing is essential for any project at scale
Mypy has a way to compile fully typed Python code to C for high-performance
Deploy with Kubernetes
asyncio looks promising but increases complexity
- GIL (global interpreter lock in CPython) (PEP 703)
- worse than py2 -> py3 conversion
- shims don’t fix a table (eg. thread is short)
Spack (used by supercomputing communities) and rez for system library package management
To Package (Software) or Not to Package
devpy & artifactory
Hda’s, nuke gizmos
How much on base image, how much packaged?
UI standards
Packaging for remote artists
Packaging for remote studios
Rez
How to package software
What happens
How does config management fall into package management
Puppet
pyc files
Overlay FS
CI/CD
Packaging nuke gizmos, hda’s
Rebuilding regularly to ensure that the recipe works
Do you need to pull to get new builds or can the system rebuild from another recipe even if the version is not the same as the file
Target server and hope
pdiff to ensure consistency
Asset vs package
- If an artist may want to tinker, an asset would make more sense
Lipo universal binaries
pdq deploy
chocolatey
homebrew
conda recipes
conan -> cmake
Pipeline TD: Career or Stepping Stone
Better IC (individual contributor) growth paths
Collaboration with core engineering groups
Making sure the TDs keep a close partnership with RnD to make sure there is respect
Making sure there is a clear career path for IC’s to continue to grow without needing to go into management
Final Thoughts
I hope that these notes at least server as a small record of the BoF session. Many great discussion were had and new connections were made, even if we didn’t do a great job at capturing it all.
Thanks for reading!