Handling video files produced for a MOOC on Windows with git and git-annex

This post is intended to document some elements of workflow that I’ve setup to manage videos produced for a MOOC, where different colleagues work collaboratively on a set of video sequences, in a remote way.

We are a team of several schools working on the same course, and we have an incremental process, so we need some collaboration over a quite long period of many remote authors, over a set of video sequences.

We’re probably going to review some of the videos and make changes, so we need to monitor changes, and submit versions to colleagues on remote sites so they can criticize and get later edits. We may have more that one site doing video production. Thus we need to share videos along the flow of production, editing and revision of the course contents, in a way that is manageable by power users (we’re all computer scientists, used to SVN or Git).

I’ve decided to start an experiment with Git and Git-Annex to try and manage the videos like we use to do for slides sources in LaTeX. Obviously the main issue is that videos are big files, demanding in storage space and bandwidth for transfers.

We want to keep a track of everything which is done during the production of the videos, so that we can later re-do some of the video editing, for instance if we change the graphic design elements (logos, subtitles, frame dimensions, additional effects, etc.), for instance in the case where we would improve the classes over the seasons. On the other hand, not all colleagues want to have to download a full copy of all rushes on their laptop if they just want to review on particular sequence of the course. They will only need to download the final edit MP4. Even if they’re interested in being able to fetch all the rushes, should they want to try and improve the videos.

Git-Annex brings us the ability to decouple the presence of files in directories, managed by regular Git commands, from the presence of the file contents (the big stuff), which is managed by Git-Annex.

Here’s a quick description of our setup :

  • we do screen capture and video editing with Camtasia on a Windows 7 system. Camtasia (although proprietary) is quite manageable without being a video editing expert, and suits quite well our needs in terms of screen capture, green background shooting and later face insertion over slides capture, additional “motion design”-like enhancement, etc.
  • the rushes captured (audio, video) are kept on that machine
  • the MP4 rendering of the edits are performed on that same machine
  • all these files are stored locally on that computer, but we perform regular backups, on demand, on a remote system, with rsync+SSH. We have installed git for Windows so we use bash and rsync and ssh from git’s install. SSH happens using a public key without a passphrase, to connect easily to the Linux remote, but that isn’t mandatory.
  • the mirrored files appear on a Linux filesystem on another host (running Debian), where the target is actually managed with git and git-annex.
  • there we handle all the files added, removed or modified with git-annex.
  • we have 2 more git-annex remote repos, accessed through SSH (again using a passphrase-less public key), run by GitoLite, to which git-annex rsyncs copies of all the file contents. These repos are on different machines keeping backups in case of crashes. git-annex is setup to mandate keeping at least 2 copies of files (numcopies).
  • colleagues in turn clone from either of these repos and git-annex get to download the video contents, only for files which they are interested in (for instance final edits, but not rushes), which they can then play locally on their preferred OS and video player.

Why didn’t we use git-annex on windows directly, on the Windows host which is the source of the files ?

We tried, but that didn’t make it. Git-Annex assistant somehow crashed on us, thus causing the Git history to be strange, so that became unmanageable, and more important, we need robust backups, so we can’t allow to handle something we don’t fully trust: shooting again a video is really costly (setting up again a shooting set, with lighting, cameras, and a professor who has to repeat again the speech!).

The rsync (with –delete on destination) from windows to Linux is robust. Git-Annex on Linux seems robust so far. That’s enough for now 🙂

The drawback is that we need manual intervention for starting the rsync, and also that we must make sure that the rsync target is ready to get a backup.

The target of the rsync on Linux is a git-annex clone using the default “indirect” mode, which handles the files as symlinks to the actual copies managed by git-annex inside the .git/ directory. But that ain’t suitable to be compared to the origin of the rsync mirror which are plain files on the Windows computer.

We must then do a “git-annex edit” on the whole target of the rsync mirror before the rsync, so that the files are there as regular video files. This is costly, in terms of storage, and also copying time (our repo contains around 50 Gb, and the Linux host is a rather tiny laptop).

After the rsync, all the files need to be compared to the SHA256 known to git-annex so that only modified files are taken into account in the commit. We perform a “git-annex add” on the whole files (for new files having appeared at rsync time), and then a “git-annex sync”. That takes a lot of time, since all SHA256 computations are quite long for such a set of big files (the video rushes and edited videos are in HD).

So the process needs to be the following, on the target Linux host:

  1. git annex add .
  2. git annex sync
  3. git annex copy . –to server1
  4. git annex copy . –to server2
  5. git annex edit .
  6. only then : rsync

Iterate ad lib 😉

I would have preferred to have a working git-annex on windows, but that is a much more manageable process for me for now, and until we have more videos to handle in our repo that our Linux laptop can hold, we’re quite safe.

Next steps will probably involve gardening the contents of the repo on the Linux host so we’re only keeping copies of current files, and older copies are only kept on the 2 servers, in case of later need.

I hope this can be useful to others, and I’d welcome suggestions on how to improve our process.

One thought on “Handling video files produced for a MOOC on Windows with git and git-annex”

Leave a Reply

Your email address will not be published.