Git diff images and pdfs
Git is a fantastic tool to version control code. And the advantages of version controlling my work are so evident that I want to version control everything else I do besides code. Most of my work consists of editing text files, and I have even forced my workflow into text files just to be able to version control more of it. Thus, I keep track of my source code, org-mode notes and then some \(\LaTeX\) files.
However, while working on a report, I realized that I generate figures and other data plots that also require to be version controlled. I do send those images, binary blobs, into git. But I don’t really have a good way to track their changes. Github provides a great tool for checking the diff of image files. But I want to do that locally on my machine. So I decided to solve this and found a solution from these websites 1, 2, 3. From now on I can diff image files thanks to imagemagick and git difftool .
The configuration
First start creating a .gitattributes
file. It can be specific to a project
or global to the user, if you save it in your home directory. This file
tells git how to treat files during version control. In this case I’ll
define svg
and png
image files as binaries, so that they never show a text
diff representation in git. pdf
files on the other hand will be treated
with a special filter.
1*.svg binary
2*.png binary
3*.pdf diff=pdf
Next in ~/.gitconfig
, my global configuration, I setup how to treat pdf
files. Their text representation is the information pdfinfo
can give me
about them. I only need to add this 2 lines in the .gitconfig
file.
1[diff "pdf"]
2 textconv = pdfinfo
More exciting now is to add the difftool
configuration. I call it
image_diff
and then declare the command cmd
that will perform the diff
action.
1[difftool "image_diff"]
2 cmd = compare $REMOTE $LOCAL png:- | montage -geometry 400x -font Liberation-Sans -label "reference" $LOCAL -label "diff" - -label "current--%f" $REMOTE x:
$LOCAL
and $REMOTE
are variables intrinsic to git and correspond to the
old/staged file and the current/unstaged file. compare
takes the
2 files that can be treated by imagemagick and creates a comparison png
stream (defined by png:-
). The output is piped to montage
to create a
more informative 3 column image with the reference file to the left, the
diff in the middle and to the right is the current file. -geometry 400x
sets the size of the image, feel free to scale it. -font Liberation-Sans
is
the font of the labels, I set it up because montage seems to default to
Helvetica
which I don’t care to install in my system.
The workflow
When I’m working on my code or any text file I can review/stage and commit all my changes from the shell or with any other tool I use to communicate with git.
Let’s take for example this simple Python code.
1import matplotlib.pyplot as plt
2import numpy as np
3x = np.linspace(0, 4, 122)
4plt.plot(x, np.sin(x))
5plt.xlabel('$x$')
6plt.ylabel(r'$\sin(x)$')
7plt.savefig('sin.png')
8plt.close()

Figure 1: Image generate by the previous codeblock
Now I can continue editing my source code. I now plot a second function, include the necessary labels and finally save the plot under the same name.
1import matplotlib.pyplot as plt
2import numpy as np
3x = np.linspace(0, 4, 122)
4plt.plot(x, np.sin(x), 'C0', label=r'$f(x)=\sin(x)$')
5plt.plot(x, np.sin(3*x), 'C1', label=r'$f(x)=\sin(3x)$')
6plt.xlabel('$x$')
7plt.ylabel(r'$f(x)$')
8plt.legend(loc=0)
9plt.savefig('sin.png')
10plt.close()

Figure 2: New plot generated by the updated code
I review and stage all my changes for text files in the usual way. But
for image files I can now review the changes using the git difftool
that I
just defined.
1git difftool -t image_diff
This will ask me if I want to launch image_diff
to evaluate the diffs of
every file not staged. When it comes to the image file I
accept, it immediately brings into display the diff image.

Figure 3: Image diff
After reviewing the changes and being conscious why they happened I stage the modified image and do a new commit.
Dr. Óscar Nájera
Software archeologist – Recovering Physicist – Dancer
As scientist I studied the very small quantum world. As a hacker I distill code. Software is eating the world, and less code means less errors, less problems, more world to enjoy. Now I build on Cardano for a world where I'm back into control.