Monday, 18 March 2019

Python Scripting [4B]: Exif Based Image Renaming 2

Today I am going to have a look at Pillow and its interface for reading EXIF data from images. As I mentioned previously, Pillow is a fork of PIL (the Python Imaging Library) and contains capabilities to read and write EXIF data from images.

Although there are official documents for Pillow at Readthedocs, it is not very good at describing the objects and methods. I am attempting to pull together a description of the way things work in this post. I used sample code from https://developer.here.com/blog/getting-started-with-geocoding-exif-image-metadata-in-python3 in this post, which is possibly a reasonably comprehensive listing of how to access the data.

from PIL import Image

def get_exif(filename):
    image = Image.open(filename)
    image.verify()
    return image._getexif()

exif = get_exif('image.jpg')

Essentially you are invoking the Image object to open and read an image, and then calling its _getxif() method to retrieve exif data. What is returned from that call is a dictionary with all the EXIF tags in it. A dictionary is a specific data type in Python that contains a list of key:value pairs. Thus, the exif data is stored as a list of pairs where the key in each case is the numerical index of the tag. So you could look up a specific tag by passing its numerical index. The page listed above goes on to document other ways you can look up the data. Because I can get the list of tag numbers off another page and in fact already know the ones I passed in to IrfanView, I am just going to use the following in my code:

exif[tag_number]

which will give me the values I need.

Listing what I used from the previous post, the decimal tag codes used were as follows:
  • 36867 - DateTimeDigitised. This appears to be returned as a string in the format YYYY:MM:DD HH:MM:SS
  • 37520 - SubSecTime. This appears to be returned as a numerical value.
  • 272 - Model. Appears to be a string.
There is a bit of manipulation of these values needed for my purposes. The date time data needing to be turned into two strings consisting of the date followed by the time with no extra characters and a space between the two. Followed by the subsectime as 3 characters, then a space, then the model name, and then the original extension.

In other words the file name should be changed to the format: YYYYMMDD HHMMSSmmm <camera name> where mmm is the subsec time in milliseconds.

The other considerations are as follows:
  • The script needs to consider whether it just scans for JPEG files or all files in a directory.
  • It will be run in directories where files have already been renamed so it needs to be able to skip the ones that have already been renamed, or avoid renaming where the code produces a filename that the file already has.
  • If it looks for all files, it needs to be able to handle files that don't have EXIF data in them. I experimentally changed the script to scan a movie file that was in the same directory as the one that had my sample image in it. The result of attempting to read that file was an IOError raised by python. Another possible outcome is an empty directory object. Therefore I need to determine how the script will handle these instances.
  • If a file doesn't have exif data we need to decide whether to rename it to something. For example a movie file could possibly be renamed based on the file date/time data rather than exif date/time.
  • One of the things I would like this script to be able to do is to handle filename collisions. The generic way to handle a filename collision is to generate something to add on to the end of a new file name such as a sequence number. This is why I use the SubSecTime value to deal with shooting multiple images with the camera's drive mode setting on Continuous, where it can take a number of images within milliseconds of each other. Mainly a file collision detection will be needed where the SubSecTime is not set, as is the case on older cameras that I have owned.
I expect all of these considerations will be easy to implement but will require quite a bit of code of course.

So that is the end of this part, so next time I will do some serious coding to bring this together.