Thursday, 1 August 2019

Python Scripting [4D]: Exif Based Image Renaming 4

After testing the Exif renaming script for the past couple of weeks I have added a piece of code to check for collisions and rename any file that collides with an existing filename. With Exif data, the most likely cause of a filename collision is when a camera is set to shoot continuously and it takes several pictures a second. If the camera doesn't set the subsec time field then the exif string for the time the photo was taken will be identical for several photos and if used as the basis of a filename the result can be identical filenames. Another possibility is you own two cameras of the same type and you let someone else use the second one at the same time as you are using the first one (for example there are two of you covering a big event) so you can possibly get duplicates that way.

When I used to use IrfanView it had some sort of collision handling function built into it based on the one that Windows Explorer uses, which as some will be aware handles copying files to a new destination when it puts an extra number in brackets on the end of a filename. Apart from the fact that on Linux we obviously don't have the use of Windows Explorer, this wasn't able to be accessed for exif string renaming, so I wasn't able to use it and had to manually rename colliding filenames. There are a few older directories in my archive of photos on the computer that do have duplicates which in that case came about by the files being copied to somewhere else and then being copied back to the original folder, in this case the duplicate has an extra string of numerical digits on the end of the filename, which was what happened after a collision had occurred with the first time of copying some files, a second copy operation was run using the IrfanView capability to add a sequence number to the end of the filename.

So for this script I planned on adding a filename collision handling functionality to the code to deal with these possibilities. This has consisted of writing a function and calling the function from within the main script because it is needed in two places. Functions are fairly easy to do in Python and like any other programming language they are for more than just where you need to reuse a block of code. There is in fact a strong case to write all of your code in functions and then the main execution block just consists of a series of function calls and is very neat and tidy. I haven't done this yet with any of my scripts but I will start doing it with the next project.

The function is fairly straightforward and looks like this:

def fixCollision(fn): # handle filename collisions. fn is full filename with path
    p = os.path.splitext(fn)
    f = p[0]
    e = p[1]
    x = 1
    ff = fn
    while os.path.exists(ff):
        ff = f + "-" + str(x) + e
        x += 1
    return ff  

The first line is the file definition and an explanatory comment. The subsequent lines of code are focused on splitting the filename into its base and extension, and setting the initial value of the collision counter x. We then set the initial value of ff, the complete filename string that is tested in the while loop, to the filename that was passed into the function, and then enter the loop where the while clause checks to see if the filename exists. If it does then it makes a new filename by inserting the value of x with a dash separator between the base and extension, increments x for the next time around, and then goes back to the top of the loop. As soon as a filename is found which doesn't collide (which may include the filename that was passed into the function) then it exits the loop and it returns the filename to the calling code.

This is called within the script in two places (the code blocks that handle images and non images) as follows:

                    destFile = os.path.normpath(destPath + "/" + destName)
                    if not os.path.exists(destPath):
                        print "Create destination path " + destPath
                        os.mkdir(destPath)
                    destFile = fixCollision(destFile)

This code block is just before the move operation that moves the source file to its destination and renames it at the same time. In other words where we have a destination to move the file to and we move it to the new location and give it its new name as well. The first four lines in that block are creating the destination file path from the new direction and the new exif-based file name string (or a different process for non-exif files), checking for and if necessary creating the destination directory, and calling the collision resolution function. Essentially this results in ensuring the destination filename is unique within the destination folder.

The only other issue which has come about from the script is with the file permissions for the originals off the camera. As far as the computer and me are concerned, I only have read permissions to the photos and that is unchanged by the script. I am currently considering whether the script should change the permissions on each file as it is renamed. However there is an advantage in preventing the files from being changed in the Photos folder as they should be copied before any alterations are made and it may be that I will just leave things as they are.