Sunday 24 March 2019

NZ Rail Maps: Two different ways to cover a large area in Gimp [1]

In my last post I compared a couple of large Gimp projects I worked on. Both of these cover a significant distance and they use different ways of doing it. It is illustrative to look at those two different ways and consider if one is better than the other.

The first way of doing in this case Dunedin-Mosgiel with more than 130 layers is for each railway station to overlay the tiles over all the others, so there is in this case a canvas of 7x7 tiles (4800Wx7200H) and the line is divided up into the various sections based around stations and each section has this 7x7 canvas, so they all are layered over each other. This is reasonably efficient in space usage because if you lay out the tiles all along linearly you need a very large canvas with many wasted tiles as the rail corridor is a narrow strip but it twists and turns around. Currently it uses 22.4 GiB on disk and around 36 GiB virtual memory when loaded.

The second way of doing it with one large canvas is best illustrated by a Gimp project I drew of Lake Dunstan from Cromwell up to Queensberry at the head of the lake, this is logical for a continuous canvas because the lake is more or less in a straight line so there is not much excess width needed. The distance covered is 17 km which is practically the same as the Dunedin-Mosgiel project. This project has 75 layers and the canvas is 8x9. The project currently occupies 11.5 GiB disk space and when loaded uses 26.7 GiB virtual memory. All of the canvas is covered.

There are a number of factors which affect the project resource usage so I won't go into much detail but I expect the multi layered type of project to be more efficient the longer the corridor section gets but somewhat to my surprise there is no clear advantage between one or the other at this point. It would be interesting to do a long corridor section and I am planning to  redo some of the Lyttelton-Rolleston route as a continuous section project just to see how that would work.

The multi layer approach has as its key disadvantage the issue that overlaps between sections are harder to handle simply because the aerial photos do not follow the same boundaries as the map tiles and so the overlap has to be staggered between two sections by duplicating the border layers. I had the same issue when I used to do a single station per file, as I started off the Christchurch suburban mosaics. It is actually quite straightforward to duplicate border sections in Gimp and therefore make the sections overla properly.

Saturday 23 March 2019

Webcam with Linux [1]

We need to do some livestreaming on Facebook Live for a project I am working on. Using a phone is quick and convenient, but pretty limited unless it is a selfie. So the next option is a webcam connected to a laptop and remotely located from it by a 5 metre cable which lets me put the camera just where I want. It also gives me the option of bringing in sound from another source and mixing it into the livestream.

I will be streaming to a secret Facebook group in order to test the ability of the webcam and computer combination. The computer is running Debian Buster and LXQt and will use the Google Chrome browser to run the livestream session.

So watch this space as I will report back. Currently the webcam is a Microsoft Lifecam which I happen to have lying around, but I am planning to get a Logitech C270 and do some testing with it in a couple of weeks. Right now I am hoping to test the Lifecam at the site tomorrow if I can get my act together and finish installing the laptop as well as making a pole and clamp to be able to raise the Lifecam up high for a good view. The third thing is to see if I can come up with some sort of microphone to pick up the sound as this webcam doesn't have a mic of its own.

I hope to start writing part 2 later tonight when I have finished testing with my computer, and blog after I have tested it at site in a couple of days from now.

NZ Rail Maps: Gimp optimisation with SSD works well

A few posts back in label NZ Rail Maps, I commented on how to optimise Gimp to use the swap partition on the computer effectively. I have since been able to prove using a computer with only 8 GB of RAM that it is able to use the SSD to load much bigger images without much of a performance hit and this validates my ideas about having SSD based swap to make Gimp perform well with very large projects that contain a lot of aerial photography data because this is how the mosaics get made in NZ Rail Maps.

The reason a large SSD based swap partition is becoming so important is that it speeds up the process of creating map mosaics from historical aerial photography. It also works well with Qgis when I am loading these large numbers of aerial photo layers into the GIS to draw the maps. However it is mainly Gimp that benefits, especially as I am taking advantage of the optimisation of resources to be able to create projects of increasing size.

Generally these large projects will cover a relatively large area. For example my largest ever project, currently around 24 GB, has 128 layers at present and this covers around 17 km of physical track distance (Dunedin to Mosgiel) with aerial layers present for the entire distance as well as other areas like the Fernhill and Walton Park branches and the first section (to Mosgiel Township) of the Outram Branch. There are still more historical aerial images to add to this map and so far the computer is coping quite well with working on it with about half the SSD in use for swap. So I expect this project will be able to grow quite a bit more yet.

Compare that with the previous largest project on this computer (although it's possible it may have had less RAM then) around 12 GB, but without optimising Gimp. The issue is simply if you leave that tile cache setting at the default then the Linux swap partition will not get used at all. Instead, Gimp will use its own swap space, which on my system is in the home volume. This means a lot of noisy and slow hard disk swapping. 

This essentially helps me to build a case for buying a larger SSD specifically for swap use. At the moment there is one SSD in the system which is mainly used for the root partition and some other OS stuff and most of it is a swap partition about 100 GB. A 240 GB SSD would create a large amount of swap space. The other option is to buy a 480 GB SSD for the NUC and put its disk into the computer to be used as a 240 GB swap volume. Anyway the cost of a SSD is around $80-120, so it is on the list of priorities to hopefully achieve soon.

Buster challenges

There have been a few issues I have found using Debian Buster so far. Among them
  • Command shells have a much more limited set of commands that will work all the time. For example blkid is the latest command I have found that will not work in a Bash shell window under KDE (Konqueror). dpkg and others also have problems. For this reason I am spending more time switching to a terminal session (Ctrl-Alt-F1 through to F6 will open terminal sessions, Ctrl-Alt-F7 will switch back to KDE) in order to carry out tasks that will not work in bash, even under root.
  • btrfs seems to be having problems. I was using it on my backup disks so that I could compress the data. It looked like it was going to work until I tried running a backup, at which point the read-write backup disk that was mounted suddenly changed into a read-only filesystem. At that point I wiped the disk and set it up as an ext4 partition, and now rdiff-backup is working fine.
  • Installing openssh-server on a backup target was problematic but after rebooting the machine it started working properly so backups using mediapc to run the backup process off a remote target seem to be OK now. I am doing my first backups in several months.
But most things seem to work OK, and there are always limits on what you can do in a shell window anyway. For example this is the reason that I went to a lot of trouble to set up the root login in KDE on computers. I haven't done that on any of the computers that have been upgraded to Buster and the latest version of KDE because I can change to the terminal session to be able to do anything that can't be done in a shell window.

Monday 18 March 2019

Python Scripting [4B]: Exif Based Image Renaming 2

Today I am going to have a look at Pillow and its interface for reading EXIF data from images. As I mentioned previously, Pillow is a fork of PIL (the Python Imaging Library) and contains capabilities to read and write EXIF data from images.

Although there are official documents for Pillow at Readthedocs, it is not very good at describing the objects and methods. I am attempting to pull together a description of the way things work in this post. I used sample code from https://developer.here.com/blog/getting-started-with-geocoding-exif-image-metadata-in-python3 in this post, which is possibly a reasonably comprehensive listing of how to access the data.

from PIL import Image

def get_exif(filename):
    image = Image.open(filename)
    image.verify()
    return image._getexif()

exif = get_exif('image.jpg')

Essentially you are invoking the Image object to open and read an image, and then calling its _getxif() method to retrieve exif data. What is returned from that call is a dictionary with all the EXIF tags in it. A dictionary is a specific data type in Python that contains a list of key:value pairs. Thus, the exif data is stored as a list of pairs where the key in each case is the numerical index of the tag. So you could look up a specific tag by passing its numerical index. The page listed above goes on to document other ways you can look up the data. Because I can get the list of tag numbers off another page and in fact already know the ones I passed in to IrfanView, I am just going to use the following in my code:

exif[tag_number]

which will give me the values I need.

Listing what I used from the previous post, the decimal tag codes used were as follows:
  • 36867 - DateTimeDigitised. This appears to be returned as a string in the format YYYY:MM:DD HH:MM:SS
  • 37520 - SubSecTime. This appears to be returned as a numerical value.
  • 272 - Model. Appears to be a string.
There is a bit of manipulation of these values needed for my purposes. The date time data needing to be turned into two strings consisting of the date followed by the time with no extra characters and a space between the two. Followed by the subsectime as 3 characters, then a space, then the model name, and then the original extension.

In other words the file name should be changed to the format: YYYYMMDD HHMMSSmmm <camera name> where mmm is the subsec time in milliseconds.

The other considerations are as follows:
  • The script needs to consider whether it just scans for JPEG files or all files in a directory.
  • It will be run in directories where files have already been renamed so it needs to be able to skip the ones that have already been renamed, or avoid renaming where the code produces a filename that the file already has.
  • If it looks for all files, it needs to be able to handle files that don't have EXIF data in them. I experimentally changed the script to scan a movie file that was in the same directory as the one that had my sample image in it. The result of attempting to read that file was an IOError raised by python. Another possible outcome is an empty directory object. Therefore I need to determine how the script will handle these instances.
  • If a file doesn't have exif data we need to decide whether to rename it to something. For example a movie file could possibly be renamed based on the file date/time data rather than exif date/time.
  • One of the things I would like this script to be able to do is to handle filename collisions. The generic way to handle a filename collision is to generate something to add on to the end of a new file name such as a sequence number. This is why I use the SubSecTime value to deal with shooting multiple images with the camera's drive mode setting on Continuous, where it can take a number of images within milliseconds of each other. Mainly a file collision detection will be needed where the SubSecTime is not set, as is the case on older cameras that I have owned.
I expect all of these considerations will be easy to implement but will require quite a bit of code of course.

So that is the end of this part, so next time I will do some serious coding to bring this together.


 
 


Sunday 17 March 2019

Python Scripting [4A]: Exif Based Image Renaming

This is now a new scripting project I am starting for Python. I need this script to rename all my photos off the camera, replacing the use of IrfanView which I used on my Windows 10 computer. Whilst I still have that computer and the software, I am looking to do something with Python scripting under Linux to achieve the same outcome automatically.

First thing is to look at the Exif string we use under IrfanView. 
This is the current string:
  • $E36867(%Y%m%d %H%M%S)$E37520 $E272$O
 That is obviously specific to IrfanView in that it incorporates some parameters that are specified in their software design. All parameters start with a $. The breakdown is
  • $E36867 - DateTimeDigitised
  • (%Y%m%d %H%M%S) when following a parameter means to extract the year, month, day, hour, minute and second out of the parameter
  • $E37520 - SubSecTime
  • $E272 - Model
  • $O - original extension of the filename including the period.
When put together that will rename a file to have a name that is based on the date, time and the name of the camera. SubSecTime is something that not all cameras support. It is basically the  millisecond component of the time and my current camera will provide this, but not all of my cameras have been able to provide it.

Then we have to translate these into something that is related to the standard. After doing some investigation, I found a list of tags on the Internet, and it listed tag codes that correspond to the above without the $E characters in front. These codes are in decimal, so every single one of them can be looked up in the list, and it turns out they are all part of the standard, and not manufacturer specific.

The second thing is to look at Exif read support for Python. This is generally implemented using a third party library. To install these libraries, a tool called pip is available. Once I installed that, I was able to install a library called exifread. To edit my script I am using KDevelop, which is the KDE supplied tool for development support, and it recognises Python out of the box.

Looking at exifread, I can start with a sample script that they provide:

import exifread
f = open("/home/xxx/Media/Pictures/Photos/2019/142_0803/IMG_2864.JPG",'rb')
tags = exifread.process_file(f, details=False)
for tag in tags.keys():
        print "Key: %s, value %s" % (tag, tags[tag])

This is a pretty simple example, which is hardcoded with a path to a specific image file I am using as an example. It just gets the tags for that file and outputs them. The exifread.process_file call gets passed a "details=False" parameter, which just limits the tags that come out, such as a thumbnail binary blob, and MakerNotes, which are manufacturer specific. The output looks like this:

Key: EXIF ApertureValue, value 29/8
Key: Image ExifOffset, value 388
Key: EXIF ComponentsConfiguration, value YCbCr
Key: EXIF CustomRendered, value Normal
Key: EXIF FlashPixVersion, value 0100
Key: EXIF RecommendedExposureIndex, value 250
Key: Image DateTime, value 2019:03:08 11:23:29
Key: EXIF ShutterSpeedValue, value 53/16
Key: EXIF ColorSpace, value sRGB
Key: EXIF MeteringMode, value Pattern
Key: EXIF ExifVersion, value 0230
Key: EXIF LensSpecification, value [15, 45, 0, 0]
Key: EXIF ISOSpeedRatings, value 250
Key: Thumbnail YResolution, value 180
Key: EXIF SubSecTime, value 87
Key: Interoperability InteroperabilityVersion, value [48, 49, 48, 48]
Key: Image Model, value <deleted>
Key: Image Orientation, value Horizontal (normal)
Key: EXIF DateTimeOriginal, value 2019:03:08 11:23:29
Key: Image YCbCrPositioning, value Co-sited
Key: EXIF InteroperabilityOffset, value 15836
Key: Thumbnail JPEGInterchangeFormat, value 20468
Key: Interoperability RelatedImageWidth, value 6000
Key: EXIF FNumber, value 7/2
Key: EXIF FileSource, value Digital Camera
Key: EXIF ExifImageLength, value 4000
Key: Image ResolutionUnit, value Pixels/Inch
Key: GPS GPSVersionID, value [2, 3, 0, 0]
Key: EXIF CompressedBitsPerPixel, value 3
Key: Thumbnail XResolution, value 180
Key: EXIF LensSerialNumber, value 000006a8dd
Key: EXIF ExposureProgram, value Program Normal
Key: Image GPSInfo, value 16078
Key: EXIF BodySerialNumber, value 495050000006
Key: Image Copyright, value
Key: Thumbnail JPEGInterchangeFormatLength, value 5322
Key: EXIF Flash, value Flash did not fire, compulsory flash mode
Key: Thumbnail Compression, value JPEG (old-style)
Key: EXIF ExposureMode, value Auto Exposure
Key: EXIF FocalPlaneYResolution, value 2000000/293
Key: EXIF FocalPlaneXResolution, value 2000000/293
Key: EXIF ExifImageWidth, value 6000
Key: Image Artist, value
Key: EXIF SceneCaptureType, value Standard
Key: EXIF SensitivityType, value Recommended Exposure Index
Key: Interoperability RelatedImageLength, value 4000
Key: Image ImageDescription, value                               
Key: EXIF DigitalZoomRatio, value 1
Key: EXIF SubSecTimeOriginal, value 87
Key: EXIF LensModel, value EF-M15-45mm f/3.5-6.3 IS STM
Key: EXIF DateTimeDigitized, value 2019:03:08 11:23:29
Key: EXIF FocalLength, value 15
Key: EXIF ExposureTime, value 1/10
Key: Image XResolution, value 180
Key: Image Make, value Canon
Key: EXIF WhiteBalance, value Manual
Key: Thumbnail ResolutionUnit, value Pixels/Inch
Key: Image YResolution, value 180
Key: EXIF FocalPlaneResolutionUnit, value 2
Key: Interoperability InteroperabilityIndex, value R98
Key: EXIF ExposureBiasValue, value 0
Key: EXIF SensingMethod, value One-chip color area
Key: EXIF SubSecTimeDigitized, value 87

It looks like most of what I am interested in are in those tags. The exact name would have to be flagged in a tag search to get a value, and then the value transcribed from a string into the data that gets fed into a rename algorithm. In practice this will be a call to a filename move because that is the rename allegory in Linux.

There are other libraries that do exif stuff. Some examples I found are:
  • piexif
  • exif
  • Python Imaging Library (PIL)
  • pyexiv2
Out of these examples I  chose to evaluate PIL (using its Pillow fork) as well. Next time around I will make a decision which of the two libraries (Pillow or exifread) will be more useful for my project.






Saturday 16 March 2019

Gimp resource usage challenges

Whilst Gimp has been great for the NZ Rail Maps mosaics, one of its challenges has been with the layer group feature. This feature unfortunately does a lot more than just organise layers in the layer list. Apparently it adds extra stuff that you may not need.

From my experience I have found the layer groups add significant resource usage, adding in the case of a particular project that was 16 GB, adding about 3 GB of file size for no apparent reason, and when doing a layer crop and export, massively increasing the time to do the crop, export and uncrop. The file save operations from experience have also been very slow.

I now use layer groups only to perform very basic operations, mainly for the grids for layer segmenting, which is only because I can duplicate the grids layers with one click, since you can't select multiple layers for duplicating but you can select a layer group and duplicate that.

It is a great pity as some of these mosaic projects have dozens of layers and I just have to put up with scrolling through a massive layer list to get to the ones I want, but it seems whatever Gimp is trying to achieve was not designed with the intention of allowing a simple reorganisation of the layers in a project with a large number of them.

Tuesday 5 March 2019

Autologin with LXQt

Autologin isn't something I do normally, but on my computer that run LXQt, for what they are used for, it gets the computer up faster so it is useful. If you are using LXQt with SDDM which is the default display manager, it is pretty easy to use.

Simply edit /etc/sddm.conf.d/autologin.conf and enter the following text:
[Autologin]
User=xxx
Session=lxqt.desktop

The next time you reboot the system will automatically log in.

I did previously have the desktop now running LXQt set up to just need the password and not the user name to be specified (single user login) using a custom greeter with lightdm, as it had XFCE desktop environment, but the custom greeter was disabled when I upgraded to Buster, so using the LXQt autologin capability, which is simpler to set up than with lightdm, is the next best thing. LXQt is actually easier to set up some of those automation things than some DEs are. I may end up doing a startup display customisation on this computer to work around a small issue with the monitor setup and if so then that will be very easy to do with LXQt. I am already using the autostart on the laptop to run x11vnc and having this autologin as well will eliminate the need to log in manually to it before being able to use it.

Monday 4 March 2019

Life with Buster

Well I rushed to get Buster on most of my computers yesterday and this was mostly straightforward. The exception being of course mainpc which ran out of disk space while running the apt upgrade command due to the number of new packages downloaded and installed. There can be literally thousands of packages needed for an upgrade depending on what is installed in your system (of course), in this case about 1500. So there was not enough space on a 12 GB system partition for that. There was no issue with the /tmp partition being full; the space needed was on /. That would have been compounded with running apt dist-upgrade which installs about 800 more packages.

Since a half updated system would end up with potentially a lot of problems, I chose at that point to reinstall mainpc from scratch with the Buster alpha5 netinst installer on a pen drive. Balena Etcher has been a bit flaky on KDE, having issues with the privilege escalation dialog, but the latest version worked just fine on serverpc, and I soon had the installer running. I chose at that point to repartition the SSD and give mainpc more space for the root partition (back to 24 GB) while also maintaining the 8 GB /tmp partition, the 100 MB /mnt partition, and the rest (88 GB) for a smaller swap partition. Of course I hope sooner rather than later to have a second SSD in the system to make a really big swap partition.  Right now I am writing this on mainpc after having done the usual stuff to bring it up, with as per usual /home on the RAID array being unaffected by the reinstall and therefore instantly restorable to get everything back to where it was before, with the added advantage that Firefox Developer and Thunderbird don't need to be reinstalled. Naturally I do have to reinstall a lot of other stuff but that will just happen over the coming week or so. Qgis is getting reinstalled at version 3.6 which requires Buster.

Some other good or useful things about buster so far: Buster makes hibernation work on mediapc which is a huge advantage for me in being able to come back to where I previously was on these computers. The key issue for hibernation of course is that you aren't going to be able to hibernate easily with a very large swap file in use because it will take a long time and there may not be room on the swap partition for the hibernation file. So hibernating with Gimp open with a large file being edited isn't really a goer, if I was hibernating serverpc. I think hibernation will be more useful if I have one of the computers whether serverpc or mediapc with stuff open like Google Earth, script windows and images or diagrams I am using as source. With maps you can end up with a lot of source documents you are looking at and up to now mainpc has been displaying all of these because of being the only computer that supports hibernating but now I can use other pcs to display these.

Another thing is that LXQt running on pc4 has all these HDMI sound options available, and while I don't use HDMI for sound on that computer, it does bring forth the possibility of being able to use HDMI sound on the NUC and therefore eliminate the audio switch box and the TV will automatically switch the sound input - I am still thinking through this one, but it looks like it's worth pursuing.

KDE is updated to the latest version (Plasma 5.8) on Buster which is supposedly optimised for better resource efficiency. Keeping pc4 on LXQt is more to do with things like MTP working better without KDE (KDE has its own attempt to implement MTP which was buggy on stretch). It may be that this is resolved in KDE on Buster but I am not going to look into it just yet. Since pc4 has plenty of RAM, running KDE on it wouldn't really be an issue. LXQt is offered as an install option for Buster (alongside LXDE and the other usual options) so it can be installed when setting up a system from scratch. So far my experiences of LXQt on Buster are very good (as I noted yesterday it is improved from the version on Stretch) and I expect this computer will be easy to use. There has been one issue with not being able to reassign the second display to be the primary, which I resolved by changing the cables between displays.

Right now I have held back serverpc from upgrading because of an issue with Qgis that has shown up and I will need to keep using Qgis on stretch for a while on serverpc until the Qgis people fix the problem with their software. I still want serverpc to have buster on it as soon as it can happen because being able to hibernate serverpc would be a great blessing to the maps project (see above). As it is, I will have to use mediapc with its ability to hibernate to do as much of the stuff that can be useful to be saved across sessions and setting up the desk so that I can reach the mouse of mediapc more easily is underway over the next few days.

So that's it for now. As I noted yesterday, the release of Buster is still a few months away and officially it is "alpha5" at the moment. Some of the earlier alphas had problems, but with hard freeze just a few days away, and good experiences to date, this alpha is clearly stable enough. We can be fairly confident in bringing up Buster for use as soon as it is released because debian is famous for the stability of their releases.

Sunday 3 March 2019

Debian Buster Release due soon

Debian Buster (Debian 10) has been in development for the last few years, and it is expected to be released about the middle of this year.

As was described in some earlier posts, I have had Buster running on some of my systems in the past, but not in the last year or so. However I am now slowly upgrading my systems one by one to it. First up is pc4, and from there, mediapc or the bedroom pc will probably be next. This depends on the result for pc4. Since Qgis releases above 3.4 now require Buster, there is an impetus for me to want to move to Buster as soon as it is available, which is the reason I have started evaluating the impending release. The release cycle hard freeze is currently about a week away, so it is getting nearer to the release being ready.

mainpc and serverpc won't be upgraded until the release is out, and probably serverpc before mainpc is the order in which the upgrade will happen. Upgrade in all cases is by using the inbuilt upgrade system, which is fairly simple to use.
  • Switch to a command window (ctrl-alt-F1) and log in as root. 
  • Change /etc/apt/sources.list package sources from stretch to buster and save. 
  • Then type in apt update and wait for the packages list to update.
  • Type in apt upgrade to install the list of new packages (over 1000 for the first computer I upgraded). 
  • Once it has finished that process then type apt dist-upgrade to update from stretch to buster.
  • There will be a list of packages that are no longer needed so type in apt autoremove to discard them.
I am currently expecting to have all my computers running buster within a couple of days as all the upgrades have gone smoothly, provided the above spec is followed exactly. Also pc4 has been upgraded from XFCE to LXQt, meaning none of my computers are running XFCE now. LXQt on Buster looks better than the previous versions I evaluated on Stretch, probably because it is now version 0.14, up from 0.11.

Saturday 2 March 2019

Python Scripting [3F]: Layer Fractional Segments for NZ Rail Maps 6

Today's little bit of fun and games has been to change the script so that it can read in a list file and process a list of source layers and produce the world and auxiliary files for them. Last night I discovered there was some additional aerial photos available for OtiraNorth area, for which I opened up the existing xcf project in Gimp that covered four existing areas with a total of 10 base tiles, I then added two more 0.4m base tiles rescaled to 0.1m, added the historical aerial photo, and as a result had six segments to export from the project. The result was a 17.3 GB Gimp file which it was able to handle OK with the extra swap space that it now has. If I can get a bigger swap disk for the computer then it should be able to handle very large files in future.

This means as there were three segments from each of two base tiles, I could put the parameters for the two base tiles into a file and save that, and then pass it to the script with a few modifications. These turned out to be rather more complex than I expected due to the different data types involved. 

Basically when you get arguments off the command line from sys.argv this is not a string containing arguments. It is a list of arguments. parse_args expects to be passed a list. But if you are reading from a file, with readlines, you get a list of strings, and when you loop through them, you have a string to pass in, which isn't what parse_args expects. So you have to turn that into a list, which typically you'd do by calling the string's split method.

So I have had to do some extra work to make sure parse_args is getting the list it expects to receive, because  otherwise it doesn't work as expected.

Here is the first part of the script up to the point where it reads the world files, showing the extra code needed to handle the extra and different parsing:

# declarations
import argparse
import os
import shutil
import sys

rootPath = "/home/patrick/Sources/Segments/"

# set up command line argument parser
parser = argparse.ArgumentParser(prog='segments')
parser.add_argument('-l', '--listfile')
parser.add_argument('-b', '--base')
parser.add_argument('-r', '--right')
parser.add_argument('-d', '--down')
parser.add_argument('-c', '--counter', type=int, default=4)
parser.add_argument('-p', '--pixelsize', type=float, default=0.1)

# check first for list file and handle if found otherwise assume single line input
argList = sys.argv[1:]                              # drop the script name parameter
args = parser.parse_args(argList)
if args.listfile == None:                           # single line input from command line
    listData = [" ".join(argList)]
else:                                               # multi line input from file
    listDataFileName = rootPath + args.listfile
    listDataFile = open(listDataFileName, "r")
    listData = listDataFile.readlines()
    listDataFile.close()

for listLine in listData:

    # parse arguments
    listLine = listLine.strip("\n")
    listItems = listLine.split(" ")
    args = parser.parse_args(listItems)
    #save the parameters
    baseName = args.base
    rightName = args.right
    downName = args.down
    counter = args.counter
    pixelSize = args.pixelsize

The major differences in this script therefore are:
  • import sys is needed in order to use sys.argv which is the parameters typed on a command line.
  • parser.add_argument calls are different. -b -d and -r are no longer mandatory and we put a new one in which is -l for the listfile.
  • The next block is to parse the arguments to look for the list file parameter (-l). This time instead of using the default to parse_args which gets sys.argv itself, I have saved this into a string. We used sys.argv[1:] in order to drop the script name which is actually passed in as the first parameter; parse_args itself uses the same syntax to achieve the same thing.
  • If a list file name was passed in then we read the list file into a list of strings. Each string contains one set of parameters. Otherwise we create a list containing one set of parameters from the command line. This means we have to first turn the command line parameter list into a string, with spaces between each parameter, and then turn that string into a list, which in this case will only have one string in it.
  • We then enter a loop which loops through our list of parameter lines. We get each list item into a string.
    • The first thing is to strip any newline characters (which is what we get as part of the input when we use readlines() to read multiple lines from a file, there will be a newline at the end of each line).
    • Next is to split the string into a list whose items are the parameters themselves, using the split function with a space as input.
    • Then finally we can call parse_args with this list as input.
From there the rest of the script is the same as before.

So we have about 20 more lines of code to handle the differences, which include discovering if there is a list file specified, and reading it, and then handling the conversions needed between different data formats. 

So I tested both types of input and it has worked as expected.

Friday 1 March 2019

Python Scripting [3E]: Layer Fractional Segments for NZ Rail Maps 5

I made a lot of progress on this on Wednesday, mainly due to dropping nearly everything else and pushing on to finish it. After testing it, which so far has worked well, I decided to add an extra step to the workflow. The source layer (base layer) is a jpg file and we are making a jgw file for each of its segments. It also has a .xml file and a .jpg.aux.xml file with it, which we want to copy automatically.

So the extra code for this is
auxFileNameSource = rootPath + baseNameBase + ".jpg.aux.xml"
auxFileNameDest = rootPath + rootNameBase + ".jpg.aux.xml"
if os.path.isfile(auxFileNameSource):
    shutil.copyfile(auxFileNameSource,auxFileNameDest)
auxFileNameSource = rootPath + baseNameBase + ".xml"
auxFileNameDest = rootPath + rootNameBase + ".xml"
if os.path.isfile(auxFileNameSource):
    shutil.copyfile(auxFileNameSource,auxFileNameDest)

Again testing this has worked out. The script has now been tested and worked with all 10 segments that I had produced. One small change made near the top of the script is to reduce the amount of typing and have the script automatically add the .jgw extensions to the filenames that were typed in on the command line. So we are now only specifying layer names, not layer file names. This means changing another line further down in the script (now 82 lines) to eliminate the splitting of the extension off the input parameter. A few more lines now print out status messages, and comments have been added as well as whitespace. One thing I have done differently from my previous effort is to start camelCasing variable names. I don't know what convention the Python people recommend, but that's my personal preference, compared to underlines and other things that people sometimes use.

If I wanted to make the script even more useful I could have it work out the right and down layers automagically and it's possible I could do that but I would have to put in code to work out Linz's rules for naming sequences and that could be a lot more work so at the moment I will leave it here for the script and just put in those parameters but it is taking a bit of work to remember to put in -b and -d and -r before typing those layer names.

The other idea I am having is to be able to feed in a parameter list in a file and have it process all  source tiles at the same time, in a case where more than one or two source tiles are ready to be processed together. So if I modify my original script then I can have a different command line parameter and that will be -l or --layerlist that will read a list that is basically the command line parameters and the script runs through the whole file and then processes the entire list. It turns out that argparse can process a list of arguments that you pass to the parse_args function call, instead of the command line which it processes by default.

Here is the full script as it is for now. I expect that I will pursue the idea of adding an option to feed in an input file, and will look at it next time I need to use the script, so there may be an additional part added to this series then.

import argparse
import os
import shutil

#parse command line parameters
parser = argparse.ArgumentParser(prog='segments')
parser.add_argument('-b', '--base', required=True)
parser.add_argument('-r', '--right', required=True)
parser.add_argument('-d', '--down', required=True)
parser.add_argument('-c', '--counter', type=int, default=4)
parser.add_argument('-p', '--pixelsize', type=float, default=0.1)
args = parser.parse_args()

#save the parameters
baseName = args.base
rightName = args.right
downName = args.down
counter = args.counter
pixelSize = args.pixelsize

#Input file names
rootPath = "/home/patrick/Sources/Segments/"
baseFileName = rootPath + baseName + ".jgw"
rightFileName = rootPath + rightName + ".jgw"
downFileName = rootPath + downName + ".jgw"

#read input files
baseFile = open(baseFileName, "r")
baseData = baseFile.readlines()
baseFile.close()
rightFile = open(rightFileName, "r")
rightData = rightFile.readlines()
rightFile.close()
downFile = open(downFileName, "r")
downData = downFile.readlines()
downFile.close()

#save input file data
baseX = float(baseData[4])
baseY = float(baseData[5])
baseSkewX = float(baseData[1])
baseSkewY = float(baseData[2])
rightX = float(rightData[4])
rightY = float(rightData[5])
downX = float(downData[4])
downY = float(downData[5])

#Main calculation and processing section
#Initialisation
for colNum in range(counter):
    for rowNum in range(counter):
        segmentX = (((rightX-baseX)/counter)*colNum)+baseX
        segmentY = (((downY-baseY)/counter)*rowNum)+baseY
        gridX = "x" + str(counter) + "." + str(colNum + 1)
        gridY = "x" + str(counter) + "." + str(rowNum + 1)
       
        #Generate segment tile filename
        baseNameBase = baseName
        baseNameSplit = baseNameBase.split("-")
        baseColDescriptor = baseNameSplit[0]
        baseRowDescriptor = baseNameSplit[1]
        gridRowDescriptor = baseRowDescriptor + gridY
        gridColDescriptor = baseColDescriptor + gridX
        gridFileName = gridColDescriptor + "-" + gridRowDescriptor + ".jpg"
        # Next line for debugging output only
        #print(gridX + " " + gridY + " : " + str(segmentX) + " , " + str(segmentY))
       
        #Look for segment tiles that match current grid position
        rootFilesList = os.listdir(rootPath)
        for rootFile in rootFilesList:
            if rootFile.endswith(gridFileName):
               
                #Generate world file name
                rootNameBase = os.path.splitext(rootFile)[0]
                segmentName = rootNameBase + ".jgw"
                segmentFileName = rootPath + segmentName
                print(rootFile + " -> " + segmentName)
               
                # Write world file
                segmentFile = open(segmentFileName, "w+")
                segmentFile.write(str(pixelSize) + "\n")
                segmentFile.write(str(baseSkewX) + "\n")
                segmentFile.write(str(baseSkewY) + "\n")
                segmentFile.write("-" + str(pixelSize) + "\n")
                segmentFile.write(str(segmentX) + "\n")
                segmentFile.write(str(segmentY) + "\n")
                segmentFile.close()
               
                # find xml files for base layer and copy to segment
                auxNameSource = baseNameBase + ".jpg.aux.xml"
                auxFileNameSource = rootPath + auxNameSource
                auxNameDest = rootNameBase + ".jpg.aux.xml"
                auxFileNameDest = rootPath + auxNameDest
                if os.path.isfile(auxFileNameSource):
                    print("   " + auxNameSource + " -> " + auxNameDest)
                    shutil.copyfile(auxFileNameSource,auxFileNameDest)
                auxNameSource = baseNameBase + ".xml"
                auxFileNameSource = rootPath + auxNameSource
                auxNameDest = rootNameBase + ".xml"
                auxFileNameDest = rootPath + auxNameDest
                if os.path.isfile(auxFileNameSource):
                    print("   " + auxNameSource + " -> " + auxNameDest)
                    shutil.copyfile(auxFileNameSource,auxFileNameDest)