I haven't thought about the filesize/hash to reduce collision, but I chose to stick with the last-modified time in the article, because it can takes hours computing hashes for a big directory tree.
Tools like rsync relies on last-modified time by default, and since I want to use this to track my own files, I won't fake it, so I think it's not a big deal?
It's not just that it could be faked, it could be an accident that you modify a file but the file modification date is not changed. For example, say you edit a photo, but later you run a script that sets the file's modification date to the EXIF data in the photo.
So I guess the point is that also including the file size will be one more (fast) data point to help ensure 'accurate' change tracking, without adding the overhead of computing content hashes.
I haven't thought about the filesize/hash to reduce collision, but I chose to stick with the last-modified time in the article, because it can takes hours computing hashes for a big directory tree.
Tools like rsync relies on last-modified time by default, and since I want to use this to track my own files, I won't fake it, so I think it's not a big deal?