フォルダ内のファイルのmd5sumを計算するshellスクリプト (2022.10.9)

Summary
md5sumコマンドを用いてフォルダ内のファイルのmd5sumを自動で計算し、階層ごとに_checksumフォルダに入れるshell scriptを作成した。 _checksumフォルダがなければ自動生成する。フォルダのみが含まれるフォルダや、隠しファイルにはmd5sumを生成しない。_checksumフォルダ内のファイルは無視する。

Macにはmd5sumコマンドのインストール

Macにはmd5sumコマンドは含まれていないためbrewでmd5sha1sumをインストールする必要がある。

$ brew update
$ brew upgrade
$ brew install md5sha1sum

使い方

% md5sum --version
Microbrew md5sum/sha1sum/ripemd160sum 0.9.5 (Wed Dec  6 12:48:56 EST 2006)
  Compiled Oct 23 2021 at 02:12:08
Written by Bulent Yilmaz

Copyright (C) 2004,2006 Microbrew Software
%
% md5sum --help   
Usage: md5sum [<option>] <file> [<file> [...] ]
       md5sum [<option>] --check <file>

Note:  These options are mostly compatible with GNU md5sum
       -s, -h, and -V are not available in GNU md5sum

 -b, --binary         Read files in binary mode
 -c, --check <file>   Check MD5 sums from <file>
 -t, --text           Read files in ASCII mode

 -s, --status         Silent mode: Use exit code to determine verification

 -h, --help           Display this help message and exit
 -V, --version        Display program version and exit

Macではかわりにmd5コマンドが用いられるらしい。 使い方は”man md5”で確認可能

% man md5

MD5(1)                       General Commands Manual                      MD5(1)

NAME
     md5 – calculate a message-digest fingerprint (checksum) for a file

SYNOPSIS
     md5 [-pqrtx] [-s string] [file ...]

DESCRIPTION
     The md5 utility takes as input a message of arbitrary length and produces
     as output a “fingerprint” or “message digest” of the input.  It is
     conjectured that it is computationally infeasible to produce two messages
     having the same message digest, or to produce any message having a given
     prespecified target message digest.  The MD5 algorithm is intended for
     digital signature applications, where a large file must be “compressed” in
     a secure manner before being encrypted with a private (secret) key under a
     public-key cryptosystem such as RSA.

     MD5's designer Ron Rivest has stated "md5 and sha1 are both clearly broken
     (in terms of collision-resistance)".  So MD5 should be avoided when
     creating new protocols, or implementing protocols with better options.
     SHA256 and SHA512 are better options as they have been more resilient to
     attacks (as of 2009).

     The following options may be used in any combination and must precede any
     files named on the command line.  The hexadecimal checksum of each file
     listed on the command line is printed after the options are processed.

     -s string
             Print a checksum of the given string.

     -p      Echo stdin to stdout and append the checksum to stdout.

     -q      Quiet mode - only the checksum is printed out.  Overrides the -r
             option.

     -r      Reverses the format of the output.  This helps with visual diffs.
             Does nothing when combined with the -ptx options.

     -t      Run a built-in time trial.

     -x      Run a built-in test script.

EXIT STATUS
     The md5 utility exits 0 on success, and 1 if at least one of the input
     files could not be read.

SEE ALSO
     cksum(1), CC_SHA256_Init(3), md5(3), ripemd(3), sha(3)

     R. Rivest, The MD5 Message-Digest Algorithm, RFC1321.

     Vlastimil Klima, Finding MD5 Collisions - a Toy For a Notebook, Cryptology
     ePrint Archive: Report 2005/075.

ACKNOWLEDGMENTS
     This program is placed in the public domain for free general use by RSA
     Data Security.

macOS 12.1                        June 6, 2004                        macOS 12.1

参考:

階層フォルダ内のファイルに対してそれぞれmd5sumを適用する

使い方

chmodでgenerate_md5sum.shを実行可能に変更し、 md5sumを計算したいファイルの入ったフォルダを指定してスクリプトを実行 md5sumコマンドが存在しない場合はインストールしておくこと

% chmod a+x ./generate_md5sum.sh
% ./generate_md5sum.sh ./target_dir

ソースコード(generate_md5sum.sh)

工夫点と課題

# ============================================
# Calculate md5sum in the target folder
#
# generate_md5sum.sh
# Coded by Noboru Harada (noboru@ieee.org)
#
# Changes:
# 2022/10/09 The first version
#
# Usage:
# > generate_md5sum.sh ./target_dir
#
# tested on Mac
# ============================================

if [ $# -lt 1 ]; then
    echo "USAGE: generate_md5sum.sh dir_path"
    exit 1
fi

dir_path=$1
#echo "$dir_path"
dirs=`find $dir_path -maxdepth 5 -type d`

if [ -z "$dirs" ]; then
    dirs="$1"
fi
#echo "$dirs"

# change IFS for filenames with white spaces
IFS_BACK="$IFS"
IFS=$'\n'

# dig the target dir
for dir in $dirs;
do
    echo "Processing DIR: $dir"
    dir_checksum=`echo "$dir/_checksum" | sed -e "s#//#/#g"`

    files=`find $dir -maxdepth 1 -type f -name "*.*" -exec echo {} \;`
    files_strip=`echo "$files" | sed -e "s#//#/#g"`

    if [ -d $dir_checksum ]; then
        echo "  $dir_checksum exits."
        echo "  Skip (case1): $dir_checksum"
    else
        # ignore _checksum folder for searching target
        dir_checksum_strip=`echo "$dir_checksum" | sed -e "s#.*_checksum/_checksum"$"##g"`
        if [ -z $dir_checksum_strip ]; then
            echo "  Skip (case2): $dir_checksum"
            files=""
            files_strip=""
        else
            if [ -z $files_strip ]; then
                echo "  Only folder exists in $dir"
                echo "  Skip (case3): $dir_checksum"
            else
                echo "$dir_checksum does not exist."
                echo "mkdir $dir_checksum"
                mkdir "$dir_checksum"
            fi
        fi
    fi

    for file in $files_strip;
    do
        # strip dir names (remove the very last '/' and previous characters) 
        file_strip1=`echo "$file" | sed -e "s#^.*/##g"`
        # ignore invisible files (starting with .) and *.md5sum files 
        file_strip2=`echo "$file_strip1" | sed -e "s#^\..*##g" | sed -e "s!.*md5sum"$"!!g"`
        echo "$file_strip2"

        if [ -z $file_strip2 ]; then
            echo "  Skip (case4): Don't process DIR, HIDEEN file or .md5sum: $file"
        else
            echo " Processing: $file"
            if [ -e $file ]; then
                sum_file=`echo "$dir/_checksum/$file_strip1.md5sum" | sed -e "s#//#/#g"`
                echo "  md5sum \"$file\" > \"$sum_file\""
                md5sum "$file" | sed -e "s#$dir/##g" > "$sum_file"
            fi
        fi
    done
done

IFS="$IFS_BACK"

## end

参考にしたサイト

だいぶ忘れてたので少し苦労した

Back to Index