Blog → Post


(NEWS) Recovering Space, Part 3: jdupes.sh
Wrapping jdupes to aid in finding duplicate files.
by @admin, september 12, 2024, 11:05am utc

As a way to find duplicate files on your system, jdupes is pretty good. However jdupes lacks (at least) the following things:

  • The ability to sort by file size (which is odd, given the task),
  • No ability to set a lower limit on the file sizes to match, (this results in piles of tiny matched files in the unsorted list),
  • No ability to see file sizes in human-readable form.

The following jdupes.sh is a small bash script, that attempts to help with those points.

jdupes.sh
figure — jdupes.sh bash script


Let's go over a few lines of this script:
  • line  3: The first if statement just displays usage information if no parms are supplied.
  • line 11: We are Not passing jdupes the -S parameter to show file size. We will get the size using a call to du later in the script (line 15) so that the size can be shown on the same line as the filename in the results. This allows us to sort the results by size. The stock jdupes command places the size above the filename line, so it can't be sorted.
  • line 12: The IFS= portion is to retain leading and trailing spaces in each line that is pumped through the "|" filter from jdupes. The -r after read treats backslash "\" … as might be found in Windows paths … as a normal character.
  • lines 15-17: The size of the file is retrieved with du. set is then used to break the variable sz into separate $1, $2 ... $n variables, with $1 holding the actual size.
  • Line 18 checks to see if the size is over a limit, so that we can avoid hundreds (thousands?) of matches on small files. The current limit is 100MB (100000). Set this however low that you wish to.
  • Line 19: Finally, we echo out the size followed by the filename; to be sorted by size on line 24.
or click for file.

#!/bin/bash

if [[ -z "$@" ]]; then echo "Nothing to do!" echo "Usage: jdupes.sh [device label or folder 1] [2] ... [n]" echo "Enclose names including blanks within double quotes." echo "e.g. $ jdupes.sh /media/user/MX500-2TB . "/m/f/Seagate Backup"" echo " will process 2 of User's drives and the current folder (".")." exit 1 fi jdupes -r "$@" | { while IFS= read -r file; do if [[ ! -z "$file" ]]; then if [[ ! "$file" == "bytes each" ]]; then sz=$(du "$file") set -- $sz if [[ $1 -gt 100000 ]]; then echo $(du "$file")
fi fi fi
done } | sort -n > /home/user/myjdups_sorted.txt

echo
echo "RESULTS"
tail -n 250 /home/user/myjdups_sorted.txt



Save this to your choice of filename.
Mark the resulting file as executable with sudo chmod+x <your choice of filename>.


Even though jdupes compares file size as one option to weed out duplicates; the tool correctly handles the few extra bytes added by various file systems for things like extra properties. For example, fdupes correctly matched these two files as being the same:

filesystem

NTFS	(Windows formatted drive)	 701384  General Class - Session 1 (2023-11-02).mp4
EXT (Linux formatted drive)  701388  General Class 1 (2023-11-02).mp4





See also,
Recovering Space
Recovering Space, Part 2
Recovering Space, Part 3: jdupes.sh (this file)
tags: All users, News
Footer done in Inkscape