Home » Questions » Computers [ Ask a new question ]

Sorting human readable file sizes

Sorting human readable file sizes

How can I sort a list using a human-readable file-size sort, numerical sort that takes size identifier (G,M,K) into account? Can I sort "du -sh" output for example?

Asked by: Guest | Views: 167
Total answers/comments: 5
Guest [Entry]

"Afaik, there's no standard command to do this.

There are various workarounds, which were discussed when the same question was asked over at Stack Overflow: How can I sort du -h output by size"
Guest [Entry]

"If you are just worried about files larger than 1MB, as it seems you are, you can use this command to sort them and use awk to convert the size to MB:

du -s * | sort -n | awk '{print int($1 / 1024)""M\t""$2}'

Again, this rounds the sizes to the nearest MB. You can modify it converting to the unit of your choice."
Guest [Entry]

du -sk * | sort -n | awk '{ print $2 }' | while read f ; do du -sh "$f" ; done
Guest [Entry]

"This command will sort by size in MB

du --block-size=MiB --max-depth=1 path | sort -n"
Guest [Entry]

"I ended up here since I was trying to sort something else that combined MB and GB in the same output and I couldn't control it.

$NF is used since the #GB or #MB pattern was the last column in the output:

somecommand | \
gawk '{
if ($NF ~ /[0-9\.]+GB/)
{ a=gensub(/([0-9\.]+)(GB)/,""\\1"",""g"",$NF); \
printf ""%sMB\n"", a*1024} \
else {print $NF}
}' | \
sort -n

Explanation of the awk command:

if ($NF ~ /[0-9\.]+GB/)

if the last column matches the regex pattern which contains a digit or a . one or more times followed by GB

{ a=gensub(/([0-9\.]+)(GB)/,""\\1"",""g"",$NF); \

then set variable a to the digit portion of each line which matches the same regex pattern in the same last column ($NF)

printf ""%sMB\n"", a*1024} \

after setting a, use printf to format the output as ${a*1024}MB

else {print $NF}

otherwise just print the last column

sort -n

use numeric sort on the output

example

echo ""4MB\n5GB\n420MB\n420GB\n1024MB\n1GB"" | \ 23:43:06 (EMP-2653/package-upgrades) Ø M
gawk '{
if ($NF ~ /[0-9\.]+GB/)
{ a=gensub(/([0-9\.]+)(GB)/,""\\1"",""g"",$NF); \
printf ""%sMB\n"", a*1024} \
else {print $NF}
}' | \
sort -n

I'm sure there's a way to reuse the regex pattern so I'm only performing the match once and replacing in place, but I don't know how to do that yet :)"