Home » Questions » Computers [ Ask a new question ]

How can I make wget rename downloaded files to not include the query string?

How can I make wget rename downloaded files to not include the query string?

I'm downloading a site with wget and a lot of the links have queries attached to them, so when I do this:

Asked by: Guest | Views: 394
Total answers/comments: 4
bert [Entry]

"I think, in order to get wget to save as a filename different than the URL specifies, you need to use the -O filename argument. That only does what you want when you give it a single URL -- with multiple URLs, all downloaded content ends up in filename.

But that's really the answer. Instead of trying to do it all in one wget command, use multiple commands. Now your workflow becomes:

Run wget to get the base HTML file(s) containing your links;
Parse for URLs;
Foreach URL ending in mp3,

process URL to get a filename (eg turn http://foo/bar/baz.mp3?gargle=blaster into baz.mp3
(optional) check that filename doesn't exist
run wget <URL> -O <filename>

That solves your problem, but now you need to figure out how to grab the base files to find your mp3 URLs.

Do you have a particular site/base URL in mind? Steps 1 and 3 will be easier to handle with a concrete example."
bert [Entry]

"I have a similar approach as @Gregory Wolf because his code always created error messages like this:

mv: './file' and './file' are the same file

Thus I first check if there is a query string in the filename before moving the file:

for f in $(find $1 -type f); do
if [ $f = ${f%%\?*} ]; then continue; fi
mv ""${f}"" ""${f%%\?*}""
done

This will recursively check every file and remove all query strings in their filenames if available."
bert [Entry]

"In order to properly rename the files you have to account for spaces in file name, which is a possibility and will mess the for loop.
Here is an improved version :
find . -type f -name ""*\?*"" -print0 |
while IFS= read -r -d '' file;
do
mv -f ""$file"" ""`echo $file | cut -d? -f1`"";
done

This ensures that files with spaces are properly handled by the loop (using \0 as delimiter) and by the mv command (double quotes)
There were only a couple complex cases where it did not work but otherwise this is the best option."
bert [Entry]

"Even easier is this: unix.stackexchange.com/questions/196253/how-do-you-rename-files-specifically-in-a-list-that-wget-will-use

This suggests a method that essentially uses wget's rename function (can be altered to include directory) for multiple files. See the second version proposed."