Home » Questions » Computers [ Ask a new question ]

How can I create a zip / tgz in Linux such that Windows has proper filenames?

How can I create a zip / tgz in Linux such that Windows has proper filenames?

Currently, tar -zcf arch.tgz files/* encodes filenames in UTF, so Windows users see all characters spoiled in filenames which are not english, and can do nothing with it.

Asked by: Guest | Views: 493
Total answers/comments: 3
Guest [Entry]

"Currently, tar encodes filenames in UTF

Actually tar doesn't encode/decode filenames at all, It simply copies them out of the filesystem as-is. If your locale is UTF-8-based (as in many modern Linux distros), that'll be UTF-8. Unfortunately the system codepage of a Windows box is never UTF-8, so the names will always be mangled except on tools such as WinRAR that allow the charset used to be changed.

So it is impossible to create a ZIP file with non-ASCII filenames that work across different countries' releases of Windows and their built-in compressed folder support.

It is a shortcoming of the tar and zip formats that there is no fixed or supplied encoding information, so non-ASCII characters will always been non-portable. If you need a non-ASCII archive format you'll have to use one of the newer formats, such as recent 7z or rar. Unfortunately these are still wonky; in 7zip you need the -mcu switch, and rar still won't use UTF-8 unless it detects characters not in the codepage.

Basically it's a horrible mess and if you can avoid distributing archives containing filenames with non-ASCII characters you'll be much better off."
Guest [Entry]

"The problem, using in Linux the default tar (GNU tar), is solved... adding the --format=posix parameter when creating the file.

For example:
tar --format=posix -cf

In Windows, to extract the files, I use bsdtar.

In lists.gnudotorg/archive/html/bug-tar/2005-02/msg00018.html it is written (since 2005!!):

> I read something in the ChangeLog about UTF-8 being supported. What does
> this mean?
> I found no way to create an archive that would be interchangeable
> between different locales.

When creating archives in POSIX.1-2001 format (tar --format=posix or
--format=pax), tar converts file names from the current locales to UTF-8
and then stores them in archive. When extracting, the reverse operation
is performed.

P.S. Instead of typing --format=posix you can type -H pax, which is shorter."
Guest [Entry]

"POSIX-1.2001 specified how TAR uses UTF-8.

As of 2007, changelog version 6.3.0 in the PKZIP APPNOTE.TXT (http://www.pkware.com/documents/casestudies/APPNOTE.TXT) specified how ZIP uses UTF-8.

It's only which tools support these standards properly, that remains an open question."