Altering wget source to Allow the Tilde (~) Character in Saved Directories John Kozubik - john@kozubik.com - http://www.kozubik.com November 23, 2002 Overview By default, wget treats the tilde character (~) as an "unsafe" character. This means that when you download a web site with wget that has a tilde character in the URL, such as: http://www.example.com/~jkozubik/ the resulting directories that will be saved on your local filesystem will contain a %7E in place of all tilde characters: # ls www.example.com %7Ejkozubik %7Edgerlach %7Emjohnson Not only is this unsightly, but if you use a web browser to traverse the saved filesystem, you will not be able to correctly view those directories. Since the tilde character is a legal character in most unix-based operating systems, it would be nice to simply save the directories as the same tilde-prepended name they have in the URL. Solution Please note that this solution has only been testedon wget 1.8.2. I suspect that any version that contains the table we will discuss below can be altered in this manner. After unpacking your wget source tarball, change to the "src" directory inside the untarred wget tarball and edit the file "url.c". Inside url.c you will find this table: const static unsigned char urlchr_table[256] = { U, U, U, U, U, U, U, U, /* NUL SOH STX ETX EOT ENQ ACK BEL */ U, U, U, U, U, U, U, U, /* BS HT LF VT FF CR SO SI */ U, U, U, U, U, U, U, U, /* DLE DC1 DC2 DC3 DC4 NAK SYN ETB */ U, U, U, U, U, U, U, U, /* CAN EM SUB ESC FS GS RS US */ U, 0, U, RU, 0, U, R, 0, /* SP ! " # $ % & ' */ 0, 0, 0, R, 0, 0, 0, R, /* ( ) * + , - . / */ 0, 0, 0, 0, 0, 0, 0, 0, /* 0 1 2 3 4 5 6 7 */ 0, 0, RU, R, U, R, U, R, /* 8 9 : ; < = > ? */ RU, 0, 0, 0, 0, 0, 0, 0, /* @ A B C D E F G */ 0, 0, 0, 0, 0, 0, 0, 0, /* H I J K L M N O */ 0, 0, 0, 0, 0, 0, 0, 0, /* P Q R S T U V W */ 0, 0, 0, U, U, U, U, 0, /* X Y Z [ \ ] ^ _ */ U, 0, 0, 0, 0, 0, 0, 0, /* ` a b c d e f g */ 0, 0, 0, 0, 0, 0, 0, 0, /* h i j k l m n o */ 0, 0, 0, 0, 0, 0, 0, 0, /* p q r s t u v w */ 0, 0, 0, U, U, U, U, U, /* x y z { | } ~ DEL */ The table on the right contains normal ascii characters, and the table on the left contains a U for every "unsafe" character, and a 0 (zero) for every "safe" character. Note that the tilde character (second from the right on the bottom row) corresponds to a U in the lefthand table, denoting it as an "unsafe" character. Simply change the U that matches the tilde to a zero instead, then save the file. That is all you need to do - now you may simply compile the source as you normally would. Solution for wget Installed From the FreeBSD Ports Collection If you are installing wget from the FreeBSD ports collection, you can still apply this fix. However, because the usual ports collection target of `make install` will install the compilation of the default source code, you need to run `make extract` first, then cd to "work/wget-1.8.2/src", then edit url.c as described above. After editing url.c, simply change to the base wget port directory (/usr/ports/ftp/wget) and this time run `make install`.