Have you spent years editing pages straight from the web through Netscape Composer, Mozilla Composer, Nvu, or Seamonkey Composer and you are wondering how many unused images and unlinked pages are sitting out there somewhere and are wondering how you can clean it up? Are you just starting out or have decided to start from scratch on your entire site?
If you answered yes to any of the previous questions, the odds are that you are feeling that you don't know where to start in either starting a new site or how to get your sanity back from the years of editing pages straight from the web. I can help you get this task under control by showing you a few tips and tricks that will help you understand a bit about websites in general, your own website, and maybe a few tools that you can use.
First, let us take a look at a Uniform Resource Locator or URL for short. On the web, a URL consists of http:// or https://, the hostname (e.g. ges.wcs.k12.va.us) and possibly a few other things as well. Typically, things that follow the hostname are a / for the main page of a website, a directory after this slash, or even more items separated by slashes or other characters. Some of you may remember a few years ago, before each school had their own URL in the form of http://jsbhs.wcs.k12.va.us, that you had a URL that looked like this: http://www.wcs.k12.va.us/schools/high/jsbhs. In the case of this URL, the John S. Battle Web sat down in a folder on our main website. We had a folder for the schools, a folder in that folder for high schools, and in that folder, a folder for ahs, hhs, jsbhs, and phhs. As you drill down through the folders, each one of those folders stacks on the end of the URL. You can use this technique to organize your own website just as you would organize folders on your desktop or in your "My Documents" folder.
One important difference between the way you name folders and filenames on your computer and the way you should name them on the web is that when you name files or folders with spaces or other special characters on your own computer, this doesn't translate well to a URL. It ends up sticking a %20 in place of the space or another sequence in the form of %<number>. For this reason, I recommend not using spaces or special characters (other than _ or -) in filenames. Instead, I recommend using an underscore or a dash in place of a space. This keeps the URL still somewhat readable. One other important distinction is the use of uppercase characters. When you use uppercase characters in folder, file, and thus URLs, you create an "ugly" URL. Be consistent in your naming. In the interest of not confusing people with different cases in URLs, I strongly recommend that you keep all file and folder names lowercase. It will save you on headache relief down the road.
I recommend using folders to help you organize your site. Below is a diagram showing an example school site. I signify the top of the website with /. This represents a folder sitting on my computer and the way they are stored on the server. Each – is a file or a folder. Entries with a | below them indicate another folder or file in that folder.
/ |--images | |--sports | |--football | |--crosscountry | |--clubs | |--french | |--nhs | |--events | |--homecoming | |--prom | |-images | |-index.html |-contact.html
The key is knowing how the URL works. In the above example, we have the following URLs:
As an internet user, I should be able to figure out pretty quickly what each of these URLs will show me. As a web designer, it also helps me keep all of my clubs in one folder, events in another folder, and sports separate as well. The images folder at the top of the site I use to store images that are used across many pages on the site. A common use of this is a logo, or navigation. This structure also helps me when it comes time to remove content from the web as well. If I have all the photos from the prom kept in the images folder inside the prom folder, I will know exactly where to find them when it comes time to remove them and I will not accidentally remove other important photos.
You may have noted that I did not list http://school.wcs.k12.va.us/index.html above. See The Index Page for the reasons why it is not necessary.
The next important concept of URL design and publishing to the web is the concept known as the index page. Each folder above, and each URL in the above examples needs an index page. If you were to create these folders and upload them and go to http://school.wcs.k12.va.us/sports, you would be presented with a page that said Index of /sports. It would then list any files or folders that you had stored there. In the example above, it would show Parent Directory (which goes up to the next directory up the chain, in this case sports), football, and crosscountry. The latter two would have little folder icons beside them. If, however, I put a file called index.html in the sports folder and publish it, when I next go to http://school.wcs.k12.va.us/sports, I will see the file index.html. If I put an index.html file in the football folder, I will see that page when I go to http://school.wcs.k12.va.us/sports/football.
Some people have started to understand the value in folders for their organizational benefits and created folders like the sports/football above, but have called the main page football.html. The URL then becomes http://school.wcs.k12.va.us/sports/football/football.html. This URL has redundant information. Had the file been named index.html instead of football.html, they could have used the URL http://school.wcs.k12.va.us/sports/football because the server automatically knows to send the index.html file. A good rule of thumb is that when you create a folder, put an index.html file in it to serve as the main page for that folder.
Please note that for historical reasons, some of your sites may be setup so that you publish your main page as the page default.htm or default.html. You may use either of these names for your index page in your folders, but do not use a combination of these, because you may not get the results you expect. I recommend that you use index.html.
Linking can be a frustrating topic and I am going to try and make it as simple as I can. You have two types of links – absolute links and relative links. An absolute link looks just like a URL. These types of links should be external to your own web page. Relative links, on the other hand, are trickier, but just as important. Lets assume that you have a structure for your website as I listed in The URL and Folders. Lets also assume that you have a page down in /sports/football called index.html and an images folder in /sports/football and you need to link to an image in the /images folder and the images folder in football. I've also added an images folder to crosscountry as well.
/ |--images | |--site-header.jpg | |--sports | |--football | | |--index.html | | |--images | | | |--crosscountry | | |--index.html | | |--images | | | |--images | | |--sports-header.jpg | | | |--index.html
In order to link to an image in the sports folder images from index.html in the football folder, you should use a relative link. Here is how you would link it (the same relative link could be used in crosscountry/index.html as well):
<img src="../images/sports-header.jpg" alt="Sports">
The ../ tells the web browser to look in the folder just above this one (sports) and in the images folder below it and then for a file called sports-heading.jpg. You could extend the chain of ..'s in the relative link to reference an image in the /images folder as well:
<img src="../../images/site-header.jpg" alt="School">
This time, the browser is looking two levels up and into the images folder for a file called site-header.jpg.
After seeing how the URL works in relation to a folder structure, you begin to see some advantages to structuring your site. Your once small website has grown beyond one or two pages and you are updating it regularly by adding and removing images from the many pages and never removing old files that were used. How can you clean up the clutter? One option is the use of a little utility called wget. Wget will allow you to mirror your site in a folder on your computer. It goes out to your website and downloads all of the resources currently in use on it into a folder, essentially taking a snapshot of your website.
So where do you get it and how can you use it? You can download it from our FTP site at ftp://ftp.wcs.k12.va.us/pub/web/wget.exe. Save the program to c:\windows or your desktop, whichever you prefer.
After it has finished, you will have a folder on your desktop that is named <school>.wcs.k12.va.us. Inside will be all the resources that were saved on your website that were linked or embedded, such as images. Beware, however, that it will not interpret JavaScript and find images that may be rotating due to the use of DynamicDrive's scripts which seem to be popular among the school sites.
The other option is obviously to start from scratch and build the new site behind the scenes until it is mostly ready to publish. This requires communication with your principal and staff because it will take time to restructure. Once you do get it underway, it will be faster to update because you will be organized. It will be easier to remove all old resources if you organize the site properly. For those that are looking for a new site design and possibly new site design tools, I strongly recommend Adobe's Dreamweaver software because it lets you design site templates for your pages and is designed from the ground up to support Style Sheets to ease formatting information uniformly across your site.