The nGzip compressor is a PHP script which produces a compressed version of a file directly on site. And what for? To save time, to better use available resources, and to enhance the user experience. But read on …
When visiting a page on a given website, a content negotiation occurs between the browser and the server (Apache). If any, the server will send a compressed version of the page. Delivered in less time, it will be quickly displayed by the browser. And users always like reactive websites, don't they?
In the dialog taking place between the browser and the server, the interesting parts lie in the headers of both HTTP requests and responses.
Usually, when a browser requests some file to a server, it announces what kind of documents it can process in request headers. The server takes them into account to determine which file to serve. It sends back a file together with response headers, which can be seen as processing instructions for the browser.
An example follows:
HTTP headers sent by browser
Accept-Charset: UTF-8,*
Host: example.org
Connection: keep-alive
Keep-Alive: 300
Accept-Language: fr,en;q=0.5
Accept: text/xml,...,*/*;q=0.5
User-Agent: Mozilla/5.0 ...
Accept-Encoding: gzip, deflate
Note the Accept-Encoding
header, indicating that the browser accepts compression formats
gzip
and deflate
.
HTTP header sent by server
Date: Tue, 24 Jul 2007 15:32:58 GMT Server: Apache Last-Modified: Mon, 23 Jul 2007 12:27:33 GMT Etag: "40f98da-636-46a49eb5;468273e9" Accept-Ranges: bytes Content-Length: 1590 Content-Type: text/html Content-Language: fr Vary: negotiate, accept-language, accept-encoding TCN: choice Content-Location: fichier.fr.html.gz Content-Encoding: gzip
The Vary
and TCN
fields are indicating that a successfull content negotiation occured ;
the Content-Location
field gives the name of served file ; the Content-Encoding
field
indicates that it is a gzip compressed file (.gz suffix).
The browser will read these fields in order to process appropriately a content to follow, ie. decompress then interpret normally the file.
The negotiation mechanism is based on a complex algorithm, which is illustrated here by some examples.
Case | Requested | Available | Served | Comments |
---|---|---|---|---|
(1) | name.html | name.html name.html.gz |
name.html | no negotiation occurs |
(2) | name.html | name.html.html name.html.gz |
name.html.gz | negotiation is forced by over-suffixing the name |
(3) | name.html | name.html.gz name.html.en name.html.fr |
name.html.fr | language is preemptive over encoding |
(4) | name.html | name.html.la name.html.en name.html.en.gz |
name.html.en.gz | choice is on second language adding encoding |
(5) | name.fr | name.html.fr name.en.html name.html.en |
no match on basename | |
(6) | name | name.fr.html name.html.en |
name.fr.html | negotiation is likely to always succeed |
What to retain from this result table?
We'll take the case of a website in a shared hosting environment with Apache v.1.x. In general, the only option left to users for changing server behavior is to use a htaccess file. If not allowed, you might better look for another hosting provider…
This is the minimum set of directives for negotiation to happen:
Options +MultiViews AddEncoding x-gzip .gz AddType text/html .gz
If your website is multilingual, you'll add language declarations to these. For example, for a French and English website:
AddLanguage fr .fr AddLanguage en .en LanguagePriority fr en
If there should be a negotiation with CSS stylesheets and/or JavaScript scripts, put an additional htaccess file in each respective directory. File names have to be over-suffixed, for example, style.css.css or script.js.js, as there would be no negotiation otherwise (see rule number 1).
In the stylesheet directory:
ForceType text/css
In the script directory:
ForceType text/javascript
As a safe practice, check before if htacces files are working in separate test directories.
In above request example, the Accept-Encoding
field had 2 values:
gzip
and deflate
. The first value corresponds to a well-known compression format,
recognized by 90 % of current browsers. The second value corresponds to a newer format, which is not as ubiquitous yet.
Compression is the operation by which an equivalent but lighter version of a file is made. This shrinked version will travel faster through networks, so it will be available sooner to the browser. Decompression is the operation by which a compressed file is transformed back to the original file. Done internally by the browser, this operation is automatic and nearly instantaneous.
A server can compress files on demand. For gzip
compression, this implies use of an additional module
(mod_gzip), which is rarely installed in shared hosting environments. In the case of deflate
, Apache server version 2 includes
module mod_deflate as a standard feature: compression is then automatic if activated. But this Apache version is still new.
Whatever the format, negotiation procedures remain the same.
To make up for the absence of these modules, files may be pre-compressed. The server will be able to free resources that it would have dedicated to compression otherwise. These resources are valuable when the server is under heavy load, as users are waiting at the other end
When should files be compressed? It depends… Of course, all situations are different. However, there are some general guidelines:
There is a good reason behind this content negotiation/compression topic, and it is that both technologies are applied here. Any pages which can be compressed, whether HTML pages, stylesheets or scripts, are available under both normal and compressed forms. There are many text documents, sometimes very large (some are over 600kb). All these characteristics make for compression.
But this solution requires a lot of work. You have to rename and compress files, upload them to the server, and again whenever they are updated. Should we forget such a promising solution because of these drawbacks? Of course not. Everything can turn out right with a little automation.
And this is the purpose of nGzip: the script allows to manage compression and appropriate renaming of any HTML, JavaScript and CSS files online. Other file types can easily be added to this list by editing the script (see code comments).
The script requires Apache (version 1 or 2), PHP (version 4 or 5) with Zlib extension (almost always installed), ability to use htaccess files, and a browser. Some User Interface features are only available if JavaScript is enabled on the browser.
Choose your prefered archive format: nGzip (Zip) or nGzip (Tgz).
Decompress the archive then upload file ngzip.php to your website. The script can be installed anywhere. However, for security reasons, it should be contained into a restricted access directory, as it could reveal some details about the site structure. Still it can not be used directly to modify or delete original files.
Note:
Although file processing is quite fast, the script might run out of time when files are numerous and/or large. If such is the case, process
the directory content in several passes. File date and size computation may also consume some milliseconds.
Some screenshots
:Starting window
All files are displayed
Processing is finished (size display)
If you are using Firefox, there is an extension, Web Developer, with an option for showing response headers. Some free services allow to check if pages are compressed, for example, Leknor, GidNetwork, and others.
May be in a next version:
The script is available under the GPLv3 License.