The nGzip compressor is a PHP script which produces a compressed version of a file directly on site. And what for? To save time, to better use available resources, and to enhance the user experience. But read on …
When visiting a page on a given website, a content negotiation occurs between the browser and the server (Apache). If any, the server will send a compressed version of the page. Delivered in less time, it will be quickly displayed by the browser. And users always like reactive websites, don't they?
In the dialog taking place between the browser and the server, the interesting parts lie in the headers of both HTTP requests and responses.
Usually, when a browser requests some file to a server, it announces what kind of documents it can process in request headers. The server takes them into account to determine which file to serve. It sends back a file together with response headers, which can be seen as processing instructions for the browser.
An example follows:
Accept-Charset: UTF-8,* Host: example.org Connection: keep-alive Keep-Alive: 300 Accept-Language: fr,en;q=0.5 Accept: text/xml,...,*/*;q=0.5 User-Agent: Mozilla/5.0 ... Accept-Encoding: gzip, deflate
Accept-Encoding header, indicating that the browser accepts compression formats
Date: Tue, 24 Jul 2007 15:32:58 GMT Server: Apache Last-Modified: Mon, 23 Jul 2007 12:27:33 GMT Etag: "40f98da-636-46a49eb5;468273e9" Accept-Ranges: bytes Content-Length: 1590 Content-Type: text/html Content-Language: fr Vary: negotiate, accept-language, accept-encoding TCN: choice Content-Location: fichier.fr.html.gz Content-Encoding: gzip
TCN fields are indicating that a successfull content negotiation occured ;
Content-Location field gives the name of served file ; the
indicates that it is a gzip compressed file (.gz suffix).
The browser will read these fields in order to process appropriately a content to follow, ie. decompress then interpret normally the file.
The negotiation mechanism is based on a complex algorithm, which is illustrated here by some examples.
|name.html||no negotiation occurs|
|name.html.gz||negotiation is forced by over-suffixing the name|
|name.html.fr||language is preemptive over encoding|
|name.html.en.gz||choice is on second language adding encoding|
|no match on basename|
|name.fr.html||negotiation is likely to always succeed|
We'll take the case of a website in a shared hosting environment with Apache v.1.x. In general, the only option left to users for changing server behavior is to use a htaccess file. If not allowed, you might better look for another hosting provider…
This is the minimum set of directives for negotiation to happen:
Options +MultiViews AddEncoding x-gzip .gz AddType text/html .gz
If your website is multilingual, you'll add language declarations to these. For example, for a French and English website:
AddLanguage fr .fr AddLanguage en .en LanguagePriority fr en
In the stylesheet directory:
In the script directory:
As a safe practice, check before if htacces files are working in separate test directories.
In above request example, the
Accept-Encoding field had 2 values:
deflate. The first value corresponds to a well-known compression format,
recognized by 90 % of current browsers. The second value corresponds to a newer format, which is not as ubiquitous yet.
Compression is the operation by which an equivalent but lighter version of a file is made. This shrinked version will travel faster through networks, so it will be available sooner to the browser. Decompression is the operation by which a compressed file is transformed back to the original file. Done internally by the browser, this operation is automatic and nearly instantaneous.
A server can compress files on demand. For
gzip compression, this implies use of an additional module
(mod_gzip), which is rarely installed in shared hosting environments. In the case of
deflate, Apache server version 2 includes
module mod_deflate as a standard feature: compression is then automatic if activated. But this Apache version is still new.
Whatever the format, negotiation procedures remain the same.
To make up for the absence of these modules, files may be pre-compressed. The server will be able to free resources that it would have dedicated to compression otherwise. These resources are valuable when the server is under heavy load, as users are waiting at the other end
When should files be compressed? It depends… Of course, all situations are different. However, there are some general guidelines:
There is a good reason behind this content negotiation/compression topic, and it is that both technologies are applied here. Any pages which can be compressed, whether HTML pages, stylesheets or scripts, are available under both normal and compressed forms. There are many text documents, sometimes very large (some are over 600kb). All these characteristics make for compression.
But this solution requires a lot of work. You have to rename and compress files, upload them to the server, and again whenever they are updated. Should we forget such a promising solution because of these drawbacks? Of course not. Everything can turn out right with a little automation.
Decompress the archive then upload file ngzip.php to your website. The script can be installed anywhere. However, for security reasons, it should be contained into a restricted access directory, as it could reveal some details about the site structure. Still it can not be used directly to modify or delete original files.
Although file processing is quite fast, the script might run out of time when files are numerous and/or large. If such is the case, process the directory content in several passes. File date and size computation may also consume some milliseconds.
If you are using Firefox, there is an extension, Web Developer, with an option for showing response headers. Some free services allow to check if pages are compressed, for example, Leknor, GidNetwork, and others.
May be in a next version:
The script is available under the GPLv3 License.