Advanced mod_rewrite: FastCGI Ruby on Rails /w HTTPS
Friday, February 7. 2014
mod_rewrite comes handy on number of occasions, but when the rewrite deviates from the most trivial things, understanding how exactly the rules are processed is very very difficult. The documentation is adequate, but the information is spread around number of configuration directives, and it is a challenging task to put it all together.
RewriteRule Order of processing
Apache has following levels of configuration from top to bottom:
- Server level
- Virtual host level
- Directory / Location level
- Filesystem level (.htaccess)
Typically configuration directives have effect from bottom to top. A lower level directive overrides any upper level directive. This also the case with mod_rewrite. A RewriteRule in a .htaccess file is processed first and any rules on upper layers in reverse order of the level. See documentation of RewriteOptions Directive, it clearly says: "Rules inherited from the parent scope are applied after rules specified in the child scope". The rules on the same level are executed from top to bottom in a file. You can think all the Include-directives to be combining a large configuration file, so the order can be determined quite easily.
However, this order of processing rather surprisingly contradicts the most effective order of execution. The technical details documentation of mod_rewrite says:
Unbelievably mod_rewrite provides URL manipulations in per-directory context, i.e., within
.htaccess
files, although these are reached a very long time after the URLs have been translated to filenames. It has to be this way because.htaccess
files live in the filesystem, so processing has already reached this stage. In other words: According to the API phases at this time it is too late for any URL manipulations.
This results in a looping approach for any .htaccess rewrite rules. The documentation of RewriteRule Directive PT|passthrough says:
The use of the [PT] flag causes the result of the RewriteRule to be passed back through URL mapping, so that location-based mappings, such as Alias, Redirect, or ScriptAlias, for example, might have a chance to take effect.
and
The PT flag implies the L flag: rewriting will be stopped in order to pass the request to the next phase of processing.
Note that the PT flag is implied in per-directory contexts such as <Directory> sections or in .htaccess files.
What that means:
- L-flag does not stop anything, it especially does not stop RewriteRule processing in .htaccess file.
- All RewriteRules, yes all of them, are being matched over and over again in a .htaccess file. That will result in a forever loop if they keep matching. RewriteCond should be used to stop that.
- RewriteRule with R-flag pointing to the same directory will just make another loop. R-flag can be used to exit looping by redirecting to some other directory.
- When not in .htaccess-context, L-flag and looping does not happen.
So, the morale of all this is that doing any rewriting on .htaccess-level performs really bad and will cause unexpected results in the form of looping.
Case study: Ruby on rails -application
There are following requirements:
- The application is using Ruby on Rails
- Interface for Ruby is mod_fcgid to implement FastCGI
- All non-HTTPS requests should be redirected to HTTPS for security reasons
- There is one exception for that rule, a legacy entry point for status updates must not be forced to HTTPS
- The legacy entry point is using Basic HTTP authentication. It does not work with FastCGI very well.
That does not sound too much, but in practice it is.
Implementation 1 - failure
To get Ruby on Rails application running via FastCGI, there are plenty of examples and other information. Something like this in .htaccess will do the trick:
RewriteCond %{REQUEST_FILENAME} !-f
RewriteRule ^(.*)$ /dispatch.fcgi/$1 [QSA]
The dispatch.fcgi comes with the RoR-application and mod_rewrite is only needed to make the Front Controller pattern required by the application framework to function properly.
To get the FastCGI (via mod_fcgid) working a simple AddHandler fastcgi-script .fcgi will do the trick.
With these, the application does work. Then there is the HTTPS-part. Hosting-setup allows to edit parts of the virtual host -template, so I added own section of configuration, rest of the file cannot be changed:
<VirtualHost _default_:80 >
<IfModule mod_rewrite.c>
RewriteEngine On
RewriteCond %{HTTP_HOST} ^www.my.service$ [NC]
RewriteRule ^(.*)$ http://my.service$1 [L,R=301]
</IfModule>RewriteCond %{HTTPS} !=on
RewriteCond %{REQUEST_URI} !^/status/update
RewriteRule ^(.*)$ https://%{HTTP_HOST}$1 [R=301,QSA,L]</VirtualHost>
The .htaccess file was taken from RoR-application:
# Rule 1:
# Empty request
RewriteRule ^$ index.html [QSA]# Rule 2:
# Append .html to the end.
RewriteRule ^([^.]+)$ $1.html [QSA]# Rule 3:
# All non-files are processed by Ruby-on-Rails
RewriteCond %{REQUEST_FILENAME} !-f
RewriteRule ^(.*)$ /dispatch.fcgi/$1 [QSA]
It failed. Mainly because HTTPS rewriting is done too late. There were lot of repetition in the replaced URLs and HTTPS-redirect was the last thing done after /dispatch.fcgi/, so the result looked rather funny and not even close what I was hoping for.
Implementation 2 - success
After the failure I started really studying how the rewrite-mechanism works.
The first thing I did was dropped the HTTPS out of virtual host configuration to the not-so-well-performing .htaccess -level. The next thing I did was got rid of the loop-added dispatch.fcgi/dispatch.fcgi/dispatch.fcgi -addition. During testing I noticed, that I didn't account the Basic authentication in any way.
The resulting .htaccess file is here:
# Rule 0:
# All requests should be HTTPS-encrypted,
# except: message reading and internal RoR-processing
RewriteCond %{HTTPS} !=on
RewriteCond %{REQUEST_URI} !^/status/update
RewriteCond %{REQUEST_URI} !^/dispatch.fcgi
RewriteRule ^(.*)$ https://%{HTTP_HOST}/$1 [R=301,QSA,skip=4]# Rule 0.1
# Make sure that any HTTP Basic authorization is transferred to FastCGI env
RewriteRule .* - [E=HTTP_AUTHORIZATION:%{HTTP:Authorization}]# Rule 1:
# Empty request
RewriteRule ^$ index.html [QSA,skip=1]# Rule 2:
# Append .html to the end, but don't allow this thing to loop multiple times.
RewriteCond %{REQUEST_URI} !\.html$
RewriteRule ^([^.]+)$ $1.html [QSA]# Rule 3:
# All non-files are processed by Ruby-on-Rails
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_URI} !/dispatch.fcgi
RewriteRule ^(.*)$ /dispatch.fcgi/$1 [QSA]
Now it fills all my requirements and it works!
Testing the thing
Just to develop the thing and make sure all the RewriteRules work as expected and didn't interfere with each other in a bad way I had to take a test-driven approach to it. I created a set of "unit" tests in the form of manually executed wget-requests. There was no automation on it, just simple eyeballing the results. My tests were:
- Test the index-page, must redirect to HTTPS:
- wget http://my.service/
- Test the index-page, no redirects, must display the page:
- wget https://my.service/
- Test the legacy entry point, must not redirect to HTTPS:
- curl --user test:password http://my.service/status/update
- Test an inner page, must redirect to HTTPS of the same page:
- Test an inner page, no redirects, must return the page:
Those tests cover all the functionality defined in the above .htaccess-file.
Logging rewrites
The process of getting all this together would have been impossible without rewrite logging. The caveat is that logging must be defined in virtual host -level. This is what I did:
RewriteLogLevel 3
RewriteLog logs/rewrite.log
Reading the logfile of level 3 is very tedious. The rows are extremely long and all the good parts are at the end. Here is a single line of log split into something humans can read:
1.2.3.219 - -
[06/Feb/2014:14:56:28 +0200]
[my.service/sid#7f433cdb9210]
[rid#7f433d588c28/initial] (3)
[perdir /var/www/my.service/public/]
add path info postfix:
/var/www/my.service/public/status/update.html ->
/var/www/my.service/public/status/update.html/update
It simply reads that in a .htaccess file was being processed and it contained a rule which was applied. The log file clearly shows the order of rules being executed. However, most of the regular expressions are '^(.*)$' , so it is impossible to distinguish the rules from each other simply by reading the log file.
Final words
This is an advanced topic. Most sysadmins and developers don't have to meet the complexity of this magnitude. If do, I'm hoping this helps. It took me quite a while to put all those rules together.