Hacker's ramblings - Entries from March 2014

EPIC4 maildir patch

Sunday, March 30. 2014

I'm in the IRC 24/7. For the "idling" on my favorite channel I have used EPIC4 for a very long time. Couple of decades, in fact. The project is in a bad shape. Anything IRC-related is. For the record: I'll be the last dinosaur to punch the clock for the last time and turn off the lights when I notice that I'll be idling there alone. It won't come for another couple of decades, though.

Based on epicsol.org website, there is actually nobody to contact about EPIC4 bugs, no mailing list anymore (last one died 2009) nor any contact e-mail or a form. So, there literally is nobody who I could notify about anything. Writing on my own blog about it is pretty much all I can do for the project.

Back to business... My Linux-box is a mail-host and whenever something new arrives, it is really nice to get notified about that while doing absolutely nothing on the channel. However, when I stopped using mbox for storing the mail in my box, my favorite IRC-client stopped doing the notifying. It didn't not have the code for the more effective Maildir format. It does now.

My stuff is at http://opensource.hqcodeshop.com/EPIC/4/
It contains 64-bit RPM for Fedora 20 and the .src.rpm if you want to do the build by yourself. Note that my version is the latest EPIC4 2.10.4, not the Fedora-boxed 2.10.2.

To start using the Maildir-mode, say:
set mail_type maildir
in your .ircrc-file. The thing relies on $MAIL-environment variable to know where your mail is stored at.

Update 31th March 2014:
I actually got hold of Mr. Jeremy Nelson, the author or EPIC4 and EPIC5. He took my patch and said that it will be released in 2.10.5. We had a brief conversation in #epic-channel and he also said, that he is about to publish the EPIC5 project in Github.

My patch (epic4-2.10.1-maildir.patch) is as follows:

diff -aur epic4-2.10.1/include/config.h epic4-2.10.1.JT/include/config.h
--- epic4-2.10.1/include/config.h    2006-06-18 20:33:51.000000000 +0300
+++ epic4-2.10.1.JT/include/config.h    2012-08-30 13:22:20.319515332 +0300
@@ -412,7 +412,7 @@
#define DEFAULT_LOGFILE "irc.log"
#define DEFAULT_MAIL 2
#define DEFAULT_MAIL_INTERVAL 60
-/ #define DEFAULT_MAIL_TYPE "mbox" /
+#define DEFAULT_MAIL_TYPE "mbox"
#define DEFAULT_MAX_RECONNECTS 4
#define DEFAULT_METRIC_TIME 0
#define DEFAULT_MODE_STRIPPER 0
diff -aur epic4-2.10.1/include/vars.h epic4-2.10.1.JT/include/vars.h
--- epic4-2.10.1/include/vars.h    2006-06-18 20:33:51.000000000 +0300
+++ epic4-2.10.1.JT/include/vars.h    2012-08-30 13:24:19.719723226 +0300
@@ -93,7 +93,7 @@
     LOG_REWRITE_VAR,
     MAIL_VAR,
     MAIL_INTERVAL_VAR,
-    / MAIL_TYPE_VAR, /
+    MAIL_TYPE_VAR,
     MANGLE_INBOUND_VAR,
     MANGLE_LOGFILES_VAR,
     MANGLE_OUTBOUND_VAR,
diff -aur epic4-2.10.1/source/mail.c epic4-2.10.1.JT/source/mail.c
--- epic4-2.10.1/source/mail.c    2006-06-18 20:33:51.000000000 +0300
+++ epic4-2.10.1.JT/source/mail.c    2012-08-30 15:25:05.568641118 +0300
@@ -353,7 +353,7 @@
         return 0;
     }

-    maildir_path = malloc_strdup(tmp_maildir_path);
+    maildir_path = malloc_strdup(maildir);
     maildir_last_changed = -1;
     return 1;
}
@@ -375,13 +375,29 @@
{
     int    count = 0;
     DIR     dir;
+    Filename     tmp_maildir_path;
+    struct dirent*    dir_data;

-    if ((dir = opendir(maildir_path)))
+    strlcpy(tmp_maildir_path, maildir_path, sizeof(Filename));
+    strlcat(tmp_maildir_path, "/new", sizeof(Filename));
+    if ((dir = opendir(tmp_maildir_path)))
     {
-        while (readdir(dir) != NULL)
-            count++;
+        while ((dir_data = readdir(dir)) != NULL) {
+            if (dir_data->d_name[0] != '.')
+                count++;
+        }
+        closedir(dir);
+    }
+
+    strlcpy(tmp_maildir_path, maildir_path, sizeof(Filename));
+    strlcat(tmp_maildir_path, "/cur", sizeof(Filename));
+    if ((dir = opendir(tmp_maildir_path)))
+    {
+        while ((dir_data = readdir(dir)) != NULL) {
+            if (dir_data->d_name[0] != '.')
+                count++;
+        }
         closedir(dir);
-        count -= 2;    / Don't count . or .. /
     }

     return count;
@@ -398,6 +414,7 @@
{
     Stat    sb;
     Stat     stat_buf;
+    Filename     tmp_maildir_path;

     if (ptr)
         stat_buf = (Stat )ptr;
@@ -408,8 +425,11 @@
         if (!init_maildir_checking())
             return 0;        / Can't find maildir /

+    strlcpy(tmp_maildir_path, maildir_path, sizeof(Filename));
+    strlcat(tmp_maildir_path, "/new", sizeof(Filename));
+
     / If there is no mailbox, there is no mail! /
-    if (stat(maildir_path, stat_buf) == -1)
+    if (stat(tmp_maildir_path, stat_buf) == -1)
         return 0;

     /
@@ -547,6 +567,10 @@
     update_mail_level2_maildir();
     if (status == 2)
     {
+        Filename     tmp_maildir_path;
+        strlcpy(tmp_maildir_path, maildir_path, sizeof(Filename));
+        strlcat(tmp_maildir_path, "/new", sizeof(Filename));
+
         / XXX Ew. Evil. Gross. /
         ts.actime = stat_buf.st_atime;
         ts.modtime = stat_buf.st_mtime;
@@ -642,6 +666,27 @@

void    set_mail_type (const void stuff)
{
-    / EPIC4 cannot switch mailbox types (yet) /
+    const char     value;
+    struct mail_checker new_checker;
+    char    old_mailval[16];
+
+    value = (const char )stuff;
+
+    if (value == NULL)
+        new_checker = NULL;
+    else if (!my_stricmp(value, "MBOX"))
+        new_checker = &mail_types[0];
+    else if (!my_stricmp(value, "MAILDIR"))
+        new_checker = &mail_types[1];
+    else
+    {
+        say("/SET MAIL_TYPE must be MBOX or MAILDIR.");
+        return;
+    }
+
+    snprintf(old_mailval, sizeof(old_mailval), "%d", get_int_var(MAIL_VAR));
+    set_var_value(MAIL_VAR, zero);
+    checkmail = new_checker;
+    set_var_value(MAIL_VAR, old_mailval);
}

diff -aur epic4-2.10.1/source/vars.c epic4-2.10.1.JT/source/vars.c
--- epic4-2.10.1/source/vars.c    2008-03-17 05:42:46.000000000 +0200
+++ epic4-2.10.1.JT/source/vars.c    2012-08-30 13:14:54.801014647 +0300
@@ -194,7 +194,7 @@
     { "LOG_REWRITE",        STR_TYPE_VAR,    0, 0, NULL, NULL, 0, 0 },
     { "MAIL",            INT_TYPE_VAR,    DEFAULT_MAIL, 0, NULL, set_mail, 0, 0 },
     { "MAIL_INTERVAL",        INT_TYPE_VAR,    DEFAULT_MAIL_INTERVAL, 0, NULL, set_mail_interval, 0, 0 },
-    / { "MAIL_TYPE",            STR_TYPE_VAR,    0, 0, NULL, set_mail_type, 0, 0 }, /
+    { "MAIL_TYPE",            STR_TYPE_VAR,    0, 0, NULL, set_mail_type, 0, 0 },
     { "MANGLE_INBOUND",        STR_TYPE_VAR,    0, 0, NULL, set_mangle_inbound, 0, 0 },
     { "MANGLE_LOGFILES",        STR_TYPE_VAR,    0, 0, NULL, set_mangle_logfiles, 0, 0 },
     { "MANGLE_OUTBOUND",        STR_TYPE_VAR,    0, 0, NULL, set_mangle_outbound, 0, 0 },
@@ -350,7 +350,7 @@
     set_string_var(HIGHLIGHT_CHAR_VAR, DEFAULT_HIGHLIGHT_CHAR);
     set_string_var(LASTLOG_LEVEL_VAR, DEFAULT_LASTLOG_LEVEL);
     set_string_var(LOG_REWRITE_VAR, NULL);
-    / set_string_var(MAIL_TYPE_VAR, DEFAULT_MAIL_TYPE); /
+    set_string_var(MAIL_TYPE_VAR, DEFAULT_MAIL_TYPE);
     set_string_var(MANGLE_INBOUND_VAR, NULL);
     set_string_var(MANGLE_LOGFILES_VAR, NULL);
     set_string_var(MANGLE_OUTBOUND_VAR, NULL);

by Jari Turkia in Software at 15:34 | Comments (0) | Share in LinkedIn

Fixing inaccurate Windows 7 NTP-client

Saturday, March 29. 2014

I don't have a Windows-domain at home, so the Internet time client (NTP) is on relaxed settings. Your typical Microsoft documentation about NTP will have phrases like: "The default value for domain members is 10. The default value for stand-alone clients and servers is 15" in it. So, it really makes a difference if the computer is in a domain or not.

It is a well established fact, that the hardware clock on your computer is quite inaccurate. On a modern computer, there is no point in using expensive hardware to make the clock run smoothly, you can always set the time from a reliable clock source from Internet. That's what the NTP was made decades ago, to make sure that everybody has the same time in their boxes.

The real question here is: Why does my Windows 7 clock skew so much? I have set up the internet time, but it still is inaccurate.

As a Linux-guy I love doing my stuff on the command-line. To question about the clock skew I'll do:

w32tm /monitor /computers:-the-NTP-server-

... and it will respond something like NTP: -0.7900288s offset from local clock. So it's almost a second behind the accurate time source.

The initial fix is easy, force it to get the accurate time from the configured time server:

w32tm /resync

But I cannot be doing that all the time. Why cannot the computer maintain a well disciplined clock like I configured it to do? There must be something fishy about that.

A command like:

w32tm /query /status

will say that Poll Interval: 10 (1024s), but I cannot confirm that requests for every 1024 seconds (or less). It simply does not do that. There is a TechNet article with the title of Windows Time Service Tools and Settings describing a registry setting of MaxPollInterval located in HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\W32Time\Config, but that has no real relevance here. The update mechanism does not obey that setting.

However, Microsoft's knowledge base article 884776 titled How to configure the Windows Time service against a large time offset gives more insight about the update interval. It describes a registry value of SpecialPollInterval located in HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\W32Time\TimeProviders\NtpClient for manual peers. I'm guessing I have a manual peer, whatever that means. I don't have a domain and I did set the server manually. The original value seems to be 604800 seconds, making that 7 days or a week. Whoa! Way too much for me.

While sniffing the network traffic with the Wireshark, indeed I can confirm that putting a small value into that will make my Windows 7 to poll on that interval. I put 10 seconds there, and it seems to work. For any real life scenario 10 seconds to update time is ridiculous. For a computer on a domain, the value is 3600 seconds, making the updates for every hour. I chose to use that.

Please note that changing the registry value requires a restart for the Windows time client. From a command line a:

net stop w32time
net start w32time

will do the trick and start using the newly set registry value. You can also restart the Windows Time service from GUI.

Now my computer's time seems to stick with a reasonable accuracy. I'm still considering of purchasing a GPS-time box of my own. They seem to be quite expensive, though.

by Jari Turkia in Windows at 06:51 | Comment (1) | Share in LinkedIn

3 Italy firmware for u-12

Friday, March 28. 2014

I got a comment from Mr. nos_com71 about 3 Italy's firmware for u-12.

The download link is to Mediafire, which definitely is not my favorite place to pick up something I'd like to run on any of my computers. But if you think you can handle it, go for https://www.mediafire.com/?jut00ju7uov988z to get it.

Little bit of FMK:ing revealed, that 3 Italy is using a classic V100R001C12SP104 (see the article about 3 Denmark's firmware with exactly the same version). The important thing is, that he pointed out that the SSH passwords are stored unencrypted in /var/sshusers.cfg. This is exactly like my version does.

So, those people who are able to use the exploit, are able to know what the SSH-passwords are. A command like
B593cmd.pl "cat /var/sshusers.cfg" will give you immediate answer and a regular
ssh admin@-the-IP-here- /bin/sh will do the rest. As I previously stated, the thing is, you need to know the admin password to the web-console and old enough firmware to have the exploit in it to be table to any of that.

by Jari Turkia in Huawei B593 at 15:41 | Comments (8) | Share in LinkedIn

Wrangling permissions on an enforcing SElinux setup

Saturday, March 22. 2014

Most people don't much care about their Linux-boxes' security. You install it, you run it, you use it and occasionally run some system updates into it. Not me. When I have a box running against the wild wild Net, I absolutely positively plan to make the life of anybody cracking into one of my boxes as difficult as possible (with some usability left for myself). See Mr. Tan's article about Security-Functionality-Usability Trade-Off.

So, my choice is at the Functionality - Security -axis with less on the Ease-of-use. The rationale is that, a web application needs to run as safely as possible and can have the ease-of-use in it. The system administrator is a trained professional, he doesn't need the easy-part so much. However, there is a point, when things are set up too tight:

Image courtesy of Dilbert by Scott Adams

So, I voluntarily run software designed and implemented by NSA, SElinux. I even run it in the the Enforcing-mode which any even remotely normal system administrator thinks as being totally insane! Any small or even a tiny slip-up from the set security policy will render things completely useless. Mordac steps in and stuff simply does not work anymore.

On my Fedora-box there was a bug in BIND, the name server and an update was released to fix that. After running the update, the DNS was gone. As in, it didn't function, it didn't respond to any requests and the service didn't start. All it said was:

# systemctl status named-chroot.service --full
named-chroot.service - Berkeley Internet Name Domain (DNS)
Loaded: loaded (/usr/lib/systemd/system/named-chroot.service; enabled)
Active: failed (Result: timeout)

Any attempt to start the service resulted in a 60 second wait and a failure. dmesg-log had nothing about the issue, nor BIND's own log had nothing about the issue in it. So I started suspecting a SElinux-permission issue. My standard SElinux debugging always starts with a:

cat /var/log/audit/audit.log | audit2allow -m local

... to see if SElinux's audit logger is logging any permission-related audit faults. Indeed it did:

require {
        type named_conf_t;
        type named_t;
        class dir write;
}

#============= named_t ==============
allow named_t named_conf_t:dir write;

That reads:
A process running in named_t security context is trying to access a directory with named_conf_t security context to gain a write access, but is denied while doing so.
It is obvious that the process in question must be the BIND name server. No other process has the named_t security context in it. When starting up, BIND name server was about to write into its own configuration directory, which is a big no no! When you write, you write only to designated directories, nowhere else (remember: running in enforcing-mode is insanity).

That is definitely a reason for a daemon not to start or to timeout while starting. Further investigation showed that also Fedora's SElinux policy had been updated a week ago: selinux-policy-3.12.1-74.19.fc19.

At this point I had all the pieces for the puzzle, it was simply a matter of putting it all together. The recently released SElinux policy has a bug in it, and nobody else was there to fix it for me.

The exact audit-log line is:

type=AVC msg=audit(1395481575.712:15239): avc:
denied { write } for
pid=4046 comm="named" name="named" dev="tmpfs" ino=14899
scontext=system_u:system_r:named_t:s0
tcontext=system_u:object_r:named_conf_t:s0 tclass=dir

So, my chrooted BIND-damon was trying to write into a tmpfs. There aren't that many of those in a system. I've even touched the tmpfs-subject earlier when I wrote a systemd-configuration into my own daemon. To find the tmpfs-usage, I ran:

# mount | fgrep tmpfs
tmpfs on /var/named/chroot/run/named type tmpfs

BIND's chroot-environment has one. That is very likely the culprit. That can be confirmed:

# ls -Z /var/named/chroot/run/
drwxrwx---. named named system_u:object_r:named_conf_t:s0 named

Yep! That's it. The directory has incorrect security context in it. To compare into system's non-chrooted one:

# ls -Zd /run/
drwxr-xr-x. root root system_u:object_r:var_run_t:s0 /run/

There is a difference between named_conf_t and var_run_t. You can write temporary files into latter, but not to the first one. The fix is very simple (assuming, that you speak fluent SElinux):

semanage fcontext -a -t var_run_t "/var/named/chroot/run(/.*)?"
restorecon -R -v named/

The two commands are:
First, re-declare a better security-context for the directory in question and then start using the new definition. Now my BIND started and was fully operational! Nice.

My investigation ran further. I needed to report this to Fedora-people. I looked into the policy-file of /etc/selinux/targeted/contexts/files/file_contexts and found the faulty line in it:

/var/named/chroot/var/run/named.* system_u:object_r:named_var_run_t:s0

That line almost works. The directory in question has only two files in it. One of them even has a matching name. The problem, obviously, is that the another one does not:

# ls -l /var/named/chroot/run/named/
total 8
-rw-r--r--. 1 named named 5 Mar 22 12:02 named.pid
-rw-------. 1 named named 102 Mar 22 12:02 session.key

See Bug 1079636 at Red Hat Bugzilla for further developments with this issue.

by Jari Turkia in Linux at 15:09 | Comments (0) | Share in LinkedIn

Disabling non-disableable Internet Explorer add ons

Friday, March 21. 2014

One day my laptop shut itself down while I was getting a cup of coffee. No big deal, I thought. I'll just plug it into charger and things will be ok again. It took me by surprise to see, that the battery was 80% charged and the laptop had done a "crash landing". Apparently it chose to turn itself off. I'm guessing to avoid an over-heating situation.

Couple of weeks later I realized that a machine that does not do anything, chews about 25% CPU constantly. The natural guess would be a virus scanner, but it turned out to be a process called IEWebSiteLogon.exe:

I've never heard of such an application. Google didn't reveal anything useful, but the process properties revealed that the file was located at C:\Program Files\Lenovo Fingerprint Reader\x86\, so the conclusion is that my fingerprint reader's software is running a piece of software to eat up a lot of CPU-resources to do exactly nothing.

The file name gave me a hint, that it has something to do with Internet Explorer. I was running IE 11:

I opened the add ons manager:

and there it was. My initial idea of disabling the stupid thing didn't pan out. The Disable-button is grayed out. Searching The Net revealed two interesting pieces of information: How to Remove Unneeded Plug-Ins in Internet Explorer By Andy Rathbone from Windows 8 For Dummies, which proved to be useless, it instructs to disable the add on. The second yielded results: Can't remove Internet Explorer Add-On. It described a way to track down the component by its class ID. Nice, but not nice enough. Somewhere there is a piece of code to attempt to load the missing component. Why not remove the requirement?

The details of the add on are:

Now I had the class ID of {8590886E-EC8C-43C1-A32C-E4C2B0B6395B}. According to SystemLookup.com is a valid piece of software, they say: "This entry is classified as legitimate". That class ID can be found in my Windows system's registry from the following locations:

HKEY_CLASSES_ROOT\CLSID\
HKEY_CLASSES_ROOT\Wow6432Node\CLSID\
HKEY_LOCAL_MACHINE\SOFTWARE\Classes\Wow6432Node\CLSID\
HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows\CurrentVersion\Explorer\Browser Helper Objects\
HKEY_CURRENT_USER\Software\Microsoft\Internet Explorer\Approved Extensions
HKEY_CURRENT_USER\Software\Microsoft\Windows\CurrentVersion\Ext\Settings\

The interesting ones are the system setting of Browser Helper Objects and user setting of Approved Extensions. Removing the helper object surely will disable the add on completely. Also it will be a good idea to make it a not-approved extension. And to un-register the component. All that should give the stupid add on a decisive blow and make it not waste my precious CPU-cycles.

The following PowerShell-commands run with administrator permissions will do the trick:

Remove-Item -path
"HKLM:\SOFTWARE\Microsoft\Windows\CurrentVersion\Explorer\Browser Helper Objects\{8590886E-EC8C-43C1-A32C-E4C2B0B6395B}"
Remove-Item -path
"HKCU:\Software\Microsoft\Windows\CurrentVersion\Ext\Settings\{8590886E-EC8C-43C1-A32C-E4C2B0B6395B}"
Remove-ItemProperty -path
"HKCU:\Software\Microsoft\Internet Explorer\Approved Extensions" -name "{8590886E-EC8C-43C1-A32C-E4C2B0B6395B}"

If you don't have admin-permissions, the commands will fail. Also please note that every time Internet Explorer is started, it will make sure that permissions in the registry HKEY_CURRENT_USER\Software\Microsoft\Internet Explorer\Approved Extensions are set so, that user is denied any modification access. See this:

I tried to remove the deny ACL with PowerShell, but it seems to be impossible. The API is not mature enough.

After removing the deny ACL and running the PowerShell-commands and finally stopping and starting the Internet Explorer, the add on was gone. I managed to "disable" it completely.

by Jari Turkia in Windows at 06:10 | Comments (0) | Share in LinkedIn

PHP large file uploads

Thursday, March 20. 2014

Here I bumped into a really popular subject. My ownCloud had a really small upload limit of 32 MiB and I was aiming for the 1+ GiB range. The "cloud" is in a tiny box and is running a 32-bit Linux, so 2 GiB is the absolute maximum for a file that can pass trough Apache and PHP. The limits are documented in ownCloud Administrators Manual - Dealing with Big File Uploads.

Raising the file size limits is something I could do myself. Here is a reference for you: How to Upload Large Files in PHP. Its simply about finding the parameters for limits and setting them to a bigger value.

I created different size sample files and tested with them. I found out that there is a point after Apache started the upload, uploaded for a while and exited with a HTTP/500. In my case 600 MiB file passed ok, but 800 MiB file did not. I later found out, that it wasn't about the file sizes itself, but max input time. I had missed that one on my setup.

The max input time is a classic, for example a conversion with topic "PHP file upload affected or not by max_input_time?" discusses the issue in detail. The conclusion is that, the actual upload speed (or network bandwidth available) has nothing to do with the input processing, or maximum value of it. There is a PHP manual page of http://php.net/manual/en/features.file-upload.common-pitfalls.php and it clearly says:

max_input_time sets the maximum time, in seconds, the script is allowed to receive input;
this includes file uploads. For large or multiple files, or users on slower connections,
the default of 60 seconds may be exceeded.

But that simply is not true! In the another section of PHP manual the integer directive max_input_time is defined as:

This sets the maximum time in seconds a script is allowed to parse input data, like POST and GET. Timing begins at the moment PHP is invoked at the server and ends when execution begins.

When is PHP invoked? Let's say you're running Apache. You're actually uploading the file to Apache, which after receiving the file passes the control to a handler. PHP in this case. Surely the input processing does not start at the point where uploading starts.

Test setup

The upload is affected by following PHP configuration directives:

file_uploads: The master switch. This one is rarely disabled as it makes any file upload processing impossible on PHP.

Changeable: PHP_INI_SYSTEM

upload_max_filesize: Max size of a single file.

PHP_INI_PERDIR

post_max_size: Max size of the entire upload batch. A HTTP POST can contain any number of files. In my test only one file is used.

PHP_INI_PERDIR

max_input_time: As discussed above, the time to parse the uploaded data and files. This would include populating $_FILES superglobal.

PHP_INI_PERDIR

max_execution_time: The time a script is allowed to run after its input has been parsed. This would include any processing of the file itself.

PHP_INI_ALL

memory_limit: The amount of memory a script is allowed to use during its execution. Has absolutely nothing to do with the size of the file uploaded, unless the script loads and processes the file.

PHP_INI_ALL

upload_tmp_dir: This is something I threw in based on testing. None of the articles ever mention this one. This defines the exact location where the uploaded file initially goes. If the PHP-script does not move the uploaded file out of this temporary location, the file will be deleted when script stops executing. Make sure you have enough space at this directory for large files!

PHP_INI_SYSTEM

A PHP script cannot change all of the introduced configuration values. The changeable limits are defined as:

PHP_INI_USER: Entry can be set in user scripts (like with ini_set())
PHP_INI_PERDIR: Entry can be set in php.ini, .htaccess, httpd.conf
PHP_INI_SYSTEM: Entry can be set in php.ini or httpd.conf

For testing purposes I chose the POST and upload max sizes to be 1 GiB (or 1024 MiB). To test the timeout values, I chose relatively small values of 2 seconds both for input parsing and script execution. Also to prove that memory limit does not limit the file upload, I chose the available memory for the script to be 1 MiB. The memory limit is not an issue, as my script does not touch the file, does not try to load or process it.

My test script carefully enforces the above limits just to make sure, that there is no configuration mistakes.

Sample files were generated out of randomness with a command:

dd if=/dev/urandom of=900M bs=1024 count=921600

A number of files of different size was used, but since the POST-limit was set to 1 GiB or 1073741824 bytes, it is impossible to upload a file of the same size. There is always some overhead in a HTTP POST-request. So, the maximum file size I succesfully used with these parameters was 900 MiB. Interestingy it was the 2 second input processing time which caused problems.

The sample code:

<?php
// Adapted by JaTu 2014 from code published in
// http://stackoverflow.com/questions/11387113/php-file-upload-affected-or-not-by-max-input-time

$iniValues = array(
    'file_uploads' => '1',              // PHP_INI_SYSTEM
    'upload_max_filesize' => '1024M',   // PHP_INI_PERDIR
    'post_max_size' => '1024M',         // PHP_INI_PERDIR
    'max_input_time' => '2',            // PHP_INI_PERDIR
    'max_execution_time' => '2',    // PHP_INI_ALL
    'memory_limit' => '1M',         // PHP_INI_ALL
);
$iniValuesToSet = array('max_execution_time', 'memory_limit');
$upload_max_filesize_inBytes = 1073741824; // 1 GiB

foreach ($iniValues as $variable => $value) {
    $cur = ini_get($variable);
    if ($cur !== $value) {
        if (in_array($variable, $iniValuesToSet)) {
            $prev = ini_set($variable, $value);
            if ($prev === false) {
                // Assume the previous value was not FALSE, but the set failed.
                // None of those variables can reasonable have a boolean value of FALSE anyway.
                die('Failed to ini_set() a value into variable ' . $variable);
            }
        } else {
            die('Failed to ini_set() a value into variable ' . $variable . ' and make it stick.');
        }
    }
}

if (!empty($_FILES) && isset($_FILES['userfile'])) {
    switch ($_FILES['userfile']["error"]) {
    case UPLOAD_ERR_OK:
        $status = 'There is no error, the file uploaded with success.';
        break;
    case UPLOAD_ERR_INI_SIZE:
        $status = 'The uploaded file exceeds the upload_max_filesize directive in php.ini.';
        break;
    case UPLOAD_ERR_FORM_SIZE:
        $status = 'The uploaded file exceeds the MAX_FILE_SIZE directive that was specified in the HTML form.' .
            ' Value is set to: ' . $_POST['MAX_FILE_SIZE'];
        break;
    case UPLOAD_ERR_PARTIAL:
        $status = 'The uploaded file was only partially uploaded.';
        break;
    case UPLOAD_ERR_NO_FILE:
        $status = 'No file was uploaded.';
        break;
    case UPLOAD_ERR_NO_TMP_DIR:
        $status = 'Missing a temporary folder.';
        break;
    case UPLOAD_ERR_CANT_WRITE:
        $status = 'Failed to write file to disk.';
        break;
    case UPLOAD_ERR_EXTENSION:
        $status = 'A PHP extension stopped the file upload. PHP does not provide a way to ascertain which extension caused the file upload to stop; examining the list of loaded extensions with phpinfo() may help.';
        break;
    default:
        $status = 'No idea. Huh?';
    }

    print "Status: {$status}<br/>\n";
    print '<pre>';
    var_dump($_FILES);
    print '</pre>';
}
?>
<form enctype="multipart/form-data" method="POST">
<input type="hidden" name="MAX_FILE_SIZE" value="<?php print $upload_max_filesize_inBytes ?>" />
File: <input name="userfile" type="file" />
<input type="submit" value="Upload" />
</form>

Test 1: PHP 5.5.10 / Apache 2.4.7

This is a basic Fedora 19 box with standard packages installed. PHP reports Server API as Apache 2.0 Handler.

To get the required setup done I had a .htaccess-file with following contents:

php_value upload_max_filesize "1024M"
php_value post_max_size "1024M"
php_value max_input_time 2

I used time-command from bash-shell combined with a cURL-request like this:

curl --include --form userfile=@800M http://the.box/php/upload.php

Timing results would be:

real    0m7.595s
user    0m1.044s
sys     0m3.259s

That is 7.5 seconds wallclock time to upload a 800 MiB file. The time includes any transfer over my LAN and processing done on the other side. No failures were recorded for the 2 second time limits or memory limits.

Errors would include:

PHP Warning: POST Content-Length of 1073742140 bytes exceeds the limit of 1073741824 bytes in Unknown on line 0

When POST-limit was exceeded

PHP Fatal error: Maximum execution time of 2 seconds exceeded in Unknown on line 0

When input processing took too long time

If you are about to go over the 2 GiB limit, please see LimitRequestBody configuration directive for Apache. It is 0 by default meaning 2 GiB. The exact wording from the docs is: "This directive specifies the number of bytes from 0 (meaning unlimited) to 2147483647 (2GB) that are allowed in a request body". However, according to number of sources, the unlimited is spelled as 2 GiB. To go over that, you may need to compile the Apache by yourself.

Warning!
Apache paired with PHP was especially difficult on situations where a HTTP/500 would occur for any reason. The temporary file would NOT be cleaned up immediate after the PHP-script died. The cleaning would occur at the point where Apache worker process would be recycled. Sometimes my temp-drive ran out of disc space an I had to manually trigger an Apache service restart to free up the space. But if you're in server exploiting business and manage to find one that allows large file uploads, it is possible to cause a resource exhaustion for the disc space by simply uploading very large files repeatedly. When upload fails the space is not immediately freed.

Test 2: PHP 5.4.26 / Nginx 1.4.6

To confirm that this is not an Apache-thing or limited to latest version of PHP, I did a second run with a different setup. I took my trustworthy Nginx equipped with PHP-FPM running on a virtualized CentOS. This time I didn't use standard components and used only packages compiled and tailored for my own web server. PHP reports Server API as FPM/FastCGI.

My /etc/php-fpm.d/www.conf had:

php_admin_value[upload_max_filesize] = "1024M"
php_admin_value[post_max_size] = "1024M"
php_admin_value[max_input_time] = "2"
php_admin_value[max_execution_time] = 2
php_admin_value[memory_limit] = 1M

PHP's own ini_set()-function was unable to set any of the values, including those it was allowed to change. I didn't investigate the reason for that and chose to declare all of the required settings in the worker definition.

To get large POSTs into Nginx, my /etc/nginx/nginx.conf had:

location ~ \.php$ {
client_max_body_size 1024M;
}

Timing results would be:

real    0m16.170s
user    0m1.060s
sys     0m2.854s

That is 16.1 seconds wallclock time to upload a 800 MiB file. The time includes any transfer over my LAN and processing done on the other side. No failures were recorded for the 2 second time limits or memory limits.

Errors would include:

413 Request Entity Too Large

On the browser end

*22 client intended to send too large body: 838861118 bytes

On the Nginx error log

If max POST size was hit.

Conclusions

As found in the net max_input_time and max_execution_time have nothing to do with the network transfer. Both of those limits affect only server's processing after the bytes are transferred.

by Jari Turkia in Programming at 06:39 | Comments (5) | Share in LinkedIn

Trivial mod_rewrite: Redirect to another file in the same directory

Wednesday, March 19. 2014

I found a funny quote at Htaccess Rewrites - Rewrite Tricks and Tips, it says:

``Despite the tons of examples and docs, mod_rewrite is voodoo. Damned cool voodoo, but still voodoo. ''

-- Brian Moore
bem@news.cmc.net

The quote is originally at http://httpd.apache.org/docs/2.0/rewrite/. Now obsoleted documentation for old Apache version.

I'll have to second Brian's opinion. I've touched the subject earlier at Advanced mod_rewrite: FastCGI Ruby on Rails /w HTTPS.

My YUM-repo definition RPM had a bug in it (see: CentOS 6 PHP 5.4 and 5.5 yum repository) and I had to release a new version of it. There exist already couple of links to the file. Why didn't I think of a situation where an update is released? Darn! So, let's keep the URL alive, even if a new version of the file with different name is released. That way everybody stays happy.

Attempt 1: Failure

An over enthusiastic "hey, that should be simple!" -type of naive solution. Create a .htaccess-file into the appropriate directory with content:

RedirectPermanent oldname.rpm newname.rpm

Well ... no. The result is a HTTP/500 and in the error log there was a:

/home/the/entire/path/here/.htaccess: Redirect to non-URL

Ok. It didn't work.

Attempt 2: Failure

Let's ramp this up. Forget the simple tools, hit it with mod_rewrite! Make .htaccess contain:

RewriteEngine on
RewriteRule ^oldname\.rpm$ newname.rpm [R=301]

Well ... no. The result is a HTTP/404, because the redirect goes really wrong. The result will be http://my.server.name/home/the/entire/path/here/newname.rpm, which is pretty far from being correct. There is a funny mix of URL and the actual filesystem storage.

The reason can be found from the Apache docs at RewriteRule PT-flag:

"The target (or substitution string) in a RewriteRule is assumed to be a file path, by default. The use of the [PT] flag causes it to be treated as a URI instead."
and
"Note that the PT flag is implied in per-directory contexts such as <Directory> sections or in .htaccess files."

That phrase can be translated as:

Internally RewriteRule works with filesystem paths
When using RewriteRule from a .htaccess-file it does not use filesystem paths, but URLs
A .htaccess-file really messes things up

Something more elegant is obviously needed.

Attempt 3: Failure

I studied the Apache docs and found a perfect solution! What about if there was a way to discard the filesystem path entirely. Nice! Let's go that way, make .htaccess contain:

RewriteEngine on
RewriteRule ^oldname\.rpm$ newname.rpm [R=301,DPI]

Well ... no. I have the DiscardPathInfo-flag there, but it changes absolutely nothing. It is the same with or without the flag. It clearly says that "The DPI flag causes the PATH_INFO portion of the rewritten URI to be discarded" in the docs. Apparently the flag is used for completely different thing (which I'm having hard time to comprehend), but the thing is that I cannot use it to fix my redirect.

Attempt 4: Success!

After browsing the Apache-docs even more I struck gold. The docs for RewriteBase-directive say:

"This directive is required when you use a relative path in a substitution in per-directory (htaccess) context"
and
"This misconfiguration would normally cause the server to look for an "opt" directory under the document root."

That's exactly what I'm doing here. I have a relative path. I'm using a substitution in a .htaccess-file. It even mis-behaves precisely like in the example from the docs.

The solution is to make .htaccess contain:

RewriteEngine on
RewriteBase /path/here/
RewriteRule ^oldname\.rpm$ newname.rpm [R=301]

Now it works exactly as I want it to do! Nice!
When a request is done for the old filename, Apache will do an external redirect and notify browser about the new version. wget fails to save the file with the new name (it will use the old name), but for example Firefox does that correctly.

Conclusion

Darn that voodoo is hard.

The mod_rewrite's complexity simply laughs at any system administrator. I consider myself to be one of the experienced ones, but still ... I find myself struggling with the subject.

by Jari Turkia in Software at 17:38 | Comments (0) | Share in LinkedIn

Cisco ASA protected SSH-connection hangs - [Fixed]

Thursday, March 13. 2014

Couple of my users were complaining, that their SSH-connection dies when idling for a while. The client does not notice, that server is gone. It cannot regain communications and dies only after a transmission is attempted, failed and timed out.

My initial reaction was, that a firewall disconnects any "non-used" TCP-connections. The non-used may or may not be true, but the firewall thinks that and since it can make the decision, it disconnects the socket. There is one catch: if the TCP-socket is truly disconnected, both the server and the client should notice that and properly end the SSH-session. In this case they don't. For those readers not familiar with the details of TCP/IP see the state transition diagram and think half-closed connection as being ESTABLISHED, but unable to move into FIN_WAIT_1 because firewall is blocking all communications.

Googling got me to read a discussion thread @ Cisco's support forums titled SSH connections through asa hanging. There Mr. Julio Carvaja asks the original poster a question: "Can you check the Timeout configuration on your firewall and also the MPF setup. What's the Idle time you have configured for a TCP session?" So I did the same. I went to the box and on IOS CLI ran the classic show running-config, which contained the timeout values:

timeout conn 1:00:00 half-closed 0:10:00 udp 0:02:00 icmp 0:00:02

From that I deduce that any TCP-connection is dropped after one hour of idling. It is moved into half-closed state after 10 minutes of idle. The 10 minutes is in the time range of my user complaints. One hour is not. So essentially Cisco ASA renders the TCP-connection unusable and unable to continue transmitting data.

In the discussion forum there is suggestion to either prolong the timeout or enable SSH keepalive. I found the way of defining a policy for SSH in the ASA. There is an article titled ASA 8.3 and Later: Set SSH/Telnet/HTTP Connection Timeout using MPF Configuration Example, which describes the procedure in detail.

However, I choose not to do that, but employ keepalive-packets on my OpenSSHd. I studied my default configuration at /etc/ssh/sshd_config and deduced that keepalive is not in use. In the man-page of sshd_config(5) I can find 3 essentially required configuration directives:

TCPKeepAlive: The master switch to enable/disable the mechanism.

This is on by default, but this alone does not dicatate if the keepalive will be used or not

ClientAliveInterval: The interval [in seconds] at how often a keepalive packet is being transmitted

As default, this is 0 seconds, meaning that no packets will be sent.

ClientAliveCountMax: The number of packets that a client did not respond to before declaring the connection dead

As default this is 3. Still, no packets are sent ever, thus a client is never declared M.I.A. based on this criteria.

So to fix the failing SSH-session problem, the only thing I changed was to set a client alive interval. Since after 10 minutes of idling (600 seconds), the Cisco ASA will mess up the connection, I chose half of that, 300 seconds.

After restarting the sshd, opening a connection and idling for 5 minutes while snooping the transmission with Wireshark, I found out that my SSH server and client exchanged data after ever 300 seconds. The best thing about the fix is that it works! It solves the problem and SSH-connection stays functional after long period of idling.

by Jari Turkia in Hardware at 20:05 | Comments (0) | Share in LinkedIn

Using own SSL certificate in Cisco ASA

Tuesday, March 11. 2014

Yesterday I was mighty pissed about Oracle's Java breaking my stuff. Then it occurred to me: I shouldn't be using self-signed certificates in the first place! See my post about Certificate Authority setup: Doing it right with OpenSSL, where I wrote "My personal recommendation is to never use self-signed certificates for anything". And there I was! Darn.

I figured, that there must be a way of stop the stupidity and install an own certificate to the network appliance. Then I bumped into ASA 8.x: Renew and Install the SSL Certificate with ASDM, a PDF-document from Cisco instructing how to achieve that. Nice! Exactly what I needed.

This is how to do it. Log into ASDM and go to Configuration -> Device Management -> Certificate Management -> Identify Certificates. It looks like this:

There you can find Add-button:

You should add a new identify certificate. I used the Default-RSA-Key, but Cisco's own documentation says to generate a new one. In case SSH-keys need to be regenerated, the SSL-certificate won't work anymore. In my case I can simply recreate the certificate also, so it is not an issue to me. After you click Add Certificate:

You will get a dialog to save the Certificate Signing Request (CSR) into a local drive to be sent to a Certification Authority (CA) to be processed.

After your CSR has gone trough and you have your certificate in PEM-format go back to ASDM and select Install this time. You will get a dialog:

Upload or copy/paste the PEM certificate there and click Install Certificate. After that you'll have an identity:

Now the next thing to do is to start using the newly created identity. In Configuration -> Device Management -> Advanced -> SSL Settings there is an option to choose an identity to use when ASDM is being accessed via HTTPS or ASDM-IDM.

To get better results from Qualsys SSL Labs server test I did following changes:

SSL-version is set to TLS V1 Only, that is TLS 1.0 only. 1.1 nor 1.2 is not available.
For encryption I'm only using:

112-bit 3DES-SHA1
128-bit RC4-SHA1
128-bit RC4-MD5

The AES-encryptions for 128-bit or 256-bit failed on my box for some reason. If you have them, please use those! The chosen 3 crypto algos provide reasonable security, but the AES-ones are better.

After an Apply the new certificate is in use. You can verify that via a web browser from HTTPS-interface or go to Control Panel's Java security settings and remove the self-signed certificate from secure site certificates -list. The ASDM-login will work again.

by Jari Turkia in Hardware at 13:13 | Comments (0) | Share in LinkedIn

Java 1.7 update 51 breaking Cisco ASDM login

Monday, March 10. 2014

One day I needed to drill a hole to a Cisco firewall. I went to Adaptive Security Device Manager and could not log in. Whaat?!

It did work before, but apparently something changed. Sneak peek with Wireshark revealed that SSL handshake failed. Java console has something like this in it:

java.lang.SecurityException: Missing required Permissions manifest attribute in main jar: https://dm-launcher.jar
    at com.sun.deploy.security.DeployManifestChecker.verifyMainJar(Unknown Source)
    at com.sun.deploy.security.DeployManifestChecker.verifyMainJar(Unknown Source)
    at com.sun.javaws.Launcher.doLaunchApp(Unknown Source)
    at com.sun.javaws.Launcher.run(Unknown Source)
    at java.lang.Thread.run(Unknown Source)

and:

javax.net.ssl.SSLHandshakeException: java.security.cert.CertificateException: Java couldn't trust Server
    at sun.security.ssl.Alerts.getSSLException(Unknown Source)
    at sun.security.ssl.SSLSocketImpl.fatal(Unknown Source)
    at sun.security.ssl.Handshaker.fatalSE(Unknown Source)
    at sun.security.ssl.Handshaker.fatalSE(Unknown Source)
    at sun.security.ssl.ClientHandshaker.serverCertificate(Unknown Source)
    at sun.security.ssl.ClientHandshaker.processMessage(Unknown Source)

Little bit of googling revealed Issues Accessing ASDM at Cisco's learning network and Cisco ASDM blocked by Java? at spiceworks.com. So I wasn't alone with the problem. Oracle's release notes for update 51 revealed a number of changes to earlier versions. Java is still piece of shit, but they're trying to fix it. Too little, too late. It is very unfortunate that I have to have Java Runtime installed and use it for a number of important applications. Now Oracle is making radical changes to JRE to improve its flaky security and these customer companies like Cisco cannot keep up with the changes.

Anyway, enough rant, here is how to fix it. The idea is to take the self-signed certificate from the Cisco firewall and import it for Java. This is yet another nice feature of a Windows-computer. There needs to be separate a separate certificate store for operating system, browser and Java.

First go to web-interface of the Cisco appliance. Internet Explorer cannot export a certificate from a web site, so use a Firefox or Chrome or pretty much any other browser. Save the certificate to a file. Like this:

When you have the file, go to Control Panel on Windows:

Select Java and Security-tab:

From there you can find Manage Certificates. Import the certificate-file from there:

It is very, very important that you first select Certificate Type: Secure Site. Any other certificate type won't fix the problem.

On the security-tab there is an exception list for certificates. Adding an exception won't fix this, since the problem is with the fact that the certificate is self-signed.

Now login works again.
When I first encountered this issue, I asked help from couple of guys who are very familiar with Cisco IOS (not Apple iOS). The initial response was "What is ASDM?" Apparently the GUI is not the expert's way to go.

by Jari Turkia in Software at 21:17 | Comments (27) | Share in LinkedIn

Firefox per dir save

Sunday, March 9. 2014

Once upon a time there was a Firefox version which remembered where something was saved from a website. I think the last download destination directory was stored per host. I clearly remember that when I downloaded something from SourceForge, it would remember the directory, but since there are number of projects I download from it didn't always be a correct one. But for websites, I use for only one piece of software, it was always correct.

Then something changed. My Firefox wouldn't remember my destinations anymore.

Now for some nostalgic reason I wanted the functionality back. My loyal/hated aide Google found me a solution for that. A Mozilla support forum discussion named Changing FF25 download location in Win7 - Browse will not get past /User folder. It clearly states that such a setting exists, but is now off by default. When I went to my about:config, it looked like this:

Like the support forum promised, the configuration directive is hidden. They say hidden, but actually it is not created at all. Luckily you can add it by yourself by right clicking on the Firefox configuration:

Just add a new boolean variable with name browser.download.lastDir.savePerSite and you're good to go. Remember to set the value into true. It is enabled immediately, no restarting or anything needed.

The last thing for me to do is keep wondering why it was enabled in a couple of versions when it was introduced and then turned off.

by Jari Turkia in Software at 17:21 | Comments (0) | Share in LinkedIn

PHP bashing: fractal of bad design?

Wednesday, March 5. 2014

I was inspecting the already fallen Bitcoin exhange Mt. Gox's source code at PasteBin and found a nice discussion thread about it at Hacker News with the title of Alleged Mt.Gox code leaked on IRC node by Russian Hacker. To my great surprise, the Bitcoin exchange was written with PHP by Mark Karpeles (aka. MagicalTux), the CEO/owner of Mt. Gox. According to number of sources thinks he is some kind of PHP-deity. Based on his (alleged) work posted to Bitcoin, I'd say no.

Anyway, I got lost reading Mr. Alex Munroe's (aka. veekun) rant about PHP. It is so good stuff, I'll have to copy/paste his entire very long blog post here as-is just to make sure it is not lost in the digital world. To repeat: I did not write this. This does not reflect my opinions. It is just a comprehensive list of all the weird things that bug Mr. Munroe. All rights for this are his, not mine. And lot of thanks for gathering this huge list.

Here goes:

Preface

I’m cranky. I complain about a lot of things. There’s a lot in the world of technology I don’t like, and that’s really to be expected—programming is a hilariously young discipline, and none of us have the slightest clue what we’re doing. Combine with Sturgeon’s Law, and I have a lifetime’s worth of stuff to gripe about.

This is not the same. PHP is not merely awkward to use, or ill-suited for what I want, or suboptimal, or against my religion. I can tell you all manner of good things about languages I avoid, and all manner of bad things about languages I enjoy. Go on, ask! It makes for interesting conversation.

PHP is the lone exception. Virtually every feature in PHP is broken somehow. The language, the framework, the ecosystem, are all just bad. And I can’t even point out any single damning thing, because the damage is so systemic. Every time I try to compile a list of PHP gripes, I get stuck in this depth-first search discovering more and more appalling trivia. (Hence, fractal.)

PHP is an embarrassment, a blight upon my craft. It’s so broken, but so lauded by every empowered amateur who’s yet to learn anything else, as to be maddening. It has paltry few redeeming qualities and I would prefer to forget it exists at all.

But I’ve got to get this out of my system. So here goes, one last try.

An analogy

I just blurted this out to Mel to explain my frustration and she insisted that I reproduce it here.

I can’t even say what’s wrong with PHP, because— okay. Imagine you have uh, a toolbox. A set of tools. Looks okay, standard stuff in there.

You pull out a screwdriver, and you see it’s one of those weird tri-headed things. Okay, well, that’s not very useful to you, but you guess it comes in handy sometimes.

You pull out the hammer, but to your dismay, it has the claw part on both sides. Still serviceable though, I mean, you can hit nails with the middle of the head holding it sideways.

You pull out the pliers, but they don’t have those serrated surfaces; it’s flat and smooth. That’s less useful, but it still turns bolts well enough, so whatever.

And on you go. Everything in the box is kind of weird and quirky, but maybe not enough to make it completely worthless. And there’s no clear problem with the set as a whole; it still has all the tools.

Now imagine you meet millions of carpenters using this toolbox who tell you “well hey what’s the problem with these tools? They’re all I’ve ever used and they work fine!” And the carpenters show you the houses they’ve built, where every room is a pentagon and the roof is upside-down. And you knock on the front door and it just collapses inwards and they all yell at you for breaking their door.

That’s what’s wrong with PHP.

Stance

I assert that the following qualities are important for making a language productive and useful, and PHP violates them with wild abandon. If you can’t agree that these are crucial, well, I can’t imagine how we’ll ever agree on much.

A language must be predictable. It’s a medium for expressing human ideas and having a computer execute them, so it’s critical that a human’s understanding of a program actually be correct.
A language must be consistent. Similar things should look similar, different things different. Knowing part of the language should aid in learning and understanding the rest.
A language must be concise. New languages exist to reduce the boilerplate inherent in old languages. (We could all write machine code.) A language must thus strive to avoid introducing new boilerplate of its own.
A language must be reliable. Languages are tools for solving problems; they should minimize any new problems they introduce. Any “gotchas” are massive distractions.
A language must be debuggable. When something goes wrong, the programmer has to fix it, and we need all the help we can get.

My position is thus:

PHP is full of surprises: mysql_real_escape_string, E_ALL
PHP is inconsistent: strpos, str_rot13
PHP requires boilerplate: error-checking around C API calls, ===
PHP is flaky: ==, foreach ($foo as &$bar)
PHP is opaque: no stack traces by default or for fatals, complex error reporting

I can’t provide a paragraph of commentary for every single issue explaining why it falls into these categories, or this would be endless. I trust the reader to, like, think.

Don’t comment with these things

I’ve been in PHP arguments a lot. I hear a lot of very generic counter-arguments that are really only designed to halt the conversation immediately. Don’t pull these on me, please.

Do not tell me that “good developers can write good code in any language”, or bad developers blah blah. That doesn’t mean anything. A good carpenter can drive in a nail with either a rock or a hammer, but how many carpenters do you see bashing stuff with rocks? Part of what makes a good developer is the ability to choose the tools that work best.
Do not tell me that it’s the developer’s responsibility to memorize a thousand strange exceptions and surprising behaviors. Yes, this is necessary in any system, because computers suck. That doesn’t mean there’s no upper limit for how much zaniness is acceptable in a system. PHP is nothing but exceptions, and it is not okay when wrestling the language takes more effort than actually writing your program. My tools should not create net positive work for me to do.
Do not tell me “that’s how the C API works”. What on Earth is the point of using a high-level language if all it provides are some string helpers and a ton of verbatim C wrappers? Just write C! Here, there’s even a CGI library for it.
Do not tell me “that’s what you get for doing weird things”. If two features exist, someday, someone will find a reason to use them together. And again, this isn’t C; there’s no spec, there’s no need for “undefined behavior”.
Do not tell me that Facebook and Wikipedia are built in PHP. I’m aware! They could also be written in Brainfuck, but as long as there are smart enough people wrangling the things, they can overcome problems with the platform. For all we know, development time could be halved or doubled if these products were written in some other language; this data point alone means nothing.
Ideally, don’t tell me anything! This is my one big shot; if this list doesn’t hurt your opinion of PHP, nothing ever will, so stop arguing with some dude on the Internet and go make a cool website in record time to prove me wrong

Side observation: I loooove Python. I will also happily talk your ear off complaining about it, if you really want me to. I don’t claim it’s perfect; I’ve just weighed its benefits against its problems and concluded it’s the best fit for things I want to do.

And I have never met a PHP developer who can do the same with PHP. But I’ve bumped into plenty who are quick to apologize for anything and everything PHP does. That mindset is terrifying.

PHP

Core language

CPAN has been called the “standard library of Perl”. That doesn’t say much about Perl’s standard library, but it makes the point that a solid core can build great things.

Philosophy

PHP was originally designed explicitly for non-programmers (and, reading between the lines, non-programs); it has not well escaped its roots. A choice quote from the PHP 2.0 documentation, regarding + and friends doing type conversion:
Once you start having separate operators for each type you start making the language much more complex. ie. you can’t use ‘==’ for stings [sic], you now would use ‘eq’. I don’t see the point, especially for something like PHP where most of the scripts will be rather simple and in most cases written by non-programmers who want a language with a basic logical syntax that doesn’t have too high a learning curve.
PHP is built to keep chugging along at all costs. When faced with either doing something nonsensical or aborting with an error, it will do something nonsensical. Anything is better than nothing.
There’s no clear design philosophy. Early PHP was inspired by Perl; the huge stdlib with “out” params is from C; the OO parts are designed like C++ and Java.
PHP takes vast amounts of inspiration from other languages, yet still manages to be incomprehensible to anyone who knows those languages. (int) looks like C, but int doesn’t exist. Namespaces use \. The new array syntax results in [key => value], unique among every language with hash literals.
Weak typing (i.e., silent automatic conversion between strings/numbers/et al) is so complex that whatever minor programmer effort is saved is by no means worth it.
Little new functionality is implemented as new syntax; most of it is done with functions or things that look like functions. Except for class support, which deserved a slew of new operators and keywords.
Some of the problems listed on this page do have first-party solutions—if you’re willing to pay Zend for fixes to their open-source programming language.
There is a whole lot of action at a distance. Consider this code, taken from the PHP docs somewhere.
```
  @fopen('http://example.com/not-existing-file', 'r');
```
What will it do?
- If PHP was compiled with --disable-url-fopen-wrapper, it won’t work. (Docs don’t say what “won’t work” means; returns null, throws exception?) Note that this flag was removed in PHP 5.2.5.
- If allow_url_fopen is disabled in php.ini, this still won’t work. (How? No idea.)
- Because of the @, the warning about the non-existent file won’t be printed.
- But it will be printed if scream.enabled is set in php.ini.
- Or if scream.enabled is set manually with ini_set.
- But not if the right error_reporting level isn’t set.
- If it is printed, exactly where it goes depends on display_errors, again in php.ini. Or ini_set.
I can’t tell how this innocuous function call will behave without consulting compile-time flags, server-wide configuration, and configuration done in my program. And this is all built in behavior.
The language is full of global and implicit state. mbstring uses a global character set. func_get_arg and friends look like regular functions, but operate on the currently-executing function. Error/exception handling have global defaults. register_tick_function sets a global function to run every tick—what?!
There is no threading support whatsoever. (Not surprising, given the above.) Combined with the lack of built-in fork (mentioned below), this makes parallel programming extremely difficult.
Parts of PHP are practically designed to produce buggy code.
- json_decode returns null for invalid input, even though null is also a perfectly valid object for JSON to decode to—this function is completely unreliable unless you also call json_last_error every time you use it.
- array_search, strpos, and similar functions return 0 if they find the needle at position zero, but false if they don’t find it at all.
Let me expand on that last part a bit.

In C, functions like strpos return -1 if the item isn’t found. If you don’t check for that case and try to use that as an index, you’ll hit junk memory and your program will blow up. (Probably. It’s C. Who the fuck knows. I’m sure there are tools for this, at least.)

In, say, Python, the equivalent .index methods will raise an exception if the item isn’t found. If you don’t check for that case, your program will blow up.

In PHP, these functions return false. If you use FALSE as an index, or do much of anything with it except compare with ===, PHP will silently convert it to 0 for you. Your program will not blow up; it will, instead, do the wrong thing with no warning, unless you remember to include the right boilerplate around every place you use strpos and certain other functions.

This is bad! Programming languages are tools; they’re supposed to work with me. Here, PHP has actively created a subtle trap for me to fall into, and I have to be vigilant even with such mundane things as string operations and equality comparison. PHP is a minefield.

I have heard a great many stories about the PHP interpreter and its developers from a great many places. These are from people who have worked on the PHP core, debugged PHP core, interacted with core developers. Not a single tale has been a compliment.

So I have to fit this in here, because it bears repeating: PHP is a community of amateurs. Very few people designing it, working on it, or writing code in it seem to know what they’re doing. (Oh, dear reader, you are of course a rare exception!) Those who do grow a clue tend to drift away to other platforms, reducing the average competence of the whole. This, right here, is the biggest problem with PHP: it is absolutely the blind leading the blind.

Okay, back to facts.

Operators

== is useless.
- It’s not transitive. "foo" == TRUE, and "foo" == 0… but, of course, TRUE != 0.
- == converts to numbers when possible (123 == "123foo"… although "123" != "123foo"), which means it converts to floats when possible. So large hex strings (like, say, password hashes) may occasionally compare true when they’re not. Even JavaScript doesn’t do this.
- For the same reason, "6" == " 6", "4.2" == "4.20", and "133" == "0133". But note that 133 != 0133, because 0133 is octal. But "0x10" == "16" and "1e3" == "1000"!
- === compares values and type… except with objects, where === is only true if both operands are actually the same object! For objects, == compares both value (of every attribute) and type, which is what === does for every other type. What.
Comparison isn’t much better.
- It’s not even consistent: NULL < -1, and NULL == 0. Sorting is thus nondeterministic; it depends on the order in which the sort algorithm happens to compare elements.
- The comparison operators try to sort arrays, two different ways: first by length, then by elements. If they have the same number of elements but different sets of keys, though, they are uncomparable.
- Objects compare as greater than anything else… except other objects, which they are neither less than nor greater than.
- For a more type-safe ==, we have ===. For a more type-safe <, we have… nothing. "123" < "0124", always, no matter what you do. Casting doesn’t help, either.
Despite the craziness above, and the explicit rejection of Perl’s pairs of string and numeric operators, PHP does not overload +. + is always addition, and . is always concatenation.
The [] indexing operator can also be spelled {}.
[] can be used on any variable, not just strings and arrays. It returns null and issues no warning.
[] cannot slice; it only retrieves individual elements.
foo()[0] is a syntax error. (Fixed in PHP 5.4.)

Unlike (literally!) every other language with a similar operator, ?: is left associative. So this:

  $arg = 'T';
  $vehicle = ( ( $arg == 'B' ) ? 'bus' :
               ( $arg == 'A' ) ? 'airplane' :
               ( $arg == 'T' ) ? 'train' :
               ( $arg == 'C' ) ? 'car' :
               ( $arg == 'H' ) ? 'horse' :
               'feet' );
  echo $vehicle;

prints horse.

Variables

There is no way to declare a variable. Variables that don’t exist are created with a null value when first used.
Global variables need a global declaration before they can be used. This is a natural consequence of the above, so it would be perfectly reasonable, except that globals can’t even be read without an explicit declaration—PHP will quietly create a local with the same name, instead. I’m not aware of another language with similar scoping issues.
There are no references. What PHP calls references are really aliases; there’s nothing that’s a step back, like Perl’s references, and there’s no pass-by-object identity like in Python.
“Referenceness” infects a variable unlike anything else in the language. PHP is dynamically-typed, so variables generally have no type… except references, which adorn function definitions, variable syntax, and assignment. Once a variable is made a reference (which can happen anywhere), it’s stuck as a reference. There’s no obvious way to detect this and un-referencing requires nuking the variable entirely.
Okay, I lied. There are “SPL types” which also infect variables: $x = new SplBool(true); $x = "foo"; will fail. This is like static typing, you see.
A reference can be taken to a key that doesn’t exist within an undefined variable (which becomes an array). Using a non-existent array normally issues a notice, but this does not.
Constants are defined by a function call taking a string; before that, they don’t exist. (This may actually be a copy of Perl’s use constant behavior.)
Variable names are case-sensitive. Function and class names are not. This includes method names, which makes camelCase a strange choice for naming.

Constructs

array() and a few dozen similar constructs are not functions. array on its own means nothing, $func = "array"; $func(); doesn’t work.
Array unpacking can be done with the list($a, $b) = ... operation. list() is function-like syntax just like array. I don’t know why this wasn’t given real dedicated syntax, or why the name is so obviously confusing.
(int) is obviously designed to look like C, but it’s a single token; there’s nothing called int in the language. Try it: not only does var_dump(int) not work, it throws a parse error because the argument looks like the cast operator.
(integer) is a synonym for (int). There’s also (bool)/(boolean) and (float)/(double)/(real).
There’s an (array) operator for casting to array and an (object) for casting to object. That sounds nuts, but there’s almost a use: you can use (array) to have a function argument that’s either a single item or a list, and treat it identically. Except you can’t do that reliably, because if someone passes a single object, casting it to an array will actually produce an array containing that object’s attributes. (Casting to object performs the reverse operation.)
include() and friends are basically C’s #include: they dump another source file into yours. There is no module system, even for PHP code.
There’s no such thing as a nested or locally-scoped function or class. They’re only global. Including a file dumps its variables into the current function’s scope (and gives the file access to your variables), but dumps functions and classes into global scope.
Appending to an array is done with $foo[] = $bar.
echo is a statement-y kind of thing, not a function.
empty($var) is so extremely not-a-function that anything but a variable, e.g. empty($var || $var2), is a parse error. Why on Earth does the parser need to know about empty? (Fixed in 5.5.)
There’s redundant syntax for blocks: if (...): ... endif;, etc.

Error handling

PHP’s one unique operator is @ (actually borrowed from DOS), which silences errors.
PHP errors don’t provide stack traces. You have to install a handler to generate them. (But you can’t for fatal errors—see below.)
PHP parse errors generally just spew the parse state and nothing more, making a forgotten quote terrible to debug.
PHP’s parser refers to e.g. :: internally as T_PAAMAYIM_NEKUDOTAYIM, and the << operator as T_SL. I say “internally”, but as above, this is what’s shown to the programmer when :: or << appears in the wrong place.
Most error handling is in the form of printing a line to a server log nobody reads and carrying on.
E_STRICT is a thing, but it doesn’t seem to actually prevent much and there’s no documentation on what it actually does.
E_ALL includes all error categories—except E_STRICT. (Fixed in 5.4.)
Weirdly inconsistent about what’s allowed and what isn’t. I don’t know how E_STRICT applies here, but these things are okay:
- Trying to access a non-existent object property, i.e., $foo->x. (warning)
- Using a variable as a function name, or variable name, or class name. (silent)
- Trying to use an undefined constant. (notice)
- Trying to access a property of something that isn’t an object. (notice)
- Trying to use a variable name that doesn’t exist. (notice)
- 2 < "foo" (silent)
- foreach (2 as $foo); (warning)
And these things are not:
- Trying to access a non-existent class constant, i.e., $foo::x. (fatal error)
- Using a constant string as a function name, or variable name, or class name. (parse error)
- Trying to call an undefined function. (fatal error)
- Leaving off a semicolon on the last statement in a block or file. (parse error)
- Using list and various other quasi-builtins as method names. (parse error)
- Subscripting the return value of a function, i.e., foo()[0]. (parse error; okay in 5.4, see above)
There are a good few examples of other weird parse errors elsewhere in this list.
The __toString method can’t throw exceptions. If you try, PHP will… er, throw an exception. (Actually a fatal error, which would be passable, except…)
PHP errors and PHP exceptions are completely different beasts. They don’t seem to interact at all.
- PHP errors (internal ones, and calls to trigger_error) cannot be caught with try/catch.
- Likewise, exceptions do not trigger error handlers installed by set_error_handler.
- Instead, there’s a separate set_exception_handler which handles uncaught exceptions, because wrapping your program’s entry point in a try block is impossible in the mod_php model.
- Fatal errors (e.g., new ClassDoesntExist()) can’t be caught by anything. A lot of fairly innocuous things throw fatal errors, forcibly ending your program for questionable reasons. Shutdown functions still run, but they can’t get a stack trace (they run at top-level), and they can’t easily tell if the program exited due to an error or running to completion.
- Trying to throw an object that isn’t an Exception results in… a fatal error, not an exception.
There is no finally construct, making wrapper code (set handler, run code, unset handler; monkeypatch, run a test, unmonkeypatch) tedious and difficult to write. Despite that OO and exceptions were largely copied from Java, this is deliberate, because finally “doesn’t make much sense in the context of PHP”. Huh? (Fixed in 5.5.)

Functions

Function calls are apparently rather expensive.
Some built-in functions interact with reference-returning functions in, er, a strange way.
As mentioned elsewhere, a lot of things that look like functions or look like they should be functions are actually language constructs, so nothing that works with functions will work with them.
Function arguments can have “type hints”, which are basically just static typing. But you can’t require that an argument be an int or string or object or other “core” type, even though every builtin function uses this kind of typing, probably because int is not a thing in PHP. (See above about (int).) You also can’t use the special pseudo-type decorations used heavily by builtin functions: mixed, number, or callback. (callable is allowed as of PHP 5.4.)
- As a result, this:
```
  function foo(string $s) {}
  foo("hello world");
```
  produces the error:
```
  PHP Catchable fatal error:  Argument 1 passed to foo()
  must be an instance of string, string given,
  called in...
```
- You may notice that the “type hint” given doesn’t actually have to exist; there is no string class in this program. If you try to use ReflectionParameter::getClass() to examine the type hint dynamically, then it will balk that the class doesn’t exist, making it impossible to actually retrieve the class name.
- A function’s return value can’t be hinted.
Passing the current function’s arguments to another function (dispatch, not uncommon) is done by call_user_func_array('other_function', func_get_args()). But func_get_args throws a fatal error at runtime, complaining that it can’t be a function parameter. How and why is this even a type of error? (Fixed in PHP 5.3.)
Closures require explicitly naming every variable to be closed-over. Why can’t the interpreter figure this out? Kind of hamstrings the whole feature. (Okay, it’s because using a variable ever, at all, creates it unless explicitly told otherwise.)
Closed-over variables are “passed” by the same semantics as other function arguments. That is, arrays and strings etc. will be “passed” to the closure by value. Unless you use &.
Because closed-over variables are effectively automatically-passed arguments and there are no nested scopes, a closure can’t refer to private methods, even if it’s defined inside a class. (Possibly fixed in 5.4? Unclear.)
No named arguments to functions. Actually explicitly rejected by the devs because it “makes for messier code”.
Function arguments with defaults can appear before function arguments without, even though the documentation points out that this is both weird and useless. (So why allow it?)
Extra arguments to a function are ignored (except with builtin functions, which raise an error). Missing arguments are assumed null.
“Variadic” functions require faffing about with func_num_args, func_get_arg, and func_get_args. There’s no syntax for such a thing.

OO

The procedural parts of PHP are designed like C, but the objectional (ho ho) parts are designed like Java. I cannot overemphasize how jarring this is. The class system is designed around the lower-level Java language which is naturally and deliberately more limited than PHP’s contemporaries, and I am baffled.
- I’ve yet to find a global function that even has a capital letter in its name, yet important built-in classes use camelCase method names and have getFoo Java-style accessors.
- Perl, Python, and Ruby all have some concept of “property” access via code; PHP has only the clunky __get and friends. (The documentation inexplicably refers to such special methods as “overloading”.)
- Classes have something like variable declaration (var and const) for class attributes, whereas the procedural part of the language does not.
- Despite the heavy influence from C++/Java, where objects are fairly opaque, PHP often treats objects like fancy hashes—for example, the default behavior of foreach ($obj as $key => $value) is to iterate over every accessible attribute of the object.
Classes are not objects. Any metaprogramming has to refer to them by string name, just like functions.
Built-in types are not objects and (unlike Perl) can in no way be made to look like objects.
instanceof is an operator, despite that classes were a late addition and most of the language is built on functions and function-ish syntax. Java influence? Classes not first-class? (I don’t know if they are.)
- But there is an is_a function. With an optional argument specifying whether to allow the object to actually be a string naming a class.
- get_class is a function; there’s no typeof operator. Likewise is_subclass_of.
- This doesn’t work on builtin types, though (again, int is not a thing). For that, you need is_int etc.
- Also the right-hand side has to be a variable or literal string; it can’t be an expression. That causes… a parse error.
clone is an operator?!
Object attributes are $obj->foo, but class attributes are Class::$foo. ($obj::$foo will try to stringify $obj and use it as a class name.) Class attributes can’t be accessed via objects; the namespaces are completely separate, making class attributes completely useless for polymorphism. Class methods, of course, are exempt from this rule and can be called like any other method. (I am told C++ also does this. C++ is not a good example of fine OO.)
Also, an instance method can still be called statically (Class::method()). If done so from another method, this is treated like a regular method call on the current $this. I think.
new, private, public, protected, static, etc. Trying to win over Java developers? I’m aware this is more personal taste, but I don’t know why this stuff is necessary in a dynamic language—in C++ most of it’s about compilation and compile-time name resolution.
PHP has first-class support for “abstract classes”, which are classes that cannot be instantiated. Code in similar languages achieves this by throwing an exception in the constructor.
Subclasses cannot override private methods. Subclass overrides of public methods can’t even see, let alone call, the superclass’s private methods. Problematic for, say, test mocks.
Methods cannot be named e.g. “list”, because list() is special syntax (not a function) and the parser gets confused. There’s no reason this should be ambiguous, and monkeypatching the class works fine. ($foo->list() is not a syntax error.)
If an exception is thrown while evaluating a constructor’s arguments (e.g., new Foo(bar()) and bar() throws), the constructor won’t be called, but the destructor will be. (This is fixed in PHP 5.3.)
Exceptions in __autoload and destructors cause fatal errors. (Fixed in PHP 5.3.6. So now a destructor might throw an exception literally anywhere, since it’s called the moment the refcount drops the zero. Hmm.)
There are no constructors or destructors. __construct is an initializer, like Python’s __init__. There is no method you can call on a class to allocate memory and create an object.
There is no default initializer. Calling parent::__construct() if the superclass doesn’t define its own __construct is a fatal error.
OO brings with it an iterator interface that parts of the language (e.g., for...as) respect, but nothing built-in (like arrays) actually implements the interface. If you want an array iterator, you have to wrap it in an ArrayIterator. There are no built-in ways to chain or slice or otherwise work with iterators as first-class objects.
Interfaces like Iterator reserve a good few unprefixed method names. If you want your class to be iterable (without the default behavior of iterating all of its attributes), but want to use a common method name like key or next or current, well, too bad.
Classes can overload how they convert to strings and how they act when called, but not how they convert to numbers or any other builtin type.
Strings, numbers, and arrays all have a string conversion; the language relies heavily on this. Functions and classes are strings. Yet trying to convert a built-in or user-defined object (even a Closure) to a string causes an error if it doesn’t define __toString. Even echo becomes potentially error-prone.
There is no overloading for equality or ordering.
Static variables inside instance methods are global; they share the same value across all instances of the class.

Standard library

Perl is “some assembly required”. Python is “batteries included”. PHP is “kitchen sink, but it’s from Canada and both faucets are labeled C”.

General

There is no module system. You can compile PHP extensions, but which ones are loaded is specified by php.ini, and your options are for an extension to exist (and inject its contents into your global namespace) or not.
As namespaces are a recent feature, the standard library isn’t broken up at all. There are thousands of functions in the global namespace.
Chunks of the library are wildly inconsistent from one another.
- Underscore versus not: strpos/str_rot13, php_uname/phpversion, base64_encode/urlencode, gettype/get_class
- “to” versus 2: ascii2ebcdic, bin2hex, deg2rad, strtolower, strtotime
- Object+verb versus verb+object: base64_decode, str_shuffle, var_dump versus create_function, recode_string
- Argument order: array_filter($input, $callback) versus array_map($callback, $input), strpos($haystack, $needle) versus array_search($needle, $haystack)
- Prefix confusion: usleep versus microtime
- Case insensitive functions vary on where the i goes in the name.
- About half the array functions actually start with array_. The others do not.
- htmlentities and html_entity_decode are inverses of each other, with completely different naming conventions.
Kitchen sink. The libary includes:
- Bindings to ImageMagick, bindings to GraphicsMagick (which is a fork of ImageMagick), and a handful of functions for inspecting EXIF data (which ImageMagick can already do).
- Functions for parsing bbcode, a very specific kind of markup used by a handful of particular forum packages.
- Way too many XML packages. DOM (OO), DOM XML (not), libxml, SimpleXML, “XML Parser”, XMLReader/XMLWriter, and half a dozen more acronyms I can’t identify. There’s surely some kind of difference between these things and you are free to go figure out what that is.
- Bindings for two particular credit card processors, SPPLUS and MCVE. What?
- Three ways to access a MySQL database: mysql, mysqli, and the PDO abstraction thing.

C influence

This deserves its own bullet point, because it’s so absurd yet permeates the language. PHP is a high-level, dynamically-typed programming language. Yet a massive portion of the standard library is still very thin wrappers around C APIs, with the following results:

“Out” parameters, even though PHP can return ad-hoc hashes or multiple arguments with little effort.
At least a dozen functions for getting the last error from a particular subsystem (see below), even though PHP has had exceptions for eight years.
Warts like mysql_real_escape_string, even though it has the same arguments as the broken mysql_escape_string, just because it’s part of the MySQL C API.
Global behavior for non-global functionality (like MySQL). Using multiple MySQL connections apparently requires passing a connection handle on every function call.
The wrappers are really, really, really thin. For example, calling dba_nextkey without calling dba_firstkey will segfault.
The wrappers are often platform-specific: fopen(directory, "r") works on Linux but returns false and generates a warning on Windows.
There’s a set of ctype_* functions (e.g. ctype_alnum) that map to the C character-class detection functions of similar names, rather than, say, isupper.

Genericism

There is none. If a function might need to do two slightly different things, PHP just has two functions.

How do you sort backwards? In Perl, you might do sort { $b <=> $a }. In Python, you might do .sort(reverse=True). In PHP, there’s a separate function called rsort().

Functions that look up a C error: curl_error, json_last_error, openssl_error_string, imap_errors, mysql_error, xml_get_error_code, bzerror, date_get_last_errors, others?
Functions that sort: array_multisort, arsort, asort, ksort, krsort, natsort, natcasesort, sort, rsort, uasort, uksort, usort
Functions that find text: ereg, eregi, mb_ereg, mb_eregi, preg_match, strstr, strchr, stristr, strrchr, strpos, stripos, strrpos, strripos, mb_strpos, mb_strrpos, plus the variations that do replacements
There are a lot of aliases as well, which certainly doesn’t help matters: strstr/strchr, is_int/is_integer/is_long, is_float/is_double, pos/current, sizeof/count, chop/rtrim, implode/join, die/exit, trigger_error/user_error, diskfreespace/disk_free_space…
scandir returns a list of files within a given directory. Rather than (potentially usefully) return them in directory order, the function returns the files already sorted. And there’s an optional argument to get them in reverse alphabetical order. There were not, apparently, enough sort functions. (PHP 5.4 adds a third value for the sort-direction argument that will disable sorting.)
str_split breaks a string into chunks of equal length. chunk_split breaks a string into chunks of equal length, then joins them together with a delimiter.
Reading archives requires a separate set of functions depending on the format. There are six separate groups of such functions, all with different APIs, for bzip2, LZF, phar, rar, zip, and gzip/zlib.
Because calling a function with an array as its arguments is so awkward (call_user_func_array), there are some pairings like printf/vprintf and sprintf/vsprintf. These do the same things, but one function takes arguments and the other takes an array of arguments.

Text

preg_replace with the /e (eval) flag will do a string replace of the matches into the replacement string, then eval it.
strtok is apparently designed after the equivalent C function, which is already a bad idea for various reasons. Nevermind that PHP can easily return an array (whereas this is awkward in C), or that the very hack strtok(3) uses (modifying the string in-place) isn’t used here.
parse_str parses a query string, with no indication of this in the name. Also it acts just like register_globals and dumps the query into your local scope as variables, unless you pass it an array to populate. (It returns nothing, of course.)
explode refuses to split with an empty/missing delimiter. Every other string split implementation anywhere does some useful default in this case; PHP instead has a totally separate function, confusingly called str_split and described as “converting a string to an array”.
For formatting dates, there’s strftime, which acts like the C API and respects locale. There’s also date, which has a completely different syntax and only works with English.
”gzgetss — Get line from gz-file pointer and strip HTML tags.” I’m dying to know the series of circumstances that led to this function’s conception.
mbstring
- It’s all about “multi-byte”, when the problem is character sets.
- Still operates on regular strings. Has a single global “default” character set. Some functions allow specifying charset, but then it applies to all arguments and the return value.
- Provides ereg_* functions, but those are deprecated. preg_* are out of luck, though they can understand UTF-8 by feeding them some PCRE-specific flag.

System and reflection

There are, in general, a whole lot of functions that blur the line between text and variables. compact and extract are just the tip of the iceberg.
There are several ways to actually be dynamic in PHP, and at a glance there are no obvious differences or relative benefits. classkit can modify user-defined classes; runkit supersedes it and can modify user-defined anything; the Reflection* classes can reflect on most parts of the language; there are a great many individual functions for reporting properties of functions and classes. Are these subsystems independent, related, redundant?
get_class($obj) returns the object’s class name. get_class() returns the name of the class the function is being called in. Setting aside that this one function does two radically different things: get_class(null)… acts like the latter. So you can’t trust it on an arbitrary value. Surprise!
The stream_* classes allow for implementing custom stream objects for use with fopen and other fileish builtins. “tell” cannot be implemented for internal reasons. (Also there are A LOT of functions involved with this system.)
register_tick_function will accept a closure object. unregister_tick_function will not; instead it throws an error complaining that the closure couldn’t be converted to a string.
php_uname tells you about the current OS. Unless PHP can’t tell what it’s running on; then it tells you about the OS it was built on. It doesn’t tell you if this has happened.
fork and exec are not built in. They come with the pcntl extension, but that isn’t included by default. popen doesn’t provide a pid.
stat’s return value is cached.
session_decode is for reading an arbitrary PHP session string, but it only works if there’s an active session already. And it dumps the result into $_SESSION, rather than returning it.

Miscellany

curl_multi_exec doesn’t change curl_errno on error, but it does change curl_error.
mktime’s arguments are, in order: hour, minute, second, month, day, year.

Data manipulation

Programs are nothing more than big machines that chew up data and spit out more data. A great many languages are designed around the kinds of data they manipulate, from awk to Prolog to C. If a language can’t handle data, it can’t do anything.

Numbers

Integers are signed and 32-bit on 32-bit platforms. Unlike all of PHP’s contemporaries, there is no automatic bigint promotion. So you can end up with surprises like negative file sizes, and your math might work differently based on CPU architecture. Your only option for larger integers is to use the GMP or BC wrapper functions. (The developers have proposed adding a new, separate, 64-bit type. This is crazy.)
PHP supports octal syntax with a leading 0, so e.g. 012 will be the number ten. However, 08 becomes the number zero. The 8 (or 9) and any following digits disappear. 01c is a syntax error.
0x0+2 produces 4. The parser considers the 2 as both part of the hex literal and a separate decimal literal, treating this as 0x002 + 2. 0x0+0x2 displays the same problem. Strangely, 0x0 +2 is still 4, but 0x0+ 2 is correctly 2. (This is fixed in PHP 5.4. But it’s also re-broken in PHP 5.4, with the new 0b literal prefix: 0b0+1 produces 2.)
pi is a function. Or there’s a constant, M_PI.
There is no exponentiation operator, only the pow function.

Text

No Unicode support. Only ASCII will work reliably, really. There’s the mbstring extension, mentioned above, but it kinda blows.
Which means that using the builtin string functions on UTF-8 text risks corrupting it.
Similarly, there’s no concept of e.g. case comparisons outside of ASCII. Despite the proliferation of case-insensitive versions of functions, not one of them will consider é equal to É.
You can’t quote keys in variable interpolation, i.e., "$foo['key']" is a syntax error. You can unquote it (which would generate a warning anywhere else!), or use ${...}/{$...}.
"${foo[0]}" is okay. "${foo[0][0]}" is a syntax error. Putting the $ on the inside is fine with both. Bad copy of similar Perl syntax (with radically different semantics)?

Arrays

Oh, man.

This one datatype acts as a list, ordered hash, ordered set, sparse list, and occasionally some strange combination of those. How does it perform? What kind of memory use will there be? Who knows? Not like I have other options, anyway.
=> isn’t an operator. It’s a special construct that only exists inside array(...) and the foreach construct.
Negative indexing doesn’t work, since -1 is just as valid a key as 0.
Despite that this is the language’s only data structure, there is no shortcut syntax for it; array(...) is shortcut syntax. (PHP 5.4 is bringing “literals”, [...].)
Similarly baffling, arrays stringify to Array with an E_NOTICE.
The => construct is based on Perl, which allows foo => 1 without quoting. (That is, in fact, why it exists in Perl; otherwise it’s just a comma.) In PHP, you can’t do this without getting a warning; it’s the only language in its niche that has no vetted way to create a hash without quoting string keys.
Array functions often have confusing or inconsistent behavior because they have to operate on lists, hashes, or maybe a combination of the two. Consider array_diff, which “computers the difference of arrays”.
```
  $first  = array("foo" => 123, "bar" => 456);
  $second = array("foo" => 456, "bar" => 123);
  echo var_dump(array_diff($first, $second));
```
What will this code do? If array_diff treats its arguments as hashes, then obviously these are different; the same keys have different values. If it treats them as lists, then they’re still different; the values are in the wrong order.

In fact array_diff considers these equal, because it treats them like sets: it compares only values, and ignores order.
In a similar vein, array_rand has the strange behavior of selecting random keys, which is not that helpful for the most common case of needing to pick from a list of choices.
Despite how heavily PHP code relies on preserving key order:
```
  array("foo", "bar") != array("bar", "foo")
  array("foo" => 1, "bar" => 2) == array("bar" => 2, "foo" => 1)
```
I leave it to the reader to figure out what happens if the arrays are mixed. (I don’t know.)
array_fill cannot create zero-length arrays; instead it will issue a warning and return false.
All of the (many…) sort functions operate in-place and return nothing. There is no way to create a new sorted copy; you have to copy the array yourself, then sort it, then use the array.
But array_reverse returns a new array.
A list of ordered things and some mapping of keys to values sounds kind of like a great way to handle function arguments, but no.

Not arrays

The standard library includes “Quickhash”, an OO implementation of “specific strongly-typed classes” for implementing hashes. And, indeed, there are four classes, each dealing with a different combination of key and value types. It’s unclear why the builtin array implementation can’t optimize for these extremely common cases, or what the relative performance is.
There’s an ArrayObject class (which implements five different interfaces) that can wrap an array and have it act like an object. User classes can implement the same interfaces. But it only has a handful of methods, half of which don’t resemble built-in array functions, and built-in array functions don’t know how to operate on an ArrayObject or other array-like class.

Functions

Functions are not data. Closures are actually objects, but regular functions are not. You can’t even refer to them with their bare names; var_dump(strstr) issues a warning and assumes you mean the literal string, "strstr". There is no way to discern between an arbitrary string and a function “reference”.
create_function is basically a wrapper around eval. It creates a function with a regular name and installs it globally (so it will never be garbage collected—don’t use in a loop!). It doesn’t actually know anything about the current scope, so it’s not a closure. The name contains a NUL byte so it can never conflict with a regular function (because PHP’s parser fails if there’s a NUL in a file anywhere).
Declaring a function named __lambda_func will break create_function—the actual implementation is to eval-create the function named __lambda_func, then internally rename it to the broken name. If __lambda_func already exists, the first part will throw a fatal error.

Other

Incrementing (++) a NULL produces 1. Decrementing (--) a NULL produces NULL. Decrementing a string likewise leaves it unchanged.
There are no generators. (Fixed in 5.5. Wow. They basically cloned the entire Python generator API, too. Impressive. Somehow, though, $foo = yield $bar; is a syntax error; it has to be $foo = (yield $bar). Sigh.)

Web framework

Execution

A single shared file, php.ini, controls massive parts of PHP’s functionality and introduces complex rules regarding what overrides what and when. PHP software that expects to be deployed on arbitrary machines has to override settings anyway to normalize its environment, which largely defeats the use of a mechanism like php.ini anyway.
- PHP looks for php.ini in a variety of places, so it may (or may not…) be possible to override your host’s. Only one such file will ever be parsed, though, so you can’t just override a couple settings and call it a day.
PHP basically runs as CGI. Every time a page is hit, PHP recompiles the whole thing before executing it. Even dev servers for Python toy frameworks don’t act like this. This has led to a whole market of “PHP accelerators” that just compile once, accelerating PHP all the way to any other language. Zend, the company behind PHP, has made this part of their business model.
For quite a long time, PHP errors went to the client by default—I guess to help during development. I don’t think this is true any more, but I still see the occasional mysql error spew at the top of a page.
PHP is full of strange “easter eggs” like producing the PHP logo with the right query argument. Not only is this completely irrelevant to building your application, but it allows detecting whether you’re using PHP (and perhaps roughly guessing what version), regardless of how much mod_rewrite, FastCGI, reverse proxying, or Server: configuration you’re doing.
Blank lines before or after the <?php ... ?> tags, even in libraries, count as literal text and is interpolated into the response (or causes “headers already sent” errors). Your options are to either strictly avoid extra blank lines at the end of every file (the one after the ?> doesn’t count) or to just leave off the ?> closing token.

Deployment

Deployment is often cited as the biggest advantage of PHP: drop some files and you’re done. Indeed, that’s much easier than running a whole process as you may have to do with Python or Ruby or Perl. But PHP leaves plenty to be desired.

Across the board, I’m in favor of running Web applications as app servers and reverse-proxying to them. It takes minimal effort to set this up, and the benefits are plenty: you can manage your web server and app separately, you can run as many or few app processes on as many machines as you want without needing more web servers, you can run the app as a different user with zero effort, you can switch web servers, you can take down the app without touching the web server, you can do seamless deployment by just switching where a fifo points, etc. Welding your application to your web server is absurd and there’s no good reason to do it any more.

PHP is naturally tied to Apache. Running it separately, or with any other webserver, requires just as much mucking around (possibly more) as deploying any other language.
php.ini applies to every PHP application run anywhere. There is only one php.ini file, and it applies globally; if you’re on a shared server and need to change it, or if you run two applications that need different settings, you’re out of luck; you have to apply the union of all necessary settings and pare them down from inside the apps themselves using ini_set or in Apache’s configuration file or in .htaccess. If you can. Also wow that is a lot of places you need to check to figure out how a setting is getting its value.
Similarly, there is no easy way to “insulate” a PHP application and its dependencies from the rest of a system. Running two applications that require different versions of a library, or even PHP itself? Start by building a second copy of Apache.
The “bunch of files” approach, besides making routing a huge pain in the ass, also means you have to carefully whitelist or blacklist what stuff is actually available, because your URL hierarchy is also your entire code tree. Configuration files and other “partials” need C-like guards to prevent them from being loaded directly. Version control noise (e.g., .svn) needs protecting. With mod_php, everything on your filesystem is a potential entry point; with an app server, there’s only one entry point, and only the URL controls whether it’s invoked.
You can’t seamlessly upgrade a bunch of files that run CGI-style, unless you want crashes and undefined behavior as users hit your site halfway through the upgrade.
Despite how “simple” it is to configure Apache to run PHP, there are some subtle traps even there. While the PHP docs suggest using SetHandler to make .php files run as PHP, AddHandler appears to work just as well, and in fact Google gives me twice as many results for it. Here’s the problem.

When you use AddHandler, you are telling Apache that “execute this as php” is one possible way to handle .php files. But! Apache doesn’t have the same idea of file extensions that every human being on the planet does. It’s designed to support, say, index.html.en being recognized as both English and HTML. To Apache, a file can have any number of file extensions simultaneously.

Imagine you have a file upload form that dumps files into some public directory. To make sure nobody uploads PHP files, you just check that they don’t have a .php extension. All an attacker has to do is upload a file named foo.php.txt; your uploader won’t see a problem, but Apache will recognize it as PHP, and it will happily execute.

The problem here isn’t “using the original filename” or “not validating better”; the problem is that your web server is configured to run any old code it runs across—precisely the same property that makes PHP “easy to deploy”. CGI required +x, which was something, but PHP doesn’t even do that. And this is no theoretical problem; I’ve found multiple live sites with this issue.

Missing features

I consider all of these to be varying levels of critical for building a Web application. It seems reasonable that PHP, with its major selling point being that it’s a “Web language”, ought to have some of them.

No template system. There’s PHP itself, but nothing that acts as a big interpolator rather than a program.
No XSS filter. No, “remember to use htmlspecialchars” is not an XSS filter. This is.
No CSRF protection. You get to do it yourself.
No generic standard database API. Stuff like PDO has to wrap every individual database’s API to abstract the differences away.
No routing. Your website looks exactly like your filesystem. Many developers have been tricked into thinking mod_rewrite (and .htaccess in general) is an acceptable substitute.
No authentication or authorization.
No dev server. (“Fixed” in 5.4. Led to the Content-Length vuln below. Also, you have to port all your rewrite rules to a PHP wrapper thing, because there’s no routing.)
No interactive debugging.
No coherent deployment mechanism; only “copy all these files to the server”.

Security

Language boundaries

PHP’s poor security reputation is largely because it will take arbitrary data from one language and dump it into another. This is a bad idea. "<script>" may not mean anything in SQL, but it sure does in HTML.

Making this worse is the common cry for “sanitizing your inputs”. That’s completely wrong; you can’t wave a magic wand to make a chunk of data inherently “clean”. What you need to do is speak the language: use placeholders with SQL, use argument lists when spawning processes, etc.

PHP outright encourages “sanitizing”: there’s an entire data filtering extension for doing it.
All the addslashes, stripslashes, and other slashes-related nonsense are red herrings that don’t help anything.
There is, as far as I can tell, no way to safely spawn a process. You can ONLY execute a string via the shell. Your options are to escape like crazy and hope the default shell uses the right escaping, or pcntl_fork and pcntl_exec manually.
Both escapeshellcmd and escapeshellarg exist with roughly similar descriptions. Note that on Windows, escapeshellarg does not work (because it assumes Bourne shell semantics), and escapeshellcmd just replaces a bunch of punctuation with spaces because nobody can figure out Windows cmd escaping (which may silently wreck whatever you’re trying to do).
The original built-in MySQL bindings, still widely-used, have no way to create prepared statements.

To this day, the PHP documentation on SQL injection recommends batty practices like type-checking, using sprintf and is_numeric, manually using mysql_real_escape_string everywhere, or manually using addslashes everywhere (which “may be useful”!). There is no mention of PDO or paramaterization, except in the user comments. I complained about this very specifically to a PHP dev at least two years ago, he was alarmed, and the page has never changed.

Insecure-by-default

register_globals. It’s been off by default for a while by now, and it’s gone in 5.4. I don’t care. This is an embarrassment.
include accepting HTTP URLs. Likewise.
Magic quotes. So close to secure-by-default, and yet so far from understanding the concept at all. And, likewise.
You can, say, probe a network using PHP’s XML support, by abusing its ubiquitous support for filenames-as-URLs. Only libxml_disable_entity_loader() can fix this, and the problem is only mentioned in the manual comments.

(5.5 brings a just-do-it password hashing function, password_hash, which should hopefully cut down on hand-rolled crypto code.)

Core

The PHP interpreter itself has had some fascinating security problems.

In 2007 the interpreter had an integer overflow vulnerability. The fix started with if (size > INT_MAX) return NULL; and went downhill from there. (For those not down with the C: INT_MAX is the biggest integer that will fit in a variable, ever. I hope you can figure out the rest from there.)
More recently, PHP 5.3.7 managed to include a crypt() function that would, in effect, let anyone log in with any password.
PHP 5.4’s dev server is vulnerable to a denial of service, because it takes the Content-Length header (which anyone can set to anything) and tries to allocate that much memory. This is a bad idea.

I could dig up more but the point isn’t that there are X many exploits—software has bugs, it happens, whatever. The nature of these is horrifying. And I didn’t seek these out; they just happened to land on my doorstep in the last few months.

Conclusion

Some commentary has rightfully pointed out that I don’t have a conclusion. And, well, I don’t have a conclusion. If you got all the way down here, I assumed you agreed with me before you started

If you only know PHP and you’re curious to learn something else, give the Python tutorial a whirl and try Flask for the web stuff. (I’m not a huge fan of its template language, but it does the job.) It breaks apart the pieces of your app, but they’re still the same pieces and should look familiar enough. I might write a real post about this later; a whirlwind introduction to an entire language and web stack doesn’t belong down here.

Later or for bigger projects you may want Pyramid, which is medium-level, or Django, which is a complex monstrosity that works well for building sites like Django’s.

If you’re not a developer at all but still read this for some reason, I will not be happy until everyone on the planet has gone through Learn Python The Hard Way so go do that.

There’s also Ruby with Rails and some competitors I’ve never used, and Perl is still alive and kicking with Catalyst. Read things, learn things, build things, go nuts.

Credits

Thanks to the following for inspiration:

PHP turtles
PHP sadness
PHP WTF
YourLanguageSucks
PHP in contrast to Perl
Pi’s dense, angry, inspirational rant
PHP is not an acceptable COBOL
the PHP documentation
a ton of PHP fanatics and PHP counter-fanatics
and, of course, Rasmus Lerdorf for his wild misunderstanding of most of Perl

Let me know if you have any additions, or if I’m (factually!) wrong about something.

by Jari Turkia in Programming at 18:21 | Comments (0) | Share in LinkedIn

(Page 1 of 1, totaling 12 entries)

Mon	Tue	Wed	Thu	Fri	Sat	Sun
← Back	March '14					Forward →
					1	2
3	4	5	6	7	8	9
10	11	12	13	14	15	16
17	18	19	20	21	22	23
24	25	26	27	28	29	30
31

Sunday, March 30. 2014

Saturday, March 29. 2014

Friday, March 28. 2014

Saturday, March 22. 2014

Friday, March 21. 2014

Thursday, March 20. 2014

Test setup

The sample code:

Test 1: PHP 5.5.10 / Apache 2.4.7

Test 2: PHP 5.4.26 / Nginx 1.4.6

Conclusions

Wednesday, March 19. 2014

Attempt 1: Failure

Attempt 2: Failure

Attempt 3: Failure

Attempt 4: Success!

Conclusion

Thursday, March 13. 2014

Tuesday, March 11. 2014

Monday, March 10. 2014

Sunday, March 9. 2014

Wednesday, March 5. 2014

Preface

An analogy

Stance

Don’t comment with these things

PHP

Core language

Philosophy

Operators

Variables

Constructs

Error handling

Functions

OO

Standard library

General

C influence

Genericism

Text

System and reflection

Miscellany

Data manipulation

Numbers

Text

Arrays

Not arrays

Functions

Other

Web framework

Execution

Deployment

Missing features

Security

Language boundaries

Insecure-by-default

Core

Conclusion

Credits

Calendar

Quicksearch

Archives

Categories

RSS feeds of this Blog

Blog Administration

Powered by