PHP / Server Setup

Cleanup PHP Sessions like a PRO

by ,


PHP Session files

PHP sessions allow you to preserve certain user data across multiple requests. By default, PHP stores session data in files. The alternative session storages include Redis, Memcached and custom database implementation is possible.

In this post, we concentrate on file-based PHP sessions and how to deal with cleaning stale session files in a way that is most performance friendly.

The default session garbage collector

By default, PHP uses its own session garbage collector (SGC) to clean stale sessions. But what is a stale session?

Each time PHP is working with a session file (read or write), it would alter its modification time. Subsequently, a session file is considered as stale one as long as its modification time is more than session.gc_maxlifetime seconds old.

The default session garbage collector is invoked every (session.gc_probability / session.gc_divisor)-th request (e.g. 1 in a thousand requests). It scans all files within the directory session.save_path and deletes files which are more session.gc_maxlifetime seconds old You may see how this approach isn’t good for performance because there will be always that unlucky visitor (1 in a thousand) who will hit the garbage collection and experience the unnecessary delay.

Alternative garbage collectors

Some distributions (Debian, and thus Ubuntu) opt into using their own session garbage collector for above mentioned (or other) reasons. Implementing alternative session garbage collector implies disabling the default SGC by setting session.gc_probability to 0.

How Debian did it

The Debian maintainers implemented session garbage collection in a form of a cron job that runs every 30 minutes.

/etc/systemd/system/timers.target.wants/phpsessionclean.timer

A systemd timer is used (symlink to /lib/systemd/system/phpsessionclean.timer) to launch session cleanup “service”:

[Unit]
Description=Clean PHP session files every 30 mins

[Timer]
OnCalendar=*-*-* *:09,39:00
Persistent=true

[Install]
WantedBy=timers.target

Each website uses the same directory for session.save_path setting: /var/lib/php/sessions. That directory has a special chmod of 1733 which allows for a different site’s user to delete only their own session files (thanks to the sticky bit).

/lib/systemd/system/phpsessionclean.service
[Unit]
Description=Clean php session files

[Service]
Type=oneshot
ExecStart=/usr/lib/php/sessionclean
ProtectHome=true
ProtectSystem=true
PrivateTmp=true

Where things get interesting is the script for sessions cleanup:

SAPIS="apache2:apache2 apache2filter:apache2 cgi:php@VERSION@ fpm:php-fpm@VERSION@ cli:php@VERSION@"

# Iterate through all web SAPIs
(
proc_names=""
for version in $(/usr/sbin/phpquery -V); do
    for sapi in ${SAPIS}; do
    conf_dir=${sapi%%:*}
    proc_name=${sapi##*:}
    if [ -e /etc/php/${version}/${conf_dir}/php.ini ]; then
        # Get all session variables once so we don't need to start PHP to get each config option
        session_config=$(PHP_INI_SCAN_DIR=/etc/php/${version}/${conf_dir}/conf.d/ php${version} -c /etc/php/${version}/${conf_dir}/php.ini -d "error_reporting='~E_ALL'" -r 'foreach(ini_get_all("session") as $k => $v) echo "$k=".$v["local_value"]."\n";')
        save_handler=$(echo "$session_config" | sed -ne 's/^session\.save_handler=\(.*\)$/\1/p')
        save_path=$(echo "$session_config" | sed -ne 's/^session\.save_path=\(.*;\)\?\(.*\)$/\2/p')
        gc_maxlifetime=$(($(echo "$session_config" | sed -ne 's/^session\.gc_maxlifetime=\(.*\)$/\1/p')/60))

        if [ "$save_handler" = "files" -a -d "$save_path" ]; then
        proc_names="$proc_names $(echo "$proc_name" | sed -e "s,@VERSION@,$version,")";
        printf "%s:%s\n" "$save_path" "$gc_maxlifetime"
        fi
    fi
    done
done
# first find all open session files and touch them (hope it's not massive amount of files)
for pid in $(pidof $proc_names); do
    find "/proc/$pid/fd" -ignore_readdir_race -lname "$save_path/sess_*" -exec touch -c {} \; 2>/dev/null
done ) | \
    sort -rn -t: -k2,2 | \
    sort -u -t: -k 1,1 | \
    while IFS=: read -r save_path gc_maxlifetime; do
    # find all files older then maxlifetime and delete them
    find -O3 "$save_path/" -ignore_readdir_race -depth -mindepth 1 -name 'sess_*' -type f -cmin "+$gc_maxlifetime" -delete
    done

exit 0

The Debian way is more performance friendly since there is no random delay if the visitor is unlucky to hit the garbage collection.

But do you see where this approach went ugly and slow? The script is unnecessarily complicated because they are trying to solve this bug.

Particularly, the bug report mentions that there are circumstances under which stale session files might be still valid and thus, should not be deleted.

Circumstances in which this might occur are:
* A script has been running longer than the configured session maxlifetime, and still has a session open.
* A script which as resumed an existing session, but the end of the session maxlifetime falls within the window of that script’s execution.

To work around those edge cases, the script is checking for all PHP processes and what files they have actually open at present. Heavy!

It’s worth noting that the bug is not really an issue IMHO. If you have session.gc_maxlifetime set to e.g. one day, the chances of cleanup script trying to delete a stale session which is actually opened by PHP is declining to zero.

Keeping the logic for dealing with the bug will result in performance issue of its own: imagine if PHP has thousands of session files open.

So the script is somewhat flawed because it is causing performance issues while trying to address a performance issue. Recursion like that is never a good one 🙂 So ignoring “the bug” seems like the way to go.

PHP developers think the same way. The default PHP garbage collector is not addressing the aforementioned issue in any way, in any release. The default SGC checks only modification time and doesn’t bother about checking whether the file is actively open by PHP itself.

Now that we know that checking PHP processes for open session files is more of an overkill rather than a solution to something, let’s proceed to the better implementation.

The session_gc() function

PHP 7 has this nice new function session_gc() so you can just call it via cron, in a shell script like the following:

#!/usr/bin/php
<?php
// Note: This script should be executed by the same user of web server process.

// Need active session to initialize session data storage access.
session_start();

// Executes GC immediately
session_gc();

// Clean up session ID created by session_gc()
session_destroy();

session_gc() is used to perform session data GC(garbage collection). PHP does probability based session GC by default.

Probability-based GC works somewhat but it has few problems. 1) Low traffic sites’ session data may not be deleted within the preferred duration. 2) High traffic sites’ GC may be too frequent GC. 3) GC is performed on the user’s request and the user will experience a GC delay.

Therefore, it is recommended to execute GC periodically for production systems using, e.g., “cron” for UNIX-like systems. Make sure to disable probability based GC by setting session.gc_probability to 0.

How can we do it better

But what about prior PHP versions? What session_gc() actually does is ensuring that the probability of running garbage collector on running the script that uses it is 100%. Thus, a somewhat more portable approach (across different PHP versions, particularly 5.x) would be to achieve what session_gc does using command line switches.

Simply setting session.gc_divisor=1 and session.gc_divisor=1 will ensure garbage collector to run now. For example:

/usr/bin/php -d session.save_path=/path/to/sessions \
  -d session.gc_probability=1 \
  -d session.gc_divisor=1 \  
  -r "session_start(); session_destroy();"

Subsequently, simply setting this to run as a regular cron task will ensure your sessions are garbage collected.

You’ve noticed how we also passed session.save_path setting as well. This is because you would mostly want to run each PHP-FPM pool under its own user and sessions directory.

Consider the following layout:

/srv/www/foo.example.com/
  -- public
  -- sessions
/srv/www/bar.example.com/
  -- public
  -- sessions

The two websites may have different session lifetime setting. So if you have custom value for session.gc_maxlifetime between your PHP-FPM pools and it differs / overrides the default php.ini, you have to pass it in as well. For example, your [foo] pool definition may have:

php_admin_value[session.gc_maxlifetime] = 86400

… whereas [bar] PHP-FPM pool has:

php_admin_value[session.gc_maxlifetime] = 3600

Then you’d setup 2 cron tasks. Each command should ensure the matching settings for its pool:

 /usr/bin/php -d session.save_path=/srv/www/foo.example.com/sessions \
  -d session.gc_probability=1 \
  -d session.gc_divisor=1 \ 
  -d session.gc_maxlifetime=86400 \
  -r "session_start(); session_destroy();"

 /usr/bin/php -d session.save_path=/srv/www/bar.example.com/sessions \
  -d session.gc_probability=1 \
  -d session.gc_divisor=1 \ 
  -d session.gc_maxlifetime=3600 \
  -r "session_start(); session_destroy();"

If you use configuration tools like Ansible, you can easily ensure that PHP-FPM pool configuration settings session.save_path and session.gc_maxlifetime will match with those in the GC cron task.

Finally, you can also spread the load for your SGC tasks by running them in different time slots to reduce I/O:

9,39 * * * * /usr/bin/php -d session.save_path=/srv/www/foo.example.com/sessions -d session.gc_probability=1 -d session.gc_divisor=1 -d session.gc_maxlifetime=86400 -r "session_start(); session_destroy();"
19,49 * * * * /usr/bin/php -d session.save_path=/srv/www/bar.example.com/sessions -d session.gc_probability=1 -d session.gc_divisor=1 -d session.gc_maxlifetime=3600 -r "session_start(); session_destroy();"

There you have it. No crazy scripts. Clean, efficient and performance friendly session files cleanup.

Some notes about alternative sessions storage backends

Of course, you might want to use Redis for session storage. It is important to know that in this case, the value of the session.gc_maxlifetime setting will translate to TTL of session data in Redis. So no actual garbage collector is required! Nice?

References

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.