Category: Linux

How Distance can Affect Throughput

slothOffsite backups are important but Round Trip Time should be taken into consideration for particular transfer protocols. TCP connections can be severely hampered under certain conditions from what is known as “Bandwidth – Delay Product” and Wikipedia have the following descriptive article:

In data communications, bandwidth-delay product refers to the product of a data link’s capacity (in bits per second) and its round-trip delay time (in seconds). The result, an amount of data measured in bits (or bytes), is equivalent to the maximum amount of data on the network circuit at any given time, i.e., data that has been transmitted but not yet acknowledged.

To use an extreme example, a tiny 64kB TCP buffer size would constrain a 100Mibit/s interface to 2.6Mibit/s if there is a consistent 200ms latency whereas a 50ms latency would allow 10.5 Mibit/s.

Fortunately modern systems are no longer defaulted with a small TCP buffer size but tweaks were required prior to kernel 2.6. The following system parameters can be explored:

  • proc.sys.net.core.wmem_max (Maximum send window size)
  • proc.sys.net.core.rmem_max (Maximum receive window size)
  • proc.sys.net.ipv4.tcp_wmem (Reserved memory for TCP send buffer)
  • proc.sys.net.ipv4.tcp_rmem (Reserved memory for TCP receive buffer)

Always be overly cautious when increasing buffers on production servers because it can potentially cause instability. Improvements can be seen immediately for FTP (more importantly FTPS) over long inflight time but there will still be circumstances where changes won’t seem to show any effect even with the best of configurations in place, such as when SO_SNDBUF and SO_RCVBUF are static for setsockopt().

Sender side autotuning has been present for quite some time but receiver limits are now usually included and this can be verified by reading net.ipv4.tcp_window_scaling and net.ipv4.tcp_moderate_rcvbuf:

sysctl net.ipv4.tcp_window_scaling net.ipv4.tcp_moderate_rcvbuf
net.ipv4.tcp_window_scaling = 1
net.ipv4.tcp_moderate_rcvbuf = 1

Interestingly SSH2 multiplexes sessions over a TCP connection, which means patches to OpenSSH such as High Performance SSH/SCP are necessary to attain greater transfer rates for the likes of SCP.

I would like to take this opportunity to wish everyone a wonderful festive season and upcoming New Year!

prague-winter

Benefits of an Efficient Memory Allocator

A developer friend of mine doing work for a mobile operator in West Africa needed advice because their database setup experienced severe congestion over Valentine’s Day. I mariadbcould recall us discussing the same problem back when we studied together and my advice then was to replace MySQL with Percona. Eventually MariaDB was the chosen replacement, which is also a great choice.

Talks with government officials are underway to migrate their databases to a specialist PaaS (Platform as a Service) which should ultimately ensure ongoing scalability for the future but in the meantime developers will each be provisioned with a virtual system installed with MariaDB 5.5. Workloads and tweaks can then be isolated but resource allocation is scarce because of limited hardware.

My personal memory allocator of choice has always been jemalloc simply because it operates with leaner memory utilisation. Memory consumption can be difficult to measure because the reading can differ greatly from one extremely short interval to another when an optimised memory allocator is used. My recommendation was jemalloc because the developer servers won’t be single purpose but he wished to know my opinion of TCMalloc.

TCMalloc is a seriously fast memory allocator which is part of gperftools but there are common inaccuracies surrounding its memory handling. Many mention that memory is never released back to the system by TCMalloc but this is not factual information and can even be controlled by adjusting the TCMALLOC_RELEASE_RATE and TCMALLOC_HEAP_LIMIT_MB environment variables as outlined in page_heap.cc. The default TCMALLOC_RELEASE_RATE value is 1.0 which means memory is released, albeit slowly.

There is no denying that TCMalloc can overpower jemalloc in overall performance but only by a small margin in many cases. It has been some time since last I checked, so I decided to conduct some benchmarks on very limited virtual servers to test the benefits when compared with glibc.

The usage of an alternative memory allocator can be determined in a variety of different ways such as checking the MariaDB log file after startup or perhaps manually with pmap, lsof or in /proc such as in the examples below:

[[email protected] ~]# pmap $(pidof mysqld) | grep malloc
00007f7e1404f000 196K r-x-- libjemalloc.so.1
00007f7e14080000 2044K ----- libjemalloc.so.1
00007f7e1427f000 8K r---- libjemalloc.so.1
00007f7e14281000 4K rw--- libjemalloc.so.1

[[email protected] ~]# grep malloc /proc/$(pidof mysqld)/maps
7f7e1404f000-7f7e14080000 r-xp 00000000 fd:01 27453423 /usr/lib64/libjemalloc.so.1
7f7e14080000-7f7e1427f000 ---p 00031000 fd:01 27453423 /usr/lib64/libjemalloc.so.1
7f7e1427f000-7f7e14281000 r--p 00030000 fd:01 27453423 /usr/lib64/libjemalloc.so.1
7f7e14281000-7f7e14282000 rw-p 00032000 fd:01 27453423 /usr/lib64/libjemalloc.so.1

Each benchmark is based on a default MariaDB installation with sysbench 0.4.12 and is executed four times. The standard complex OLTP (On Line Transaction Processing) benchmarks are used instead of the newer customisable Lua workloads. The following arguments are specified for each test, proceeded with a table drop / recreation and reboot:

–oltp-table-size=2000000
–max-time=300

Two separate CentOS 7 virtual servers with matching E5645 CPU clockspeeds were used but with differing memory and core count assignments:

  • 1 core at 2.4 GHz with 2 GiB of memory
  • 4 cores of 2.4 GHz with 1 GiB of memory

glibcjemalloctcmallocThe results show a benefit for InnoDB with either TCMalloc or jemalloc over glibc even for low end specifications. There’s a narrowed performance gap between the server with constrained memory versus the server with a limited processor configuration as thread counts increase when glibc is replaced.

MariaDB 10 is now built with jemalloc by default.

The Brilliance of Node.js with AngularJS

UI (User Interface) as well as UX (User Experience) formed the foundation of my first real world development experience and will always remain a passion of mine.

An earlier Music for a Goodangularjs-logo Cause post detailed my interest to promote publicity for local musicians with the added benefit to aid animals who need help.

sealion
I dedicate effort for this noble cause when able, which has lead to a number of notably positive experiences with AngularJS.

Node.js has always fascinated me and I find it increasingly impressive the more I work with it. Thanks to amazing technology like this, we can help work towards a project so that animals like this little guy can be all smiles again.

Consider an example which involves a music trackbar. A trackbar was displayed for each track and this proved to not only be resource inefficient but also displeasing in appearance:

Separate Trackbars:

separate-trackbars

Unified Trackbar:

unified-trackbar

 

Changes to the play button styling aside, the only necessary adjustment was to create an additional card which contains the trackbar. I didn’t even need a second cup of coffee since only two lines of code were needed to be inserted.

Have a safe and wonderful Festive Season ahead with a great New Year everyone! See you next year.

Harden procfs Security using hidepid

securityThe objective of hidepid is to ensure privacy concerning process information for standard users and its presence can prove beneficial for a multi tenant environment.

Wikipedia’s article related to procfs describes it as follows:

procfs (or the proc filesystem) is a special filesystem in Unix-like operating systems that presents information about processes and other system information in a hierarchical file-like structure, providing a more convenient and standardized method for dynamically accessing process data held in the kernel than traditional tracing methods or direct access to kernel memory.

Linux Kernel version 3.3 initially introduced the hidepid mount option for procfs quite some time ago but its usage isn’t always implemented. CentOS 6.3 and above have since offered full support for its usage. In the past the kernel source fs/proc/base.c needed to manually be patched with the line below to achieve the same capability:

inode->i_mode = S_IFDIR|S_IRUSR|S_IXUSR;

The following options can be defined for hidepid:

hidepid=0: Disable hidepid.
hidepid=1: Users other than root are only capable of realising information regarding their own processes but can still manually inspect /proc to gather references such as Process IDs.
hidepid=2: The ideal setting which will ensure privacy of /proc amongst standard users.

One factor to keep in mind is hidepid will not protect actual processes which do exist – it serves to only make process information more private.

Aside from modifying the system’s fstab, hidepid can be activated at runtime by remounting /proc as follows:

mount -o remount,hidepid=2 /proc

Detection of Spam: The Better Way

url

A topic which is often discussed is the prevention of incoming spam but what about outgoing spam from shared web servers? The standard approach is a reactive one: the system administrator will respond to an abnormally high mail queue upon detection regarding a particular server and hopefully disable the outbreak of spam. There is also always the risk that the primary cause was not properly identified and the spam activity continues, thereby repeating a further cycle of the problem.

The biggest weakness endured by this approach is that the affected server could potentially have been blacklisted by one or more Real Time Blackhole Lists and this can severely hamper legitimate e – mail deliverability for other users on the server who share the same IP Address. Once listed, it can take hours or in some cases even days to become delisted.

The alternative method to stopping the spam would be to employ a spam filter such as SpamAssassin but for outgoing messages. If you would like to have such a setup on your cPanel based server, then it is very easy to implement as per the following reference.

All outgoing e – mail messages will be scanned for spam – like characteristics and discarded if they reach the configured determining threshold value. Any existing monitoring system check will also need to be changed to rather take into consideration that the e – mail queue size is no longer important, but rather the total number of instances determined as outgoing spam by the server’s mail logs.

There will be an increase in terms of resource usage / processing but the tradeoff in terms of maintaining good e – mail reputation should be well worth the resource penalty.

The Start of a New Year and a Word on strace.

newyear

I want to take the opportunity in this blog post to wish everyone both a good holiday season, as well as New Year up ahead. This will likely be the last entry that I will be making before the start of 2014 and I hope that everyone can get some time to spend with their friends and loved ones.

Today I want to talk about strace. Many system administrators are aware of its usage but for those that aren’t familiar with it, an apt description from its Wikipedia page desribes it concisely below:

strace is a debugging utility for Linux and some other Unix-like systems to monitor the system calls used by a program and all the signals it receives, similar to “truss” utility in other Unix systems. This is made possible by a kernel feature known as ptrace.

I personally always find it useful when debugging hanging processes (commonly Apache, for example) but it can also be used for identifying questionable performance. One thing to keep in mind, however, is the overhead its usage incurrs. This can be demonstrated as follows:

Without strace:

time dd if=/dev/zero of=/dev/null bs=512k count=1024k
1048576+0 records in
1048576+0 records out
549755813888 bytes (550 GB) copied, 23.6136 s, 23.3 GB/s

real 0m23.615s
user 0m0.096s
sys 0m23.496s

With strace:

time strace -c dd if=/dev/zero of=/dev/null bs=512k count=1024k
1048576+0 records in
1048576+0 records out
549755813888 bytes (550 GB) copied, 82.8322 s, 6.6 GB/s

real 1m22.837s
user 0m5.757s
sys 1m24.015s

As one can see by the results, there is a significant performance penalty and this should always be taken into account when strace is used as a means to troubleshoot and examine system bottlenecks.

See you all next year!

Avoid False Positives with Pingdom

pingdom_thumb

It’s been over half a year since my last post and I still remember my promise about adding a technical entry. What I want to write about today is how to avoid (or at least limit) false positives generated by the awesome uptime checker Pingdom.

For this example, I will be using arguably the most popular firewall used for standard cPanel web servers: ConfigServer Security and Firewall (available at http://configserver.com/cp/csf.html), commonly referred to as CSF.

In summary, here is the problem:

  • Pingdom’s IP Addresses must be added to your firewall rules in order to avoid them from becoming blocked, otherwise this leads to false positives.
  • Pingdom will occasionally add new IP Addresses / check locations and these must then be integrated with the firewall rules as quickly as possible.
  • The use of a firewall allow list which is maintained by hand is far too labour intensive when multiple server configurations must be updated.

The solution? A search online reveals that Pingdom provides a realtime list of checker IP Addresses as an RSS feed: https://my.pingdom.com/probes/feed So far so good.

Further searching revealed a method involving the use of CSF’s “GLOBAL_ALLOW” setting which is then used in conjunction with this RSS feed. CSF can add the IP Addresses to its allow list but this is not ideal in my opinion because a file will need to be hosted either from the local server’s htdocs or a remote web server which can be used as a source for all servers to gather the IP Addresses. This latter implementation poses security concerns because that central server will have influence over the firewall rules of all other servers which collect the Pingdom IP Address list from it.

A simple one liner below can be used as a cronjob instead presents a far more elegant solution:

/usr/bin/wget --quiet -O- https://my.pingdom.com/probes/feed | grep "pingdom:ip" | sed -e 's|</.*||' -e 's|.*>||' | xargs -n 1 /usr/sbin/csf -a

It would also be a good idea to list pingdom.com in /var/cpanel/commondomains in order to prevent a user from creating it as a domain name and then using it as a mechanism for manipulating the server’s firewall rules.

Test Disk I/O Performance

It can be difficult to choose a good Virtual Private Server host with so many different web hosts available. Apart from an earlier post which demonstrated the use of a modified version of UnixBench (http://www.rskeens.com/?p=17), there is another quick and easy way to test a core component of a system: the disk I/O (Input / Output) throughput.

A quick test that I saw on Web Hosting Talk today used:

dd if=/dev/zero of=test bs=64k count=16k conv=fdatasync

Granted, this is only going to be a rudimentary test of your disk performance, but it can still be worthwhile to check out. My results of a VZ based VPS Node with no free slots are below:

16384+0 records in
16384+0 records out
1073741824 bytes (1.1 GB) copied, 3.4941 seconds, 307 MB/s

That’s fast considering my desktop with a single disk configuration only gets:

16384+0 records in
16384+0 records out
1073741824 bytes (1.1 GB) copied, 26.4561 s, 40.6 MB/s

Go on, bench your VPS (or desktop!) and see what results you get.

PeerLevel 2009 – 2010 Uptime Statistics

October marks the year end mark of our uptime statistics and I am proud of our 2009 – 2010 statistics. The uptime statistics below are based on single minute checks (the statistics would be even better with the usual 5 or even 15 minute check intervals that other web hosts use).

Shared Servers
Average uptime: 99.99%
Total average downtime: 32m 12s

We use an internal monitoring system for our VPS nodes and unfortunately do not have public reports for these systems, however I will still place these statistics below.

Virtual Private Servers (Hardware Nodes)
Average uptime: 99.99%
Total average downtime: 9m 37s

All outages are recorded – both planned and unplanned. While it does not reflect every single system’s uptime, it does indicate the overall stability average. It’s what you can expect when you see uptime such as 00:11:56 up 155 days, 23:57 on many systems, coupled with people who know what they are doing.

If you need our public report links, then please let me know. I would like to thank each and every client for their support and I look forward to next year’s statistics.

Linux CLI Entertainment

If you are reading this and you are a Linux Command Line Interface (CLI) user, then you may be interested in occasionally beautifying your CLI output with the use of various (often comical) ASCII artwork using the programs cowthink or cowsay, such as below:

$ ping -c 5 google.co.za | cowthink -W 60
 _____________________________________________________________
( PING google.co.za (155.232.240.19) 56(84) bytes of data. 64 )
( bytes from gc-cpt-bree-g21-23-19.uni.net.za                 )
( (155.232.240.19): icmp_seq=1 ttl=59 time=30.6 ms 64 bytes   )
( from gc-cpt-bree-g21-23-19.uni.net.za (155.232.240.19):     )
( icmp_seq=2 ttl=59 time=30.1 ms 64 bytes from                )
( gc-cpt-bree-g21-23-19.uni.net.za (155.232.240.19):          )
( icmp_seq=3 ttl=59 time=31.0 ms 64 bytes from                )
( gc-cpt-bree-g21-23-19.uni.net.za (155.232.240.19):          )
( icmp_seq=4 ttl=59 time=30.3 ms 64 bytes from                )
( gc-cpt-bree-g21-23-19.uni.net.za (155.232.240.19):          )
( icmp_seq=5 ttl=59 time=30.9 ms                              )
(                                                             )
( --- google.co.za ping statistics --- 5 packets transmitted, )
( 5 received, 0% packet loss, time 4005ms rtt                 )
( min/avg/max/mdev = 30.167/30.631/31.071/0.401 ms            )
 -------------------------------------------------------------
        o   ^__^
         o  (oo)\_______
            (__)\       )\/\
                ||----w |
                ||     ||


$ ping -c 5 google.co.za | cowsay -W 60
 _____________________________________________________________
/ PING google.co.za (155.232.240.19) 56(84) bytes of data. 64 \
| bytes from gc-cpt-bree-g21-23-19.uni.net.za                 |
| (155.232.240.19): icmp_seq=1 ttl=59 time=30.1 ms 64 bytes   |
| from gc-cpt-bree-g21-23-19.uni.net.za (155.232.240.19):     |
| icmp_seq=2 ttl=59 time=30.4 ms 64 bytes from                |
| gc-cpt-bree-g21-23-19.uni.net.za (155.232.240.19):          |
| icmp_seq=3 ttl=59 time=31.3 ms 64 bytes from                |
| gc-cpt-bree-g21-23-19.uni.net.za (155.232.240.19):          |
| icmp_seq=4 ttl=59 time=31.1 ms 64 bytes from                |
| gc-cpt-bree-g21-23-19.uni.net.za (155.232.240.19):          |
| icmp_seq=5 ttl=59 time=31.1 ms                              |
|                                                             |
| --- google.co.za ping statistics --- 5 packets transmitted, |
| 5 received, 0% packet loss, time 4004ms rtt                 |
\ min/avg/max/mdev = 30.170/30.839/31.348/0.458 ms            /
 -------------------------------------------------------------
        \   ^__^
         \  (oo)\_______
            (__)\       )\/\
                ||----w |
                ||     ||

Or one of my favourites:

$ uptime | cowthink -f tux
 _________________________________________
(  06:10:29 up 1 day, 1:43, 3 users, load )
( average: 0.60, 0.41, 0.23               )
 -----------------------------------------
   o
    o
        .--.
       |o_o |
       |:_/ |
      //   \ \
     (|     | )
    /'\_   _/`\
    \___)=(___/

Want to know what sort of cows or ASCII artwork is available to your install? You need only use the command:

$ ls /usr/share/cowsay/cows