Efficient Incident Responsiveness

Monitoring service availability is perhaps the most fundamental aspect which every web hosting service provider should be deeply cognisant of. There are many different technologies which can fulfill this role, notably Nagios but I prefer one of my personal favourites, Icinga, for this example.

An organisation can possess a great service monitoring implementation but incidents should still be responded to quickly and efficiently in a manner which respect company procedures. The easiest means of ensuring these concerns are adhered to would be to develop a frontend which utilises the monitoring system as a backend. Integration can leverage JSON or even SQL and I have pieced together a basic example which portrays the principal requirements which would commonly be needed.

Icinga offers out of the box JSON support (although Nagios can also be configured in a similar manner) and included below is a screenhot which depicts this functionality with reports concerning two Google checks which have been collecting data for quite some time:

json

This data can then be presented within the context of an organisation’s appropriate procedures and I have a handful included in the example:

  • The ability to “lock” a host so that other staff members can immediately be aware that the problem is being worked on and by who. It can also serve as a metric to track productivity and follow up times.
  • Creation of a comment. This is important because useful observations or attempts for resolution can be recorded.
  • Acknowledge the issue to be resolved.
  • Refresh the particular entry.
  • Escalate to senior staff.

Since the frontend is entirely customised by the organisation, various buttons and areas can be accessible according to user level criteria. Customer information can also be represented should it also integrate with a billing system’s database. The following screenshots portray an example with important aspects pieced together:

service-monitor-a service-monitor-b

 

 

This is merely an example and by no means a finished product. The code also needs complete restructuring with the consideration of NodeJS so that action updates, table content and pagination can be dynamic without the need for a full pageload.

The use of a third party service monitor can also be used so that customers can have peace of mind regarding statistical integrity.

I would be very interested to learn if your organisation has a similar setup in production and if any immediate benefits were thereof derived.

Leave a Reply

Your email address will not be published.

*