service_is_stale everywhere!

Support requests, bug reports, questions etc.
cbrunet_
starter
starter
Posts:4
Joined:Sat Mar 14, 2009 19:49
service_is_stale everywhere!

Post by cbrunet_ » Mon Mar 16, 2009 15:10

Hi!

I just tried the generated config file, but the Default_collector/services.cfg file has service_is_stale check_command everywhere!!! Did I do something wrong?

Charles.

User avatar
agargiulo
NConf developer
NConf developer
Posts:725
Joined:Fri Mar 06, 2009 17:50
Location:Zurich, Switzerland
Contact:

Re: service_is_stale everywhere!

Post by agargiulo » Mon Mar 16, 2009 16:58

Hi.

Have you changed any settings in the predefined data? Did you configure a "Collector" server, a "Monitor" server, or both?

The intended behavour is this:
"service_is_stale" is written to the config, if you have defined your Nagios server to be a "Monitor" server. Monitor servers are meant to only run passive checks and not execute any check commands. Therefore, we will write service_is_stale to the config.

If NConf is not behaving as explained, please tell us more details about your configuration.

cbrunet_
starter
starter
Posts:4
Joined:Sat Mar 14, 2009 19:49

Re: service_is_stale everywhere!

Post by cbrunet_ » Mon Mar 16, 2009 21:13

I found what is the bug...

I have no monitors defined.

In bin/generate_config.pl, around line 717 (not sure because I modify the file for testing), when checking if service is for monitor, the test always return true if $monitor_path is empty...

Charles.

Melanie__

Re: service_is_stale everywhere!

Post by Melanie__ » Wed Mar 25, 2009 10:03

agargiulo wrote:Hi.

Have you changed any settings in the predefined data? Did you configure a "Collector" server, a "Monitor" server, or both?

The intended behavour is this:
"service_is_stale" is written to the config, if you have defined your Nagios server to be a "Monitor" server. Monitor servers are meant to only run passive checks and not execute any check commands. Therefore, we will write service_is_stale to the config.

If NConf is not behaving as explained, please tell us more details about your configuration.
good aproach but if solely done like this you miss a nice feature of nagios in distributed settings. lets define some thing before I give real world examples:

Monitor (Master-Server): Main Nagios Server with logic and knowing of all services/hosts also doing reachability checks

Collector (Slave-Server): knows his part of the network does checks but nothing else

Satelite-Server: is in use in secure networks (sensible data or things that shouldn't be reached)

in case of a normal collector I would want the master to check the service directly because if a something is down from what the collector sees it doesn't mean it is really down (reachablility logic) from the sight of the master. so it is acctually useful to give the master the possibility of checking the service and say it is okay rather than saying per default it is stale (contraproduktive alerts or do you want to be ringed out at three in the morning that all your servers went down but in reality it was just the collector killing the nagios process? ).

A Satelite-Server is different in that way that no other nagios is able to access the network where it is located (high security network in a company or somewhere remote) in this case it is wanted that if nothing comes that the Master gives an alert and rings people out whenever it happens.

so what is the main thing i want to say some Servers we want to get stale with check_freshness and some we want the master/monitor to do a kind of failover. in my environment at work we have 10000 services with your standard behavior this would mean that each admin get imediatly a worst case zenario when one Nagios needs to go down or goes down due to some other cirumstances. in my case this would mean getting around 15 admins in a stress situation and check everything which is not necessary and will result in that they either don't read the alerts anymore or they just create a script to delet them when they come in.


otherwise very good work.

User avatar
agargiulo
NConf developer
NConf developer
Posts:725
Joined:Fri Mar 06, 2009 17:50
Location:Zurich, Switzerland
Contact:

Re: service_is_stale everywhere!

Post by agargiulo » Wed Mar 25, 2009 18:09

Hi Melanie

Thank you very much for your comments and your contribution.

Our real server example is:

Monitoring server: Knows all servers and services. Does the displaying and the alarming of the events received from the collectors.
Collector servers: Checks only its assigned servers and services and forwards the events to the monitoring server.

Why don't you just enable the "check_freshness" option for a specific service of the collector? Or use a different "freshness_threshold" value for collectors and "normal" servers? For us this works fine...

Melanie__

Re: service_is_stale everywhere!

Post by Melanie__ » Thu Mar 26, 2009 09:47

stop nagios on the collector wait a bit have a look at what the monitor is telling you and then have a look if it is really like that.

I have check freshness enabled freshness treshhold is what the policy is asking for.

using different values well why should i change something that is allready produktive and works very good we use nagios since it still was called netsaint so there is more than enough know how.

as i mentioned in my last post we have a hugh installation we have a clustered master four slaves and three sattelite systems running.

if you send me a mail and if you want to go deeper in what i mean you can have a look at my profile on http://www.nagios-portal.org
username Melanie__ . i think we should talk about that a bit more in detail than is possible here.

User avatar
agargiulo
NConf developer
NConf developer
Posts:725
Joined:Fri Mar 06, 2009 17:50
Location:Zurich, Switzerland
Contact:

Re: service_is_stale everywhere!

Post by agargiulo » Thu Mar 26, 2009 16:27

Hi Melanie

I'm not trying to make you change your configuration. I'm just trying to understand.
Anyway, you're right, we should discuss Nagios issues somewhere else.

Let me understand exactly what this means for NConf though. If I understood correctly, you are suggesting that we allow users to configure wether they want services to be "stale" on the monitor, or if they would like to execute the checks there too. Is that correct?
A solution would be to check the "active_checks_enabled" attribute on monitor servers, and to write the proper command name to the config, if active checks are enabled.

If this is something you need, I'm sure there's other people out there who have a similar setup and who would also profit from the feature.

Melanie__

Re: service_is_stale everywhere!

Post by Melanie__ » Thu Mar 26, 2009 17:36

yes thats exactly what i mean.

it is an intended feature of nagios a lot of people use and that can give you a lot of time in case of an error on the system.

it is like this turn nagios of other maschine takes over with check_freshness --------> do your updates, work on the system network reconfiguration ..... -------> turn nagios on again and the checks are back where they should be

yes it puts more load on the main server but still better the server (bought to handle all that) but it is something that can give you time. and you will know as a sysadmin you usually don't have time.

so this would mean for configuration:

real check on both servers
monitor not obsessing over service
collector obsessing over service
check_freshness enabled on monitor so if check_result comes in too late the monitor can check himself

in some cases this also means to map a check to a different command

check_local_disk on collector -----> check_nrpe_disk1 on monitor


a lot of things but i think you understood if you need examples i can mail you exerpts from our configuration files so you can see how it works and should look like.

User avatar
agargiulo
NConf developer
NConf developer
Posts:725
Joined:Fri Mar 06, 2009 17:50
Location:Zurich, Switzerland
Contact:

Re: service_is_stale everywhere!

Post by agargiulo » Fri Mar 27, 2009 11:59

Hi Melanie.

We will further discuss this subject within the development team and will most likely do some changes in the next version to allow users to override "service_is_stale" on Monitor servers. The most difficult part when making a tool like NConf is to make it fit everyone's requirements. We are learning every day...

Cheers,
Angelo

ecarlseen
starter
starter
Posts:3
Joined:Sun Apr 05, 2009 10:41

Re: service_is_stale everywhere!

Post by ecarlseen » Sun Apr 05, 2009 10:43

So is there a way to use NConf with a single-server (small) environment, or is that not supported at this point?

User avatar
agargiulo
NConf developer
NConf developer
Posts:725
Joined:Fri Mar 06, 2009 17:50
Location:Zurich, Switzerland
Contact:

Re: service_is_stale everywhere!

Post by agargiulo » Mon Apr 06, 2009 09:46

For a single server environment, simply configure one "collector" server, and no "monitor" server.

ecarlseen
starter
starter
Posts:3
Joined:Sun Apr 05, 2009 10:41

Re: service_is_stale everywhere!

Post by ecarlseen » Tue Apr 07, 2009 02:26

agargiulo wrote:For a single server environment, simply configure one "collector" server, and no "monitor" server.
If I do that, I get the "service_is_stale" checkcommand on each service.

User avatar
agargiulo
NConf developer
NConf developer
Posts:725
Joined:Fri Mar 06, 2009 17:50
Location:Zurich, Switzerland
Contact:

Re: service_is_stale everywhere!

Post by agargiulo » Tue Apr 07, 2009 11:11

Hi. This seems to be a bug then. Please try the following:

In bin/generate_config.pl replace line 717 with the following:

Code: Select all

if($attr->[0] eq "check_command" && $path =~ /$monitor_path/ && $monitor_path){$attr->[1] = "service_is_stale"}
Let me know if this solves your problem.

ecarlseen
starter
starter
Posts:3
Joined:Sun Apr 05, 2009 10:41

Re: service_is_stale everywhere!

Post by ecarlseen » Wed Apr 08, 2009 16:28

agargiulo wrote:Hi. This seems to be a bug then. Please try the following:

In bin/generate_config.pl replace line 717 with the following:

Code: Select all

if($attr->[0] eq "check_command" && $path =~ /$monitor_path/ && $monitor_path){$attr->[1] = "service_is_stale"}
Let me know if this solves your problem.
Seems to be working... I'll do some more extensive testing...

Thanks for the prompt response!

User avatar
agargiulo
NConf developer
NConf developer
Posts:725
Joined:Fri Mar 06, 2009 17:50
Location:Zurich, Switzerland
Contact:

Re: service_is_stale everywhere!

Post by agargiulo » Wed Apr 08, 2009 16:36

This fix will be implemented with NConf 1.2.5

Locked