[Reconnoiter-devel] Memory leak in noitd

Michal Taborsky michal at taborsky.cz
Wed May 2 13:30:39 EDT 2012


Well, I'm lost. I suppressed the OpenSSL errors and after installing the
latest version, valgrind shows only those possible lost bytes that ware
also in the previous outputs and I agree they are harmless. However, the
production noitd still leaks when there are the snmp checks that timeout.
My testing noitd, that does not have stratcond running against it, running
the same checks, does not leak. Either the problem is somewhere in the
noit/stratcon communication or something really fishy is going on.

Here is a transcript of a session where I run production noitd for a bit,
observing the memory increase, then I disable the checks that time out and
observe it again, seeing constant memory consumption.
https://gist.github.com/2578413

--
Michal Taborsky
http://www.taborsky.cz



2012/5/2 Theo Schlossnagle <jesus at omniti.com>

> Yeah... OpenSSL punches valgrind in the nose.
>
> I believe I've fixed all of the leaks present in all of the valgrind
> outputs you've sent -- it should be committed and available.
>
> Take the newer bits for a spin and see if it fixes it.  From your
> description of your problem, it will not.  I'd like to remove the
> obvious noise before trying to locate the issue.
>
> On Wed, May 2, 2012 at 9:16 AM, Michal Taborsky <michal at taborsky.cz>
> wrote:
> > No, the console is used only manually when we're changing configuration.
> > Nothing that would periodically login and do stuff.
> >
> > I tried to enable one of the failing checks in our production and re-ran
> for
> > about 5 minutes with valgrind. Stratcon was running against this noit and
> > the leaks file is a mess.
> > https://gist.github.com/2576462
> >
> > --
> > Michal Taborsky
> > http://www.taborsky.cz
> >
> >
> >
> > 2012/5/2 Theo Schlossnagle <jesus at omniti.com>
> >>
> >> you don't happen to automate anything through the telnet interface do
> >> you?  Like regular health checks or configuration stuff?
> >>
> >> On Wed, May 2, 2012 at 8:35 AM, Michal Taborsky <michal at taborsky.cz>
> >> wrote:
> >> > It leaks, but it's behaving inconsistently.
> >> >
> >> > To be 100% sure I did a fresh install on a new Centos 6 virtual
> machine,
> >> > configured a minimal noit.conf to reproduce the issue and re-ran
> >> > valgrind.
> >> > The results together with the config are
> >> > here: https://gist.github.com/2576023
> >> >
> >> > On one try, when noitd ran untouched, the leak did not occur. Then I
> >> > tried
> >> > again and used the telnet console a bit. There are some leaks visible
> >> > there,
> >> > but I do not think that's it. With valgrind running I cannot observe
> the
> >> > memory increase, because as far as I know valgrind will eat memory as
> >> > part
> >> > of the observation. Then I ran noitd again without valgrind and could
> >> > see
> >> > the memory increasing.
> >> >
> >> > I know... big help this does.
> >> >
> >> > --
> >> > Michal Taborsky
> >> > http://www.taborsky.cz
> >> >
> >> >
> >> >
> >> > 2012/5/2 Theo Schlossnagle <jesus at omniti.com>
> >> >>
> >> >> if valgrind doesn't show anything significant, something very subtle
> >> >> is going on.  Usually valgrind will reveal these sorts of programming
> >> >> errors quite obviously.  Are you sure it is leaking?
> >> >>
> >> >> On Tue, May 1, 2012 at 11:07 AM, Michal Taborsky <michal at taborsky.cz
> >
> >> >> wrote:
> >> >> > Hello Theo,
> >> >> >
> >> >> > I am not sure we finished this. So I ran the fixed code, valgrind
> >> >> > doesn't
> >> >> > show anything significant now. But as I wrote earlier, the problem
> is
> >> >> > somewhere in the snmp check timeout handling. I can avoid it for
> the
> >> >> > moment
> >> >> > by disabling the check that times out now. It should be pretty easy
> >> >> > to
> >> >> > simulate.
> >> >> >
> >> >> > --
> >> >> > Michal Taborsky
> >> >> > http://www.taborsky.cz
> >> >> >
> >> >> >
> >> >> >
> >> >> > 2012/4/22 Theo Schlossnagle <jesus at omniti.com>
> >> >> >>
> >> >> >> A whole bunch of fixes for mem-related stuff today... the two
> places
> >> >> >> that
> >> >> >> look bad in your leak diagnostics where the noit.gunzip lua
> wrapper
> >> >> >> and
> >> >> >> snmp
> >> >> >> in the event of timeouts.  However, both were quite small and
> would
> >> >> >> take a
> >> >> >> long time to accumulate to anything noticeable.  In otherwords,
> >> >> >> there's
> >> >> >> likely another more nefarious leak in there.
> >> >> >>
> >> >> >> Can you update everything to latest and run under valgrind again
> --
> >> >> >> feel
> >> >> >> free to run it for an hour or so (as long as it performs well
> enough
> >> >> >> to
> >> >> >> stay
> >> >> >> current).
> >> >> >>
> >> >> >>
> >> >> >> On Sun, Apr 22, 2012 at 11:29 AM, Michal Taborsky
> >> >> >> <michal at taborsky.cz>
> >> >> >> wrote:
> >> >> >>>
> >> >> >>> I am not familiar with valgrind, but the output is
> >> >> >>> here https://gist.github.com/2464624
> >> >> >>>
> >> >> >>> MT.
> >> >> >>>
> >> >> >>> 2012/4/22 Theo Schlossnagle <jesus at omniti.com>
> >> >> >>>>
> >> >> >>>> Are you familiar with valgrind?  If you could, compile with
> >> >> >>>> debugging
> >> >> >>>> symbols (-g) and run:
> >> >> >>>>
> >> >> >>>> valgrind --log-file=noitd.leaks --leak-check=full ./noitd -D
> >> >> >>>> (other
> >> >> >>>> args
> >> >> >>>> like -M or -c if you use those)
> >> >> >>>>
> >> >> >>>> Let it run for about two minutes, telnet into noitd and run
> >> >> >>>> shutdown
> >> >> >>>>
> >> >> >>>> valgrind should spit a ton of junk out in noitd.leaks -- that
> >> >> >>>> should
> >> >> >>>> pinpoint the problem.
> >> >> >>>>
> >> >> >>>> 2012/4/22 Michal Taborsky <michal at taborsky.cz>
> >> >> >>>>>
> >> >> >>>>> After upgrading to the latest code from master, I am
> experiencing
> >> >> >>>>> some
> >> >> >>>>> memory leak in noitd. This noit runs about 200 checks per
> minute
> >> >> >>>>> with
> >> >> >>>>> various modules and looses about a 500k per minute. It's CentOS
> >> >> >>>>> 6.2
> >> >> >>>>> 64bit.
> >> >> >>>>>
> >> >> >>>>> [root at server etc]# date;  ps uax | grep noitd
> >> >> >>>>> Sun Apr 22 13:18:21 CEST 2012
> >> >> >>>>> root     22405  0.0  0.1 146844  1088 ?        S    12:59
> 0:00
> >> >> >>>>> /usr/local/sbin/noitd -c /usr/local/etc/noit.conf
> >> >> >>>>> root     22406  1.1  3.9 940652 40108 ?        Sl   12:59
> 0:13
> >> >> >>>>> /usr/local/sbin/noitd -c /usr/local/etc/noit.conf
> >> >> >>>>> [root at server etc]# date;  ps uax | grep noitd
> >> >> >>>>> Sun Apr 22 13:42:30 CEST 2012
> >> >> >>>>> root     22405  0.0  0.1 146844  1088 ?        S    12:59
> 0:00
> >> >> >>>>> /usr/local/sbin/noitd -c /usr/local/etc/noit.conf
> >> >> >>>>> root     22406  1.1  5.0 951612 51560 ?        Sl   12:59
> 0:28
> >> >> >>>>> /usr/local/sbin/noitd -c /usr/local/etc/noit.conf
> >> >> >>>>>
> >> >> >>>>> I know this does not help much and I am ready to provide more
> >> >> >>>>> info
> >> >> >>>>> if
> >> >> >>>>> requested. For now, I did not have the time to try to disable
> the
> >> >> >>>>> checks one
> >> >> >>>>> by one to see which one causes it. I'll do it later, if
> >> >> >>>>> necessary.
> >> >> >>>>> Right now I am looking for some possible quick fix or advice
> how
> >> >> >>>>> to
> >> >> >>>>> find it, if there is any.
> >> >> >>>>>
> >> >> >>>>> --
> >> >> >>>>> Michal Taborsky
> >> >> >>>>> http://www.taborsky.cz
> >> >> >>>>>
> >> >> >>>>>
> >> >> >>>>> _______________________________________________
> >> >> >>>>> Reconnoiter-devel mailing list
> >> >> >>>>> Reconnoiter-devel at lists.omniti.com
> >> >> >>>>> http://lists.omniti.com/mailman/listinfo/reconnoiter-devel
> >> >> >>>>>
> >> >> >>>>
> >> >> >>>>
> >> >> >>>>
> >> >> >>>> --
> >> >> >>>>
> >> >> >>>> Theo Schlossnagle
> >> >> >>>>
> >> >> >>>> http://omniti.com/is/theo-schlossnagle
> >> >> >>>>
> >> >> >>>>
> >> >> >>>
> >> >> >>
> >> >> >>
> >> >> >>
> >> >> >> --
> >> >> >>
> >> >> >> Theo Schlossnagle
> >> >> >>
> >> >> >> http://omniti.com/is/theo-schlossnagle
> >> >> >>
> >> >> >>
> >> >> >
> >> >>
> >> >>
> >> >>
> >> >> --
> >> >> Theo Schlossnagle
> >> >>
> >> >> http://omniti.com/is/theo-schlossnagle
> >> >
> >> >
> >>
> >>
> >>
> >> --
> >> Theo Schlossnagle
> >>
> >> http://omniti.com/is/theo-schlossnagle
> >
> >
>
>
>
> --
> Theo Schlossnagle
>
> http://omniti.com/is/theo-schlossnagle
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.omniti.com/pipermail/reconnoiter-devel/attachments/20120502/b03bf8b1/attachment.html>


More information about the Reconnoiter-devel mailing list