[mod_python] Re: child process XXXX still did not exit, sending a SIGTERM

Martin Blais blais at furius.ca
Wed Jan 4 11:19:45 EST 2006


On 1/4/06, Graham Dumpleton <grahamd at dscpl.com.au> wrote:
>
> The stack trace is a bit bogus from what I can tell. In the various MPMs
> I looked at, the ap_graceful_stop_signalled() function simple sets a
> variable and returns. It doesn't go calling apr_pool_destroy().

I'm still not sure how the normal termination communication message
between the apache parent and its child happens, I thought it was
supposed to be via the scoreboard, but the stack trace seems to
indicate via a signal.  When I attach gdb on the running child prior
to stopping the app, when I stop apache it gets SIGTERM right away,
and not when a timeout occurs.  I guess I should dig in apache and
libc now (won't happen for another 2 weeks, I have some important work
to move on to now for a deadline), to find out how the normal
termination is supposed to occur.

Note that if I attach gdb before I shutdown apache I can't reproduce
this stack trace.  I need to attach after I terminate apache to get to
this.




> Anyway, seeing the stack trace I can see where the problem lies and can
> simulate the situation with a test case.
>
> What it all comes down to is the signal handler for a SIGTERM in the
> child process is registered as:
>
>    apr_signal(SIGTERM, just_die);
>
> Thus when the SIGTERM is received it calls just_die(). The just_die()
> function calls clean_child_exit(), which if there is found to be a
> memory pool in existence for the child process calls apr_pool_destroy()
> on that memory pool.
>
> The problem then is that mod_python registers a cleanup handler
> associated with that memory pool, namely python_finalize(). Ie., it
> calls:
>
>    apr_pool_cleanup_register(p, NULL, python_finalize,
> apr_pool_cleanup_null);
>
> This means that when that memory pool is destroyed, the
> python_finalize()
> function is being called, which is wrong in that situation for a couple
> of reasons.

Maybe we should change the way python_finalize() is being triggered.
Any ideas?


>
> The first reason is that complex things should not be done from inside
> of
> signal handlers unless the code which is called is heavily protected
> against being called by signal handlers when in critical sections. There
> is no way that general Python API functions are going to fall into that
> category.

Indeed.


> The second reason is that at the time that the signal occurs, the main
> program thread is already deep within Python code and probably has
> various
> locks acquired. When the signal handler calls into Py_Finalize() it is

I don't know about that, the trace does not indicate that we're
processing a request at all.  But it could happen I suppose.

What we could/should do on that signal is to simply mark a variable
for later exiting the wait-loop.  That must be somewhere within the
apache libs.  This way we could terminate properly without being in a
signal handler.


> most likely reaching a point where it wants to acquire the same lock
> as the main program thread has and it effectively deadlocks as the
> signal handler can't proceed until it gets the lock, but the main
> program thread can't give it up while the signal handler is running.
>
> At least this is the case on UNIX systems, where signal handlers
> interrupt the execution of the main program thread, unlike Win32 where
> signal handlers are a distinct thread in their own right.
>
> My immediate question is why does Py_Finalize() even need to be called
> within the context of the child process if it is simply being killed off
> anyway. I know that for the Apache main process if doing a restart that
> Py_Finalize() needs to be called as the same process is kept around,
> but for a child process I don't see the point except maybe to flush
> out stderr/stdout which aren't typically used in mod_python anyway.

I'm still not convinced if it is being killed off or asked to
gracefully go down.


> Time now to work out why python_finalize() needs to be called. Maybe
> it can't simply not do anything when called in the context of the child
> process.


> Anyway, one could put:
>
>      if (child_init_pool)
>          return APR_SUCCESS;
>
> at the start of python_finalize() and that would at least avoid any
> problems
> with the signal handler trying to do complicated stuff like call into
> Python
> and cause a deadlock.

> Can you at least try the above little addition to python_finalize() and
> see if it makes any difference in your specific case.

Oh yes.  Problem completely gone... but then again python_finalize is
not being called for any of the children (I checked with some logging
traces), whilst before some of the children managed to terminate
gracefully.

Hmm, I think we either need to find a way to terminate outside of a
signal handler, or to forego calling Py_Finalize entirely (I don't
like the latter "solution").



More information about the Mod_python mailing list