oss-sec mailing list archives
Re: backtrace_symbols() misuse by Ceph and its supposedly-safe use
From: Jacob Bachmeyer <jcb62281 () gmail com>
Date: Fri, 12 Jul 2024 18:15:07 -0500
Alexander Patrakov wrote:
[...] What would be a good solution (as in: something that does not convert crashes into deadlocks) here? I understand that, after memory corruption, we are already in the UB territory, but is there anything better possible than what is implemented?
I would suggest a monitor daemon that runs GDB to get the backtrace. The simplest way to do this would require Ceph to have its own supervisor (not unique; PostgreSQL has long had a "postmaster" process that manages the worker "postgres" backend processes) and provide each daemon with a pipe back to the supervisor; the fatal error handler need only write(2) to the pipe from a static string and/or fixed buffer (to report a signal number) and then enter an infinite loop; the supervisor then kills the crashed process, possibly after attaching GDB and collecting a backtrace.
Alternately, simply run the Ceph daemons with `ulimit -c` nonzero and collect the core files. The core files can be analyzed using GDB after the fact. No dedicated supervisor needed here, only kernel facilities.
The central problem here, as I understand it, is trying to do too much in a process that has gone into undefined behavior. Attaching GDB or dumping a core file both sidestep that problem.
-- Jacob
Current thread:
- backtrace_symbols() misuse by Ceph and its supposedly-safe use Alexander Patrakov (Jul 12)
- Re: backtrace_symbols() misuse by Ceph and its supposedly-safe use Jacob Bachmeyer (Jul 13)
- Re: backtrace_symbols() misuse by Ceph and its supposedly-safe use Simon McVittie (Jul 13)
