[Info-vax] Apache + mod_php performance

Fri Sep 27 15:39:21 EDT 2024

In article <vd6n70$q3fm$1 at dont-email.me>,
Arne VajhÃ¸j  <arne at vajhoej.dk> wrote:
>On 9/27/2024 10:16 AM, Dan Cross wrote:
>> In article <vd6dh4$nrif$1 at dont-email.me>,
>> Arne VajhÃ¸j  <arne at vajhoej.dk> wrote:
>>> On 9/27/2024 9:18 AM, Craig A. Berry wrote:
>>>>        Â  The only thing I can think of that hasn't already been mentioned
>>>> is that Tomcat code is JIT-compiled, which is likely to be pretty good,
>>>> optimized code, whereas Apache is probably either cross-compiled or
>>>> native-compiled with an early enough field test compiler that there are
>>>> no optimizations.
>>>
>>> That is a possible explanation.
>>>
>>> But the difference in numbers are crazy big.
>>>
>>> Apache getting a static text file with 2 bytes: 22 req/sec
>>>
>>> Tomcat with Quercus and PHP getting data out of a MySQL database on
>>> Windows and outputting HTML: over 200 req/sec
>>>
>>> Tomcat using JSP (which get triple compiled) getting data out of a MySQL
>>> database on Windows (with db connection pool) and outputting HTML: over
>>> 600 req/sec.
>>>
>>> My gut feeling is that cross-compilation may contribute to but not
>>> fully explain the difference.
>> 
>> Almost certainly not; this is an IO bound application, not CPU
>> bound.
>
>With static content yes.

Correct.  That's all you ought to be looking at under you
understand why that's slow.

>With dynamic content and the volume Apache+mod_php delivers yes.

Maybe, but without a profile you really don't know.  But beyond
that, it is currently irrelevant.  You see approximately the
same numbers with static and dynamic content; this heavily
implies that the dynamic content case is not related to the
present slow-down, including it now is premature, and likely
just masks what's _actually_ wrong.

>With dynamic content and high volume then CPU can matter. Tomcat
>and Quercus can do over 200 req/sec, but CPU utilization fluctuate
>between 150% and 250% - 4 VCPU used so not CPU bound, but could
>have been if it had been just 2 VCPU.

See above.  You know that there's a problem with Apache and
static content, but you don't know _what_ that problem is.  Why
would you jump ahead of yourself worrying about things like that
until you actually understand what's going on?

In this case, concentrating on static content, CPU time consumed
by Apache itself due to poor optimization or something seems
like a low-probability root cause of the performance problems
you are seeing, as static file service like this is IO, not
compute, bound.  Keep your eye on the ball.

>> My strong suspicion is that what you're seeing is the result of
>> a serious impedance mismatch between the multi-process model
>> Apache was written to use, and its realization using the event
>> signalling infrastructure on VMS.
>
>Yes.

Maybe.  You really haven't done enough investigation to know, at
least going by what you've reported here.

>Or actually slightly worse.
>
>Prefork MPM is the multi-process model used in Apache 1.x - it is still
>around in Apache 2.x, but Apache 2.x on Linux use event or worker
>MPM (that are a mix of processes and threads) and Apache 2.x on Windows
>use winnt MPM (that is threads only).

Ok, sure.  But as you posted earlier, Apache on VMS, as you're
using it, is using the MPM model, no?

>> Again, I would try to establish a baseline.  Cut out the MPM
>> stuff as much as you can;
>
>MPM is the core of the server.

No, you misunderstand.  Try to cut down on contention due to
coordination between multiple entities; you do this by
_lowering_ the number of things at play (processes, threads,
whatever).  The architecture of the server is irrelevant in
this case; what _is_ relevant is minimizing concurrency in its
_configuration_.  Does that make sense?

>>                       ideally, see what kind of numbers you
>> can get fetching your text file from a single Apache process.
>> Simply adding more threads or worker processes is unlikely to
>> significantly increase performance, and indeed the numbers you
>> posted are typical of performance collapse one usually sees due
>> to some kind of contention bottleneck.
>
>It increases but not enough.
>
>1 -> 0.1 req/sec
>150 -> 11 req/sec
>300 -> 22 req/sec
>
>> Some things to consider: are you creating a new network
>> connection for each incoming request?
>
>Yes. Having the load test program keep connections alive
>would be misleading as real world clients would be on different
>systems.

Again, you're getting ahead of yourself.  Try simulating a
single client making multiple, repeated tests to a single
server, ideally reusing a single HTTP connection.  This will
tell you whether the issue is with query processing _inside_
the server, or if it has something to do with handling new
connections for each request.  If you use HTTP keep alives
and the number of QPS jumps up, you've narrowed down your
search space.  If it doesn't, you've eliminated one more
variable, and again, you've cut down on your search space.

Does that make sense?

>>                                      It's possible that that's
>> hitting a single listener, which is then trying to dispatch the
>> connection to an available worker,
>
>That is the typical web server model.

No, it is _a_ common model, but not _the_ "typical" model.  For
instance, many high-performance web solutions are built on an
asynchronous model, which effectively implement state machines
where state transitions yield callbacks that are distributed
across a collection of executor threads.  There's no single
"worker" or dedicated handoff.

Moreover, there are many different _ways_ to implement the
"listener hands connection to worker" model, and it _may_ be
that the way that Apache on VMS is trying to do it is
inherently slow.  We don't know, do we?  But that's what we're
trying to figure out, and that's why I'm encouraging you to
start simply and build on what you can actually know from
observation, as opposed to faffing about making guesses.

>>                                     using some mechanism that is
>> slow on VMS.
>
>It is a good question how Apache on VMS is actually doing that.
>
>All thread based solutions (OSU, Tomcat etc.) just pass a
>pointer/reference in memory to the thread. Easy.
>
>Fork create a process copy with the open socket. I am not quite
>sure about the details of how it works, but it works.
>
>If the model on VMS is:
>
>---(HTTP)---parent---(IPC)---child
>
>then it could explain being so slow.
>
>I may have to read some of those bloody 3900 lines of code (in a
>single file!).

Precisely.  And maybe run some more experiments.

>>               Is there a profiler available?  If you can narrow
>> down where it's spending its time, that'd provide a huge clue.
>
>Or I take another path.

This is a useful exercise either way; getting to the root cause
of a problem like this may teach you something you could apply
to other, similar, problems in the future.

	- Dan C.