[Ur] TechEmpower Benchmarks
Adam Chlipala
adamc at csail.mit.edu
Wed Dec 11 19:16:29 EST 2013
On 12/11/2013 04:06 PM, escalier at riseup.net wrote:
> I suppose I should have said "obvious changes that may turn out to be
> improvements".
>
>
>> I'm curious how involving Nginx could improve performance
>>
> In my rather brief and inexpert tests, Ur/Web + lighttpd had about 70% of
> pure Ur/Web performance.
>
Interesting; so throwing at least one popular proxy in front doesn't
bring magic performance improvements. That's at least comforting from
the perspective of not challenging my mental model of how efficient the
Ur/Web HTTP binaries are.
I should check: 70% performance is better or worse than the baseline? :)
>> I've pushed a changeset to avoid all database operations for page
>> handlers that don't need the database.
>>
> Wow. Dramatic improvement here too.
>
> If I make a dubious extrapolation, (new local)/(old local) = (hypothetical
> new i7)/(old i7), the hypothetical i7 performance I get on JSON
> serialization, for example, is ~47,000. That would be around 75% of Yesod
> performance (which I have supposed to be the framework most analogous to
> Ur/Web).
>
Yesod is probably most similar in terms of the programming experience,
but at run-time, I would expect the closest frameworks to be those based
directly on C or C++!
You can take a look at the generated code for, e.g., '/json' URIs by
running 'urweb' with the '-debug' flag. The C source will then be in
/tmp/webapp.c. (I just pushed a changeset that adds an optimization
that makes this code even more direct, though it doesn't seem to have
any serious performance effect for any of the benchmarks.)
The '/json' handler is a function named like '__uwn_wrap_json_XXX'. It
does almost nothing:
- Send some hints about region-based memory management with
uw_[begin|end]_region(). These should just be little more than single
pointer bumps.
- Call the function to add the required HTTP headers. Probably trivial
running time here, though we most likely take on system call overhead to
get the current time. Writing the headers themselves is just copying
into a mutable string buffer.
- Clear the mutable string buffer holding the page to return.
- Make a number of uw_write() calls to append content to that buffer.
- Twice call the string escaping function, which writes its output
directly to the page buffer in an efficient manner.
- Call the runtime system function to return the current page buffer
content with a particular MIME type.
So, I hope you agree that it's not obvious how any of this could be much
faster, working directly with any reasonable C library for HTTP
serving. In contrast, understanding the run-time behavior of any
Haskell program is probably much more complex.
Again, I'd be very interested in any help anyone is willing to offer on
comparing the execution of the latest Ur/Web benchmark against one of
the current winners for the 'plaintext' benchmark. There could be
runtime system inefficiencies that could be teased out in this way. So
far my personal motivation level hasn't reached the point where I'm
willing to replicate the official benchmark setup on EC2, but I'd be
very happy to provide help (and maybe even money) to someone else who
would take charge of it!
P.S.: I also just learned that OpenSSL's random number generation is not
thread-safe, which probably led to some segfaults during execution of
the benchmarks that call Ur/Web's [rand]. This issue is fixed now by
adding a lock in the Ur/Web library.
More information about the Ur
mailing list