Saturday, June 21, 2014

Intel, Xeon, FPGA

I've been waiting years for an on die FPGA!

http://www.extremetech.com/extreme/184828-intel-unveils-new-xeon-chip-with-integrated-fpga-touts-20x-performance-boost

Time to brush up on the Verilog.  Combine with C++ and mcorelab, we'll finally be able to really cook!  High throughput message routing based on topic paths or tag/value pairs will now be possible.  The line between network switch and application is finally starting to blur, and high throughput applications will be easier to build with commodity servers.

Thursday, June 12, 2014

Mcorelab - Mcoreweb bench numbers

Quick correction, numbers below are for Haswell with AVX2.

These are unofficial numbers, mileage may vary depending on your hardware.  Still pretty amazing.  With a low enough update rate I'd reckon one could get millions of websocket users on a single server!

http://www.mcorelab.com

test setup and some performance numbers of mcoreweb server (number reported are single server performance):
- server: dual E5-2690 with Intel 10GbE (X520).
- client:  multiple client nodes generating load, total of 100,000 concurrent client sockets/connections.
- server answer each request message with a response message.
- message payload size=64Byte, so the test is the raw message/request handling capacity, rather than limited by PHY -- our software can easily saturate 10Gbit with just moderate message size)

Windows 2012/2008-R2:
1). sample server implemented on the mcoreweb/TCPserver layer:
using 2 core on the server: 1.5 Million/second,
using 4 core on the server: 3.3 Million/second
using 8 core on the server: 5.9 Million/second

SSL (AES encrypted):
using 4 core on the server: 1.35 Million/second
using 8 core on the server: 2 Million/second
(If using Intel newer gen Ivybridge CPU, SSL performance is much better. here the result is limited SandyBridge CPU which is without the new AVX2 instructions.)

2). sample server implemented on the mcoreweb/websocket layer:
2 core: 1.48 Million/second
4 core: 2.56 Million/second
8 core: 5.55 Million/second

Secure Websocket (AES encrypted):
4 core: 0.95 Million/second
8 core: 1.99 Million/second

Redhat Linux 6.4:
1). sample server implemented on the mcoreweb/TCPserver layer:
using 2 core on the server: 1.8 Million/second,
using 4 core on the server: 3.4 Million/second
using 8 core on the server: 6.1 Million/second

SSL (AES encrypted):
using 4 core on the server: 1.13 Million/second
using 8 core on the server: 2.1 Million/second

2). sample server implemented on the mcoreweb/websocket layer:
2 core: 1.43 Million/second
4 core: 3 Million/second
8 core: 5.45 Million/second

Secure Websocket (AES encrypted):
4 core: 0.99 Million/second
8 core: 1.75 Million/second

Tuesday, June 10, 2014

The grandaddy of c100k

This does a great job explaining the scaling issues in server side programming.  Lots of technical concepts, but laymen should also be able to get the big picture.

The original c10k article

Multiplexing sockets with a regular OS

Sure, anyone can pin up 50k sockets on a single tcp listener.  But, how many times can you read/write all those sockets concurrently?  Verbiage may vary, I call these iops, as in (input/output operations per second).   For our purposes, we will say they carry a 64 byte payload.

Has anyone out there been able to do a send/recv operation, ie ping pong, every second on 50k connected tcp sockets (ie 100k iops, server side) using Linux or Windows?  If so, I'd love to make your acquaintance.  In fact, through my journey, I've yet to meet anyone who can exceed the above numbers using a regular OS.

Stay tuned, after a bit more reflection I'll get into a pretty disruptive concept which can multiplex 100k sockets with millions of iops.  Things are starting to get interesting.  Think of a webserver streaming >100k concurrent users via websockets!  Extreme scalability.

More to come ...