A year ago today I wrote this post [recently and I thought I’d wait] and now I’m unleashing it on my readers.
The question of aggregated logging rages on. The stark reality that there is no real or correct answer only time and money relative to the current state of the economics in the system. Google only shuttles actionable events to it’s MQ and operations; meaning that if you need to debug a problem you have to log into the system that generated the error. On the other hand there is an option that all of the messages are being forwarded to the aggregate server for immediate evaluation.
And so the proof was presented to me this way:
at my previous company we had 600 servers producing 1B messages a day which we processed on just 15 servers with varying functions.
At first this seemed like a rational description but I was still skeptical; and it only took some simple math.
1000TPS * 60 = 60K-TPSIf the average message size is 500bytes then you are talking about something like 180TB per day. And when you are collecting data at this rate you are probably collecting many multiples of petabytes of data. Not to mention backups with ot without compression.
60K * 1000msgs = 60M messages per hour
60M * 600 Servers = 36B messages
and over 10hrs that 360B messages