A year ago today I wrote this post [recently and I thought I’d wait] and now I’m unleashing it on my readers.The question of aggregated logging rages on. The stark reality that there is no real or correct answer only time and money relative to the current state of the economics in the system. Google only shuttles actionable events to it’s MQ and operations; meaning that if you need to debug a problem you have to log into the system that generated the error. On the other hand there is an option that all of the messages are being forwarded to the aggregate server for immediate evaluation.And so the proof was presented to me this way:at my previous company we had 600 servers producing 1B messages a day which we processed on just 15 servers with varying functions.At first this seemed like a rational description but I was still skeptical; and it only took some simple math.1B messages divided among 600 machines leaves 1.6M messagesgiven an average concentration of a 10hr day that means 160K messages an hourmaking a guess that there are 5K messages per transaction … That means that the system was processing about 32 TPH. And even if we agree that they produced 1000 messages per transaction then that’s only 160 TPH or 2.6TPS.Now let’s do the math the other way:1000TPS * 60 = 60K-TPS60K * 1000msgs = 60M messages per hour60M * 600 Servers = 36B messagesand over 10hrs that 360B messagesIf the average message size is 500bytes then you are talking about something like 180TB per day. And when you are collecting data at this rate you are probably collecting many multiples of petabytes of data. Not to mention backups with ot without compression.