Thursday, May 2, 2024
HomeGolangWhy is "/sched/latencies:seconds" a largely delayed metric for gorountine queuing delay -...

Why is “/sched/latencies:seconds” a largely delayed metric for gorountine queuing delay – Getting Assist


tldr; When working grpc server and shoppers, why is the server-side gorountine queuing delay (/sched/latencies:seconds) from bundle *runtime metrics a lot slower than the actual latency measured at client-side?

Description:
When working microservices beneath heavy load, the typical or most queuing delay is often used as a sign of congestion (Dagor 2018, Breakwater 2020). I’ve been trying to gather this data for overload management on a go-grpc server. Since goroutines are scheduled by Go runtimes, I attempted to make use of the ready time data for goroutines. Nonetheless, the /sched/latencies:seconds metric from /runtime/metrics appears to be considerably delayed. I calculate the ninetieth percentile and max latency because the inexperienced, crimson strong line within the determine, but it surely at all times experiences latency peaks 0.5 seconds later after the peaks occurred (the dotted line, e2e latency measured from shopper aspect).

The plots reveals one situations of GreetingServiceServer, with 10ms computation concerned (however deducted in determine). The /sched/latencies:seconds are sampled at 20ms interval. I elevated the load after the 4th second, the place you’ll be able to see a peak of e2e latency there.

My query is:

  1. The place does the delay happen within the case of an overloaded unary gRPC server? Is /sched/latencies:seconds an accurate measure for this queuing delay?
  2. As proven within the determine above, there’s is a large hole horizontally (/sched/latencies:seconds is 0.5 second later than e2e measures), why is it the case?
  3. Any method to repair it? Some other measurement out there for golang to trace the queuing delay?

Any assist can be appreciated!!

Hello @jolly, welcome to the discussion board.

I’ve not labored with runtime/metrics but, so permit me a dumb query.

I don’t see any wall-clock timestamps within the definition of Pattern or Float64Histogram.

How do you align the x-axis of the output of /sched/latencies:seconds with the e2e latency measured on the shopper aspect?

Edited so as to add: Values that /sched/latencies:seconds supplies

RELATED ARTICLES

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Most Popular

Recent Comments