Notes on MMc queues

Recently had a project related to queues at work, so wanted to put some of my notes in a blog post. For a bit of up-front, the notation MMc refers to a queuing system with multiple servers (c), and the inputs are Poisson distributed (the first M), and have exponential service rates M (these Ms can be different though). That is a mouthful, but basically saying events that arrive in independently and have a left skewed distribution of times it takes to resolve those events. (That may seem like a lot of assumptions, they are often reasonable though for many systems, and if not deviations may not be that big of deal to the estimates in practice.)

Main reason for blog post is that the vast majority of stuff online is about MM1 queue systems, so systems that only have 1 server. I basically never deal with this situation. The formulas for multiple servers are much more complicated, so took me a bit to gather code examples and verify correctness. These are notes based on that work.

So for up-front, the group I was dealing with at work had a fundamental problem, their throughput was waaay too small. In this notation, we have:

  • Number of arrivals per time period, N
  • Mean time it takes to exit the queue, S
  • Number of servers, c

So first, you need to have N*S < c! This is simple accounting, so say we are talking about police calls for service, you have on average 5 calls per hour, and they take on average 0.5 hours (30 minutes) to handle. You then need more than 5*0.5 = 2.5 officers to handle this, so a minimum of 3 officers. If you don’t have 3 officers, the queue will grow, you won’t be able to handle all of the calls.

At work I was advising a situation where they were chronically too low of staff serving for a particular project, and it has ballooned over an extended period of time to create an unacceptable backlog. So think S is really tiny and N is very large – at first the too small of servers could cycle through the tickets, but the backlog just slowly grew, and then after months, they had unacceptable wait times. This is a total mess – there is no accounting trick to solve this, you need c > N*S. It makes no sense to talk about anything else like average wait time in the queue unless that condition is met.

OK, so we know you need c > N*S, a common rule of thumb is that capacity should not be over 80%, so that is c > (N*S)/0.8. (This is not for policing, but more common for call centers, see also posts on Erlang-C formulas.) The idea behind 80% it is at the point where wait times (being held in the queue) start to grow.

If you want to get more into the nitty gritty though, such as calculating the actual probability in the queue, average wait time, etc., then you will want to dig into the MMc queue lit. Here I have posted some python notes (that is itself derivative work others have posted). Hoping just posting and giving my thumbs up makes it easier for others.

So first here is an example of using those functions to estimate our queue example above. Note you need to give the inverse of the mean service time for this function.

# queuing functions in python
from queue import MMc, nc

N = 6    # 6 calls per hour
S = 0.5  # calls take 30 minutes to resolve
c = 7    # officers taking calls

# This function expects inverse service average
qS = MMc(N,1/S,c)

# Now can get stats of interest

# This is the probability that when a call comes
# in, it needs to wait for an officer
qS.getQueueProb()

And this prints out 0.0376.... So when a call comes in, we have a 3% probability of having to wait in the queue for an officer to respond. How about how long on average a call will wait in the queue?

# This is how long a call on average needs
# to wait in the queue in minutes
qS.getAvgQueueTime()*60

And this gives 0.28.... The multiplication by 60 goes from hours to minutes, and we are waiting less than 1 minute on average. This seems good, but somewhat counter-intuitively, this is an average of a bunch of calls answered immediately, plus the 3.8% of calls that are held for some time. We can calculate the estimate of if a call is held, how long will it be held on average:

# If a call is queued however, how long to wait?
qS.getAvgQueueTime_Given()*60

And this is a less rosy 17.5 minutes! Queues are tricky. Unless you have a lot of extra capacity, there are going to be wait times. We can also calculate how often all officers will be idle in this setup.

# Idle time, no one taking any calls
qS.getIdleProb()

And this gives rounded 0.05, so we have only 5% idle time in this set up. This is not that helpful though for police planning, you want individual officers to have capacity to do proactive work, that is more you want officers to only spend 40-60% on responding to calls for service, so that suggests c > (N*S)/0.5 is where you want to be. Which is where we are at in this scenario with 7 officers. This is the probability all 7 officers will be idle at once, which does not really matter.

Now you can technically just run this through multiple values of c to get this, but Rosetti (2021) has listed an approximate square root staffing formula that given an input probability wait in the queue, how many servers do you need. So here is that function:

# If you want probability of holding in the queue to only be 3%
est_serv = nc(N,S,0.03)
print(est_serv)

Which prints out 6.387..., so since you need to take the ceiling of this, you will need to 7 officers to keep to that probability (agreeing with the MMc object above).

In terms of values, the nc function will work with very large/small N and S inputs just fine. The MMc function also looks fine, minus one submethod uses a factorial, .getPk (so cannot have very large inputs to that method), but the rest is OK. So if you wanted to do nc(very_big,very_small,0.1) that is fine and should be no floating point issues.

The nc function relies on scipy, but the MMc class is all base python (just the math library). So the MMc functions can really be embedded in any particular python application you want with no real problem.

Rough Estimates for Spatial Police Planning

I have prior work on spatial allocation of patrol units with workload equality constraints (Wheeler, 2018). But generally, you need to first estimate how many units you will have, and after that you can worry about optimally distributing them. The reason for this is that the number of units is much more important, too few and you will have more queuing, in which case the spatial arrangement does not matter at all. Larson & Stevenson (1972) estimate optimal spatial allocation only beats random allocation by 25%.

So for police response times you can think about time waiting in queue, time spent driving to the event, and time spent resolving the event (time to dispatch tends to be quite trivial, but is sometimes included in the wait in the queue part, Verlaan & Ruiter, 2023).

There is somewhat of a relationship between the above “service” time, fewer people have to drive farther, and so service time goes up. But there happens to some simple rules of thumb, if you have N patrol units, you can calculate (2/3)*sqrt(Square Miles)/sqrt(N) = average distance traveled in miles for your jurisdiction (Stenzel, 1993, see page 135 in the PDF). Then you can translate that miles driven to time, by say taking an average of 45 miles per hour. Given a fixed N, you can then just add this into the service time estimate for your given jurisdiction to get a rough estimate of more officers will reduce response times by X amount.

It ends up being though this tends to be trivial relation to the waiting in the queue time (or the typical it takes 30 minutes to resolve police incidents on average). So it is often more important to get rough estimates for that if you want to reduce wait times for calls for service. And this does not even take into account priority levels in calls, but to start simpler folks should figure out a minimum to handle the call stack (whether in policing or in other areas) and then go onto more complicated scenarios.

References

Leave a comment

Leave a comment