119 lines
5.0 KiB
Diff
119 lines
5.0 KiB
Diff
From 2d8daa84e1cb9681d49a29cf829eb5678850210e Mon Sep 17 00:00:00 2001
|
|
From: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
|
|
Date: Wed, 13 Aug 2025 12:25:58 +0200
|
|
Subject: cpuidle: governors: menu: Avoid selecting states with too much
|
|
latency
|
|
|
|
Occasionally, the exit latency of the idle state selected by the menu
|
|
governor may exceed the PM QoS CPU wakeup latency limit. Namely, if the
|
|
scheduler tick has been stopped already and predicted_ns is greater than
|
|
the tick period length, the governor may return an idle state whose exit
|
|
latency exceeds latency_req because that decision is made before
|
|
checking the current idle state's exit latency.
|
|
|
|
For instance, say that there are 3 idle states, 0, 1, and 2. For idle
|
|
states 0 and 1, the exit latency is equal to the target residency and
|
|
the values are 0 and 5 us, respectively. State 2 is deeper and has the
|
|
exit latency and target residency of 200 us and 2 ms (which is greater
|
|
than the tick period length), respectively.
|
|
|
|
Say that predicted_ns is equal to TICK_NSEC and the PM QoS latency
|
|
limit is 20 us. After the first two iterations of the main loop in
|
|
menu_select(), idx becomes 1 and in the third iteration of it the target
|
|
residency of the current state (state 2) is greater than predicted_ns.
|
|
State 2 is not a polling one and predicted_ns is not less than TICK_NSEC,
|
|
so the check on whether or not the tick has been stopped is done. Say
|
|
that the tick has been stopped already and there are no imminent timers
|
|
(that is, delta_tick is greater than the target residency of state 2).
|
|
In that case, idx becomes 2 and it is returned immediately, but the exit
|
|
latency of state 2 exceeds the latency limit.
|
|
|
|
Address this issue by modifying the code to compare the exit latency of
|
|
the current idle state (idle state i) with the latency limit before
|
|
comparing its target residency with predicted_ns, which allows one
|
|
more exit_latency_ns check that becomes redundant to be dropped.
|
|
|
|
However, after the above change, latency_req cannot take the predicted_ns
|
|
value any more, which takes place after commit 38f83090f515 ("cpuidle:
|
|
menu: Remove iowait influence"), because it may cause a polling state
|
|
to be returned prematurely.
|
|
|
|
In the context of the previous example say that predicted_ns is 3000 and
|
|
the PM QoS latency limit is still 20 us. Additionally, say that idle
|
|
state 0 is a polling one. Moving the exit_latency_ns check before the
|
|
target_residency_ns one causes the loop to terminate in the second
|
|
iteration, before the target_residency_ns check, so idle state 0 will be
|
|
returned even though previously state 1 would be returned if there were
|
|
no imminent timers.
|
|
|
|
For this reason, remove the assignment of the predicted_ns value to
|
|
latency_req from the code.
|
|
|
|
Fixes: 5ef499cd571c ("cpuidle: menu: Handle stopped tick more aggressively")
|
|
Cc: 4.17+ <stable@vger.kernel.org> # 4.17+
|
|
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
Reviewed-by: Christian Loehle <christian.loehle@arm.com>
|
|
Link: https://patch.msgid.link/5043159.31r3eYUQgx@rafael.j.wysocki
|
|
---
|
|
drivers/cpuidle/governors/menu.c | 29 ++++++++++++-----------------
|
|
1 file changed, 12 insertions(+), 17 deletions(-)
|
|
|
|
--- a/drivers/cpuidle/governors/menu.c
|
|
+++ b/drivers/cpuidle/governors/menu.c
|
|
@@ -287,20 +287,15 @@ static int menu_select(struct cpuidle_dr
|
|
return 0;
|
|
}
|
|
|
|
- if (tick_nohz_tick_stopped()) {
|
|
- /*
|
|
- * If the tick is already stopped, the cost of possible short
|
|
- * idle duration misprediction is much higher, because the CPU
|
|
- * may be stuck in a shallow idle state for a long time as a
|
|
- * result of it. In that case say we might mispredict and use
|
|
- * the known time till the closest timer event for the idle
|
|
- * state selection.
|
|
- */
|
|
- if (predicted_ns < TICK_NSEC)
|
|
- predicted_ns = data->next_timer_ns;
|
|
- } else if (latency_req > predicted_ns) {
|
|
- latency_req = predicted_ns;
|
|
- }
|
|
+ /*
|
|
+ * If the tick is already stopped, the cost of possible short idle
|
|
+ * duration misprediction is much higher, because the CPU may be stuck
|
|
+ * in a shallow idle state for a long time as a result of it. In that
|
|
+ * case, say we might mispredict and use the known time till the closest
|
|
+ * timer event for the idle state selection.
|
|
+ */
|
|
+ if (tick_nohz_tick_stopped() && predicted_ns < TICK_NSEC)
|
|
+ predicted_ns = data->next_timer_ns;
|
|
|
|
/*
|
|
* Find the idle state with the lowest power while satisfying
|
|
@@ -316,13 +311,15 @@ static int menu_select(struct cpuidle_dr
|
|
if (idx == -1)
|
|
idx = i; /* first enabled state */
|
|
|
|
+ if (s->exit_latency_ns > latency_req)
|
|
+ break;
|
|
+
|
|
if (s->target_residency_ns > predicted_ns) {
|
|
/*
|
|
* Use a physical idle state, not busy polling, unless
|
|
* a timer is going to trigger soon enough.
|
|
*/
|
|
if ((drv->states[idx].flags & CPUIDLE_FLAG_POLLING) &&
|
|
- s->exit_latency_ns <= latency_req &&
|
|
s->target_residency_ns <= data->next_timer_ns) {
|
|
predicted_ns = s->target_residency_ns;
|
|
idx = i;
|
|
@@ -354,8 +351,6 @@ static int menu_select(struct cpuidle_dr
|
|
|
|
return idx;
|
|
}
|
|
- if (s->exit_latency_ns > latency_req)
|
|
- break;
|
|
|
|
idx = i;
|
|
}
|