Problems with sidekiq and a lot of queues #5164

gsmetal · 2025-05-21T14:58:29Z

gsmetal
May 21, 2025

Firstly, thanks for the great project!

We're now trying to move to dragonfly from Redis (KeyDB to be more precise). We use redis for 4 different purposes, one of which is Sidekiq. We've moved to dragonfly in all of our situations, and everything looked great, improvement in resources was significant (12 CPUs on KeyDB vs 3 CPUs on Dragonfly). But then we've faced some problems in Sidekiq: every some time Dragonfly started drain memory (showing OOMs in lua scripts, stopping sending jobs to Sidekiq, then rejecting connections at all) and only restart could fix that. Dashboard of this situation look like this:

INFO in this situation at 17:43

# Server
redis_version:7.4.0
dragonfly_version:df-v1.30.1
redis_mode:standalone
arch_bits:64
os:Linux 6.8.0-52-generic x86_64
thread_count:4
multiplexing_api:iouring
tcp_port:6379
uptime_in_seconds:83412
uptime_in_days:0

# Clients
connected_clients:4273
max_clients:64000
client_read_buffer_bytes:58107136
blocked_clients:2425
pipeline_queue_length:27
send_delay_ms:2
timeout_disconnects:0

# Memory
used_memory:3890531312
used_memory_human:3.62GiB
used_memory_peak:3890525552
used_memory_peak_human:3.62GiB
fibers_stack_vms:421950672
fibers_count:8577
used_memory_rss:5281259520
used_memory_rss_human:4.92GiB
used_memory_peak_rss:5316079616
maxmemory:4294967296
maxmemory_human:4.00GiB
used_memory_lua:3810936
object_used_memory:3837880905
type_used_memory_string:9320512
type_used_memory_list:637018729
type_used_memory_set:42608
type_used_memory_zset:3184328144
type_used_memory_hash:7170912
table_used_memory:24922891
prime_capacity:338520
expire_capacity:425040
num_entries:193898
inline_keys:490
small_string_bytes:9167808
pipeline_cache_bytes:9480617
dispatch_queue_bytes:50886
dispatch_queue_subscriber_bytes:0
dispatch_queue_peak_bytes:49455192
client_read_buffer_peak_bytes:124490240
tls_bytes:5664
snapshot_serialization_bytes:0
commands_squashing_replies_bytes:0
cache_mode:store
maxmemory_policy:noeviction
replication_streaming_buffer_bytes:0
replication_full_sync_buffer_bytes:0

# Stats
total_connections_received:650813
total_commands_processed:9245135948
instantaneous_ops_per_sec:104398
total_pipelined_commands:4644896805
pipeline_throttle_total:0
pipelined_latency_usec:9748706328286
total_net_input_bytes:1323712941379
connection_migrations:0
connection_recv_provided_calls:0
total_net_output_bytes:3153318570025
rdb_save_usec:0
rdb_save_count:0
big_value_preemptions:0
compressed_blobs:0
instantaneous_input_kbps:-1
instantaneous_output_kbps:-1
rejected_connections:-1
expired_keys:80783791
evicted_keys:0
total_heartbeat_expired_keys:75261113
total_heartbeat_expired_bytes:10906157164
total_heartbeat_expired_calls:32483706
hard_evictions:0
garbage_checked:749193
garbage_collected:22463
bump_ups:0
stash_unloaded:26126
oom_rejections:2558
traverse_ttl_sec:1799
delete_ttl_sec:40
keyspace_hits:5688979980
keyspace_misses:10372593907
keyspace_mutations:1820393060
total_reads_processed:1695741036
total_writes_processed:7627697295
defrag_attempt_total:618027
defrag_realloc_total:23080
defrag_task_invocation_total:1669
reply_count:7627697285
reply_latency_usec:392480099106
blocked_on_interpreter:0
lua_interpreter_cnt:40
lua_blocked_total:4283969

# Replication
role:master
connected_slaves:0
master_replid:a68f68da25093d36d7fdd1617a4598ddf753e91d

# Modules
module:name=ReJSON,ver=20000,api=1,filters=0,usedby=[search],using=[],options=[handle-io-errors]
module:name=search,ver=20000,api=1,filters=0,usedby=[],using=[ReJSON],options=[handle-io-errors]

# Keyspace
db0:keys=0,expires=0,avg_ttl=-1
db1:keys=193898,expires=193656,avg_ttl=-1

# Cpu
used_cpu_sys:100312.749372
used_cpu_user:99229.456322
used_cpu_sys_children:0.2825
used_cpu_user_children:0.1232
used_cpu_sys_main_thread:24765.439500
used_cpu_user_main_thread:23842.901497

# Cluster
cluster_enabled:0
migration_errors_total:0
total_migrated_keys:0

I should note, that we are using sidekiq-unique-jobs to ensure jobs uniqueness and sidekiq-alive for liveness probes.

I started to investigate this situation, and it looks like that the root cause is our quantity of queues. Thus, we have about 20 queues, which are loaded unevenly (about 5 queues have more than 80% of load) plus we have more than 190 sidekiq-alive queues (one for each sidekiq instance) that only used to process 1 job in some time.

So, when the problem begins, I see continuous grow in scheduled and enqueued jobs, which are mostly SidekiqAlive jobs as I can see from our Sidekiq dashboard:

Then, Dragonfly starts to put these warnings to log:

2025-05-20 17:25:54.999
W20250520 15:25:49.755167    12 transaction.cc:61] TxQueue is too long. tx armed 92, total: 97,global:0,runnable:1
2025-05-20 17:25:54.999
, head: BRPOP@300369459/2 {id=61467 {cb_ptr=0x7d2d46470fc0,mask[1]=9,is_armed=1,txqpos[]=69,fail_state_print=usc: 2, name:BRPOP, usecnt:1, runcnt: 2, coordstate: 1, coord native thread: 2, schedule attempts: 7, report from sid: 1
2025-05-20 17:25:54.999
- shard: 0 local_mask:9 total_runs: 0
2025-05-20 17:25:54.999
- shard: 1 local_mask:9 total_runs: 0
2025-05-20 17:25:54.999
- shard: 2 local_mask:0 total_runs: 0
2025-05-20 17:25:54.999
- shard: 3 local_mask:0 total_runs: 0
2025-05-20 17:25:54.999
}
2025-05-20 17:25:54.999
locks total:56,contended:54
2025-05-20 17:25:54.999
max contention score: 15360, lock: 8706908398071568165
2025-05-20 17:25:54.999
poll_executions:244983783 continuation_tx: BRPOP@300369458/4 {id=61668 {cb_ptr=0x7d2d44ff01c0,mask[1]=9,is_armed=0,txqpos[]=4294967295,fail_state_print=usc: 4, name:BRPOP, usecnt:1, runcnt: 1, coordstate: 1, coord native thread: 2, schedule attempts: 8, report from sid: 1
2025-05-20 17:25:54.999
- shard: 0 local_mask:9 total_runs: 0



2025-05-20 17:25:54.999
- shard: 1 local_mask:9 total_runs: 1
2025-05-20 17:25:54.999
- shard: 2 local_mask:9 total_runs: 1
2025-05-20 17:25:54.999
- shard: 3 local_mask:9 total_runs: 1
2025-05-20 17:25:54.999
}

As I understand, somewhere here Sidekiq workers begin to stop receiving new jobs and therefore queues and memory consumption continue to grow leading to OOMs:

first in scripts

		2025-05-20 17:39:35.267
I20250520 15:39:33.168843    11 db_slice.cc:150] Can't grow, free_items 4305, obj_bytes: 50142  mem_available: 209881607
2025-05-20 17:39:35.267
W20250520 15:39:33.168916    11 db_slice.cc:691] AddOrFind: InsertNew failed, budget: 209881650 reclaimed: 0 offset: -43
2025-05-20 17:39:35.267
W20250520 15:39:33.168984    12 script_mgr.cc:328] Error running script (call to abf4915d03d5930a3350746f4ee07678cf4f1fd5): @user_script:91: Out of memory
2025-05-20 17:39:35.267
W20250520 15:39:33.169246    12 main_service.cc:1363]  EVALSHA abf4915d03d5930a3350746f4ee07678cf4f1fd5 8 uniquejobs:8643729280114d276be1879d7569813b uniquejobs:8643729280114d276be1879d7569813b:QUEUED uniquejobs:8643729280114d276be1879d7569813b:PRIMED uniquejobs:8643729280114d276be1879d7569813b:LOCKED uniquejobs:8643729280114d276be1879d7569813b:INFO uniquejobs:changelog uniquejobs:digests uniquejobs:expiring_digests 42156df1a14883459ba01b83 301000 until_executed 1 1747755574.1532784 1747755573.155987 0 0 queue 7.4.0 failed with reason: Error running script (call to abf4915d03d5930a3350746f4ee07678cf4f1fd5): @user_script:91: Out of memory
2025-05-20 17:39:35.267
W20250520 15:39:33.628388    11 script_mgr.cc:328] Error running script (call to 023cd507fb3ca2483e369693af4a83e0a49f7e2a): @user_script:115: Out of memory
2025-05-20 17:39:35.267
W20250520 15:39:33.898826    13 script_mgr.cc:328] Error running script (call to abf4915d03d5930a3350746f4ee07678cf4f1fd5): @user_script:91: Out of memory
2025-05-20 17:39:38.175
I20250520 15:39:37.287931    11 db_slice.cc:150] Can't grow, free_items 4220, obj_bytes: 50308  mem_available: 202959691
2025-05-20 17:39:38.175
W20250520 15:39:37.287983    11 db_slice.cc:691] AddOrFind: InsertNew failed, budget: 202959734 reclaimed: 0 offset: -43
2025-05-20 17:39:38.175
W20250520 15:39:37.288116    14 script_mgr.cc:328] Error running script (call to abf4915d03d5930a3350746f4ee07678cf4f1fd5): @user_script:91: Out of memory
2025-05-20 17:39:38.176
W20250520 15:39:37.288492    14 main_service.cc:1363]  EVALSHA abf4915d03d5930a3350746f4ee07678cf4f1fd5 8 uniquejobs:cadbd1bfebf28e951ff72c27cb97a458 uniquejobs:cadbd1bfebf28e951ff72c27cb97a458:QUEUED uniquejobs:cadbd1bfebf28e951ff72c27cb97a458:PRIMED uniquejobs:cadbd1bfebf28e951ff72c27cb97a458:LOCKED uniquejobs:cadbd1bfebf28e951ff72c27cb97a458:INFO uniquejobs:changelog uniquejobs:digests uniquejobs:expiring_digests 2ef2fdc980d52d4c80bab0d4 301000 until_executed 1 1747755578.28143 1747755577.2855031 0 0 queue 7.4.0 failed with reason: Error running script (call to abf4915d03d5930a3350746f4ee07678cf4f1fd5): @user_script:91: Out of memory



2025-05-20 17:39:39.121
W20250520 15:39:38.174392    11 script_mgr.cc:328] Error running script (call to 023cd507fb3ca2483e369693af4a83e0a49f7e2a): @user_script:115: Out of memory
2025-05-20 17:39:39.196
W20250520 15:39:38.453040    12 transaction.cc:61] TxQueue is too long. tx armed 96, total: 97,global:0,runnable:2
2025-05-20 17:39:39.196
, head: BRPOP@303182782/2 {id=21033 {cb_ptr=0x7d2d462d1ec0,mask[1]=9,is_armed=1,txqpos[]=388,fail_state_print=usc: 2, name:BRPOP, usecnt:1, runcnt: 2, coordstate: 1, coord native thread: 2, schedule attempts: 1, report from sid: 1

and then on other clients commands and connections.

My best guess now is that something is happening when the number of queues grows (as sidekiq-alive-queues could appear and dissappear on scaling) and then everything is messed up.

Our Dragonfly configuration is like recommended in your docs (we tried to increase maxmemory and threads, but it randomly affects this case and ultimately leads to the same thing):

  --dbnum=10
  --cache_mode=false
  --default_lua_flags=disable-atomicity
  --shard_round_robin_prefix=queue
  --alsologtostderr
  --maxmemory=4096mb
  --proactor_threads=4

Now we had to rollback to KeyDB for Sidekiq (as we didn't have this problems on it) and I am here to share our situation and looking for advice how to handle our number of queues and uneven load on this queues. Thanks in advance!

Answered by romange

Jun 3, 2025

Ok, latencies are really high. You can futher increase --interpreter_per_thread until lua_blocked_total statistics will stop heavily increasing, but besides this I can not help much. This should improve latency imho.

Seems that your lua scripts touch multiple hashtags that are spread across multiple threads but I do not know why and whether this can be fixed. tx_with_freq:4887835546,13646581,1077753,277821242,23,57,52,2114309,0,4,4,0 shows you have lots of multi-threaded transactions and these may have high latency and if they are contended on the same queue, the latency starts aggregating. As I said, that's the maximum (community) support we can give at this point.

View full answer

romange · 2025-05-22T08:49:18Z

romange
May 22, 2025
Maintainer

Kudos on providing all the info, probably the first time I see almost all the data needed to help with the issue :)

A few comments:

--default_lua_flags=disable-atomicity - is not a good thing to do here. You break correctness of your lua scripts that must run atomically. In general have not seen yet a practical usecase where disable-atomicity was applicable.
Please look at our bullmq docs https://www.dragonflydb.io/docs/integrations/bullmq#using-hashtags--optimized-configurations
same applies for laravel. You should use {somequeue} naming convention to be able localize the queues data.

I will provide more hints once you be able to fix these issues.

12 replies

romange Jun 2, 2025
Maintainer

I am sorry, I won't be able to debug grafana metrics but they are seem off - it's unlikely to have latencies reaching seconds.
if your info logs does not have errors, then everything is fine from the correctness perspective. "TxQueue is too long. tx armed 96,." warning are normal as long as the numbers do not increase too much. If you provide "INFO ALL" logs I can look at it. you can also run redis-cli -3 SCRIPT LATENCY and attach the output as well.

gsmetal Jun 3, 2025
Author

I've moved on the same configuration on production yesterday (the only difference is 4Gb maxmemory and 12 threads), and there are similar results now. The main concern is script latencies, other components are working so far. Now I'll provide info from production server.

"TxQueue is too long. tx armed 96,." warning are normal as long as the numbers do not increase too much.

The highest value I've seen was 420. Haven't looked for it on purpose, though.

2025-06-03 10:17:53.020	
W20250603 08:17:52.000357    22 transaction.cc:61] TxQueue is too long. tx armed 419, total: 420,global:0,runnable:0

Here is INFO ALL

# redis-cli -3 INFO ALL
# Server
redis_version:7.4.0
dragonfly_version:df-v1.30.1
redis_mode:standalone
arch_bits:64
os:Linux 6.8.0-52-generic x86_64
thread_count:12
multiplexing_api:iouring
tcp_port:6379
uptime_in_seconds:76050
uptime_in_days:0

# Clients
connected_clients:5193
max_clients:64000
client_read_buffer_bytes:129425152
blocked_clients:2797
pipeline_queue_length:788
send_delay_ms:0
timeout_disconnects:0

# Memory
used_memory:135510552
used_memory_human:129.23MiB
used_memory_peak:342261384
used_memory_peak_human:326.41MiB
fibers_stack_vms:516045648
fibers_count:10473
used_memory_rss:1089716224
used_memory_rss_human:1.01GiB
used_memory_peak_rss:1295499264
maxmemory:4294967296
maxmemory_human:4.00GiB
used_memory_lua:52313856
object_used_memory:96262625
type_used_memory_string:13901952
type_used_memory_list:49057729
type_used_memory_set:36576
type_used_memory_zset:26172176
type_used_memory_hash:7094192
table_used_memory:35064904
prime_capacity:584640
expire_capacity:449400
num_entries:289255
inline_keys:491
small_string_bytes:13756160
pipeline_cache_bytes:55789149
dispatch_queue_bytes:5356349
dispatch_queue_subscriber_bytes:0
dispatch_queue_peak_bytes:215383242
client_read_buffer_peak_bytes:187840256
tls_bytes:5664
snapshot_serialization_bytes:0
commands_squashing_replies_bytes:455192
cache_mode:store
maxmemory_policy:noeviction
replication_streaming_buffer_bytes:0
replication_full_sync_buffer_bytes:0

# Stats
total_connections_received:242379
total_commands_processed:8156347436
instantaneous_ops_per_sec:117499
total_pipelined_commands:3870796244
pipeline_throttle_total:0
pipelined_latency_usec:279041964942367
total_net_input_bytes:1206722189791
connection_migrations:47322
connection_recv_provided_calls:0
total_net_output_bytes:2636764462306
rdb_save_usec:0
rdb_save_count:0
big_value_preemptions:0
compressed_blobs:0
instantaneous_input_kbps:-1
instantaneous_output_kbps:-1
rejected_connections:-1
expired_keys:81763683
evicted_keys:0
total_heartbeat_expired_keys:17111217
total_heartbeat_expired_bytes:2623860492
total_heartbeat_expired_calls:90366383
hard_evictions:0
garbage_checked:3159463
garbage_collected:618706
bump_ups:0
stash_unloaded:64707
oom_rejections:0
traverse_ttl_sec:9900
delete_ttl_sec:32
keyspace_hits:4680023425
keyspace_misses:12805697081
keyspace_mutations:1808117227
total_reads_processed:1486818006
total_writes_processed:6307716125
defrag_attempt_total:0
defrag_realloc_total:0
defrag_task_invocation_total:0
reply_count:6307716115
reply_latency_usec:452389398572
blocked_on_interpreter:0
lua_interpreter_cnt:600
lua_blocked_total:1066834

# Tiered
tiered_entries:0
tiered_entries_bytes:0
tiered_total_stashes:0
tiered_total_fetches:0
tiered_total_cancels:0
tiered_total_deletes:0
tiered_total_uploads:0
tiered_total_stash_overflows:0
tiered_heap_buf_allocations:0
tiered_registered_buf_allocations:0
tiered_allocated_bytes:0
tiered_capacity_bytes:0
tiered_pending_read_cnt:0
tiered_pending_stash_cnt:0
tiered_small_bins_cnt:0
tiered_small_bins_entries_cnt:0
tiered_small_bins_filling_bytes:0
tiered_cold_storage_bytes:0
tiered_offloading_steps:0
tiered_offloading_stashes:0
tiered_ram_hits:4680023425
tiered_ram_cool_hits:0
tiered_ram_misses:0

# Persistence
current_snapshot_perc:0
current_save_keys_processed:0
current_save_keys_total:0
last_success_save:1748866016
last_saved_file:
last_success_save_duration_sec:0
loading:0
saving:0
current_save_duration_sec:0
rdb_changes_since_last_success_save:1808117227
rdb_bgsave_in_progress:0
rdb_last_bgsave_status:ok
last_failed_save:0
last_error:
last_failed_save_duration_sec:0

# Transaction
tx_shard_polls:7046503196
tx_shard_optimistic_total:4336541110
tx_shard_ooo_total:772727
tx_global_total:0
tx_normal_total:5182495571
tx_inline_runs_total:4062542889
tx_schedule_cancel_total:29351056
tx_batch_scheduled_items_total:0
tx_batch_schedule_calls_total:0
tx_with_freq:4887835546,13646581,1077753,277821242,23,57,52,2114309,0,4,4,0
tx_queue_len:237
eval_io_coordination_total:277821387
eval_shardlocal_coordination_total:67083003
eval_squashed_flushes:0

# Replication
role:master
connected_slaves:0
master_replid:d9b59489911b6c57e2ef0d78e12be168cddd24a1

# Commandstats
cmdstat_bitfield:calls=1957288,usec=207773015,usec_per_call=106.154
cmdstat_blmove:calls=9240496,usec=266728142087,usec_per_call=28865.1
cmdstat_brpop:calls=121490732,usec=173068332589611,usec_per_call=1.42454e+06
cmdstat_command:calls=1,usec=22,usec_per_call=22
cmdstat_evalsha:calls=344904290,usec=8629162391616,usec_per_call=25019
cmdstat_exec:calls=2781767,usec=25438634979,usec_per_call=9144.78
cmdstat_exists:calls=1386162,usec=4019858,usec_per_call=2.89999
cmdstat_expire:calls=8314078,usec=630360382,usec_per_call=75.8184
cmdstat_get:calls=115814682,usec=1758409931,usec_per_call=15.183
cmdstat_hdel:calls=77421808,usec=1250718041,usec_per_call=16.1546
cmdstat_hello:calls=227171,usec=601922,usec_per_call=2.64964
cmdstat_hexists:calls=596086745,usec=12356502431094,usec_per_call=20729.4
cmdstat_hget:calls=10383970,usec=142918315,usec_per_call=13.7634
cmdstat_hgetall:calls=335702,usec=9441082,usec_per_call=28.1234
cmdstat_hincrby:calls=3979687,usec=274630782,usec_per_call=69.0081
cmdstat_hkeys:calls=2,usec=31,usec_per_call=15.5
cmdstat_hlen:calls=355241655,usec=6544729614,usec_per_call=18.4233
cmdstat_hmget:calls=3666851849,usec=42800029625,usec_per_call=11.6721
cmdstat_hset:calls=80090246,usec=1603217988,usec_per_call=20.0176
cmdstat_incrby:calls=5185300,usec=563797739,usec_per_call=108.73
cmdstat_info:calls=36184,usec=43365176,usec_per_call=1198.46
cmdstat_latency:calls=1,usec=7,usec_per_call=7
cmdstat_lindex:calls=5123554,usec=25238435566,usec_per_call=4925.96
cmdstat_llen:calls=362091252,usec=6017496899867,usec_per_call=16618.7
cmdstat_lmove:calls=75419222,usec=2324265732120,usec_per_call=30817.9
cmdstat_lpush:calls=231045927,usec=392824603610,usec_per_call=1700.2
cmdstat_lrange:calls=1156,usec=8764287,usec_per_call=7581.56
cmdstat_lrem:calls=310094407,usec=4469416854,usec_per_call=14.4131
cmdstat_mget:calls=8,usec=17643,usec_per_call=2205.38
cmdstat_multi:calls=2781767,usec=100439,usec_per_call=0.0361062
cmdstat_pexpire:calls=634868661,usec=8331543485,usec_per_call=13.1233
cmdstat_ping:calls=1401378,usec=96497076,usec_per_call=68.8587
cmdstat_publish:calls=150,usec=108,usec_per_call=0.72
cmdstat_rpop:calls=1386162,usec=2350846,usec_per_call=1.69594
cmdstat_rpush:calls=2966,usec=46338353,usec_per_call=15623.2
cmdstat_sadd:calls=70348715,usec=6573657133,usec_per_call=93.4439
cmdstat_scard:calls=147192,usec=956338995,usec_per_call=6497.22
cmdstat_script:calls=16234,usec=8080798,usec_per_call=497.77
cmdstat_select:calls=227164,usec=128492587,usec_per_call=565.638
cmdstat_set:calls=104841833,usec=5071049033,usec_per_call=48.3686
cmdstat_sismember:calls=5,usec=14,usec_per_call=2.8
cmdstat_srem:calls=2919,usec=39088458,usec_per_call=13391
cmdstat_sscan:calls=367486294,usec=1517884453534,usec_per_call=4130.45
cmdstat_ttl:calls=2739347,usec=319397047,usec_per_call=116.596
cmdstat_type:calls=187,usec=311925,usec_per_call=1668.05
cmdstat_unlink:calls=227513365,usec=3421981268,usec_per_call=15.0408
cmdstat_zadd:calls=145031783,usec=9666501989,usec_per_call=66.6509
cmdstat_zcard:calls=112703,usec=9662999,usec_per_call=85.7386
cmdstat_zrange:calls=67083155,usec=513827993,usec_per_call=7.65957
cmdstat_zrangebyscore:calls=526376,usec=46511862,usec_per_call=88.3624
cmdstat_zrem:calls=144296112,usec=3038786455,usec_per_call=21.0594
cmdstat_zremrangebyrank:calls=6979,usec=4918,usec_per_call=0.704685
cmdstat_zremrangebyscore:calls=6829,usec=22891,usec_per_call=3.35203
cmdstat_zscan:calls=10660,usec=32379207,usec_per_call=3037.45
unknown_COMAND:1

# Modules
module:name=ReJSON,ver=20000,api=1,filters=0,usedby=[search],using=[],options=[handle-io-errors]
module:name=search,ver=20000,api=1,filters=0,usedby=[],using=[ReJSON],options=[handle-io-errors]

# Search
search_memory:0
search_num_indices:0
search_num_entries:0

# Errorstats
unknown_cmd:1
syntax_error:1

# Keyspace
db0:keys=0,expires=0,avg_ttl=-1
db2:keys=289255,expires=46364,avg_ttl=-1

# Cpu
used_cpu_sys:124595.953076
used_cpu_user:118112.681043
used_cpu_sys_children:0.4419
used_cpu_user_children:0.253
used_cpu_sys_main_thread:6620.297751
used_cpu_user_main_thread:7113.574865

# Cluster
cluster_enabled:0
migration_errors_total:0
total_migrated_keys:0

And here is SCRIPT LATENCY

# redis-cli -3 SCRIPT LATENCY
1) 1) "023cd507fb3ca2483e369693af4a83e0a49f7e2a"
   2) Count: 84757235 Average: 30477.4287  StdDev: 95889074.23
Min: 41.0000  Median: 20806.6478  Max: 419974.0000
------------------------------------------------------
[      40,      45 ) 12   0.000%   0.000% 
[      45,      50 ) 164   0.000%   0.000% 
[      50,      60 ) 1699   0.002%   0.002% 
[      60,      70 ) 6216   0.007%   0.010% 
[      70,      80 ) 13469   0.016%   0.025% 
[      80,      90 ) 31405   0.037%   0.062% 
[      90,     100 ) 50965   0.060%   0.123% 
[     100,     120 ) 107217   0.126%   0.249% 
[     120,     140 ) 94274   0.111%   0.360% 
[     140,     160 ) 194153   0.229%   0.589% 
[     160,     180 ) 418781   0.494%   1.084% 
[     180,     200 ) 576318   0.680%   1.763% 
[     200,     250 ) 1498000   1.767%   3.531% 
[     250,     300 ) 1258992   1.485%   5.016% 
[     300,     350 ) 1028068   1.213%   6.229% 
[     350,     400 ) 860628   1.015%   7.245% 
[     400,     450 ) 740898   0.874%   8.119% 
[     450,     500 ) 647971   0.765%   8.883% 
[     500,     600 ) 1091394   1.288%  10.171% 
[     600,     700 ) 898384   1.060%  11.231% 
[     700,     800 ) 766055   0.904%  12.135% 
[     800,     900 ) 668146   0.788%  12.923% 
[     900,    1000 ) 597144   0.705%  13.628% 
[    1000,    1200 ) 1035938   1.222%  14.850% 
[    1200,    1400 ) 888151   1.048%  15.898% 
[    1400,    1600 ) 786453   0.928%  16.826% 
[    1600,    1800 ) 709598   0.837%  17.663% 
[    1800,    2000 ) 649745   0.767%  18.429% 
[    2000,    2500 ) 1450814   1.712%  20.141% 
[    2500,    3000 ) 1280866   1.511%  21.652% 
[    3000,    3500 ) 1174370   1.386%  23.038% 
[    3500,    4000 ) 1100593   1.299%  24.336% 
[    4000,    4500 ) 1043538   1.231%  25.568% 
[    4500,    5000 ) 981739   1.158%  26.726% 
[    5000,    6000 ) 1806674   2.132%  28.858% 
[    6000,    7000 ) 1645370   1.941%  30.799% 
[    7000,    8000 ) 1511714   1.784%  32.582% 
[    8000,    9000 ) 1401420   1.653%  34.236% 
[    9000,   10000 ) 1284063   1.515%  35.751% 
[   10000,   12000 ) 2390147   2.820%  38.571% #
[   12000,   14000 ) 2364147   2.789%  41.360% #
[   14000,   16000 ) 2279453   2.689%  44.050% #
[   16000,   18000 ) 2171324   2.562%  46.611% #
[   18000,   20000 ) 2088694   2.464%  49.076% 
[   20000,   25000 ) 4856226   5.730%  54.805% #
[   25000,   30000 ) 4459775   5.262%  60.067% #
[   30000,   35000 ) 4163385   4.912%  64.979% #
[   35000,   40000 ) 3865644   4.561%  69.540% #
[   40000,   45000 ) 3553646   4.193%  73.733% #
[   45000,   50000 ) 3242456   3.826%  77.558% #
[   50000,   60000 ) 5368147   6.334%  83.892% #
[   60000,   70000 ) 3959419   4.671%  88.563% #
[   70000,   80000 ) 2790757   3.293%  91.856% #
[   80000,   90000 ) 1966188   2.320%  94.176% 
[   90000,  100000 ) 1410543   1.664%  95.840% 
[  100000,  120000 ) 1750659   2.065%  97.906% 
[  120000,  140000 ) 925150   1.092%  98.997% 
[  140000,  160000 ) 478090   0.564%  99.561% 
[  160000,  180000 ) 216220   0.255%  99.816% 
[  180000,  200000 ) 89739   0.106%  99.922% 
[  200000,  250000 ) 57456   0.068%  99.990% 
[  250000,  300000 ) 7274   0.009%  99.998% 
[  300000,  350000 ) 1003   0.001% 100.000% 
[  350000,  400000 ) 269   0.000% 100.000% 
[  400000,  450000 ) 25   0.000% 100.000% 

2) 1) "69cc8e36ce17205db6fd0267c39a531efb722829"
   2) Count: 77510323 Average: 32254.9369  StdDev: 49982518.32
Min: 75.0000  Median: 22810.5526  Max: 424682.0000
------------------------------------------------------
[      70,      80 ) 5   0.000%   0.000% 
[      80,      90 ) 251   0.000%   0.000% 
[      90,     100 ) 4423   0.006%   0.006% 
[     100,     120 ) 47199   0.061%   0.067% 
[     120,     140 ) 66763   0.086%   0.153% 
[     140,     160 ) 51758   0.067%   0.220% 
[     160,     180 ) 50926   0.066%   0.286% 
[     180,     200 ) 92900   0.120%   0.405% 
[     200,     250 ) 811287   1.047%   1.452% 
[     250,     300 ) 1215481   1.568%   3.020% 
[     300,     350 ) 1118111   1.443%   4.463% 
[     350,     400 ) 947972   1.223%   5.686% 
[     400,     450 ) 794006   1.024%   6.710% 
[     450,     500 ) 677448   0.874%   7.584% 
[     500,     600 ) 1111278   1.434%   9.018% 
[     600,     700 ) 886422   1.144%  10.162% 
[     700,     800 ) 737359   0.951%  11.113% 
[     800,     900 ) 633855   0.818%  11.931% 
[     900,    1000 ) 557049   0.719%  12.649% 
[    1000,    1200 ) 951976   1.228%  13.877% 
[    1200,    1400 ) 803449   1.037%  14.914% 
[    1400,    1600 ) 701171   0.905%  15.819% 
[    1600,    1800 ) 629203   0.812%  16.630% 
[    1800,    2000 ) 574924   0.742%  17.372% 
[    2000,    2500 ) 1270281   1.639%  19.011% 
[    2500,    3000 ) 1112381   1.435%  20.446% 
[    3000,    3500 ) 1015614   1.310%  21.756% 
[    3500,    4000 ) 948882   1.224%  22.981% 
[    4000,    4500 ) 899322   1.160%  24.141% 
[    4500,    5000 ) 845441   1.091%  25.232% 
[    5000,    6000 ) 1556351   2.008%  27.240% 
[    6000,    7000 ) 1422230   1.835%  29.074% 
[    7000,    8000 ) 1313191   1.694%  30.769% 
[    8000,    9000 ) 1222855   1.578%  32.346% 
[    9000,   10000 ) 1127892   1.455%  33.802% 
[   10000,   12000 ) 2107879   2.719%  36.521% #
[   12000,   14000 ) 2096657   2.705%  39.226% #
[   14000,   16000 ) 2030989   2.620%  41.846% #
[   16000,   18000 ) 1950475   2.516%  44.363% #
[   18000,   20000 ) 1887221   2.435%  46.797% 
[   20000,   25000 ) 4416006   5.697%  52.495% #
[   25000,   30000 ) 4077455   5.261%  57.755% #
[   30000,   35000 ) 3830745   4.942%  62.698% #
[   35000,   40000 ) 3584909   4.625%  67.323% #
[   40000,   45000 ) 3335764   4.304%  71.626% #
[   45000,   50000 ) 3074254   3.966%  75.593% #
[   50000,   60000 ) 5176026   6.678%  82.270% #
[   60000,   70000 ) 3900302   5.032%  87.302% #
[   70000,   80000 ) 2790701   3.600%  90.903% #
[   80000,   90000 ) 1983170   2.559%  93.461% #
[   90000,  100000 ) 1430519   1.846%  95.307% 
[  100000,  120000 ) 1788284   2.307%  97.614% 
[  120000,  140000 ) 952800   1.229%  98.843% 
[  140000,  160000 ) 496045   0.640%  99.483% 
[  160000,  180000 ) 226634   0.292%  99.776% 
[  180000,  200000 ) 98505   0.127%  99.903% 
[  200000,  250000 ) 64838   0.084%  99.986% 
[  250000,  300000 ) 8583   0.011%  99.998% 
[  300000,  350000 ) 1495   0.002%  99.999% 
[  350000,  400000 ) 293   0.000% 100.000% 
[  400000,  450000 ) 118   0.000% 100.000% 

3) 1) "000a4c2f82df69db1780053377a2c9b0741570e0"
   2) Count: 67159556 Average: 200.1720  StdDev: 474918.60
Min: 13.0000  Median: 78.2016  Max: 114647.0000
------------------------------------------------------
[      12,      14 ) 1   0.000%   0.000% 
[      14,      16 ) 20   0.000%   0.000% 
[      16,      18 ) 166   0.000%   0.000% 
[      18,      20 ) 2008   0.003%   0.003% 
[      20,      25 ) 87437   0.130%   0.133% 
[      25,      30 ) 505433   0.753%   0.886% 
[      30,      35 ) 1398898   2.083%   2.969% 
[      35,      40 ) 2729869   4.065%   7.034% #
[      40,      45 ) 3861129   5.749%  12.783% #
[      45,      50 ) 4538677   6.758%  19.541% #
[      50,      60 ) 9084312  13.526%  33.067% ###
[      60,      70 ) 7188960  10.704%  43.772% ##
[      70,      80 ) 5100051   7.594%  51.366% ##
[      80,      90 ) 3639540   5.419%  56.785% #
[      90,     100 ) 2697919   4.017%  60.802% #
[     100,     120 ) 3808489   5.671%  66.473% #
[     120,     140 ) 2726528   4.060%  70.533% #
[     140,     160 ) 2120468   3.157%  73.690% #
[     160,     180 ) 1701452   2.533%  76.223% #
[     180,     200 ) 1400899   2.086%  78.309% 
[     200,     250 ) 2655579   3.954%  82.264% #
[     250,     300 ) 1913278   2.849%  85.112% #
[     300,     350 ) 1437211   2.140%  87.252% 
[     350,     400 ) 1119449   1.667%  88.919% 
[     400,     450 ) 900680   1.341%  90.260% 
[     450,     500 ) 740288   1.102%  91.363% 
[     500,     600 ) 1134353   1.689%  93.052% 
[     600,     700 ) 820818   1.222%  94.274% 
[     700,     800 ) 626765   0.933%  95.207% 
[     800,     900 ) 493236   0.734%  95.942% 
[     900,    1000 ) 398350   0.593%  96.535% 
[    1000,    1200 ) 599466   0.893%  97.427% 
[    1200,    1400 ) 412987   0.615%  98.042% 
[    1400,    1600 ) 291320   0.434%  98.476% 
[    1600,    1800 ) 207335   0.309%  98.785% 
[    1800,    2000 ) 153776   0.229%  99.014% 
[    2000,    2500 ) 251973   0.375%  99.389% 
[    2500,    3000 ) 160817   0.239%  99.628% 
[    3000,    3500 ) 120734   0.180%  99.808% 
[    3500,    4000 ) 74618   0.111%  99.919% 
[    4000,    4500 ) 27831   0.041%  99.961% 
[    4500,    5000 ) 8749   0.013%  99.974% 
[    5000,    6000 ) 8484   0.013%  99.986% 
[    6000,    7000 ) 3598   0.005%  99.992% 
[    7000,    8000 ) 1708   0.003%  99.994% 
[    8000,    9000 ) 796   0.001%  99.995% 
[    9000,   10000 ) 486   0.001%  99.996% 
[   10000,   12000 ) 733   0.001%  99.997% 
[   12000,   14000 ) 623   0.001%  99.998% 
[   14000,   16000 ) 433   0.001%  99.999% 
[   16000,   18000 ) 347   0.001%  99.999% 
[   18000,   20000 ) 191   0.000% 100.000% 
[   20000,   25000 ) 170   0.000% 100.000% 
[   25000,   30000 ) 52   0.000% 100.000% 
[   30000,   35000 ) 8   0.000% 100.000% 
[   35000,   40000 ) 4   0.000% 100.000% 
[   40000,   45000 ) 6   0.000% 100.000% 
[   45000,   50000 ) 6   0.000% 100.000% 
[   50000,   60000 ) 18   0.000% 100.000% 
[   60000,   70000 ) 13   0.000% 100.000% 
[   70000,   80000 ) 4   0.000% 100.000% 
[   80000,   90000 ) 2   0.000% 100.000% 
[   90000,  100000 ) 2   0.000% 100.000% 
[  100000,  120000 ) 3   0.000% 100.000% 

4) 1) "abf4915d03d5930a3350746f4ee07678cf4f1fd5"
   2) Count: 115872383 Average: 30298.4036  StdDev: 83181355.11
Min: 44.0000  Median: 20168.5884  Max: 422118.0000
------------------------------------------------------
[      40,      45 ) 3   0.000%   0.000% 
[      45,      50 ) 196   0.000%   0.000% 
[      50,      60 ) 7586   0.007%   0.007% 
[      60,      70 ) 34268   0.030%   0.036% 
[      70,      80 ) 75088   0.065%   0.101% 
[      80,      90 ) 115705   0.100%   0.201% 
[      90,     100 ) 165602   0.143%   0.344% 
[     100,     120 ) 457407   0.395%   0.739% 
[     120,     140 ) 679411   0.586%   1.325% 
[     140,     160 ) 923567   0.797%   2.122% 
[     160,     180 ) 966608   0.834%   2.956% 
[     180,     200 ) 896891   0.774%   3.730% 
[     200,     250 ) 1862788   1.608%   5.338% 
[     250,     300 ) 1480788   1.278%   6.616% 
[     300,     350 ) 1237910   1.068%   7.684% 
[     350,     400 ) 1068599   0.922%   8.606% 
[     400,     450 ) 934621   0.807%   9.413% 
[     450,     500 ) 827407   0.714%  10.127% 
[     500,     600 ) 1415591   1.222%  11.349% 
[     600,     700 ) 1178904   1.017%  12.366% 
[     700,     800 ) 1017971   0.879%  13.245% 
[     800,     900 ) 896038   0.773%  14.018% 
[     900,    1000 ) 803940   0.694%  14.712% 
[    1000,    1200 ) 1405338   1.213%  15.925% 
[    1200,    1400 ) 1209594   1.044%  16.969% 
[    1400,    1600 ) 1071726   0.925%  17.893% 
[    1600,    1800 ) 970899   0.838%  18.731% 
[    1800,    2000 ) 892414   0.770%  19.502% 
[    2000,    2500 ) 1992285   1.719%  21.221% 
[    2500,    3000 ) 1760159   1.519%  22.740% 
[    3000,    3500 ) 1613073   1.392%  24.132% 
[    3500,    4000 ) 1509428   1.303%  25.435% 
[    4000,    4500 ) 1423109   1.228%  26.663% 
[    4500,    5000 ) 1334620   1.152%  27.815% 
[    5000,    6000 ) 2449296   2.114%  29.928% 
[    6000,    7000 ) 2222719   1.918%  31.847% 
[    7000,    8000 ) 2042119   1.762%  33.609% 
[    8000,    9000 ) 1883161   1.625%  35.234% 
[    9000,   10000 ) 1723589   1.487%  36.722% 
[   10000,   12000 ) 3222676   2.781%  39.503% #
[   12000,   14000 ) 3176943   2.742%  42.245% #
[   14000,   16000 ) 3059424   2.640%  44.885% #
[   16000,   18000 ) 2911717   2.513%  47.398% #
[   18000,   20000 ) 2796221   2.413%  49.811% 
[   20000,   25000 ) 6488938   5.600%  55.411% #
[   25000,   30000 ) 5938628   5.125%  60.536% #
[   30000,   35000 ) 5541134   4.782%  65.318% #
[   35000,   40000 ) 5139950   4.436%  69.754% #
[   40000,   45000 ) 4734581   4.086%  73.840% #
[   45000,   50000 ) 4335563   3.742%  77.582% #
[   50000,   60000 ) 7188305   6.204%  83.786% #
[   60000,   70000 ) 5337444   4.606%  88.392% #
[   70000,   80000 ) 3792455   3.273%  91.665% #
[   80000,   90000 ) 2700214   2.330%  93.995% 
[   90000,  100000 ) 1956477   1.688%  95.684% 
[  100000,  120000 ) 2467891   2.130%  97.814% 
[  120000,  140000 ) 1320509   1.140%  98.953% 
[  140000,  160000 ) 687020   0.593%  99.546% 
[  160000,  180000 ) 306284   0.264%  99.810% 
[  180000,  200000 ) 127241   0.110%  99.920% 
[  200000,  250000 ) 81149   0.070%  99.990% 
[  250000,  300000 ) 9486   0.008%  99.999% 
[  300000,  350000 ) 1403   0.001% 100.000% 
[  350000,  400000 ) 267   0.000% 100.000% 
[  400000,  450000 ) 45   0.000% 100.000% 

5) 1) "7f59c21e8dfd9b5094b33ddc000db7941b82f6d4"
   2) Count: 1543 Average: 47292.2229  StdDev: 39636.02
Min: 91.0000  Median: 38456.7901  Max: 238194.0000
------------------------------------------------------
[      90,     100 ) 1   0.065%   0.065% 
[     100,     120 ) 10   0.648%   0.713% 
[     120,     140 ) 3   0.194%   0.907% 
[     140,     160 ) 2   0.130%   1.037% 
[     160,     180 ) 9   0.583%   1.620% 
[     180,     200 ) 4   0.259%   1.879% 
[     200,     250 ) 9   0.583%   2.463% 
[     250,     300 ) 6   0.389%   2.852% 
[     300,     350 ) 6   0.389%   3.240% 
[     350,     400 ) 3   0.194%   3.435% 
[     400,     450 ) 2   0.130%   3.564% 
[     450,     500 ) 10   0.648%   4.213% 
[     500,     600 ) 5   0.324%   4.537% 
[     600,     700 ) 10   0.648%   5.185% 
[     700,     800 ) 2   0.130%   5.314% 
[     800,     900 ) 8   0.518%   5.833% 
[     900,    1000 ) 5   0.324%   6.157% 
[    1000,    1200 ) 7   0.454%   6.610% 
[    1200,    1400 ) 8   0.518%   7.129% 
[    1400,    1600 ) 5   0.324%   7.453% 
[    1600,    1800 ) 7   0.454%   7.907% 
[    1800,    2000 ) 6   0.389%   8.296% 
[    2000,    2500 ) 13   0.843%   9.138% 
[    2500,    3000 ) 10   0.648%   9.786% 
[    3000,    3500 ) 15   0.972%  10.758% 
[    3500,    4000 ) 14   0.907%  11.666% 
[    4000,    4500 ) 14   0.907%  12.573% 
[    4500,    5000 ) 15   0.972%  13.545% 
[    5000,    6000 ) 22   1.426%  14.971% 
[    6000,    7000 ) 15   0.972%  15.943% 
[    7000,    8000 ) 19   1.231%  17.174% 
[    8000,    9000 ) 17   1.102%  18.276% 
[    9000,   10000 ) 15   0.972%  19.248% 
[   10000,   12000 ) 42   2.722%  21.970% #
[   12000,   14000 ) 25   1.620%  23.590% 
[   14000,   16000 ) 25   1.620%  25.211% 
[   16000,   18000 ) 34   2.203%  27.414% 
[   18000,   20000 ) 33   2.139%  29.553% 
[   20000,   25000 ) 81   5.250%  34.802% #
[   25000,   30000 ) 107   6.935%  41.737% #
[   30000,   35000 ) 71   4.601%  46.338% #
[   35000,   40000 ) 80   5.185%  51.523% #
[   40000,   45000 ) 64   4.148%  55.671% #
[   45000,   50000 ) 65   4.213%  59.883% #
[   50000,   60000 ) 116   7.518%  67.401% ##
[   60000,   70000 ) 116   7.518%  74.919% ##
[   70000,   80000 ) 93   6.027%  80.946% #
[   80000,   90000 ) 79   5.120%  86.066% #
[   90000,  100000 ) 52   3.370%  89.436% #
[  100000,  120000 ) 81   5.250%  94.686% #
[  120000,  140000 ) 33   2.139%  96.824% 
[  140000,  160000 ) 23   1.491%  98.315% 
[  160000,  180000 ) 17   1.102%  99.417% 
[  180000,  200000 ) 7   0.454%  99.870% 
[  200000,  250000 ) 2   0.130% 100.000%

gsmetal Jun 3, 2025
Author

Haven't looked for it on purpose, though.

Created graph for this log, the maximum for the whole time is 1530

romange Jun 3, 2025
Maintainer

Ok, latencies are really high. You can futher increase --interpreter_per_thread until lua_blocked_total statistics will stop heavily increasing, but besides this I can not help much. This should improve latency imho.

Seems that your lua scripts touch multiple hashtags that are spread across multiple threads but I do not know why and whether this can be fixed. tx_with_freq:4887835546,13646581,1077753,277821242,23,57,52,2114309,0,4,4,0 shows you have lots of multi-threaded transactions and these may have high latency and if they are contended on the same queue, the latency starts aggregating. As I said, that's the maximum (community) support we can give at this point.

Answer selected by gsmetal

gsmetal Jun 4, 2025
Author

until lua_blocked_total statistics will stop heavily increasing

As I see from metrics, there were only 3 spikes where it was heavily increasing, and now it's much more stable (without doing anything)

I will try increasing interpreter_per_thread.

Seems that your lua scripts touch multiple hashtags that are spread across multiple threads

As I said, the only hashtag (keys with {}) I left is {uniquejobs} they do not touch sidekiq queues afaiu, only create separate keys with ttl and simple string value. But I will research this more deeply.

As I said, that's the maximum (community) support we can give at this point.

OK, thank you for your help. The only frustrating moment is that the most specific recommendation given in your article and in sidekiq wiki (shard_round_robin_prefix) in fact makes things worse in our case (and no visible drawbacks when it was removed). Maybe some remarks could have been added there?

gsmetal Jul 9, 2025
Author

So. I've increased interpreted_per_thread a while ago. The immediate effect was great — no lua_blocked and latencies under 10ms. But after 1-2 hours latencies returned to 60-90ms (without increasing lua_blocked). We decided to leave it in this state and look on a long distance. Since then, only once we had to restart dragonfly because of increasing resources consumption, but there was increasing load too, so it seems it was not a DF problem alone. After restart, latencies have not gone low again and settled down at the same level 60-90ms.

Now, after a week running DF, we have 79 in lua_blocked metric and 1460 in lua_interpreters_cnt (from 15 threads * 100 per_thread), but still same 60-90ms latency 🤷‍♂️

I will close this discussion and try to find a time to dive into more details and experiments.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Problems with sidekiq and a lot of queues #5164

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 12 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Select a reply

Uh oh!

Problems with sidekiq and a lot of queues #5164

Uh oh!

gsmetal May 21, 2025

Replies: 1 comment · 12 replies

Uh oh!

romange May 22, 2025 Maintainer

Uh oh!

romange Jun 2, 2025 Maintainer

Uh oh!

gsmetal Jun 3, 2025 Author

Uh oh!

gsmetal Jun 3, 2025 Author

Uh oh!

romange Jun 3, 2025 Maintainer

Uh oh!

gsmetal Jun 4, 2025 Author

Uh oh!

Uh oh!

gsmetal Jul 9, 2025 Author

gsmetal
May 21, 2025

Replies: 1 comment 12 replies

romange
May 22, 2025
Maintainer

romange Jun 2, 2025
Maintainer

gsmetal Jun 3, 2025
Author

gsmetal Jun 3, 2025
Author

romange Jun 3, 2025
Maintainer

gsmetal Jun 4, 2025
Author

gsmetal Jul 9, 2025
Author