When you are using Sidekiq to handle asynchronous jobs, some times there are exceptions and jobs failing, and I say sometimes because your environment is probably perfect, there is no lag, all services your jobs depend on are always on and responsive, and you probably write better code than most other developers 😛 otherwise it probably happens quite often…
But Sidekiq will retry that job for you, of course this is configurable:
# Retry 5 times before giving up sidekiq_options retry: 5 # Don't retry at all sidekiq_options retry: false
The default number of retries is 25, and with that many tries most of the transient problems will fix itselves probably.
But sometimes not even being that insistent is enough to avoid the problem, and in that case sidekiq will send your job to a DeadSet where all the job ghosts live.
For most applications, it is good to know when some or all jobs go to that DeadSet, meaning Sidekiq will not retry it anymore.
In this situation you can log the error, send one email for a human to fix the issue by hand, or handle it the best way possible for your business.
To intercept this, sidekiq the sidekiq_retries_exhausted hook, that you can configure either per worker class as bellow:
class ImportantWorker include Sidekiq::Worker sidekiq_options retry: 5 sidekiq_retries_exhausted do |msg, exception| # example with using Rails' logger Rails.logger.warn("Failed #{msg['class']} with #{msg['args']}: #{msg['error_message']}", error: exception) end def perform(important_arguments) # do some work end end
or you can configure a global handler for the entire application, adding a handler to Sidekiq like bellow:
Sidekiq.configure_server do |config| # other config stuff... config.death_handlers << ->(job, ex) do Rails.logger.error "Surprise, an error!, #{job['class']} #{job["jid"]} just died with error #{ex.message}." end end
I usually add this code to the config/initializers/sidekiq.rb file where all the sidekiq related configurations live together.
Of course just logging like this will not solve your problem, but intercepting the dead jobs event will allow you to write more robust asynchronous job processing applications.
Please add a comment bellow if you have any other issues with asynchronous jobs that I could help with.