All Ceph cluster administrators have probably already faced some level of disruption caused by deep scrubbing.

Deep scrubs are ruining the performance for my clients! Should I disable them?

Hold on a minute, let's make sure we understand what a scrub and deep scrub is, per the manual:

In addition to making multiple copies of objects, Ceph insures data integrity by scrubbing placement groups. Ceph scrubbing is analogous to fsck on the object storage layer. For each placement group, Ceph generates a catalog of all objects and compares each primary object and its replicas to ensure that no objects are missing or mismatched. Light scrubbing (daily) checks the object size and attributes. Deep scrubbing (weekly) reads the data and uses checksums to ensure data integrity.

This can raise the following question:

Ceph is already doing scrubbing, so why is deep scrubbing so important?

"Regular" scrubbing catches bugs and filesystem errors. Deep scrubbing compares data in objects bit-by-bit - this helps find bad sectors on a drive that weren't apparent during "light" scrubbing:

Data Scrubbing: As part of maintaining data consistency and cleanliness, Ceph OSD Daemons can scrub objects within placement groups. That is, Ceph OSD Daemons can compare object metadata in one placement group with its replicas in placement groups stored on other OSDs. Scrubbing (usually performed daily) catches bugs or filesystem errors. Ceph OSD Daemons also perform deeper scrubbing by comparing data in objects bit-for-bit. Deep scrubbing (usually performed weekly) finds bad sectors on a drive that weren’t apparent in a light scrub. See Data Scrubbing for details on configuring scrubbing.

Options for dealing with deep scrubs

If your cluster is fast (SSD) or very fast (NVMe) you can probably deal with automated scrubbing. If you're having performance issues on clients during deep scrubs on mentioned clusters OR are using spinning disks, here are some tips:

Option 1: Tweaking variables

Ceph has a range of variables that can help minimize scrub and deep scrub impact (see: These alone can make quite a noticeable impact on the cluster performance. In particular, take a look at osd scrub chunk maxosd scrub sleep and osd deep scrub interval:

osd deep scrub interval
Long story short, this variable controls how often a deep scrub is performed against a placement group. This depends on when said placement group was created. If you've been increasing PGs by significant amounts, deep scrubs will be queried for more PGs at a time because the interval for them is the same since their creation.

osd scrub sleep
Time to sleep before scrubbing next group of chunks. This can give some breathing room to client ops during scrub.

osd scrub chunk max
The maximum number of object store chunks to scrub during a single operation. For RBD devices, the "chunk size" is controlled by --stripe-unit during creation.

Changing the variables across all OSDs without restarting them
Run on an admin node: ceph tell osd.* injectargs -- --osd_scrub_chunk_max=1

Option 2: Taking control of deep scrubs probably in addition to some of the above variables

The approach I've chosen is to schedule deep scrubs on my own with a simple bash script, the gist of it is find the oldest (scrub-wise) PG and queue a deep scrub:

[email protected]:~# ceph pg ls-by-pool rbd active | head -n 3 
0.0 2276 0 0 0 0 9523095552 1582 1582 active+clean 2018-03-31 00:23:43.874267 19276'9889784 19276:12161333 [30,11,0] 30 [30,11,0] 30 19276'9862308 2018-03-31 00:23:43.874228 15359'9620854 2018-03-16 03:10:10.327517 
0.1 2403 0 0 0 0 10028146176 1550 1550 active+clean 2018-03-31 07:05:09.231847 19276'12868733 19276:15069331 [8,49,5] 8 [8,49,5] 8 19276'12839922 2018-03-31 07:05:09.231795 19276'12839922 2018-03-31 07:05:09.231795

You will notice that column #20 contains the last deep scrub date. Sorting PGs by this column can potentially get us the PG we need to scrub:

As Mike pointed out some of the fields contain spaces, so we will need to focus our awk on column #23:

[email protected]:~# ceph pg ls-by-pool rbd active | awk '{ print $1, $23 }' | sort -k2 | head -n 10 
0.14e 2018-03-30 
0.153 2018-03-30 
0.156 2018-03-30 
0.15a 2018-03-30 
0.15d 2018-03-30 
0.16 2018-03-30 
0.162 2018-03-30 
0.166 2018-03-30 
0.169 2018-03-30 
0.16b 2018-03-30

Using awk to grab only the columns I need: PG and LAST_DEEP_SCRUB, sorting by the second column.

Querying the deep scrub: ceph pg deep-scrub 0.14e >/dev/null 2>&1

Tip: Probably set the nodeep-scrub flag on so that automatic deep scrubs are not queued: ceph osd set nodeep-scrub

Bonus: Automating manual deep scrubs

Seeing as running the above code every time you need a deep scrub is tedious at best, it'd be great if this was automated. Have a look at ceph-deepscrubber, a simple tool I've written to take care of this for clusters I administrate. Instructions for install & use are in the repo.

Let me know if I've missed something. I'd love to hear from you!