diskq: fix invalid read after disk_buf_size #3726

alltilla · 2021-07-07T14:09:21Z

We do not check beforehand, whether we can fit a new message to the disk-buffer, we decide to wrap the write_head after we wrote over the disk-buf-size limit.

It is possible to be just before the disk-buf-size with the write_head and write a huge message, then wrap the write_head to the beginning. This will change self->file_size to be a large value.

Let's say we set truncate-size-ratio(1), so we never truncate and never change the file_size from that large value.

Imagine, that we handled every message, then wrote a lot of messages to the disk-buffer again, so we are just before the disk-buf-size limit.
We write 2 small messages, one will end just after disk-buf-size but still before the huge self->file_size, the next one will start at the beginning, and end a bit after that.
This is because we decide to wrap the write_head based on the position related to disk-buf-size (see _is_qdisk_overwritten()).

When we are reading from the disk-buffer we currently decide to wrap the read_head based on whether it is after the self->file_size.
We read the small message that ends just after disk-buf-size, and we decide not to wrap, and try to read one more message there, which is the middle of an old message.

This PR matches the wrap logic between the write and read head, so they both measure their position relative to disk-buf-size.

Signed-off-by: Attila Szakacs attila.szakacs@oneidentity.com

kira-syslogng · 2021-07-07T14:31:45Z

Build FAILURE

kira-syslogng · 2021-07-07T16:35:16Z

Build FAILURE

kira-syslogng · 2021-07-07T20:31:35Z

Build FAILURE

alltilla · 2021-07-07T20:33:08Z

@kira-syslogng do stresstest

MrAnno · 2021-07-07T20:40:34Z

Can you explain the exact difference between useful_file_size and file_size and when they have to be updated?
Maybe we could find a more "intention-revealing" variable name that way.

alltilla · 2021-07-07T21:01:10Z

file_size is the physical disk used by the disk-queue file. useful_file_size is the physical disk usage of the mmapped region and the log messages which are waiting to be processed (read or acked). Sorry if I am cryptic.

Edit: This is not completely true, because we can have already handled messages in the start of the disk buffer... The useful_file_size variable is the theoretically minimal file size, if we would truncate every time. So it does not take into account the already handled messages in the end of the file, which could have been removed by truncate, but wasn't because of truncate-size-ratio.

kira-syslogng · 2021-07-07T21:02:24Z

Kira-stress-test: Build SUCCESS

modules/diskq/tests/test_diskq_truncate.c

Signed-off-by: Attila Szakacs <attila.szakacs@oneidentity.com>

MrAnno · 2021-07-09T09:26:36Z

It seems that useful_file_size is not recovered after restarting syslog-ng, so the fix may be incomplete.

I think the best would be if we could fix this issue without adding more "state".
If we could calculate useful_file_size at restart, then it's actually an unnecessary field, otherwise, we should persist this field as well.

MrAnno · 2021-07-09T11:22:38Z

@kira-syslogng do stresstest

kira-syslogng · 2021-07-09T11:56:40Z

Kira-stress-test: Build SUCCESS

Signed-off-by: Attila Szakacs <attila.szakacs@oneidentity.com>

MrAnno · 2021-07-09T21:20:03Z

It turned out that the additional "state" is required, and it also has to be persisted (has to be added to the disk buffer header).

This is because the truncation logic was not only used to reduce file size, but for 2 special operations as well:

to remove persisted memory-buffers (qout, overflow, backlog) from the disk buffer after a restart (non-reliable qdisk)
to remove an overflowing disk buffer section when we write past disk-buf-size() (we allow a single message at the end of the disk-buffer to exceed the disk-buf-size limit)

~~We must not allow read_head to reach those parts of the diskq file, hence the need for the new field.~~
~~Since we need to add a new field to the header, the diskq version needs to be bumped too.~~

Alternatively, we could truncate in the above 2 rare(?) cases unconditionally as we did before. That would save us from complicating things further.

kira-syslogng · 2021-07-12T09:45:31Z

Kira-stress-test: Build SUCCESS

kira-syslogng · 2021-07-12T09:52:31Z

Build SUCCESS

gaborznagy

I've just glimpsed into the PR to see what's changing.

modules/diskq/qdisk.c

This is a strange unit test, it was created to help to investigate a bug and validate its fix. I added this first, so one can checkout this commit, and see, that it fails with the current code. The details and the root cause of the bug will be in the next commit's message. Signed-off-by: Attila Szakacs <attila.szakacs@oneidentity.com>

With the truncate-size-ratio() option, now we cannot base our read_head wrapping logic on self->file_size (see next commit). We can base it on disk_buf_size, if we guarantee, that the write_head does not overwrite disk_buf_size with more than 1 message. Signed-off-by: Attila Szakacs <attila.szakacs@oneidentity.com>

We do not check beforehand, whether we can fit a new message to the disk-buffer, we decide to wrap the write_head after we wrote over the disk-buf-size limit. It is possible to be just before the disk-buf-size with the write_head and write a huge message, then wrap the write_head to the beginning. This will change self->file_size to be a large value. Let's say we set truncate-size-ratio(1), so we never truncate and never change the file_size from that large value. Imagine, that we handled every message, then wrote a lot of messages to the disk-buffer again, so we are just before the disk-buf-size limit. We write 2 small messages, one will end just after disk-buf-size but still before the huge self->file_size, the next one will start at the beginning, and end a bit after that. This is because we decide to wrap the write_head based on the position related to disk-buf-size (see `_is_qdisk_overwritten()`). When we are reading from the disk-buffer we currently decide to wrap the read_head based on whether it is after the self->file_size. We read the small message that ends just after disk-buf-size, and we decide not to wrap, and try to read one more message there, which is the middle of an old message. This commit matches the wrap logic between the write and read head, so they both measure their position relative to disk-buf-size. Signed-off-by: Attila Szakacs <attila.szakacs@oneidentity.com>

Signed-off-by: Attila Szakacs <attila.szakacs@oneidentity.com>

This way we can change the disk-buffer version immediately at load, so it is easier to add a new disk-buffer version later. Signed-off-by: Attila Szakacs <attila.szakacs@oneidentity.com>

Follow-up for syslog-ng#3689. Signed-off-by: László Várady <laszlo.varady@protonmail.com>

As the truncation logic is now conditional, debug messages has to be moved inside _maybe_truncate_file(). Signed-off-by: László Várady <laszlo.varady@protonmail.com>

Signed-off-by: László Várady <laszlo.varady@protonmail.com>

The in-memory buffers of the non-reliable diskq is persisted when syslog-ng is restarted, which increases the size of the actual disk-buffer file. This test case tests the correctness of the non-reliable diskq when truncation is completely disabled. Signed-off-by: László Várady <laszlo.varady@protonmail.com>

Signed-off-by: Attila Szakacs <attila.szakacs@oneidentity.com>

alltilla · 2021-07-14T18:50:29Z

@kira-syslogng do stresstest

kira-syslogng · 2021-07-14T19:10:39Z

Build SUCCESS

kira-syslogng · 2021-07-14T19:16:51Z

Kira-stress-test: Build SUCCESS

modules/diskq/tests/test_diskq_truncate.c

szemere

Thank You!

alltilla force-pushed the fix_no_truncate_wrap branch from 8e79f3a to 65decf5 Compare July 7, 2021 14:11

szemere self-requested a review July 7, 2021 14:18

alltilla force-pushed the fix_no_truncate_wrap branch from 65decf5 to 214379d Compare July 7, 2021 16:13

alltilla force-pushed the fix_no_truncate_wrap branch from 214379d to 4ea96d7 Compare July 7, 2021 16:14

alltilla changed the title ~~diskq: fix invalid read after disk_buf_size~~ [wip] diskq: fix invalid read after disk_buf_size Jul 7, 2021

alltilla marked this pull request as draft July 7, 2021 16:32

alltilla force-pushed the fix_no_truncate_wrap branch from 4ea96d7 to 428d680 Compare July 7, 2021 20:07

alltilla changed the title ~~[wip] diskq: fix invalid read after disk_buf_size~~ [wip] qdisk: store useful file size Jul 7, 2021

alltilla marked this pull request as ready for review July 8, 2021 06:03

alltilla changed the title ~~[wip] qdisk: store useful file size~~ qdisk: store useful file size Jul 8, 2021

Barkodcz reviewed Jul 8, 2021

View reviewed changes

modules/diskq/tests/test_diskq_truncate.c Outdated Show resolved Hide resolved

MrAnno self-requested a review July 8, 2021 18:53

alltilla added a commit to alltilla/syslog-ng that referenced this pull request Jul 9, 2021

news: add entry for syslog-ng#3726

d0062db

Signed-off-by: Attila Szakacs <attila.szakacs@oneidentity.com>

alltilla added a commit to alltilla/syslog-ng that referenced this pull request Jul 9, 2021

news: add entry for syslog-ng#3726

8b99f09

Signed-off-by: Attila Szakacs <attila.szakacs@oneidentity.com>

gaborznagy reviewed Jul 12, 2021

View reviewed changes

modules/diskq/qdisk.c Show resolved Hide resolved

MrAnno reviewed Jul 12, 2021

View reviewed changes

modules/diskq/qdisk.c Outdated Show resolved Hide resolved

modules/diskq/qdisk.c Show resolved Hide resolved

modules/diskq/qdisk.c Outdated Show resolved Hide resolved

modules/diskq/qdisk.c Outdated Show resolved Hide resolved

MrAnno self-requested a review July 12, 2021 23:40

alltilla and others added 14 commits July 13, 2021 07:22

diskq: move is_space_avail comment to the implementation

d922319

Signed-off-by: Attila Szakacs <attila.szakacs@oneidentity.com>

diskq: refactor header version upgrade functions

b5dc2d5

Signed-off-by: Attila Szakacs <attila.szakacs@oneidentity.com>

diskq: use old read_head wrap logic once, if we upgrade

1279237

Signed-off-by: Attila Szakacs <attila.szakacs@oneidentity.com>

diskq: persist use_old_read_head_wrap_condition

28486d3

This way we can change the disk-buffer version immediately at load, so it is easier to add a new disk-buffer version later. Signed-off-by: Attila Szakacs <attila.szakacs@oneidentity.com>

disk-buffer: rename _truncate_file to _maybe_truncate_file

9590d4a

Follow-up for syslog-ng#3689. Signed-off-by: László Várady <laszlo.varady@protonmail.com>

disk-buffer: fix debug messages

965d89c

As the truncation logic is now conditional, debug messages has to be moved inside _maybe_truncate_file(). Signed-off-by: László Várady <laszlo.varady@protonmail.com>

disk-buffer: fix test function names

983126f

Signed-off-by: László Várady <laszlo.varady@protonmail.com>

disk-buffer: avoid using QDisk internals in tests

2eaef22

Signed-off-by: László Várady <laszlo.varady@protonmail.com>

disk-buffer: name test constant TEST_DISKQ_SIZE

1dbd5cf

Signed-off-by: László Várady <laszlo.varady@protonmail.com>

news: add entry for syslog-ng#3726

4a19167

Signed-off-by: Attila Szakacs <attila.szakacs@oneidentity.com>

alltilla force-pushed the fix_no_truncate_wrap branch from f3f985b to 4a19167 Compare July 14, 2021 18:47

alltilla changed the title ~~qdisk: store useful file size~~ diskq: fix invalid read after disk_buf_size Jul 14, 2021

alltilla marked this pull request as ready for review July 14, 2021 18:48

alltilla mentioned this pull request Jul 14, 2021

diskq: fix invalid read after disk_buf_size #3729

Closed

MrAnno approved these changes Jul 14, 2021

View reviewed changes

szemere reviewed Jul 15, 2021

View reviewed changes

modules/diskq/tests/test_diskq_truncate.c Show resolved Hide resolved

szemere approved these changes Jul 15, 2021

View reviewed changes

szemere merged commit 59d17f5 into syslog-ng:master Jul 15, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

diskq: fix invalid read after disk_buf_size #3726

diskq: fix invalid read after disk_buf_size #3726

alltilla commented Jul 7, 2021 •

edited

kira-syslogng commented Jul 7, 2021

kira-syslogng commented Jul 7, 2021

kira-syslogng commented Jul 7, 2021

alltilla commented Jul 7, 2021

MrAnno commented Jul 7, 2021

alltilla commented Jul 7, 2021 •

edited

kira-syslogng commented Jul 7, 2021

MrAnno commented Jul 9, 2021

MrAnno commented Jul 9, 2021

kira-syslogng commented Jul 9, 2021

MrAnno commented Jul 9, 2021 •

edited

kira-syslogng commented Jul 12, 2021

kira-syslogng commented Jul 12, 2021

gaborznagy left a comment

alltilla commented Jul 14, 2021

kira-syslogng commented Jul 14, 2021

kira-syslogng commented Jul 14, 2021

szemere left a comment

diskq: fix invalid read after disk_buf_size #3726

diskq: fix invalid read after disk_buf_size #3726

Conversation

alltilla commented Jul 7, 2021 • edited

kira-syslogng commented Jul 7, 2021

kira-syslogng commented Jul 7, 2021

kira-syslogng commented Jul 7, 2021

alltilla commented Jul 7, 2021

MrAnno commented Jul 7, 2021

alltilla commented Jul 7, 2021 • edited

kira-syslogng commented Jul 7, 2021

MrAnno commented Jul 9, 2021

MrAnno commented Jul 9, 2021

kira-syslogng commented Jul 9, 2021

MrAnno commented Jul 9, 2021 • edited

kira-syslogng commented Jul 12, 2021

kira-syslogng commented Jul 12, 2021

gaborznagy left a comment

Choose a reason for hiding this comment

alltilla commented Jul 14, 2021

kira-syslogng commented Jul 14, 2021

kira-syslogng commented Jul 14, 2021

szemere left a comment

Choose a reason for hiding this comment

alltilla commented Jul 7, 2021 •

edited

alltilla commented Jul 7, 2021 •

edited

MrAnno commented Jul 9, 2021 •

edited