2️⃣ Read IO Improvements #429

joocer · 2022-08-28T10:47:23Z

joocer
Aug 28, 2022
Maintainer

Current deployments using GCS as the main store regularly see the cost of read (planning and scanning) occupy over 90% (99.9% observed) of the query time. Using Memcache reduces read time, usually in the region of 50%, but this can still mean over 90% of the query time is read (planning and scanning).

The current implementation of planning is broadly:

for each period in the temporal range
    get the list of all of the blobs under that folder
    prune the blobs based on completion signals and the latest writes

The current implementation of reading is serial, read a blob, send it for processing, wait for the next blob to be requested. Entire blobs are read at a time, regardless of the amount of data needed (for example, even if using projection pushdown on Parquet, the entire blob is downloaded at a time).

Some experimentation has been done with multi-processing, this is immature but has not shown good improvements to read times (reduction of in the region of 10% execution time - note read time is the same, but in parallel so measuring elapsed time)

joocer · 2022-08-28T16:42:17Z

joocer
Aug 28, 2022
Maintainer Author

Watching videos on Postgres approach, they appear to be implementing async IO (AIO) for storage, citing cloud storages highly parallel read as a factor in that approach.

This is in concert with a threading approach which is unlikely to be able to be replicated with Python due to the GIL. Will need multi processing and the costs associated with that, but with AIO.

0 replies

joocer · 2022-09-04T00:42:54Z

joocer
Sep 4, 2022
Maintainer Author

Create a truer implementation of the buffer pool, refer to the existing implementation as an external buffer (even in memory)

Use plasma (?) To reserve a chunk of memory, populate the buffer on read, keep a directory of pages in the buffer.

When we have a scan plan, read the pages from the pool first.

Prefetch pages from remote storage into the pool.

https://15445.courses.cs.cmu.edu/fall2019/schedule.html

2 replies

joocer Sep 4, 2022
Maintainer Author

Anecdotally, Plasma can't keep up with the internals of the engine and although is faster than moving the information between processes,. It's not significantly faster.

This attests to be many times faster than plasma

https://github.com/danyangz/lightning

joocer Sep 13, 2022
Maintainer Author

the local buffer pool should use something like LRU-K to prevent sequential flooding evicting everything from the cache.

joocer · 2022-09-10T16:18:18Z

joocer
Sep 10, 2022
Maintainer Author

GCS supports the S3 API, I recommend this is the focus of proving optimisations. Where possible these should be ported to the GCS functionality.

This may be the rust S3 reader.

1 reply

joocer Sep 11, 2022
Maintainer Author

The Rust S3 reader with Python binding doesn't support authentication, so was not viable.

joocer · 2022-09-11T12:11:46Z

joocer
Sep 11, 2022
Maintainer Author

Using asyncio to wrap Opteryx inners only, read is not quite halved (60% for disk, 55% for MinIo)

The MinIo reader was able to saturate my 2.5gbps connection (2.3gbps sustained), so may be faster on a faster network.

This is lab testing and will not quite map to real implementation, the lab approach downloads the files as fast as possible, which if was a large dataset, would blow memory allocations.

An approach is needed to buffer results and to wait when there is no room in the buffer.

0 replies

joocer · 2022-09-13T18:27:40Z

joocer
Sep 13, 2022
Maintainer Author

I wonder what writing a sidecar system which handles IO would do for performance.

Running as a separate process, using multiprocessing, the new part could be optimised for getting data, handling caching, buffering and decoding data, for the query engine component to consume from.

0 replies

joocer · 2022-12-14T10:47:44Z

joocer
Dec 14, 2022
Maintainer Author

Given the massive disproportionate execution time for read and process (let's say 90% of execution time is reading - different factors affect this so there's no hard number), we cannot read as fast as we process without significant changes.

Doing work like parallel reading, takes a lot of effort but at most will make the download the critical path rather than read/process/read/process, so we're saving about 10%.

Caching and buffer pool have helped considerably, and we've seen that projection pushdown saves a lot of the read time, but selection pushdown less so, it may even be slower.

1 reply

joocer Dec 28, 2022
Maintainer Author

Using async to read in batches may be the way forward.

The existing parallelizer written for Mabel did the first few in parallel, and then fetches more pages in a separate thread in the background. After the buffer is full, the critical path is the download, which happens whilst the processing of the previous file is taking place. Let's assume it takes 1.0 seconds to fetch and 0.1 seconds to process a page, we have 100 pages and four threads.

As a rough approximation, we download four together in 1.0 seconds, then we download the 96 more essentially serially (with a small overlap)
1.0 + (96 x (1.0 - 0.1)) + 0.1 = 87.5 seconds

However, if we move to a batch model, we can do more in parallel, in this case downloading 25 batches of four (each in about 1 second) and then processing them serially:
(1.0 * 25) + (0.1 x 100) = 35 seconds

Each additional thread on the Mabel model saves just under a second, another thread on the batch model would save 5 seconds (30 seconds total). The returns are diminishing, 10 threads would be approx 20 seconds in total, and 100 threads approx 11 seconds.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

2️⃣ Read IO Improvements #429

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 6 comments 4 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

Select a reply

2️⃣ Read IO Improvements #429

joocer Aug 28, 2022 Maintainer

Replies: 6 comments · 4 replies

joocer Aug 28, 2022 Maintainer Author

joocer Sep 4, 2022 Maintainer Author

joocer Sep 4, 2022 Maintainer Author

joocer Sep 13, 2022 Maintainer Author

joocer Sep 10, 2022 Maintainer Author

joocer Sep 11, 2022 Maintainer Author

joocer Sep 11, 2022 Maintainer Author

joocer Sep 13, 2022 Maintainer Author

joocer Dec 14, 2022 Maintainer Author

joocer Dec 28, 2022 Maintainer Author

joocer
Aug 28, 2022
Maintainer

Replies: 6 comments 4 replies

joocer
Aug 28, 2022
Maintainer Author

joocer
Sep 4, 2022
Maintainer Author

joocer Sep 4, 2022
Maintainer Author

joocer Sep 13, 2022
Maintainer Author

joocer
Sep 10, 2022
Maintainer Author

joocer Sep 11, 2022
Maintainer Author

joocer
Sep 11, 2022
Maintainer Author

joocer
Sep 13, 2022
Maintainer Author

joocer
Dec 14, 2022
Maintainer Author

joocer Dec 28, 2022
Maintainer Author