2️⃣ Read IO Improvements #429
Replies: 6 comments 4 replies
-
Watching videos on Postgres approach, they appear to be implementing async IO (AIO) for storage, citing cloud storages highly parallel read as a factor in that approach. This is in concert with a threading approach which is unlikely to be able to be replicated with Python due to the GIL. Will need multi processing and the costs associated with that, but with AIO. |
Beta Was this translation helpful? Give feedback.
-
Create a truer implementation of the buffer pool, refer to the existing implementation as an external buffer (even in memory) Use plasma (?) To reserve a chunk of memory, populate the buffer on read, keep a directory of pages in the buffer. When we have a scan plan, read the pages from the pool first. Prefetch pages from remote storage into the pool. |
Beta Was this translation helpful? Give feedback.
-
GCS supports the S3 API, I recommend this is the focus of proving optimisations. Where possible these should be ported to the GCS functionality. This may be the rust S3 reader. |
Beta Was this translation helpful? Give feedback.
-
Using The MinIo reader was able to saturate my 2.5gbps connection (2.3gbps sustained), so may be faster on a faster network. This is lab testing and will not quite map to real implementation, the lab approach downloads the files as fast as possible, which if was a large dataset, would blow memory allocations. An approach is needed to buffer results and to wait when there is no room in the buffer. |
Beta Was this translation helpful? Give feedback.
-
I wonder what writing a sidecar system which handles IO would do for performance. Running as a separate process, using multiprocessing, the new part could be optimised for getting data, handling caching, buffering and decoding data, for the query engine component to consume from. |
Beta Was this translation helpful? Give feedback.
-
Given the massive disproportionate execution time for read and process (let's say 90% of execution time is reading - different factors affect this so there's no hard number), we cannot read as fast as we process without significant changes. Doing work like parallel reading, takes a lot of effort but at most will make the download the critical path rather than read/process/read/process, so we're saving about 10%. Caching and buffer pool have helped considerably, and we've seen that projection pushdown saves a lot of the read time, but selection pushdown less so, it may even be slower. |
Beta Was this translation helpful? Give feedback.
-
Current deployments using GCS as the main store regularly see the cost of read (planning and scanning) occupy over 90% (99.9% observed) of the query time. Using Memcache reduces read time, usually in the region of 50%, but this can still mean over 90% of the query time is read (planning and scanning).
The current implementation of planning is broadly:
The current implementation of reading is serial, read a blob, send it for processing, wait for the next blob to be requested. Entire blobs are read at a time, regardless of the amount of data needed (for example, even if using projection pushdown on Parquet, the entire blob is downloaded at a time).
Some experimentation has been done with multi-processing, this is immature but has not shown good improvements to read times (reduction of in the region of 10% execution time - note read time is the same, but in parallel so measuring elapsed time)
Beta Was this translation helpful? Give feedback.
All reactions