Virtual Storage Views

Accelerating Main-Memory Table Scans with Partial Virtual Views

Abstract: In main-memory column stores, column scans are one of the base operations performed when answering analytical queries. Typically, one or multiple columns must be filtered with respect to the given query predicate, which, by default, involves inspecting all data of the involved columns. To reduce the amount of data to scan, there exist essentially two strategies: (1) Create a coarse-granular index on the column, then use it for early pruning during each scan. While creating such an index is relatively lightweight, unfortunately, accessing the relevant portions of the column through the index causes unpleasant overhead during scanning. (2) Create materialized views that contain semantic portions of the column and filter on these. While this enables fast scans, unfortunately, it requires physical copying and causes significant space overhead. To break this trade-off, in the following, we propose a view-based strategy that avoids any physical copying of column data while providing optimal scan performance. We achieve this by utilizing tools of the virtual memory subsystem provided by the OS: On the lowest level, we materialize all columns within physical main memory. On top of that, we allow the creation of arbitrarily many partial views in virtual memory that map to subsets of the physical columns having certain properties of interest. Creation, maintenance, and usage of these partial virtual views happens fully adaptively as a side-product of scan-based query processing.

Code available: https://gitlab.rlp.net/fschuhkn/adaptive-virtual-storage-views 

Papers available: