Version: devel

dlt.destinations.impl.filesystem.filesystem

FilesystemLoadJob Objects

class FilesystemLoadJob(RunnableLoadJob)

View source on GitHub

make_remote_path

def make_remote_path() -> str

View source on GitHub

Returns path on the remote filesystem to which copy the file, without scheme. For local filesystem a native path is used

make_remote_url

def make_remote_url() -> str

View source on GitHub

Returns path on a remote filesystem as a full url including scheme.

FilesystemClient Objects

class FilesystemClient(FSClientBase, WithSqlClient, JobClientBase,
                       WithStagingDataset, WithStateSync, SupportsOpenTables,
                       WithTableReflection)

View source on GitHub

storage_versions

@property
def storage_versions() -> Tuple[int, int]

View source on GitHub

Returns cached storage versions, loading it once from filesystem if not already cached

init_file_path

@property
def init_file_path() -> str

View source on GitHub

Returns the path to the init file for the current dataset

dataset_path

@property
def dataset_path() -> str

View source on GitHub

A path within a bucket to tables in a dataset NOTE: dataset_name changes if with_staging_dataset is active

migrate_storage

def migrate_storage(from_version: int, to_version: int) -> None

View source on GitHub

Migrate storage from one version to another

get_storage_versions

def get_storage_versions() -> Tuple[int, int]

View source on GitHub

Returns initial and current storage versions.

If the init file is empty, we assume legacy version 1 where .gz extension was not added to compressed files.
For any other non-empty content we parse it as json, expect version key to have a supported value.

get_storage_tables

def get_storage_tables(
        table_names: Iterable[str]
) -> Iterable[Tuple[str, TTableSchemaColumns]]

View source on GitHub

Yield (table_name, column_schemas) pairs for tables that have files in storage.

For Delta and Iceberg tables, the columns present in the actual table metadata are returned. For tables using regular file formats, the column schemas come from the dlt schema instead, since their real schema cannot be reflected directly.

truncate_tables

def truncate_tables(table_names: List[str]) -> None

View source on GitHub

Truncate a set of regular tables with given table_names

get_table_dir

def get_table_dir(table_name: str, remote: bool = False) -> str

View source on GitHub

Returns a directory containing table files, ending with separator. Note that many tables can share the same table dir

get_table_prefix

def get_table_prefix(table_name: str) -> str

View source on GitHub

For table prefixes that are folders, trailing separator will be preserved

get_table_dirs

def get_table_dirs(table_names: Iterable[str],
                   remote: bool = False) -> List[str]

View source on GitHub

Gets directories where table data is stored.

list_table_files

def list_table_files(table_name: str) -> List[str]

View source on GitHub

gets list of files associated with one table

list_files_with_prefixes

def list_files_with_prefixes(table_dir: str, prefixes: List[str]) -> List[str]

View source on GitHub

returns all files in a directory that match given prefixes

make_remote_url

def make_remote_url(remote_path: str) -> str

View source on GitHub

Returns uri to the remote filesystem to which copy the file

get_stored_schema

def get_stored_schema(schema_name: str = None) -> Optional[StorageSchemaInfo]

View source on GitHub

Retrieves newest schema from destination storage

load_open_table

def load_open_table(table_format: TTableFormat, table_name: str,
                    **kwargs: Any) -> Any

View source on GitHub

Locates, loads and returns native table client for table table_name in delta or iceberg formats

get_open_table_catalog

def get_open_table_catalog(table_format: TTableFormat,
                           catalog_name: str = None) -> Any

View source on GitHub

Gets a native catalog for a table table_name with format table_format

Returns: currently pyiceberg Catalog is supported

get_open_table_location

def get_open_table_location(table_format: TTableFormat,
                            table_name: str) -> str

View source on GitHub

All tables have location, also those in "native" table format. Native format in case of filesystem is a set of parquet/csv/jsonl files where a table may be placed in a separate folder or share common prefix define in the layout. Locations of native tables will are normalized to include trailing separator if path is a "folder" (includes buckets) Note: location is fully formed url

dlt.destinations.impl.filesystem.filesystem

FilesystemLoadJob Objects

make_remote_path

make_remote_url

FilesystemClient Objects

storage_versions

init_file_path

dataset_path

migrate_storage

get_storage_versions

get_storage_tables

truncate_tables

get_table_dir

get_table_prefix

get_table_dirs

list_table_files

list_files_with_prefixes

make_remote_url

get_stored_schema

load_open_table

get_open_table_catalog

get_open_table_location

DHelp

Ask a question

FilesystemLoadJob Objects​

make_remote_path​

make_remote_url​

FilesystemClient Objects​

storage_versions​

init_file_path​

dataset_path​

migrate_storage​

get_storage_versions​

get_storage_tables​

truncate_tables​

get_table_dir​

get_table_prefix​

get_table_dirs​

list_table_files​

list_files_with_prefixes​

make_remote_url​

get_stored_schema​

load_open_table​

get_open_table_catalog​

get_open_table_location​

DHelp

Ask a question

FilesystemLoadJob Objects

make_remote_path

make_remote_url

FilesystemClient Objects

storage_versions

init_file_path

dataset_path

migrate_storage

get_storage_versions

get_storage_tables

truncate_tables

get_table_dir

get_table_prefix

get_table_dirs

list_table_files

list_files_with_prefixes

make_remote_url

get_stored_schema

load_open_table

get_open_table_catalog

get_open_table_location