apache spark - How to refresh a table and do it concurrently? -

i'm using spark streaming 2.1. i'd refresh cached table (loaded spark provided datasource parquet, mysql or user-defined data sources) periodically.

how refresh table?

suppose have table loaded by

spark.read.format("").load().createtempview("my_table")

and cached by

spark.sql("cache table my_table")

is enough following code refresh table, , when table loaded next, automatically cached

spark.sql("refresh table my_table")

or have manually with

spark.table("my_table").unpersist spark.read.format("").load().createorreplacetempview("my_table") spark.sql("cache table my_table")
is safe refresh table concurrently?

by concurrent mean using scheduledthreadpoolexecutor refresh work apart main thread.

what happen if spark using cached table when call refresh on table?

in spark 2.2.0 have introduced feature of refreshing metadata of table if updated hive or external tools.

you can achieve using api,

spark.catalog.refreshtable("my_table")

this api update metadata table keep consistent.

wiki

tL