apache spark - How to refresh a table and do it concurrently? -
i'm using spark streaming 2.1. i'd refresh cached table (loaded spark provided datasource parquet, mysql or user-defined data sources) periodically.
how refresh table?
suppose have table loaded by
spark.read.format("").load().createtempview("my_table")
and cached by
spark.sql("cache table my_table")
is enough following code refresh table, , when table loaded next, automatically cached
spark.sql("refresh table my_table")
or have manually with
spark.table("my_table").unpersist spark.read.format("").load().createorreplacetempview("my_table") spark.sql("cache table my_table")
is safe refresh table concurrently?
by concurrent mean using
scheduledthreadpoolexecutor
refresh work apart main thread.what happen if spark using cached table when call refresh on table?
in spark 2.2.0 have introduced feature of refreshing metadata of table if updated hive or external tools.
you can achieve using api,
spark.catalog.refreshtable("my_table")
this api update metadata table keep consistent.
wiki
Comments
Post a Comment