reduction parallel loop in julia -
we can use
c = @parallel (vcat) i=1:10 (i,i+1) end
but when i'm trying use push!() instead of vcat()
i'm getting error. how can use push!()
in parallel loop?
c = @parallel (push!) i=1:10 (c, (i,i+1)) end
elaborating bit on dan's point; see how parallel
macro works, see difference between following 2 invocations:
julia> @parallel print in 1:10 (i,i+1) end (1, 2)(2, 3)nothing(3, 4)nothing(4, 5)nothing(5, 6)nothing(6, 7)nothing(7, 8)nothing(8, 9)nothing(9, 10)nothing(10, 11) julia> @parallel string in 1:10 (i,i+1) end "(1, 2)(2, 3)(3, 4)(4, 5)(5, 6)(6, 7)(7, 8)(8, 9)(9, 10)(10, 11)"
from top 1 should clear what's going on. each iteration produces output. when comes using specified function on outputs, done in output pairs. 2 first pair of outputs fed print, , the result of print operation becomes first item in next pair processed. since output nothing
, print
prints nothing (3,4). result of print statement nothing
, therefore next pair printed nothing
, (4,5)
, , on until elements consumed. i.e. in terms of pseudocode, what's happening:
step 1: state = print((1,2), (2,3)); # state becomes nothing
step 2: state = print(state, (3,4)); # state becomes nothing
again
step 3: state = print(state, (4,5)); # , forth
the reason string works expected because what's happening following steps:
step 1: state = string((1,2),(2,3));
step 2: state = string(state, (3,4));
step 3: state = string(state, (4,5);
etc
in general, function pass parallel macro should takes two inputs of same type, , outputs object of same type.
therefore cannot use push!
, because uses 2 inputs of different types (one array, , 1 plain element), , outputs array. therefore need use append!
instead, fits specification.
also note order of outputs not guaranteed. (here happens in order because used 1 worker). if want order of operations matters, shouldn't use construct. e.g., in addition doesn't matter, because addition associative operation; if used string
, if outputs processed in different order, end different string you'd expect.
edit - addressing benchmark between vcat / append! / indexed assignment
i think efficient way in fact via normal indexing onto preallocated array. between append!
, vcat
, append faster vcat makes copy (as understand it).
benchmarks:
function parallelwithvcat!( a::array{tuple{int64, int64}, 1} ) = @parallel vcat = 1:10000 (i, i+1) end end; function parallelwithfunction!( a::array{tuple{int64, int64}, 1} ) = @parallel append! in 1:10000 [(i, i+1)]; end end; function parallelwithpreallocation!( a::array{tuple{int64, int64}, 1} ) @parallel in 1:10000 a[i] = (i, i+1); end end; = array{tuple{int64, int64}, 1}(10000); ### first runs omitted, benchmarks here 2nd runs ### # first on single worker: @time n in 1:100; parallelwithvcat!(a); end #> 8.050429 seconds (24.65 m allocations: 75.341 gib, 15.42% gc time) @time n in 1:100; parallelwithfunction!(a); end #> 0.072325 seconds (1.01 m allocations: 141.846 mib, 52.69% gc time) @time n in 1:100; parallelwithpreallocation!(a); end #> 0.000387 seconds (4.21 k allocations: 234.750 kib) # true parallelism: addprocs(10); @time n in 1:100; parallelwithvcat!(a); end #> 1.177645 seconds (160.02 k allocations: 109.618 mib, 0.75% gc time) @time n in 1:100; parallelwithfunction!(a); end #> 0.060813 seconds (111.87 k allocations: 70.585 mib, 3.91% gc time) @time n in 1:100; parallelwithpreallocation!(a); end #> 0.058134 seconds (116.16 k allocations: 4.174 mib)
if can suggest more efficient way, please so!
note in particular indexed assignment faster rest, such appears (for example @ least) of computation in parallel case appears lost on parallelisation itself.
disclaimer: make no claim above correct summonings of @parallel spell. have not delved inner workings of macro in detail able claim otherwise. in particular, not aware parts macro causes processed remotely vs local (e.g. assignment part). caution advised, ymmv, etc.
wiki
Comments
Post a Comment