reduction parallel loop in julia -

- February 24, 2014

we can use

c = @parallel (vcat) i=1:10           (i,i+1)        end

but when i'm trying use push!() instead of vcat() i'm getting error. how can use push!() in parallel loop?

c = @parallel (push!) i=1:10      (c, (i,i+1)) end

elaborating bit on dan's point; see how parallel macro works, see difference between following 2 invocations:

julia> @parallel print in 1:10          (i,i+1)        end (1, 2)(2, 3)nothing(3, 4)nothing(4, 5)nothing(5, 6)nothing(6, 7)nothing(7, 8)nothing(8, 9)nothing(9, 10)nothing(10, 11)  julia> @parallel string in 1:10          (i,i+1)        end "(1, 2)(2, 3)(3, 4)(4, 5)(5, 6)(6, 7)(7, 8)(8, 9)(9, 10)(10, 11)"

from top 1 should clear what's going on. each iteration produces output. when comes using specified function on outputs, done in output pairs. 2 first pair of outputs fed print, , the result of print operation becomes first item in next pair processed. since output nothing, print prints nothing (3,4). result of print statement nothing, therefore next pair printed nothing , (4,5), , on until elements consumed. i.e. in terms of pseudocode, what's happening:

step 1: state = print((1,2), (2,3)); # state becomes nothing
step 2: state = print(state, (3,4)); # state becomes nothing again
step 3: state = print(state, (4,5)); # , forth

the reason string works expected because what's happening following steps:

step 1: state = string((1,2),(2,3));
step 2: state = string(state, (3,4));
step 3: state = string(state, (4,5);
etc

in general, function pass parallel macro should takes two inputs of same type, , outputs object of same type.

therefore cannot use push!, because uses 2 inputs of different types (one array, , 1 plain element), , outputs array. therefore need use append! instead, fits specification.

also note order of outputs not guaranteed. (here happens in order because used 1 worker). if want order of operations matters, shouldn't use construct. e.g., in addition doesn't matter, because addition associative operation; if used string, if outputs processed in different order, end different string you'd expect.

edit - addressing benchmark between vcat / append! / indexed assignment

i think efficient way in fact via normal indexing onto preallocated array. between append! , vcat, append faster vcat makes copy (as understand it).

benchmarks:

function parallelwithvcat!( a::array{tuple{int64, int64}, 1} )   = @parallel vcat = 1:10000     (i, i+1)   end end;  function parallelwithfunction!( a::array{tuple{int64, int64}, 1} )   = @parallel append! in 1:10000     [(i, i+1)];   end end;  function parallelwithpreallocation!( a::array{tuple{int64, int64}, 1} )   @parallel in 1:10000     a[i] = (i, i+1);   end end;  = array{tuple{int64, int64}, 1}(10000);  ### first runs omitted, benchmarks here 2nd runs ### # first on single worker:  @time n in 1:100; parallelwithvcat!(a); end #>  8.050429 seconds (24.65 m allocations: 75.341 gib, 15.42% gc time)  @time n in 1:100; parallelwithfunction!(a); end #>  0.072325 seconds (1.01 m allocations: 141.846 mib, 52.69% gc time)  @time n in 1:100; parallelwithpreallocation!(a); end #>  0.000387 seconds (4.21 k allocations: 234.750 kib)  # true parallelism: addprocs(10);  @time n in 1:100; parallelwithvcat!(a); end #>  1.177645 seconds (160.02 k allocations: 109.618 mib, 0.75% gc time)  @time n in 1:100; parallelwithfunction!(a); end #>  0.060813 seconds (111.87 k allocations: 70.585 mib, 3.91% gc time)  @time n in 1:100; parallelwithpreallocation!(a); end #>  0.058134 seconds (116.16 k allocations: 4.174 mib)

if can suggest more efficient way, please so!

note in particular indexed assignment faster rest, such appears (for example @ least) of computation in parallel case appears lost on parallelisation itself.

_{disclaimer: make no claim above correct summonings of @parallel spell. have not delved inner workings of macro in detail able claim otherwise. in particular, not aware parts macro causes processed remotely vs local (e.g. assignment part). caution advised, ymmv, etc.}

wiki

Search This Blog

tL

reduction parallel loop in julia -

Comments

Post a Comment

Popular posts from this blog

python - Read npy file directly from S3 StreamingBody -

Asterisk AGI Python Script to Dialplan does not work -

kotlin - Out-projected type in generic interface prohibits the use of metod with generic parameter -