Friday, April 19, 2024
HomeGolangSurprising efficiency degradation on a number of Go routines - Technical Dialogue

Surprising efficiency degradation on a number of Go routines – Technical Dialogue


Hello,
I’m questioning why my algorithm for producing information doesn’t carry out higher when two Go routines have been began to generate the info.
I ran the check on a multi-core machine, which suggests that the 2 Go routines can run in parallel. Every of those two Go routines solely generates half of the whole information quantity to be generated. I subsequently assumed that the whole operating time must be about half as lengthy in comparison with a single Go routine.
It might be nice if somebody might evaluation the connected code pattern under. There is likely to be one thing improper with my strategy, however I couldn’t establish it.
Thanks!

bundle fundamental

import (
	"fmt"
	"math/rand"
	"sync"
	"time"
)

func gendata(numGoRoutines, numElements int) []int {
	var wg sync.WaitGroup
	wg.Add(numGoRoutines)
	rand.Seed(time.Now().UnixNano())

	ssi := make([][]int, numGoRoutines)
	for i := 0; i < numGoRoutines; i++ {
		ssi[i] = make([]int, 0, numElements/numGoRoutines)
		go func(i int) {
			defer wg.Completed()
			for j := 0; j < numElements/numGoRoutines; j++ {
				ssi[i] = append(ssi[i], rand.Intn(10))
			}
		}(i)
	}
	wg.Wait()

	// Create the end result by appending all of the sub-slices into one end result slice
	var res []int
	for i := 0; i < numGoRoutines; i++ {
		res = append(res, ssi[i]...)
	}
	return res
}

func fundamental() {
	n := 100000000

	// One Go routine
	g := 1
	t1 := time.Now()
	gendata(g, n)
	t2 := time.Now()
	fmt.Println("Information era of", n, "components with", g, "Go routines took", t2.Sub(t1))

	// Two Go routines
	g = 2
	t1 = time.Now()
	gendata(g, n)
	t2 = time.Now()
	fmt.Println("Information era of", n, "components with", g, "Go routines took", t2.Sub(t1))
}

math/rand produces a predictable sequence of pseudo-random values. To make sure that successive calls to math/rand.Intn at all times produce the identical values, the supply is protected with a mutex, so including goroutines gained’t assist.

As an alternative, use math/rand.NewSource and move that to rand.New for every goroutine individually. You can’t share a rand.Supply between goroutines as a result of solely the worldwide supply is secure for concurrent use. That ought to permit a number of cores to generate values.

2 Likes

That’s an excellent level. It improved the efficiency. Now, the runtime for the info era by one goroutine and for the info era by two goroutines is nearly equal.

However, I anticipated the runtime to be roughly halved with the two-goroutine strategy.
I additionally examined with a set worth for the info to be appended to the slice, e.g.

ssi[i] = append(ssi[i], 1)

The end result is similar. The runtime is similar for each variants. It appears there’s something else which influences the runtime of the 2 goroutines. I nonetheless marvel why this occurs…

I like to recommend wanting into the runtime/pprof bundle to analyze the place the timw is being spent.

I did extra analysis and was capable of additional enhance the runtime of the perform.
Along with the change relating to the random supply (thanks for that trace) I’ve modified the next two issues:

  1. The calculation of the exit standards of the for-loop is carried out solely as soon as earlier than coming into the loop
  2. new components are inserted into the slice utilizing the suitable slice indices
func gendata(numGoRoutines, numElements int) []int {
	var wg sync.WaitGroup
	wg.Add(numGoRoutines)

	ssi := make([][]int, numGoRoutines)
	for i := 0; i < numGoRoutines; i++ {
		ssi[i] = make([]int, numElements/numGoRoutines)
		go func(i int) {
			defer wg.Completed()
            s := rand.NewSource(time.Now().UnixNano())
	        r := rand.New(s)
            cond := numElements / numGoRoutines
			for j := 0; j < cond; j++ {
				ssi[i][j] = r.Intn(10)
			}
		}(i)
	}
	wg.Wait()

	// Create the end result by appending all of the sub-slices into one end result slice
	var res []int
	for i := 0; i < numGoRoutines; i++ {
		res = append(res, ssi[i]...)
	}
	return res
}

After all! You’re proper; I’m pissed off at myself for not noticing this! Each time you append to the slice, you’re rewriting the brand new size again into ss[i] which is adjoining in reminiscence to ss[i±1] which is probably going leading to false sharing which might nullify any good thing about parallelism.

1 Like

RELATED ARTICLES

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Most Popular

Recent Comments