Principally, it’s like this:
x := "It labored earlyer lol"
go func(x string){
c.onHTML(...){
print(x)
}
c.go to(https://www.website.com)
}
in order that’s the gist once I print the variables which might be clean however exterior of the go func they’re outlined right here’s the total guardian perform:
eventCollector.OnHTML(".rgMasterTable tr", func(h *colly.HTMLElement) {
eventName := h.ChildText("td:nth-child(3) a")
eventURL := h.ChildAttr("td:nth-child(3) a", "href")
state := h.ChildText("td:nth-child(2)")
wgFR.Add(1) // Increment WaitGroup counter for every goroutine
semaphore <- struct{}{} // Purchase a token
go func(eventName, eventURL, state string) {
defer wgFR.Performed() // Sign completion when the goroutine exits
defer func() { <-semaphore }()
contestCollector := eventCollector.Clone()
var postedDateStr string
contestCollector.OnHTML("#ctl00_ContentPlaceHolder1_FormView1_Report_2Label", func(d *colly.HTMLElement) {
postedDateStr = d.Textual content
})
contestCollector.OnHTML(".rgMasterTable tr", func(c *colly.HTMLElement) { // troubled line
contestName := c.ChildText("td:nth-child(1)")
contestURL := c.ChildAttr("td:nth-child(3) a", "href")
if contestURL == "" {
contestURL = "FORMAT-ERROR"
} // tmp handler for doc model outcomes
postedDate, timeErr := time.Parse("Jan 2, 2006", postedDateStr)
if timeErr != nil {
log.Printf("Error parsing time from %s", eventURL)
}
contest := Contest{
EventName: eventName,
ContestName: contestName,
PostedDate: postedDate,
ContestURL: contestURL,
State: state,
Current: true,
}
fResults = append(fResults, contest)
})
err := contestCollector.Go to("https://www.judgingcard.com/Outcomes/" + eventURL)
if err != nil {
log.Printf("Couldn't discover occasion: %s -- %s", eventURL, eventName)
}
contestCollector.Wait() // Anticipate the interior collector to complete
}(eventName, eventURL, state)
})
within the full model all variables handed to the go func return clean if within the callback (contestCollector.OnHTML) sadly I’m undecided climate difficulty likes within the go routine, the truth that is as callback or no matter else.
Thanks upfront!
This appears fairly convoluted. It seems to be like you might be utilizing github.com/gocolly/colly. Why is that part in a goroutine precisely? I’d first strive eliminating that. It appears as if the collectors are already utilizing goroutines (I’m simply guessing since they’ve a Wait
perform; I didn’t have a look at the docs to substantiate this).
This code seems to be prefer it’s lacking some stuff (like the place are you ready for wgFR
? The place is wgFR
even outlined?) so off the highest of my head one one thing might be modifying the strings in your outer features that you just won’t be anticipating. Within the problematic contestCollector.OnHTML
callback you might be not passing these values to that perform so it might be falling prey to one thing like this:
func principal() {
myAwesomeValue := "superior"
wg := sync.WaitGroup{}
wg.Add(1)
go func() {
time.Sleep(time.Millisecond)
fmt.Println("The worth is", myAwesomeValue)
wg.Performed()
}()
myAwesomeValue = "not superior"
wg.Wait()
}
… which prints The worth isn't superior
.
1 Like
Thanks the Colly framework does have its personal supervisor that dosen’t require defining a brand new wait group. Ive constructed this take a look at script utilizing what Ive realized that works as anticipated.
bundle principal
import (
"fmt"
"github.com/gocolly/colly"
)
kind Occasion struct {
event_date string
state string
event_name string
event_url string
contest_name string
contest_url string
}
func principal() {
var occasions []Occasion
ECollector := colly.NewCollector(colly.Async(true))
ECollector.Restrict(&colly.LimitRule{DomainGlob: "*", Parallelism: 2})
ECollector.OnHTML(".rgMasterTable tr", func(e *colly.HTMLElement) {
if e.Index > 5 {
return
}
date := e.ChildText("td:nth-child(1)")
state := e.ChildText("td:nth-child(2)")
title := e.ChildText("td:nth-child(3)")
url := e.ChildAttr("td:nth-child(3) a", "href")
occasion := Occasion{
event_date: date,
state: state,
event_name: title,
event_url: url,
}
occasions = append(occasions, occasion)
})
ECollector.Go to("https://www.judgingcard.com")
ECollector.Wait()
CCollector := ECollector.Clone()
for _, occasion := vary occasions {
CCollector.OnHTML(".rgMasterTable tr", func(c *colly.HTMLElement) {
//skip header line
if c.Index == 0 || c.Request.URL.String() == "https://www.judgingcard.com/Outcomes/default.aspx" {
return
}
//Extra contests than occasions/construct new struct
contest_name := c.ChildText("td:nth-child(1)")
contest_url := c.ChildAttr("td:nth-child(3) a", "href")
fmt.Println(contest_name, contest_url)
})
CCollector.Go to("https://www.judgingcard.com" + occasion.event_url)
}
CCollector.Wait()
}