go vs python

Go vs Python: Performance, Concurrency, and Use Cases

Currently in the market, there are 2 very popular languages: Go and Python. Python got its popularity back in the day because of its very simple syntax and Go is a newer language which got its popularity because of its speed and simplicity.

Both of them, as any language, have some advantages and some disadvantages. In this article we are going to explore the differences and see in which scenarios which language exceeds.

Let’s start from the beginning…

 

What was the idea behind creating these languages?

Python

It was created because there was no language at that time which was very easy to read and write, but on the other hand was very powerful and extensible. In that time you could take, for example, C++ to solve something which needed good performance, but you’d spend some time learning the syntax. And if you wanted to use a more human-readable syntax, like the ABC language then the performance would be quite poor. Python solved that.

 

Go

It was created with performance and concurrency in mind. The idea was to have a very simple language, but to excel at being very performant. The developers who invented it were dealing with very big codebases written in languages like C++ and Java which meant those languages offered them the power they needed, but the developer experience wasn’t the best because, for example, compile time was very long.

 

The Difference Between Python and Go

Even though these two languages are similar, they are better suited for different situations. We are going to go through one small project I have come up with, implemented both in Go and Python to show which parts of the project are better suited to which language.

 

The project

Its idea is to go through a list of URLs and scrape them. After scraping, there’s a bit of processing: convert text to lowercase, extract words, find bigrams (i.e. two consecutive words in this case) and count the number of recurring bigrams.

In the project we can experience different types of challenges to solve: concurrency and text processing. Through those challenges we’ll see how Go and Python handle them and each language’s pros and cons.

 

Scraping web sites concurrently

We have a list of 20 URLs from which we need to scrape the text.

Here are two implementations:

 

Python:

import asyncio
import aiohttp
from bs4 import BeautifulSoup
urls = [
    "https://www.python.org/doc/",
    "https://golang.org/doc/",
    "https://docs.djangoproject.com/en/stable/",
    "https://flask.palletsprojects.com/en/stable/",
    "https://fastapi.tiangolo.com/",
    "https://pandas.pydata.org/docs/",
    "https://numpy.org/doc/",
    "https://scikit-learn.org/stable/documentation.html",
    "https://matplotlib.org/stable/contents.html",
    "https://developer.mozilla.org/en-US/docs/Web",
    "https://news.ycombinator.com/",
    "https://www.bbc.com/news",
    "https://www.theguardian.com/international",
    "https://www.npr.org/sections/news/",
    "https://www.reuters.com/",
    "https://apnews.com/",
    "https://www.aljazeera.com/news/",
    "https://www.cnn.com/world",
    "https://www.nationalgeographic.com/",
    "https://www.nytimes.com/international/",
]
async def fetch(session, url):
    async with session.get(url) as r:
        return BeautifulSoup(await r.text(), "html.parser").get_text()
async def fetch_urls():
    async with aiohttp.ClientSession() as session:
        html_pages = await asyncio.gather(*(fetch(session, u) for u in urls))
    return html_pages
async def main():
    html_pages = await fetch_urls()
if __name__ == "__main__":
    asyncio.run(main())

 

Go:

package main
import (
    "io"
    "net/http"
    "strings"
    "sync"
    "golang.org/x/net/html"
)
var urls = []string{
    "https://www.python.org/doc/",
    "https://golang.org/doc/",
    "https://docs.djangoproject.com/en/stable/",
    "https://flask.palletsprojects.com/en/stable/",
    "https://fastapi.tiangolo.com/",
    "https://pandas.pydata.org/docs/",
    "https://numpy.org/doc/",
    "https://scikit-learn.org/stable/documentation.html",
    "https://matplotlib.org/stable/contents.html",
    "https://developer.mozilla.org/en-US/docs/Web",
    "https://news.ycombinator.com/",
    "https://www.bbc.com/news",
    "https://www.theguardian.com/international",
    "https://www.npr.org/sections/news/",
    "https://www.reuters.com/",
    "https://apnews.com/",
    "https://www.aljazeera.com/news/",
    "https://www.cnn.com/world",
    "https://www.nationalgeographic.com/",
    "https://www.nytimes.com/international/",
}
func fetch(url string, wg *sync.WaitGroup, ch chan<- string) {
    defer wg.Done()
    resp, err := http.Get(url)
    if err != nil {
        return
    }
    defer resp.Body.Close()
    body, _ := ioutil.ReadAll(resp.Body)
    text := extractText(string(body))
    ch <- text
}
func extractText(htmlStr string) string {
    doc, _ := html.Parse(strings.NewReader(htmlStr))
    var f func(*html.Node) string
    f = func(n *html.Node) string {
        if n.Type == html.TextNode {
            return n.Data + " "
        }
        result := ""
        for c := n.FirstChild; c != nil; c = c.NextSibling {
            result += f(c)
	 }
	 return result
    }
    return f(doc)
}
func fetchURLs() chan string {
    ch := make(chan string, len(urls))
    var wg sync.WaitGroup
    for _, u := range urls {
        wg.Add(1)
	 go fetch(u, &wg, ch)
    }
    wg.Wait()
    close(ch)
    return ch
}
func main() {
    ch := fetchURLs()
}

 

The first and foremost difference you can see immediately is how there’s much more code in Go’s version than in Python’s. One of the reasons for that is because Go’s statically typed and Python is dynamically typed. Another reason is that Python is an older language so there are a lot more libraries available which do the heavy lifting for us, for example, HTML parsing as in our example.

But regarding performance, it’s a different story. Here’s a small benchmarking table which shows CPU time (how quickly it executed) and peak memory usage:

Language CPU time Peak memory
Python ~1090ms ~170MiB
Go ~404ms ~132MiB

 

The numbers are approximate because it could slightly change depending on the machine where it’s run and the network conditions.

Go does the fetching quicker, but it’s more verbose to write. On the other hand, it’s simpler to write it in Python, but you’ll lose some performance.

 

Processing What We Scraped

On the text we’ve got from scraping web sites, we’ll do some processing:

  • convert text to lowercase
  • remove any character, except letters and spaces
  • extract words
  • find bigrams (two consecutive words)
  • count recurring bigrams

 

The Two Implementations:

Python:

import asyncio
import re
from collections import Counter
def process_text(text):
    text = text.lower()
    text = re.sub(r"[^a-z\s]", "", text)
    words = text.split()
    bigrams = zip(words, words[1:])
    return Counter(bigrams)
async def main():
    total_counter = Counter()
    # html_pages is retrieved in the previous part where it scraped websites
    for html in html_pages:
        total_counter.update(process_text(text))

 

Go:

package main
import (
    "io"
    "regexp"
    "strings"
)
func processText(text string) map[string]int {
    re := regexp.MustCompile(`[^a-z\s]`)
    text = strings.ToLower(text)
    text = re.ReplaceAllString(text, "")
    words := strings.Fields(text)
    bigrams := make(map[string]int)
    for i := 0; i < len(words)-1; i++ {
        bigram := words[i] + " " + words[i+1]
        bigrams[bigram]++
    }
    return bigrams
}
func main() {
    totalCount := make(map[string]int)
    // ch variable with scraped websites is retrieved in the previous part
    for text := range ch {
        for k, v := range processText(text) {
            totalCount[k] += v
        }
    }
}

 

We can again see that it’s easier to write Python code because it’s quite often used for data processing because of that reason and there are libraries which will help you in those scenarios.

Coming to the performance results of that part of the project (pay attention to the units of measurement):

Language CPU time
Python ~10miliseconds
Go ~27nanoseconds

 

Memory isn’t included in the benchmarking table because it’s negligible.

The results show us that, even though it’s easier and quicker to write data processing in Python, Go is quicker by an order of magnitude. One of the main reasons for that is because Go is a compiled language and Python is an interpreted language.

 

Go’s Benchmarking Package

The results in benchmarking tables from previous paragraphs for Go were measured by Go’s built in testing package. For Python it was measured by memory_profiler and timeit packages.

The advantage of Go’s testing package is that it is already built into the language, very easy to use and it measures all of the important variables for benchmarking out of the box. It offers methods through which you can control specific parts of the code you want to measure (e.g. ignore expensive setup code and only measure the code which processes data).

 

Which language is better suited to what situation?

As we’ve seen from the project example, Go is quicker than Python, but it’s more verbose. That’s why some companies (like Docker or Kubernetes) have decided to use Go for performance reasons in critical parts of their software.

Go is quite often used for networking, microservices, wherever high performance is important. It offers great performance and it’s easier to use compared to, for example, C++. Another quite important feature of Go is concurrency. Its lightweight goroutines are easy to use. They are lightweight threads managed by the Go runtime. Don’t confuse them with OS threads because they are not equal as goroutines are scheduled by Go runtime.

Python is used for data science, AI, quick prototypes and similar applications. It has a rich community with various libraries to help you build what you want very quickly. Also, the syntax is quite simple to grasp even with little knowledge or experience in programming.

 

Should you use Go or Python for your next project?

That question is raised quite often for various tools or programming languages and the answer is: depends on the project. There’s no general answer.

But in this context, a good guideline to help you decide would be:

  • if it’s related to data science or AI, use Python
    • Tesla is using PyTorch for its Autopilot system
    • J.P. Morgan Chase is using scikit-learn among other libraries for predictive analytics, fraud detection, and risk management
    • Hugging Face is using PyTorch and TensorFlow heavily
  • if you want to get out something very quickly to see if the idea viable, use Python
  • if you want something to be very performant, use Go
    • Docker is using Go for its core (including its daemon and CLI)
    • Prometheus is built using Go
    • Twitch is also using Go (including its chat)

 

Final words

To conclude everything written, Python and Go are good at different areas. If you have some performance challenges then definitely try out Go because it’s easy to start with compared to C or C++ and you get great results. And Python is very good at developing applications very quickly. An additional bonus for Python is that its learning curve is an easy one so even less experienced programmers will produce things quickly.

 

Author:

male software engineer at Zartis Mario Šumiga is a Senior fullstack Software Engineer at Zartis, with more 9 years of experience in the tech world. His primary focus is frontend development, with recent years spent using React and VueJS to build UI that ensures low-latency, seamless interaction for users. Mario also writes backend code in Python and Ruby, handling integrations or processing various data. He is always eger to try out new technologies and use the most fitting technology to solve problems.

Share this post

Do you have any questions?

Zartis Tech Review

Your monthly source for AI and software news

;