[Python] [OT] Cheap MapReduce in Go

Riccardo Magliocchetti riccardo.magliocchetti a gmail.com
Lun 13 Lug 2015 20:35:01 CEST


Il 13/07/2015 20:20, Carlo Miron ha scritto:
> < http://marcio.io/2015/07/cheap-mapreduce-in-go/>
>
> tl;dr
>
> Sometimes you don’t need overly complex infrastructures or systems to do a job
> well. In this case, we were running these exact same aggregations over close to
> 20 EMR instances that would take a few minutes to execute the entire MapReduce
> job over hundreds of Gigabytes of data each day.
>
> When we decided to take a look at this problem again, we rewrote this task using
> Go, and we now simply run this on a single 8-core machine and the whole daily
> execution takes about 10 minutes. We cut a lot of the costs associated with
> maintaining and running these EMR systems and we just schedule this Go app to
> run once a day over our daily dataset.
>
> You can find the entire code here:
> https://gist.github.com/mcastilho/e051898d129b44e2f502

Qualche tempo fa era uscito qualcosa del genere dove veniva usata una commovente 
pipeline:
http://aadrake.com/command-line-tools-can-be-235x-faster-than-your-hadoop-cluster.html

-- 
Riccardo Magliocchetti
@rmistaken

http://menodizero.it


Maggiori informazioni sulla lista Python