![]() ![]() Fault tolerance: you must be able to recover if one of your computers hangs in the middle of the process.Parallel computing: you use not one but many computers to speed your calculations.Spark gives you two features you need to handle these data monsters: Even with a powerful computer it is crazy. ![]() Now think that you have to process a 1Tb (or bigger) dataset and train a ML algorithm on it. You will probably load the entire dataframe using Pandas, R or your tool of choice and after some quick cleaning and visualization you will be almost done with no major hassles related with computing performance if you are using a proper computer (or cloud infrastructure). Why do you need something like Spark? Think for example about a small dataset that fit easily into memory, let’s say some Gb maximum. Spark is a framework to make computations with large amounts of data. ![]()
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. Archives
January 2023
Categories |