Do I want a cloud service?
Within the subsequent few weeks I will begin a brand new venture working with a number of datasets as much as perhaps 10gb every. Up thus far I’ve solely labored with information which are a pair mb at most, and I am involved my MacBook is not going to be as much as processing the information or conducting the evaluation. Within the first section not less than I will be producing descriptive statistics and fundamental assessments as a substitute of constructing fashions.
​
I’ve a small price range; would storing the information on a cloud server and utilizing a digital pc be worthwhile? In that case, what’s a superb possibility? Posit Cloud looks as if probably the most user-friendly possibility but it surely looks as if it has a restrict of 500mb per venture. Google Cloud and AWS haven’t got that restrict and it appears just like the digital machines I would have the ability to arrange are extra highly effective, however I do not know if it will be essential and setting it up can be far more of a chore.
Comments ( 2 )
Would it be useful to partition the data in the workflow so perhaps .1-.5 gb at a time? This way you can create descriptive statistics one chunk at a time and aggregate the results?
Fitting models to 10 GB of data will require a lot of RAM. Might need some better tools for that. And, you should probably not be plotting that much data ever.
But, if you’re just going to be filtering or aggregating data first, you can just store it in a local duckdb or SQLite database and query it as needed. That way you can get the data or result you want without reading it all into memory.