Skip to main content

Posts

Day 14 Subsetting Matrix , Partial Matching , Missing NA Values

In matrix , subsetting by default returns a vector Remove rows with NAs (missing values) in data.frame http://stackoverflow.com/questions/4862178/remove-rows-with-nas-missing-values-in-data-frame Find Complete Cases https://stat.ethz.ch/R-manual/R-devel/library/stats/html/complete.cases.html Repeating a repeated sequence http://stackoverflow.com/questions/11180125/repeating-a-repeated-sequence http://stackoverflow.com/questions/3672527/r-generate-a-repeating-sequence-based-on-vector Error in complete.cases(x, y) : not all arguments have the same length http://stackoverflow.com/questions/4740244/chisq-test-error-message True Matrix Mulitplication https://stat.ethz.ch/R-manual/R-devel/library/base/html/matmult.html

Day 13 Saving R Data , Subsetting

Saving R Data http://thomasleeper.com/Rcourse/Tutorials/savingdata.html Difference between dput and dump?   ( self.Rlanguage ) https://www.reddit.com/r/Rlanguage/comments/2po2i3/difference_between_dput_and_dump/   HOW CAN I TIME MY CODE? | R FAQ http://stats.idre.ucla.edu/r/faq/how-can-i-time-my-code/ Subsetting lists with single bracket [ always return the same class type subsetting lists [[ may or may not return the same type  There is only one exception to [[ vs $

Day 13 R on Coursera , Reading Datasets

Attributes in R - names, length, class, dimensions COmplex Vectors 1+0i imaginary number Vector function vector() Coercion when mixing vectors Explicit coercion as.numeric(),as.logical(),as.character(),as.complex() attributes(),dim() Attach 2 columns or 2 rows , cbind(),rbind() table(x) is.na(),is.nan() nrow() ,ncol() Reading strings that contain whitespace into R from tab delimited .txt file http://stackoverflow.com/questions/11199496/reading-strings-that-contain-whitespace-into-r-from-tab-delimited-txt-file Difference between read.table and read.delim functions http://stackoverflow.com/questions/10599708/difference-between-read-table-and-read-delim-functions read.delim(file, header = FALSE, sep = "\t", quote = "\"", dec = ".", fill = TRUE, comment.char = "", ...) read.delim2(file, header = TRUE, sep = "\t", quote = "\"", dec = ",", fill = TRUE, comment.ch...

Day 12 R Programming on COursera, History

Beginner trying to figure out how to import a simple csv file into R http://stackoverflow.com/questions/10417938/beginner-trying-to-figure-out-how-to-import-a-simple-csv-file-into-r How to Set Working Directory in R http://rprogramming.net/set-working-directory-in-r/ A short list of the most useful R commands https://www.personality-project.org/r/r.commands.html The purpose of S n R was to allow usage of language without going deep into the programming , to be able to use them easily. and once the user is familiar to basic statsitics , should be able to program more efficiently and get into as a programmer. https://www.r-bloggers.com/ross-ihaka-on-the-history-of-the-r-project/

Day 12 GCP , SDK , Open Data

Interacting with Cloud Storage https://cloud.google.com/storage/docs/ We can upload data to cloud using gsutil or GCP Console SSH Google Cloud SDK https://cloud.google.com/sdk/docs/ Open Data for UAE http://opendata.fcsa.gov.ae/ https://cloud.google.com/sdk/docs/quickstart-windows Transfer Services Latency and Zones - Distirubute the data across diff zones 4 Practical “less” Command Examples and tips for effective navigation in Linux: http://www.sanfoundry.com/4-practical-less-command-examples-and-tips-effective-navigation-in-linux/ What's a .sh file? http://stackoverflow.com/questions/13805295/whats-a-sh-file down vote What is a file with extension .sh? It is a  Bourne shell script . They are used in many variations of UNIX-like operating systems. They have no "language" and are interpreted by your shell (interpreter of terminal commands) or if the first line is in the form #!/path/to/interpreter the...

Day 11 GCP First Project Compute Engine

So I created my first project , credit card will be required to create a google cloud account , 300$ will be given for trial for 1 year. Has a really cool dashboard and most of the words are kind of lexicons for me at this point of time , hopefully I become an expert on these. Why Google Cloud Platform? https://cloud.google.com/why-google/ CUSTOM MACHINE TYPES https://cloud.google.com/custom-machine-types/ Instead if purchasing physical hardware and keep upgrading it , Cloud solution will let you select only the required configuration at the moment of time and upgrading would be lot easier. Also you get all the services along with it.' Premptible VMS https://cloud.google.com/preemptible-vms/ https://codelabs.developers.google.com/codelabs/cpb100-compute-engine/#0 To find some information about the Compute Engine instance, type the following into the command-line: cat /proc/cpuinfo

Day 10 Google Cloud Platform P2, Code Labs

Atomic Fiction Walks “The Walk” https://cloudplatform.googleblog.com/2015/10/Atomic-Fiction-walks-The-Walk.html Market Reconstruction 2.0: A Financial Services Application of Google Cloud Bigtable and Google Cloud Dataflow https://cloud.google.com/customers/fis/ https://www.fisglobal.com/Solutions/Institutional-and-Wholesale/Broker-Dealer/-/media/FISGlobal/Files/Whitepaper/A-Financial-Services-Application-of-Google-Cloud-Bigtable-and-Google-Cloud-Dataflow.pdf Google Analytics Premium + Google BigQuery for Predictive Digital Marketing https://cloud.google.com/solutions/google-analytics-bigquery CPB100 https://codelabs.developers.google.com/cpb100 https://codelabs.developers.google.com/codelabs/cpb100-free-trial/index.html?index=..%2F..%2Fcpb100#0 https://console.cloud.google.com/freetrial?pli=1&page=0

Day 10 Google Cloud Platform

3rd Wave Cloud Generation is just using the maximum processing availability for a certain task and paying for that just for the amount required , unlike 2nd Wave where you had to own dedicated machines and you have to be limited with the processing power of those machines. Instead 3rd wave you can pay only for the amount of time you require a certain task. eg. Spotify Engg Spotify's journey to cloud: why Spotify migrated its event delivery system from Kafka to Google Cloud Pub/Sub https://cloud.google.com/blog/big-data/2016/03/spotifys-journey-to-cloud-why-spotify-migrated-its-event-delivery-system-from-kafka-to-google-cloud-pubsub

Day 10 Coursera Short Course Data and Machine Learning on Google Cloud Platform

Coursera Short Course Data and Machine Learning on Google Cloud Platform https://www.coursera.org/learn/gcp-big-data-ml-fundamentals/lecture/EewWO/introduction-to-the-data-and-machine-learning-specialization MapReduce Applications and Limitations of MapReduce http://mapreduce-specifics.wikispaces.asu.edu/Applications+and+Limitations+of+MapReduce HADOOP – ADVANTAGES AND DISADVANTAGES http://www.j2eebrain.com/java-J2ee-hadoop-advantages-and-disadvantages.html Google I/O: Hello Dataflow, Goodbye MapReduce http://www.informationweek.com/cloud/software-as-a-service/google-i-o-hello-dataflow-goodbye-mapreduce/d/d-id/1278917 GOOGLE CLOUD BIG DATA AND MACHINE LEARNING BLOG https://cloud.google.com/blog/big-data/2016/05/no-shard-left-behind-dynamic-work-rebalancing-in-google-cloud-dataflow http://www.datacenterknowledge.com/archives/2014/06/25/google-dumps-mapreduce-favor-new-hyper-scale-analytics-system/ Colossus: Successor to the Google File System (GFS) ...

Day 9 MOOCs

What Meaningful Careers Exist In Data Science? https://www.forbes.com/sites/quora/2017/03/31/what-meaningful-careers-exist-in-data-science/?utm_content=buffereeb38&utm_medium=social&utm_source=facebook.com&utm_campaign=buffer#6ebe81e3d266 What are the best data science MOOCs? https://www.quora.com/What-are-the-best-data-science-MOOCs?ref=forbes&rel_pos=1

Day 8 Data Frames

invalid factor level, NA generated http://stackoverflow.com/questions/16819956/invalid-factor-level-na-generated Convert data.frame columns from factors to characters http://stackoverflow.com/questions/2851015/convert-data-frame-columns-from-factors-to-characters?noredirect=1&lq=1

Day 8 Include R Source files , Google Data Studio , R Data Frames , Completed Module 1

http://stackoverflow.com/questions/6456501/how-to-include-source-r-script-in-other-scripts Missing Values (NA) Sometimes values in a vector are missing and you have to show them using NA, which is a special value in R for "Not Available". For example, if you don't know the age restriction for some movies, you can use NA. In [5]: age_restric age_restric is.na(age_restric) Out[5]: [1] 14 12 10 NA 18 NA Out[5]: [1] FALSE FALSE FALSE  TRUE FALSE  TRUE Google Data Studio https://www.google.com/analytics/data-studio/ How to delete multiple values from a vector? http://stackoverflow.com/questions/9665984/how-to-delete-multiple-values-from-a-vector Completed Module 1 on Coursera , has a brief introduction to Data Science and more focus is on getting the tools ready.Can't wait for Module 2 :P

Day 8 Data Scientist resources

Seven Ways to Be More Curious https://www.psychologytoday.com/blog/finding-the-next-einstein/201407/seven-ways-be-more-curious Curiosity: The One Superpower We Don't Use Enough, And How To Use It https://www.forbes.com/sites/lawtonursrey/2014/06/20/curiosity-the-one-superpower-we-dont-use-enough-and-how-to-use-it/#7c82cd95624f 10 Reasons Why You Should Be Curious http://www.marcandangel.com/2007/08/24/10-reasons-why-you-should-be-curious/ Tools for improving structured thinking (for analysts) https://www.analyticsvidhya.com/blog/2014/02/tools-structured-thinking/ https://www.analyticsvidhya.com/blog/2013/06/art-structured-thinking-analyzing/ https://www.analyticsvidhya.com/blog/2013/06/art-structured-thinking-analyzing/ Critical Thinking: Where to Begin http://www.criticalthinking.org/pages/critical-thinking-where-to-begin/796 How to Use Design Thinking Methods to Improve Your Nonprofit’s Strategy and Measurement http://www.bethkanter.org/design-thinking/ Int...

Day 7 The Elements of Data Analytic Style , Supervised vs Unsupervised Learning , Data

Supervised V Unsupervised Machine Learning -- What's The Difference? https://www.forbes.com/sites/bernardmarr/2017/03/16/supervised-v-unsupervised-machine-learning-whats-the-difference/2/#1f636ebc2080 The Elements of Data Analytic Style by Jeff Leak https://leanpub.com/datastyle Darwin Tunes https://en.wikipedia.org/wiki/DarwinTunes The home of the U.S. Government’s open data https://www.data.gov/ This Is How Much Data The Internet Gets Through In One Minute http://www.iflscience.com/technology/this-is-how-much-data-the-internet-gets-through-in-one-minute/ Big Data: Are you ready for blast-off? http://www.bbc.com/news/business-26383058

Day 7 Hands on with Git Repo and RStudio , R 101 BigDataUniversity

Removing a remote http://stackoverflow.com/questions/9224754/how-to-remove-origin-from-git-repository Kickstarting   R  - Writing R scripts https://cran.r-project.org/doc/contrib/Lemon-kickstart/kr_scrpt.html Source on Save https://support.rstudio.com/hc/en-us/articles/200484448-Editing-and-Executing-Code Ctrl+L  — Clear the Console https://support.rstudio.com/hc/en-us/articles/200404846-Working-in-the-Console User Defined Functions in R http://www.statmethods.net/management/userfunctions.html Issue pushing new code in Github http://stackoverflow.com/questions/20939648/issue-pushing-new-code-in-github Git refusing to merge unrelated histories http://stackoverflow.com/questions/37937984/git-refusing-to-merge-unrelated-histories

Day 6 R packages , Bioconductor Packages, Loading Packages, Rtools , Most Active Github

Took one day off, feel slightly guilty about it. Continuing on Coursera , checking R packages , there are 10326 R packages, wow astonishing. Issue 1: Unable to update R packages in default library on Windows 7 http://stackoverflow.com/questions/5059692/unable-to-update-r-packages-in-default-library-on-windows-7 Issue 2: slidify is not available for r version 3.3.3 http://stackoverflow.com/questions/27445536/slidify-package-not-available-in-r-3-1-2 https://github.com/ramnathv/slidify/issues/405 Issue 3 Username parameter is deprecated. Please use ramnathv/slidify Issue 4  there is no package called 'magrittr' ERROR: lazy loading failed for package 'slidify' Similar for stringi package Finally * installing *source* package 'slidify' ... ** R ** inst ** tests ** preparing package for lazy loading ** help *** installing help indices ** building package indices ** testing if installed package can be loaded *** arch - i386 *** arch -...

Day 5 IBM Watson , DataCamp Purchase, IT Pros Attack n RedMonk rankings

IBM Watson http://www.jenunderwood.com/2017/03/28/ibm-watson-cognitive-computing/?utm_content=bufferb4ded&utm_medium=social&utm_source=facebook.com&utm_campaign=buffer The DataCamp Intro to R is free but later courses are paid. Will try complete Coursera first and get some heads on and then I can buy DataCamp courses as well. https://www.tripwire.com/state-of-security/featured/90-pros-expect-attacks-risk-vulnerability-iiot-2017/ http://redmonk.com/sogrady/2017/03/17/language-rankings-1-17/

Day 4 Intermediate with R , LinkedIn Learning, Need to pass Quiz 1, Subscribtions

Apparently we need to get 80% in the quiz , I got 3/5 correct so that is 60%, I need to retake the test for Coursera. I have started Intermediate with R in DataCamp as I found the learning to be nice and concise , the Beginning with R is quite a good course for beginners to get an interest in learning R , will surely recommend it. I have seen even LinkedIn learning has lot of courses even for Data Science , will need to try their free 1 month offer. At the same time I have subscribed to datascience central and kdnuggets for some latest news. I have even subscribed on Google Plus and Facebook for some big data and machine learning channels , just to keep up with current trends happening in the world.

Day 4 Beginner R completed, Nice Example for Data Science , Blockchain Technology

Is It Better to Rent or Buy? https://www.nytimes.com/interactive/2014/upshot/buy-rent-calculator.html?_r=1 Came across this blockchain technology because of http://dataconomy.com/2017/03/blockchain-solution-healthcare/?utm_content=bufferfc3c0&utm_medium=social&utm_source=facebook.com&utm_campaign=buffer What is Blockchain Technology? A Step-by-Step Guide For Beginners https://blockgeeks.com/guides/what-is-blockchain-technology/

Day 4 Data Science Central in One Picture , Some understanding on Data Analytics vs Data Science

Data Analytics gives one true and absolute results , so the scenario that came up in my mind like when to patients visit most hospital. Data Analytics will give us the picture Friday as per figures  which is wrong as per Data Science , Friday is the day they visit the hospital but that might not be the day they get sick , it might be the day people feel like visiting the hospital due to various reasons internally(within the hospital boundaries) and externally (outside the hospital boundaries). eg. General mindset to wait till weekend , Friday might be general trend more doctors on duty in corollary to Thursday doctors are more relaxed and tend to cater to lesser patients. Several ideas , parking , bachelor or married or children , how each idea influences the idea of visiting the hospital.  Collecting these stats requires hardware and software. Suppose hospital does not have equipment to log entry , exit of doctors even each ward wise (Expecting triggers at door ...

Day 3 Coursera,About Data Scientist , Interview Qs

S trata + Hadoop , New technologies related to big data and final data science. Trying to collect all data possible related to becoming a Data Scientist and will hopefully do analysis in that. Subsribe to kdnuggets seems like a real steal !! http://www.kdnuggets.com/2017/03/strata-hadoop-san-jose-key-takeaways.html http://www.kdnuggets.com/2017/03/data-science-data-scientist-do.html http://www.kdnuggets.com/2017/02/17-data-science-interview-questions-answers.html

Day 3 Factors in R , Data Frames , Subset

Should I use a data.frame or a matrix? http://stackoverflow.com/questions/5158790/should-i-use-a-data-frame-or-a-matrix I think I'm getting a little hand at the advantages of having a language like R.

Day 2 Data Deluge Starting Data Science Specialization in Coursera , Venn Diagram

Old article back in 2010 , feel I'm like outdated , as to the importance of data science has been clearly stated way back . http://www.economist.com/node/15579717 Also the course on Coursera seems very promising plus it gives you one week of trial , you cannot get a better deal than that. Big Data http://www.mckinsey.com/business-functions/digital-mckinsey/our-insights/big-data-the-next-frontier-for-innovation  Open in IE for flash and podcast. http://www.nytimes.com/2009/08/06/technology/06stats.html Drew Convay Venn Diagram http://drewconway.com/zia/2013/3/26/the-data-science-venn-diagram

Day 2 R Vectors and Matrices

Feels exciting going back to vectors and matrices , it's this feeling wish there was this explanation at that point of time . But the questions remain were we intelligent or smart enough to relate or were the educators smart enough to advocate a meaning to what they taught well the latter seems more probable, as today even if we teach a kid some programming , we try give him the big picture , how the coding behaves or affects what he sees or uses in reality , this key crucial element to any subject might change the whole outlook . Okay i might be little carried away to expect Data Science in early 2000s but the very fact that we were never given a real time problem or atleast the problem was hardly ever given a "Business" Context , just made me feel very mute. Never the less , let's enjoy learning R and hopefully to get my hands on with Coursera  Data Science course spreas over 6 months which seems a minimal time to dwell on these aspects.

Day 2 Trying to get a certification related to Data Scientist

Checked some certifications , was looking for something slightly longer. The one on cloudera CCP Data Scientist seems is retired and currently only have CCP: Data Engineer but not sure if this is something I should start right away with at this point of time. Found one on coursera which seemed good for Data Scientist. https://www.coursera.org/specializations/jhu-data-science?utm_source=gg&utm_medium=sem&campaignid=749038049&adgroupid=43242527030&device=c&keyword=certified%20data%20scientist&matchtype=e&network=g&devicemodel=&adpostion=1t1&creativeid=176695995025&hide_mobile_promo&gclid=CjwKEAjw8OLGBRCklJalqKHzjQ0SJACP4BHrOW8gIYlKMVtkh3CiayIzAJoNA9yZdbZ4Qehof2fbzhoCqxfw_wcB#creators

Day 2 Inquiry on Courses, Intro to R and Master's in Data Science

Checked online certain universities for Data Analysis or Data Sciences or Machine Learning or Computational Sciences courses. It isn't so widespread in UAE , sent some mails and inquired with some recently LinkedIn contacts. Check this link for lot of good information on the path http://www.mastersindatascience.org/careers/data-scientist/ Now on the learning R , seems cool esp. the vector factor , slowly I correlate similar data (eg. nos) and then classifying the similar data on a pattern (eg. days) and doing certain analysis on them to extract information. I have taken this course.  https://campus.datacamp.com/courses/free-introduction-to-r What Kind of Skills Will I Need? Technical Skills Math (e.g. linear algebra, calculus and probability) Statistics (e.g. hypothesis testing and summary statistics) Machine learning tools and techniques (e.g. k-nearest neighbors, random forests, ensemble methods, etc.) Software engineering skills (e.g. distributed computing, algo...

Day 1 Become a data scientist in 8 steps

1st Day Journey to Data Scientist

Can't believe it's been 4 years since my last act with words. Probably the simplest excuse , busy and reality lack of content or rather lack of content to be content with. Nevertheless , I have come back to use my blogging platform as a note reminder/ TODO list n so on so forth. I have decided to learn what it takes to be a Data Scientist. Something in me urged to push myself and the developer world seems like a big dark hole keep getting darker and bigger. The Angular series , the Django n CMS , Node.js and so on so forth . The little glimpse of light and hope to a bewildered Developer gave the word data science  , rejuvenating words like as naive as Mathematics , getting nostalgic about going back to big books of study. So this is my first day , technically night . And the first research I have done is found a guidelines to becoming one and another inherent guideline embedded i me goto LinkedIn and search for Data Scientists and see their profile. I match people closest ...

Eat, Pray and Love

This time I have just stolen the title from the movie itself. Two years since my last post , seems I have lost my touch (whatever I had) totally. Lets start recouping and try build something. Other day I watched this movie Eat,Pray n Love . Generally I have this sixth sense of a movie transpiring deep in my thoughts and resulting into something very meaningful. But this movie didn't , well for most of the movie it did but at one point it did not , We'll get to that. The movie talks about people who are still discovering themselves , well it seemed more on purpose rather than being lost. The protagonist seemed to force herself to picture a life and make that happen. The concept Eat,Pray n Love seems the perfect candy floss story. So lets jump to the Eat part (:P my Favorite) , Food is the most sort after things and why not , Most of my posts have been chanting around poverty and basic essence. But today we stay a little aloof and imagine that everything in the world is cl...