Skip to main content

Posts

Day 24 Big Data on Hadoop Udemy | Hive

We got our image started in the virtual sand box. We needed to disable hyper-v as Windows 10 doesn't allow Virtual Box Hyper-V and Hyper-V on the windows 10. We downloaded the dataset , now uploading the dataset in Hive Ambari Sandbox allows us to directly use Hive from the  url. we can import data in the csv format and query data as sql relational although it is not stored relational. we can also visualize the selected data columns. https://www.techrepublic.com/blog/10-things/10-things-you-should-know-about-hyper-v/ https://docs.microsoft.com/en-us/virtualization/hyper-v-on-windows/about/ https://stackoverflow.com/questions/11183572/whats-the-difference-between-rank-and-dense-rank-functions-in-oracle Difference between RANK , DENSE RANK and ROW_NUMBER ROW_NUMBER will always give unique RANK will give same rank for the first 4 same but 5th rank for the 5th . DENSE_RANK will give first 4 same  but 2 for the 5th like a group by. Scrolling Page Title http://www....
Recent posts

Day 23 Big Data on Hadoop Udemy | Arduino

So my wife has secured a job in Luxembourg and I need to board along. I looked at how I could maximise my chances. Machine Learning has really good opportunities but I need time probably atleast 6 months. 3 things always crept up my mind Data Science, AI/ML and Big Data. I foound Big Data to be interesting and more achievable in 3 months target as I'm familiar with most technologies and needed to upgrade myself. I had a choice of choosing between Microsoft Program in Big Data vs Udemy Big Data. I chose Udemy as it's slightly lesser on schedule and more precise on content, also the pricing is very attractive, which should have been a similar case for Microsoft but seems like 1000$ for a certificate which would teach similar skills just on Microsoft stack seemed little steep. So I've bought the Udemy - Ultimate Hadns on Big Data. Another exciting thing for me was the trainer is an Amazon and IMDB ex employee and I actually quite like their UX and the way one could sear...

Day 22 Microsoft AI Math | Arduino

Linear Equations y=mx+b m slope and b coefficient System of Equations x+y =10 2x+5y=34 one of the easiest way is to find intersection point, if you substitute x n y as 0 and plot you will get the intersection. another way is to add the 2 equations both eliminating x or y and getting the point of intersection.

Day 21 Microsoft AI Math | Arduino

So, instead of continuing in python programming course , I jumped on to Math for ML as that is more essential in terms of getting to it. I have been programming a while in python so it seems to be not an issue , although I need to keep up with all packages but since I keep doing the demo in the course hand typing myself , I start getting familiarized, this is a good point for lot of developers , instead of copy pasting or simp[ly uploading the Ipython notebook , start typing literally almost everything , the learning would be higher plus higher retaining, although the former will allow to finish faster but one may forget as he never encountered any issue while doing those , in my case I encounter so many setup issues that it helps retaining lot of things naturally. So in the same light and learning spirit , I want to dive into maths and understand something , quite eager. Distributive Property

Day 21 Microsoft AI Python | Arduino

So now I have begun my new course Python in the 10 series. https://stackoverflow.com/questions/16476924/how-to-iterate-over-rows-in-a-dataframe-in-pandas DataFrame Iterating Note  − Do not try to modify any object while iterating. Iterating is meant for reading and the iterator returns a copy of the original object (a view), thus the changes will not reflect on the original object. https://stackoverflow.com/questions/23330654/update-a-dataframe-in-pandas-while-iterating-row-by-row XML Etree Element Tree https://docs.python.org/3/library/xml.etree.elementtree.html One thing I realized how .Net had made the xml a piece of cake. Especially after Linq to Xml , we took most of things for granted , never did a xml fail a parse. In python I had to remove xml namespaces and then parse to get things working def removexmlnamespaces(xst):         it = ET.iterparse(StringIO(xst))     for index, el in it:         if '}...

Day 20 - Microsoft Artifical Intelligence | Arduino Robotic Kit

Data Science Virtual Machine Wow , seems a neat concept Seems I cannot change HDD to SDD once I create the VM https://buildwindows.wordpress.com/2015/10/11/azure-virtual-machine-resizing-consideration/ https://azure.microsoft.com/en-us/blog/resize-virtual-machines/ Cognitive ToolKit - Examples of Deep Learning in python https://tutorials-sayadrameez.notebooks.azure.com/j/lab I have just copy pasted below content from the App Developer Resources Cognitive Services For an overview of the Cognitive Services APIs, take a look at  Getting Started with Microsoft Cognitive Services  on Channel 9. Then you can dive into the documentation for individual Cognitive Services APIs, which often includes tutorials to help you get started (note that this is not an exhaustive list!): Linguistic Analysis Text Analytics Bing Speech Translator Language Understanding Intelligent Services (LUIS) Custom Vision Computer Vision Face Video Indexer Building Bots For in-d...

Day 19 - Microsoft Artifical Intelligence | Arduino Robotic Kit

Finally I think Visual Studio has a fix for Anconda and Python envronments. It seems to recognise anaconda default environment and create more, also now I can see both packages pip and conda separately. It seems to make more sense as it lesser loaded . Installing av in python https://github.com/mikeboers/PyAV/issues/199 Git Clone Video Indexer API List Videos Find Video Basic Info by Video Id Upload Video Thumbanial View IPython Magic https://ipython.readthedocs.io/en/stable/interactive/magics.html VIdoe Indexer API allows to view thumbnail of key images you can embed video edit from video ai inline Moving to BOTS Need to deal with NodeJs as I have downloaded the bOTS API source code for NodeJs environment. gyp ERR! stack Error: Can't find Python executable "python", you can set the PYTHON env variable. Bot Framework emulator https://github.com/Microsoft/BotFramework-Emulator/releases/tag/v4.2.1 You can use the bot to embed as iframe in ...

Day 18 - Microsoft Artifical Intelligence | Arduino Robotic Kit

Image Classifier Classify an image whether the tag matches I couldn't get it to work exactly. Azure Computer Vision API very interesting API , probably the most that I like so far of all, gives features about any picture and seemed fairly accurate. Most of the cognitive APIs body accepts json string instead of json data Face Detection APIs Can give certain attributes, match 2 faces Jupyter is formerly known as IPython so both are closely interchangeable https://code.visualstudio.com/docs/python/jupyter-support None of the interactive windows seem to work https://stackoverflow.com/questions/41664756/installing-numpy-for-windows-10-importing-the-multiarray-numpy-extension-module Failed to start the kernel XSRF cookie does not match POST argument OpenCV Python Library https://docs.opencv.org/3.4.0/d0/de3/tutorial_py_intro.html Finally got something working https://stackoverflow.com/questions/50185654/opencv-load-video-from-url for certain packages insta...

Day 17 - Microsoft Artifical Intelligence | Arduino Robotic Kit

Uninstall packages https://pip.pypa.io/en/stable/reference/pip_uninstall/ What is difference between cumulative distribution and normal distribution in histogram Gaussian Filter a 3 by 3 matrix is taken where in the center value is formed by average of beside values, this technique is moved across the whole image to reduce noise by averaging out all pixels sigma is the mask size Median Filter Edge Detection Algorithm Sobel Edge Detection Algorithm similar to filters , workd with a mask to find a gradient for x axis and y axis Harris Corner Algorithm Didn't work for median filtered image Custom VIsion AI Project https://customvision.ai/projects Selecting images and tagging them , train them https://customvision.ai/projects/1fe8cc76-7ace-4829-92e7-f25940e8f011#/predictions

Day 16 - Microsoft Artifical Intelligence | Arduino Robotic Kit

For anaconda packages , use Anaconda navigator which identifies correctly the missing dependencies. https://stackoverflow.com/questions/51992375/python-package-installation-issues-pyaudio-portaudio IPython doesn't work correctly need to move to jupyter_client https://github.com/Microsoft/PTVS/issues/3220 https://github.com/Microsoft/PTVS/issues/2722 Profiling in VIS for python https://docs.microsoft.com/en-us/visualstudio/python/profiling-python-code-in-visual-studio?view=vs-2017 Installing Package in Notebook https://stackoverflow.com/questions/38368318/installing- a-pip-package-from-within-a-jupyter-notebook-not-working https://stackoverflow.com/questions/7225900/how-to-install-packages-using-pip-according-to-the-requirements-txt-file-from-a?rq=1 Speech Translator Services and Language Understanding Intelligence Services LUIS https://www.luis.ai/applications/5eb952f8-d9ff-441d-8677-f1f655141569/versions/0.1/build/intents Web Chat BOT https://docs.microsof...

Day 15 - Microsoft Artifical Intelligence | Arduino Robotic Kit

Azure ECOSystem https://docs.microsoft.com/en-us/azure/#pivot=get-started Azure Application Insights Just add a tag to client side to https://docs.microsoft.com/en-us/azure/application-insights/app-insights-dotnetcore-quick-start Feature Hashing https://docs.microsoft.com/en-us/azure/machine-learning/studio-module-reference/feature-hashing Giving a word or phrase an integer value indicating the relevance of that text in a given text or dataset. Created 32k column names , hence takes time to open the next dialog box Cannot apply SQL Transformation Need to use Select Columns Filter for publishing web service Sentimental Analysis Excel online https://onedrive.live.com/edit.aspx?resid=E96BE125F9F308EB%21116&nd=1&app=Excel Icon Sets- something new in excel as well Azure Cognitive Services for Text Analytics https://westus.dev.cognitive.microsoft.com/docs/services/TextAnalytics.V2.0/operations/56f30ceeeda5650db055a3c7 Error in urllib.request or conn.request...

Day 14 - Microsoft Artifical Intelligence | Arduino Robotic Kit

Accessing y index https://stackoverflow.com/questions/522563/accessing-the-index-in-for-loops Ternary Operator https://stackoverflow.com/questions/394809/does-python-have-a-ternary-conditional-operator Python Dynamic Programming Language Focussed on Readibility Interpreted Language as vs Compiled Language Multiparadigm : Structure , object and Funtional oriented https://code.visualstudio.com/docs/python/tutorial-deploy-containers NLTK Download and punkt https://stackoverflow.com/questions/26570944/resource-utokenizers-punkt-english-pickle-not-found What are these pickle files C:\Users\rameezahmed.sayad\AppData\Roaming\nltk_data\tokenizers\punkt\PY3 Pickling and Unplicking ,pickle files https://www.pythoncentral.io/how-to-pickle-unpickle-tutorial/ Pandas Dataframe https://pandas.pydata.org/pandas-docs/stable/api.html Document Word Frequency and Inverse document Frequency Term Frequency or Word Frequency - is the frequency of a word in a document against tot...

Day 13 - Microsoft artificial Intelligence | Arduino Robotic Kit

Using Visual Studio for python -- back to debugging mode , feels amazing. So finally after managing to get 4 18650 batteries , 2 of them are discharged , now not sure how to recharge them. Good Resources for AI https://blog.goodaudience.com/learn-ai-for-free-5b186cde3990 Python Math Operators https://www.digitalocean.com/community/tutorials/how-to-do-math-in-python-3-with-operators Reading an Input form cmd line with python https://stackoverflow.com/questions/70797/user-input-and-command-line-arguments Convert string to int in Python https://guide.freecodecamp.org/python/how-to-convert-strings-into-integers-in-python/ https://www.programiz.com/python-programming/if-elif-else

Day 12 - Microsoft artificial Intelligence | Arduino Robotic Kit

Learning about electricity fundamentals. So we need 7v to 12v to power in battery mode. We got a 9v battery just to see if it works, but seems like the current might not be enough. https://www.teachmemicro.com/use-l298n-motor-driver/ http://forum.arduino.cc/index.php?topic=325357.0 https://forum.arduino.cc/index.php?topic=158582.0 https://arduino.stackexchange.com/questions/30595/lipo-batteries-with-l298n-controller https://www.robotshop.com/community/tutorials?mode=list&sort=newest&page=1 So it's final seems , the current provided by the 9v battery is quite less and needs more amps. https://learn.sparkfun.com/tutorials/how-to-power-a-project  https://www.open-electronics.org/the-power-of-arduino-this-unknown/ 18650 Battery https://www.makeuseof.com/tag/difference-32-bit-64-bit-windows/ Now back to Artificial Intelligence USING CURL to DOWNLOAD files http://www.compciv.org/recipes/cli/downloading-with-curl/ For opening Notebook https://azureml...

Day 11 Microsoft artificial Intelligence | Arduino Robotic Kit

Running motor using USB https://forum.arduino.cc/index.php?topic=464421.0 http://forum.arduino.cc/index.php?topic=210239.0 Voila , Motor, LED , Ultrasonic and Servo in action Only 1 motor can run from USB , both the motors cannot run simultaneously from USB voltage. 2 motors can be run or the 5v pins can be run at once.

Day 10 - Microsoft Artificial Intelligence | Arduino Robotic Kit

Excel Online https://onedrive.live.com/?id=E96BE125F9F308EB%21105&cid=E96BE125F9F308EB Try Except Python https://www.pythonforbeginners.com/error-handling/python-try-and-except https://docs.python.org/3/tutorial/errors.html Azure Notebooks https://notebooks.azure.com/sayadrameez/projects/azureml Urllib2 to urllib https://stackoverflow.com/questions/37042152/python-3-5-1-urllib-has-no-attribute-request Power Arduino using AC power Vin pin can also be used to power Arduino Servo Motor is for rotating using Arduino https://www.arduino.cc/en/Reference/Servo Arduino Shield is used to power and control motor. It has default pins which are used for sensing current and controlling the speed and direction of the motor.

Day 9 - Microsoft Artificial Intelligence | Arduino Robotic Kit

Motor Shield for Arduino motor shield is required for Bluetooth , GPS , GSM and Servo Motor in order to generate a stable and controlled current.  Connecting directly to Arduino board , the equipments may start drawing large current damaging the board. First Experiment with Arduino - LED Test. The RGB LED sensor has 4 points , 1 positive for 5 V connection and 1 each for red , green , blue . Pass accordingly LOW and HIGH to arduino Red , Green and Blue.

Day 8 - Microsoft AI | Arduino Robotic Kit

Arduino Robotic Kit https://photos.app.goo.gl/Ng6MmaRnxHHWUHpJ6 Just got my Arduino robotic kit , quite excited :-) , Let us start exploring the various components. HC SR 04 Ultrasonic Ranging Module https://www.mouser.com/ds/2/813/HCSR04-1022824.pdf HC - SR04 provides 2cm - 400cm non-contact measurement function, the ranging accuracy can reach to 3mm Vcc , Trigger , Echo n Ground  You only need to supply a short 10uS pulse to the trigger input to start the ranging, and then the module will send out an 8 cycle burst of ultrasound at 40 kHz and raise its echo. The Echo is a distance object that is pulse width and the range in proportion https://randomnerdtutorials.com/complete-guide-for-ultrasonic-sensor-hc-sr04/ https://www.arduino.cc/en/Reference/Libraries https://www.youtube.com/watch?v=6F1B_N6LuKw

Day 7 - Microsoft Professional Program for Artificial Intelligence track

2 class logistic regression https://docs.microsoft.com/en-us/azure/machine-learning/studio-module-reference/two-class-logistic-regression used to predict only 2 outcomes For  Label column , click  Launch column selector , and choose a single column that contains outcomes the model can use for training. For classification problems, the label column must contain either  categorical  values or  discrete  values. Some examples might be a yes/no rating, a disease classification code or name, or an income group. If you pick a noncategorical column, the module will return an error during training. For regression problems, the label column must contain  numeric  data that represents the response variable. Ideally the numeric data represents a continuous scale for creating web service , all steps need to be run

Day 6 - Microsoft Professional Program for Artificial Intelligence track

Scatter Plot Ready Module in AZMLS For removing an outlier , we can use sql transformation to update any row or values Why use matplotlib agg https://matplotlib.org/faq/howto_faq.html Setting utlier to threshhold , By using Clip Values , upper na dlower threshold can be set. https://docs.microsoft.com/en-us/azure/machine-learning/studio-module-reference/clip-values Convert to Indicator Values to add a column IsCategoricalValue 0 or 1 based on the no of values in the selected feature Supervised Learning Nearest neighbour, Naïve Bayes, Decision Trees, Regression  UnSupervised Learning K- means Clustering Algorithm ReInforcement Learning The machine/ software agent trains itself on a continual basis based on the environment it is exposed to, and applies it’s enriched knowledge to solve business problems.  Markov Decision Process “A computer program is said to learn from experience E with respect to some task T and some performance measure P, if its perfo...

Day 5 - Microsoft Professional Program for Artificial Intelligence track

PCA vs Mean - If probabilistic is used against a single column , it is equivalent to mean as it will used the single column. MICE - multiple imputation chained equations Normal Distribution - Bell Curve Seaborn https://seaborn.pydata.org/generated/seaborn.FacetGrid.html Indentation in Python http://www.peachpit.com/articles/article.aspx?p=1312792&seqNum=3 https://social.msdn.microsoft.com/Forums/azure/en-US/596f2988-49ff-4750-a00d-b24713543276/allow-seaborn-in-python-module?forum=MachineLearning Anaconda does not have Seaborn yet Normalizing Data - there are 5 technieques , even log , mainly because if there are large values in the data set which do not dominate the prediction Z SCORE https://t4tutorials.com/z-score-normalization-data-mining/ score using std  deviation MIN MAX https://t4tutorials.com/min-max-normalization-of-data-in-data-mining/ score between 0 and 1

Day 4 - Microsoft Professional Program for Artificial Intelligence track

Decision Tree Algorithm Assigning a label to the problem instance, Eg classifying gender so male or female become labels. Single Decision Tree single Decision Tree Ensemble Decision Tree multiple nodes Homogenous Sets- similar data or similar structure Heterogeneous Sets - mixed or unsimilar data Finding homogeneity - Information Gain Gini Impurity Beautiful Soup Python package for parsing HTML. Pandas Python package for reading HTML to dataframe , data analysis package. Panda Python package - categorical function for assigning integer to alphanumeric data. sklearn decision tree classifier Building own ML - Visual through ML Studio . Code first through Python notebooks. Consuming built in ML models - Azure cognitive services and bots

Day 3 - Microsoft Professional Program for Artificial Intelligence track

AI Cheat Sheet https://startupsventurecapital.com/essential-cheat-sheets-for-machine-learning-and-deep-learning-researchers-efb6a8ebd2e5 Jargons for ML Training Data - after raw data is processed Columns in Training Data is called Features Regression - Supervised Learning - line that best fits , least error Classification - Supervised Learning classify into classes Clustering - Unsupervised Learning create clusters Decision Tree Neural Network Bayesian Algorithm K-means Clustering Data Scientists can choose the features more appropriately who have the domain knowledge. Feature Engineering - includes cleaning , normalizing , rebalancing , grouping , filtering  raw data

Day 2 - Microsoft Professional Program for Artificial Intelligence track

Back to basics Learning Binary Search Tree and algorithms for Binary Tree https://leetcode.com/submissions/detail/193276741/ Big  O notation O(1) - exactly 1 computation to do the operation O(n) - computation time is equal to the size of array O(n square) - computation to nested for-each O(log (n)) - divide and conquer algorithms O(n log(n)) - Insertion Sort - iterate each element , check if it is lower than any of the element at the left and insert in between and move other elements greater than it. O(n square) , best O(n) - Bubble Sort - keep comparing side by side each element and swap O(n square) , best O(n) - if no swap at the first iteration Selection Sort - assume the first is smallest and find the smallest at first position, increment each position and check if anything is smaller. O(n square) , best O(n2) Heap Sort - elements are stored in binary tree. root node will be greater than child node.swap root and child , finally swap the last and first node...

Day 1 - Microsoft Professional Program for Artificial Intelligence track

Based on my friend's request , I have begun to write more on it. So it's been a long time since I have left Data Science in some lurk , but honestly I did struggle a lot to cope up with it. Plus because of timeframes , I scrambled through Data Science initial models , but I couldn't gather a lot except R programming and concepts of Data Science , ML and AI. Data Science seems a tough hill for me , by tough hill I mean way more time and effort , I got busy in work and then somehow , it's been pending , Suprisingly I didn't come across Microsoft Certification Program because that should be the defacto choice for a .Net Developer having spent more than 8 years professionally and 19 years with Microsoft and it's ecosystem. So I began seeing Microsoft Artifical Intelligence course it seems very exciting , although I remember my stint from GCP (Google Cloud Platform ) , it was a big thing doing some things I did very naturally with SQL or Excel. https://www....

Day 1 - Azure Learning - Docker

Docker Containers - switching https://docs.docker.com/docker-for-windows/#diagnose--feedback "docker pull" fails in windows 10 https://github.com/docker/for-win/issues/1100 [Windows] Cannot connect to plugin server -- wsarecv: An existing connection was forcibly closed by the remote host.  https://github.com/docker/machine/issues/2318 Disable Docker Support https://stackoverflow.com/questions/45843842/how-to-temporarily-disable-docker-support-from-asp-net-core-2-0-project-in-visua?rq=1 Remove Docker from Visual Studio project https://stackoverflow.com/questions/41642363/how-to-remove-docker-support-from-an-asp-net-core-project Turn Off Docker Support https://stackoverflow.com/questions/50182952/how-to-turn-docker-support-on-off-for-net-core-applications

Common Errors - NPM , Nodemon

Can't seem to get Node js and NPM up and running: Reference Error on every command https://stackoverflow.com/questions/37513487/cant-seem-to-get-node-js-and-npm-up-and-running-reference-error-on-every-comma 'npm' is not recognized as internal or external command, operable program or batch file https://stackoverflow.com/questions/20992723/npm-is-not-recognized-as-internal-or-external-command-operable-program-or-bat https://github.com/remy/nodemon#nodemon Node / Express: EADDRINUSE, Address already in use - Kill server https://stackoverflow.com/questions/4075287/node-express-eaddrinuse-address-already-in-use-kill-server

Visual Studio Node JS - Initial Setup

Let's keep it precise and small steps. 1. You need Visual Studio 2017 Community or Professional. 2. Select NodeJs development package. 3. You need NodeJs runtime to be installed from https://nodejs.org/en/download/ 4. Start a new project from VS 2017 , TypeScript->NodeJs 5. You might get an error , http or project not found. 6. Check the reference of the TypeScript project. 7. If you see the warning icon indicating that mode reference missing. 8. Right click and click Install from npm. 9. I you receive an error message , just follow the instructions. 10. Specify the node.exe path in the proj properties.C:\node-v8.9.4-win-x64\node.exe 11. Build and voila , let's start.

Day 25 R Programming

https://stackoverflow.com/questions/18222286/dynamically-select-data-frame-columns-using-and-a-vector-of-column-names https://stackoverflow.com/questions/12614953/how-to-create-a-numeric-vector-of-zero-length-in-r https://stackoverflow.com/questions/7355187/error-in-if-while-condition-missing-value-where-true-false-needed http://www.dummies.com/programming/r/how-to-add-observations-to-a-data-frame-in-r/ https://stackoverflow.com/questions/11561856/add-new-row-to-dataframe-at-specific-row-index-not-appended https://stackoverflow.com/questions/22235809/append-value-to-empty-vector-in-r

Day 24 (quite a break) Back to Git Issues

So after 3 weeks of sis in law's marital rituals and back to office loaded with tons of work , I finally get some time to get back. And only to realise that it's been more than a month now , I'm already beginning to forget . First day and git issues start , Apparently before leaving I had pulled the working code and downloaded rstudio and r so that I could probably learn something in my vacation (high hopes) , didnt touch a thing. Now when i come back I realize that there were certain changes I had done on this machine and from RStudio I had even commited but it doesn't show up in my github repo or the other machine when I pull. Something was wrong. Figured out it was because of the origins , I didn't add .git to the end of the Repo while adding remote origin and that indeed caused some issues , surprisingly RStudio didn't error out and even when I check it's history it shows me , So finally recreated the origin , pull and pushed the code thankfully so ...

Day 23 Git Crashes

** Please tell me who you are. Run   git config --global user.email "you@example.com"   git config --global user.name "Your Name" to set your account's default identity. Omit --global to set the identity only in this repository. fatal: empty ident name (for <(NULL)>) not allowed Git - fatal: Unable to create '/path/my_project/.git/index.lock': File exists. http://stackoverflow.com/questions/7860751/git-fatal-unable-to-create-path-my-project-git-index-lock-file-exists

Day 23 R Objective Functions. Plotting , Date Time

Objective functions could be imagined as essence of a constructor in a function . this gives the ability to reuse or declare the parameters to the function and keep it instantiated instead of passing the parameters each time. Learned a little bit of plotting and the different types of plot , probably the easiest of the visualizations. Mode, Class and Type of R objects https://stats.stackexchange.com/questions/3212/mode-class-and-type-of-r-objects https://cran.r-project.org/doc/manuals/r-patched/R-intro.html#Object-orientation t2, like all POSIXlt objects, is just a list of values that make up the date and | time. Use str(unclass(t2)) to have a more compact view. >  > str(unclass(t2)) List of 11  $ sec   : num 18.5  $ min   : int 32  $ hour  : int 16  $ mday  : int 23  $ mon   : int 3  $ year  : int 117  $ wday  : int 0  $ yday  : int 112  $ isdst :...

Day 22 R Coursera

Day 21 R Continuing in Coursera

So I'm just done with GCP on Coursera , a brief introduction to the set of tools provided by the Google Cloud Platform and practical hands on lab on certain things to give the realization how easy it is to get things started up. Partial matching of function argument http://stackoverflow.com/questions/14153904/partial-matching-of-function-argument The three-dots construct in R http://www.burns-stat.com/the-three-dots-construct-in-r/

Day 20 Google APIs, Google Application Default Credentials

Searching for objects attribute value, it has to be Datastore . Remember that BigTable, you can only search by key.  High-throughput writes of wide-column data. Well, that is BigTable , right, because it's supporting high-throughput writes.  Warehousing structured data. So what's the data warehouse technology on Google Cloud? That's, which one, BigQuery .  To create and test new machine learning methods. Well, if you're writing new machine learning methods, then TensorFlow .  Develop Big Data algorithms interactively in Python.Well, interactive development in Python is done best with Datalab .   Well, interactive No-ops, custom machine learning applications at scale. No-ops ML at scale, then that's a role for Cloud ML.  Automatically reject inappropriate image content. Rejecting image content where it is inappropriate. Well, that could be the Vision API. So you could use a Vision API to basically see if this is safe content or not safe...

Day 19 TensorFlow

http://www.kdnuggets.com/2015/11/google-tensorflow-deep-learning-disappoints.html http://www.businessinsider.com/what-is-google-tensorflow-2015-11 http://playground.tensorflow.org/#activation=linear&batchSize=10&dataset=circle&regDataset=reg-plane&learningRate=0.03&regularizationRate=0&noise=0&networkShape=4,2&seed=0.98991&showTestData=false&discretize=false&percTrainData=50&x=true&y=true&xTimesY=false&xSquared=false&ySquared=false&cosX=false&sinX=false&cosY=false&sinY=false&collectStats=false&problem=classification&initZero=false&hideText=false Feature Engineering https://datasciencedojo.com/data-wrangling-in-r/ https://en.wikipedia.org/wiki/Feature_engineering For Machine Learning in TensorFlow, for training the model , the data has to be numeric ( also not categorical variable - not holdeing weightage for that value but just a representation or classification like day of the...

Day 18 GCP DataLab, Big Query from Client Side , Pandas- Python

'sudo su -' vs 'sudo -i' vs 'sudo /bin/bash' - when does it matter which is used, or does it matter at all? https://askubuntu.com/questions/376199/sudo-su-vs-sudo-i-vs-sudo-bin-bash-when-does-it-matter-which-is-used docker ps  will show only running containers by default. To see all containers:  docker ps -a https://docs.docker.com/v1.11/engine/reference/commandline/ps/ https://8081-dot-2337103-dot-devshell.appspot.com/tree/datalab root1234 - paraphrase DataLab gives the ability to share a notebook with other people , at the same time use the cloud for computing n storage. https://cloud.google.com/bigquery/docs/reference/libraries#client-libraries-usage-python https://github.com/google/google-api-javascript-client http://stackoverflow.com/questions/12479895/obtaining-bigquery-data-from-javascript-code Python Data Analysis Library http://pandas.pydata.org/

Day 17 Periodic Data Science

Becoming a Data Scientist:  Profiling Cisco’s Data Science  Certification Program http://blog.kaggle.com/2017/03/02/becoming-a-data-scientist-profiling-ciscos-data-science-certification-program/?utm_source=Mailing+list&utm_campaign=8ed002c926-Kaggle_Newsletter_04-11-2017&utm_medium=email&utm_term=0_f42f9df1e1-8ed002c926-402242277 Wow , found this on R-Bloggers , quite awesome , I have certain blocks from each legend but still a logn way to go.

Day 16 DataProc , Goole Cloud Solutions

The Beginner’s Guide to Nano, the Linux Command-Line Text Editor https://www.howtogeek.com/howto/42980/the-beginners-guide-to-nano-the-linux-command-line-text-editor/ So far from the course Google Cloud Platform , I got the main advantage using GCP is invariant of which tehcnology whether they are Sql Db or NoSql DB , Spark or Hadoop , GCP offers to run your pieces of programs to be run for a specific amount of time unlike in other scenario where one would end up many hardware or software in order to perform these heavy processing tasks. Microservices Architecture on Google App Engine https://cloud.google.com/appengine/docs/standard/python/microservices-on-app-engine

Day 15 GCP CloudSql Lab, Hadoop

bash ./find_my_ip.sh cd training-data-analyst/CPB100/lab3a Note : If you lose your Cloud Shell VM due to inactivity, you will have to reauthorize your new Cloud Shell VM with Cloud SQL. For your convenience, lab3a includes a script called  authorize_cloudshell.sh  that you can run. https://cloud.google.com/certification/data-engineer https://cloud.google.com/sql/docs/mysql/connect-compute-engine https://cloud.google.com/solutions/mysql-remote-access You have to make sure the console IP is specified in the Autorized Networks PySpark Cheat SHeet http://www.datasciencecentral.com/profiles/blogs/pyspark-cheat-sheet-spark-in-python https://www.datacamp.com/community/blog/pyspark-cheat-sheet-python#gs.0vl89jQ DataProc is google managed Hadoop Spark Pig Hive Program The Hadoop Ecosystem Table https://hadoopecosystemtable.github.io/ HBase, Sqoop, Flume and More: Apache Hadoop Defined http://wikibon.org/wiki/v/HBase,_Sqoop,_Flume_and_More:_Apache_Hadoop_Defined h...

Day 15 GCP Recommendations , Cloud SQL PySpark DataProc

Collaborative Filtering - RDD-based API https://spark.apache.org/docs/latest/mllib-collaborative-filtering.html PySpark https://spark.apache.org/docs/0.9.0/python-programming-guide.html Managed Hadoop & Spark https://cloud.google.com/dataproc/ Fully-Managed PostgreSQL  BETA  & MySQL https://cloud.google.com/sql/ Cloud sql can run 1 petabit per second

Day 14 Swirl

install.packages("swirl") https://github.com/swirldev/swirl_courses#swirl-courses https://en.wikipedia.org/wiki/YAML http://yaml.org/ | You can exit swirl and return to the R prompt (>) at any time by pressing the Esc key. If you are | already at the prompt, type bye() to exit and save your progress. When you exit properly, you'll see a | short message letting you know you've done so. | When you are at the R prompt (>): | -- Typing skip() allows you to skip the current question. | -- Typing play() lets you experiment with R on your own; swirl will ignore what you do... | -- UNTIL you type nxt() which will regain swirl's attention. | -- Typing bye() causes swirl to exit. Your progress will be saved. | -- Typing main() returns you to swirl's main menu. | -- Typing info() displays these options again. | Let's get started! sqrt() function and to take the absolute value, use the abs() function Vector of unequal length Artihmetic Op...