test blog

Aspiring Data Scientist? You'll Need Some Math!

Posted by Dave Langer on Apr 18, 2017 1:19:18 PM

[av_section min_height='' min_height_px='500px' padding='default' shadow='no-shadow' bottom_border='no-border-styling' scroll_down='' id='' color='main_color' custom_bg='' src='' attach='scroll' position='top left' repeat='no-repeat' video='' video_ratio='16:9' video_mobile_disabled='' overlay_enable='' overlay_opacity='0.5' overlay_color='' overlay_pattern='' overlay_custom_pattern='']
[av_heading heading='Aspiring Data Scientist? You’ll Need Some Math!' tag='h1' style='blockquote modern-quote modern-centered' size='' subheading_active='' subheading_size='15' padding='10' color='' custom_font=''][/av_heading]
[/av_section]

[av_section min_height='' min_height_px='500px' padding='default' shadow='no-shadow' bottom_border='no-border-styling' scroll_down='' id='' color='main_color' custom_bg='' src='' attach='scroll' position='top left' repeat='no-repeat' video='' video_ratio='16:9' video_mobile_disabled='' overlay_enable='' overlay_opacity='0.5' overlay_color='' overlay_pattern='' overlay_custom_pattern='']

[av_two_third first min_height='' vertical_alignment='' space='' custom_margin='' margin='0px' padding='0px' border='' border_color='' radius='0px' background_color='' src='' background_position='top left' background_repeat='no-repeat' animation='']

[av_heading tag='h3' padding='10' heading='Why So Much Math?' color='' style='' custom_font='' size='' subheading_active='' subheading_size='15' custom_class=''][/av_heading]

[av_textblock size='' font_color='' color='']
At the end of each of our bootcamps we ask our students to provide us with feedback on their experience. In particular, we ask for honest assessments and opinions on how we can improve. It's something we take very seriously at Data Science Dojo and I can list a number of changes we've made as a direct result of student feedback. Given that our students come from a broad spectrum of backgrounds, it is not surprising that we invariably receive feedback that distills down to, "why so much math?"
[/av_textblock]

[av_heading tag='h3' padding='10' heading='Math and Programming - the Tools of the Data Scientist' color='' style='' custom_font='' size='' subheading_active='' subheading_size='15' custom_class=''][/av_heading]

[av_textblock size='' font_color='' color='']
It is my firm conviction that you do not need a PhD in Statistics/Computer Science/Machine Learning/Whatever to become a Data Scientist. However, it is my firm conviction that ultimately Data Science boils down to two things - Mathematics and Programming. Per this belief, our bootcamp curriculum is engineered to provide the required foundation in both mathematical concepts/theory and programming for Data Science.

As you might imagine, it is very rare for our students to provide feedback along the lines of, "why so much programming?" Some students comment that there was more programming than they expected, but rarely is the need for a Data Scientist to have coding skills questioned. Not so for mathematics. This is unfortunate as I would strenuously argue that without some mathematical knowledge a Data Scientist will not be able to build effective models.
[/av_textblock]

[/av_two_third][av_one_third min_height='' vertical_alignment='' space='' custom_margin='' margin='0px' padding='0px' border='' border_color='' radius='0px' background_color='' src='' background_position='top left' background_repeat='no-repeat' animation='']

[av_image src='https://datasciencedojo.com/wp-content/uploads/AspiringDataScientistMath.png' attachment='32452' attachment_size='full' align='center' styling='' hover='' link='' target='' caption='' font_size='' appearance='' overlay_opacity='0.4' overlay_color='#000000' overlay_text_color='#ffffff' animation='no-animation'][/av_image]

[av_hr class='default' height='50' shadow='no-shadow' position='center' custom_border='av-border-thin' custom_width='50px' custom_border_color='' custom_margin_top='30px' custom_margin_bottom='30px' icon_select='yes' custom_icon_color='' icon='ue808']

[av_social_share title='Share this entry' style='' buttons='' share_facebook='' share_twitter='' share_pinterest='' share_gplus='' share_reddit='' share_linkedin='' share_tumblr='' share_vk='' share_mail=''][/av_social_share]

[/av_one_third][av_one_full first min_height='' vertical_alignment='' space='' custom_margin='' margin='0px' padding='0px' border='' border_color='' radius='0px' background_color='' src='' background_position='top left' background_repeat='no-repeat' animation='']

[av_heading tag='h3' padding='10' heading='Math and Programming - the Tools of the Data Scientist' color='' style='' custom_font='' size='' subheading_active='' subheading_size='15' custom_class=''][/av_heading]

[av_textblock size='' font_color='' color='']
Here's a hypothetical example to illustrate my point. An aspiring Data Scientist does some research regarding a particular problem and finds a blog post, a paper, and/or a forum post recommending the application of a regression model built with Stochastic Gradient Descent to the problem space. The following screenshot is an excerpt from Python's most excellent scikit-learn library.

NOTE - Rest assured that similar R examples exist as well (e.g., the awesome glmnet pacakge) and I only use scikit-learn here as the scikit-learn HTML documentation is more visually attractive ;-).

SGDRegressor API

The above green boxes illustrate some of the mathematical knowledge required to use this algorithm to build the most effective model. For example:

  1. The Stochastic Gradient Descent algorithm - what is it and how does it work.
  2. Regularization - what is it and how does it work.
  3. The differences between L1 and L2 regularization - why a Data Scientist might want one vs. the other or a blend of both.

I believe this relatively simple example illustrates my point about math and programming. Specifically, this example shows that without the required math knowledge, the Data Scientist has little hope of coding up the training/construction of the most effective model in any reasonable way.

For these reasons, our students learn every highlighted item above as part of our curriculum's coverage of regression. We also teach our students the mathematics and theory for other important topics like decision trees, boosting, and recommender systems. It is also for these reasons that I advise the aspiring Data Scientists that I mentor that eventually they will need to dust off their math textbooks.

Until next time, happy data sleuthing!
[/av_textblock]

[/av_one_full][av_hr class='invisible' height='30' shadow='no-shadow' position='center' custom_border='av-border-thin' custom_width='50px' custom_border_color='' custom_margin_top='30px' custom_margin_bottom='30px' icon_select='yes' custom_icon_color='' icon='ue808' font='entypo-fontello']

[av_one_full first min_height='' vertical_alignment='' space='' custom_margin='' margin='0px' padding='0px' border='' border_color='' radius='0px' background_color='' src='' background_position='top left' background_repeat='no-repeat' animation='']

[av_comments_list]

[/av_one_full][/av_section][av_section min_height='' min_height_px='500px' padding='default' shadow='no-shadow' bottom_border='no-border-styling' id='' color='main_color' custom_bg='' src='' attachment='' attachment_size='' attach='scroll' position='top left' repeat='no-repeat' video='' video_ratio='16:9' overlay_opacity='0.5' overlay_color='' overlay_pattern='' overlay_custom_pattern='']
[av_heading tag='h2' padding='10' heading='You Might Also Like...' color='' style='blockquote modern-quote modern-centered' custom_font='' size='' subheading_active='' subheading_size='15' custom_class=''][/av_heading]

[av_hr class='invisible' height='30' shadow='no-shadow' position='center' custom_border='av-border-thin' custom_width='50px' custom_border_color='' custom_margin_top='30px' custom_margin_bottom='30px' icon_select='yes' custom_icon_color='' icon='ue808' font='entypo-fontello']

[av_one_fifth first min_height='' vertical_alignment='' space='' custom_margin='' margin='0px' padding='0px' border='' border_color='' radius='0px' background_color='' src='' background_position='top left' background_repeat='no-repeat' animation='']

[av_image src='https://datasciencedojo.com/wp-content/uploads/2016/03/Boosted-Decision-Tree-180x180.jpg' attachment='25396' attachment_size='square' align='center' styling='' hover='' link='post,25390' target='' caption='' font_size='' appearance='' overlay_opacity='0.4' overlay_color='#000000' overlay_text_color='#ffffff' animation='no-animation']
Build a Predictive Model in Azure Machine Learning Studio
[/av_image]

[av_textblock size='' font_color='' color='']

Build a Predictive Model in Azure Machine Learning Studio

[/av_textblock]

[/av_one_fifth][av_one_fifth min_height='' vertical_alignment='' space='' custom_margin='' margin='0px' padding='0px' border='' border_color='' radius='0px' background_color='' src='' background_position='top left' background_repeat='no-repeat' animation='']

[av_image src='https://datasciencedojo.com/wp-content/uploads/2014/09/Microsoft-Azure-180x180.png' attachment='11163' attachment_size='square' align='center' styling='' hover='' link='post,25382' target='' caption='' font_size='' appearance='' overlay_opacity='0.4' overlay_color='#000000' overlay_text_color='#ffffff' animation='no-animation']
Create Custom R Models in Azure Machine Learning
[/av_image]

[av_textblock size='' font_color='' color='']

Create Custom R Models in Azure Machine Learning

[/av_textblock]

[/av_one_fifth][av_one_fifth min_height='' vertical_alignment='' space='' custom_margin='' margin='0px' padding='0px' border='' border_color='' radius='0px' background_color='' src='' background_position='top left' background_repeat='no-repeat' animation='']

[av_image src='https://datasciencedojo.com/wp-content/uploads/2014/05/type1and2error-180x180.gif' attachment='4838' attachment_size='square' align='center' styling='' hover='' link='post,4763' target='' caption='' font_size='' appearance='' overlay_opacity='0.4' overlay_color='#000000' overlay_text_color='#ffffff' animation='no-animation']
Type I and Type II Error, Smoke Detector and the Boy Who Cried Wolf
[/av_image]

[av_textblock size='' font_color='' color='']

Type I and Type II Error, Smoke Detector and the Boy Who Cried Wolf

[/av_textblock]

[/av_one_fifth][av_one_fifth min_height='' vertical_alignment='' space='' custom_margin='' margin='0px' padding='0px' border='' border_color='' radius='0px' background_color='' src='' background_position='top left' background_repeat='no-repeat' animation='']

[av_image src='https://datasciencedojo.com/wp-content/uploads/2016/09/Steves-House-01-180x180.png' attachment='27952' attachment_size='square' align='center' styling='' hover='' link='post,27947' target='' caption='' font_size='' appearance='' overlay_opacity='0.4' overlay_color='#000000' overlay_text_color='#ffffff' animation='no-animation']
Predicting the Value of Your House
[/av_image]

[av_textblock size='' font_color='' color='']

Predicting the Value of Your House

[/av_textblock]

[/av_one_fifth][av_one_fifth min_height='' vertical_alignment='' space='' custom_margin='' margin='0px' padding='0px' border='' border_color='' radius='0px' background_color='' src='' background_position='top left' background_repeat='no-repeat' animation='']

[av_image src='https://datasciencedojo.com/wp-content/uploads/R-for-Excel-Users-04-180x180.png' attachment='32269' attachment_size='square' align='center' styling='' hover='' link='post,32184' target='' caption='' font_size='' appearance='' overlay_opacity='0.4' overlay_color='#000000' overlay_text_color='#ffffff' animation='no-animation']
Text Analytics with R Programming
[/av_image]

[av_textblock size='' font_color='' color='']

R Programming for Excel Users

[/av_textblock]

[/av_one_fifth]
[/av_section]

Topics: Data Science & Engineering, Predictive Modeling, bootcamp, data sceintist