[av_section min_height='' min_height_px='500px' padding='default' shadow='no-shadow' bottom_border='no-border-styling' scroll_down='' id='' color='main_color' custom_bg='' src='' attach='scroll' position='top left' repeat='no-repeat' video='' video_ratio='16:9' video_mobile_disabled='' overlay_enable='' overlay_opacity='0.5' overlay_color='' overlay_pattern='' overlay_custom_pattern='']
[av_heading heading='Text Analytics with R Programming' tag='h1' style='blockquote modern-quote modern-centered' size='' subheading_active='' subheading_size='15' padding='10' color='' custom_font=''][/av_heading]
[/av_section]
[av_section min_height='' min_height_px='500px' padding='default' shadow='no-shadow' bottom_border='no-border-styling' scroll_down='' id='' color='main_color' custom_bg='' src='' attach='scroll' position='top left' repeat='no-repeat' video='' video_ratio='16:9' video_mobile_disabled='' overlay_enable='' overlay_opacity='0.5' overlay_color='' overlay_pattern='' overlay_custom_pattern='']
[av_two_third first min_height='' vertical_alignment='' space='' custom_margin='' margin='0px' padding='0px' border='' border_color='' radius='0px' background_color='' src='' background_position='top left' background_repeat='no-repeat' animation='']
[av_textblock size='' font_color='' color='']
It is my firm conviction that Text Analytics is a must-have skill of any practicing Data Scientist. From analyzing customer feedback in NSAT surveys, to scraping Microsoft’s internal job postings for analyzing popular technical skills, to segmenting customers via textual features, I have universally found that Text Analytics is a wildly useful skill.
Not surprisingly, I am often asked by students of our Bootcamp, folks that I mentor on Data Science, and my LinkedIn contacts about the subject of Text Analytics. The good news is that there are many great resources for the R programmer to learn Text Analytics. What follows is a practical curriculum where the only required knowledge is basic R programming skills. I have read all of the books referenced below and can attest that studying the curriculum will have you mastering Text Analytics in no time!
[/av_textblock]
[/av_two_third][av_one_third min_height='' vertical_alignment='' space='' custom_margin='' margin='0px' padding='0px' border='' border_color='' radius='0px' background_color='' src='' background_position='top left' background_repeat='no-repeat' animation='']
[av_image src='https://datasciencedojo.com/wp-content/uploads/Text-Analytics-R-Prog_v2-01-1-495x400.png' attachment='34064' attachment_size='portfolio' align='center' styling='' hover='' link='' target='' caption='' font_size='' appearance='' overlay_opacity='0.4' overlay_color='#000000' overlay_text_color='#ffffff' animation='no-animation'][/av_image]
[av_hr class='default' height='50' shadow='no-shadow' position='center' custom_border='av-border-thin' custom_width='50px' custom_border_color='' custom_margin_top='30px' custom_margin_bottom='30px' icon_select='yes' custom_icon_color='' icon='ue808']
[av_social_share title='Share this entry' style='' buttons='' share_facebook='' share_twitter='' share_pinterest='' share_gplus='' share_reddit='' share_linkedin='' share_tumblr='' share_vk='' share_mail=''][/av_social_share]
[/av_one_third][av_hr class='default' height='50' shadow='no-shadow' position='center' custom_border='av-border-thin' custom_width='50px' custom_border_color='' custom_margin_top='30px' custom_margin_bottom='30px' icon_select='yes' custom_icon_color='' icon='ue808']
[av_one_fifth first min_height='' vertical_alignment='' space='' custom_margin='' margin='0px' padding='0px' border='' border_color='' radius='0px' background_color='' src='' background_position='top left' background_repeat='no-repeat' animation='']
[av_image src='https://datasciencedojo.com/wp-content/uploads/Text-Analytics-with-R-100.jpg' attachment='32279' attachment_size='full' align='center' styling='' hover='' link='' target='' caption='' font_size='' appearance='' overlay_opacity='0.4' overlay_color='#000000' overlay_text_color='#ffffff' animation='no-animation'][/av_image]
[/av_one_fifth][av_three_fifth min_height='' vertical_alignment='' space='' custom_margin='' margin='0px' padding='0px' border='' border_color='' radius='0px' background_color='' src='' background_position='top left' background_repeat='no-repeat' animation='']
[av_textblock size='' font_color='' color='']
Text Analytics with R for Students of Literature is quite simply the best, most straightforward introduction to working with text that I have found. Professor Jockers illustrates many of the fundamentals using out of the box R programming. This book provides a great foundation for anyone looking to get started in Text Analytics with R.
[/av_textblock]
[/av_three_fifth][av_one_fifth min_height='' vertical_alignment='' space='' custom_margin='' margin='0px' padding='0px' border='' border_color='' radius='0px' background_color='' src='' background_position='top left' background_repeat='no-repeat' animation='']
[/av_one_fifth][av_hr class='default' height='50' shadow='no-shadow' position='center' custom_border='av-border-thin' custom_width='50px' custom_border_color='' custom_margin_top='30px' custom_margin_bottom='30px' icon_select='yes' custom_icon_color='' icon='ue808']
[av_one_fifth first min_height='' vertical_alignment='' space='' custom_margin='' margin='0px' padding='0px' border='' border_color='' radius='0px' background_color='' src='' background_position='top left' background_repeat='no-repeat' animation='']
[av_image src='https://datasciencedojo.com/wp-content/uploads/Taming-Text-100.jpg' attachment='32281' attachment_size='full' align='center' styling='' hover='' link='' target='' caption='' font_size='' appearance='' overlay_opacity='0.4' overlay_color='#000000' overlay_text_color='#ffffff' animation='no-animation'][/av_image]
[/av_one_fifth][av_three_fifth min_height='' vertical_alignment='' space='' custom_margin='' margin='0px' padding='0px' border='' border_color='' radius='0px' background_color='' src='' background_position='top left' background_repeat='no-repeat' animation='']
[av_textblock size='' font_color='' color='']
Taming Text is the next stop on the Text Analytics journey. While this book is primarily written for Java programmers, there is a lot of theory that is immensely useful for R programmers learning to work with text. Additionally, the book covers the OpenNLP Java library which is available to R programmers via the excellent openNLP package.
[/av_textblock]
[/av_three_fifth][av_one_fifth min_height='' vertical_alignment='' space='' custom_margin='' margin='0px' padding='0px' border='' border_color='' radius='0px' background_color='' src='' background_position='top left' background_repeat='no-repeat' animation='']
[/av_one_fifth][av_hr class='default' height='50' shadow='no-shadow' position='center' custom_border='av-border-thin' custom_width='50px' custom_border_color='' custom_margin_top='30px' custom_margin_bottom='30px' icon_select='yes' custom_icon_color='' icon='ue808']
[av_one_fifth first min_height='' vertical_alignment='' space='' custom_margin='' margin='0px' padding='0px' border='' border_color='' radius='0px' background_color='' src='' background_position='top left' background_repeat='no-repeat' animation='']
[av_image src='https://datasciencedojo.com/wp-content/uploads/RLogo.png' attachment='31943' attachment_size='full' align='center' styling='' hover='' link='' target='' caption='' font_size='' appearance='' overlay_opacity='0.4' overlay_color='#000000' overlay_text_color='#ffffff' animation='no-animation'][/av_image]
[/av_one_fifth][av_three_fifth min_height='' vertical_alignment='' space='' custom_margin='' margin='0px' padding='0px' border='' border_color='' radius='0px' background_color='' src='' background_position='top left' background_repeat='no-repeat' animation='']
[av_textblock size='' font_color='' color='']
The CRAN NLP Task View illustrates the wide-ranging Text Analytics support for the R programmer. Unfortunately, it also illustrates that the landscape is fractured as well. However, a couple of packages are worthy of study. The tm package is often the go-to Text Analytics package for R programmers. However, the new quanteda package shows a lot of promise. Lastly, the excellent openNLP package deserves a second callout.
[/av_textblock]
[/av_three_fifth][av_one_fifth min_height='' vertical_alignment='' space='' custom_margin='' margin='0px' padding='0px' border='' border_color='' radius='0px' background_color='' src='' background_position='top left' background_repeat='no-repeat' animation='']
[/av_one_fifth][av_hr class='default' height='50' shadow='no-shadow' position='center' custom_border='av-border-thin' custom_width='50px' custom_border_color='' custom_margin_top='30px' custom_margin_bottom='30px' icon_select='yes' custom_icon_color='' icon='ue808']
[av_one_fifth first min_height='av-equal-height-column' vertical_alignment='av-align-middle' space='' margin='0px' margin_sync='true' padding='0px' padding_sync='true' border='' border_color='' radius='0px' radius_sync='true' background_color='' src='' attachment='' attachment_size='' background_position='top left' background_repeat='no-repeat' animation='']
[av_image src='https://datasciencedojo.com/wp-content/uploads/Information-Retrieval-100.jpg' attachment='32278' attachment_size='full' align='center' styling='' hover='' link='' target='' caption='' font_size='' appearance='' overlay_opacity='0.4' overlay_color='#000000' overlay_text_color='#ffffff' animation='no-animation'][/av_image]
[/av_one_fifth][av_three_fifth min_height='' vertical_alignment='' space='' custom_margin='' margin='0px' padding='0px' border='' border_color='' radius='0px' background_color='' src='' background_position='top left' background_repeat='no-repeat' animation='']
[av_textblock size='' font_color='' color='']
Introduction to Information Retrieval, while focused primarily on the problem of search, nevertheless contains a wealth of theory and understanding (e.g., the Vector Space Model) to take the R programmer to the next level. The text is language agnostic, is quite excellent, and free!
[/av_textblock]
[/av_three_fifth][av_one_fifth min_height='' vertical_alignment='' space='' custom_margin='' margin='0px' padding='0px' border='' border_color='' radius='0px' background_color='' src='' background_position='top left' background_repeat='no-repeat' animation='']
[/av_one_fifth][av_hr class='default' height='50' shadow='no-shadow' position='center' custom_border='av-border-thin' custom_width='50px' custom_border_color='' custom_margin_top='30px' custom_margin_bottom='30px' icon_select='yes' custom_icon_color='' icon='ue808']
[av_one_fifth first min_height='' vertical_alignment='' space='' custom_margin='' margin='0px' padding='0px' border='' border_color='' radius='0px' background_color='' src='' background_position='top left' background_repeat='no-repeat' animation='']
[av_image src='https://datasciencedojo.com/wp-content/uploads/Natural-Language-Processing-Pything-100.gif' attachment='32280' attachment_size='full' align='center' styling='' hover='' link='' target='' caption='' font_size='' appearance='' overlay_opacity='0.4' overlay_color='#000000' overlay_text_color='#ffffff' animation='no-animation'][/av_image]
[/av_one_fifth][av_three_fifth min_height='' vertical_alignment='' space='' custom_margin='' margin='0px' padding='0px' border='' border_color='' radius='0px' background_color='' src='' background_position='top left' background_repeat='no-repeat' animation='']
[av_textblock size='' font_color='' color='']
While the Natural Language Toolkit (NLTK) is Python-based, the book on the subject is a wealth of goodness to the R programmer. I put this resource last in the list as learning the above conceptual material and R packages provides the necessary background to translate some of the concepts (e.g., chunking) into the R context. Awesome stuff and free to boot!
[/av_textblock]
[/av_three_fifth][av_one_fifth min_height='' vertical_alignment='' space='' custom_margin='' margin='0px' padding='0px' border='' border_color='' radius='0px' background_color='' src='' background_position='top left' background_repeat='no-repeat' animation='']
[/av_one_fifth][av_hr class='default' height='50' shadow='no-shadow' position='center' custom_border='av-border-thin' custom_width='50px' custom_border_color='' custom_margin_top='30px' custom_margin_bottom='30px' icon_select='yes' custom_icon_color='' icon='ue808']
[av_one_full first min_height='' vertical_alignment='' space='' custom_margin='' margin='0px' padding='0px' border='' border_color='' radius='0px' background_color='' src='' background_position='top left' background_repeat='no-repeat' animation='']
[av_textblock size='' font_color='' color='']
There you have it, a practical curriculum for the R programmer to ramp into Text Analytics. Don’t hesitate to reach out if you have any questions or comments – I monitor my blog almost continually.
Until next time, happy data sleuthing!
[/av_textblock]
[/av_one_full][av_heading tag='h2' padding='10' heading='Video Tutorials' color='' style='blockquote modern-quote modern-centered' custom_font='' size='' subheading_active='' subheading_size='15' custom_class=''][/av_heading]
[av_hr class='invisible' height='30' shadow='no-shadow' position='center' custom_border='av-border-thin' custom_width='50px' custom_border_color='' custom_margin_top='30px' custom_margin_bottom='30px' icon_select='yes' custom_icon_color='' icon='ue808' font='entypo-fontello']
[av_one_half first min_height='' vertical_alignment='' space='' custom_margin='' margin='0px' padding='0px' border='' border_color='' radius='0px' background_color='' src='' background_position='top left' background_repeat='no-repeat' animation='']
[av_codeblock wrapper_element='' wrapper_element_attributes='']
[/av_codeblock]
[/av_one_half][av_one_half min_height='' vertical_alignment='' space='' custom_margin='' margin='0px' padding='0px' border='' border_color='' radius='0px' background_color='' src='' background_position='top left' background_repeat='no-repeat' animation='']
[av_codeblock wrapper_element='' wrapper_element_attributes='']
[/av_codeblock]
[/av_one_half][av_hr class='invisible' height='30' shadow='no-shadow' position='center' custom_border='av-border-thin' custom_width='50px' custom_border_color='' custom_margin_top='30px' custom_margin_bottom='30px' icon_select='yes' custom_icon_color='' icon='ue808' font='entypo-fontello']
[av_one_full first min_height='' vertical_alignment='' space='' custom_margin='' margin='0px' padding='0px' border='' border_color='' radius='0px' background_color='' src='' background_position='top left' background_repeat='no-repeat' animation='']
[av_comments_list]
[/av_one_full][/av_section][av_section min_height='' min_height_px='500px' padding='default' shadow='no-shadow' bottom_border='no-border-styling' id='' color='main_color' custom_bg='' src='' attachment='' attachment_size='' attach='scroll' position='top left' repeat='no-repeat' video='' video_ratio='16:9' overlay_opacity='0.5' overlay_color='' overlay_pattern='' overlay_custom_pattern='']
[av_heading tag='h2' padding='10' heading='You Might Also Like...' color='' style='blockquote modern-quote modern-centered' custom_font='' size='' subheading_active='' subheading_size='15' custom_class=''][/av_heading]
[av_hr class='invisible' height='30' shadow='no-shadow' position='center' custom_border='av-border-thin' custom_width='50px' custom_border_color='' custom_margin_top='30px' custom_margin_bottom='30px' icon_select='yes' custom_icon_color='' icon='ue808' font='entypo-fontello']
[av_one_fifth first min_height='' vertical_alignment='' space='' custom_margin='' margin='0px' padding='0px' border='' border_color='' radius='0px' background_color='' src='' background_position='top left' background_repeat='no-repeat' animation='']
[av_image src='https://datasciencedojo.com/wp-content/uploads/2016/03/Boosted-Decision-Tree-180x180.jpg' attachment='25396' attachment_size='square' align='center' styling='' hover='' link='post,25390' target='' caption='' font_size='' appearance='' overlay_opacity='0.4' overlay_color='#000000' overlay_text_color='#ffffff' animation='no-animation']
Build a Predictive Model in Azure Machine Learning Studio
[/av_image]
[av_textblock size='' font_color='' color='']
Build a Predictive Model in Azure Machine Learning Studio
[/av_textblock]
[/av_one_fifth][av_one_fifth min_height='' vertical_alignment='' space='' custom_margin='' margin='0px' padding='0px' border='' border_color='' radius='0px' background_color='' src='' background_position='top left' background_repeat='no-repeat' animation='']
[av_image src='https://datasciencedojo.com/wp-content/uploads/2014/09/Microsoft-Azure-180x180.png' attachment='11163' attachment_size='square' align='center' styling='' hover='' link='post,25382' target='' caption='' font_size='' appearance='' overlay_opacity='0.4' overlay_color='#000000' overlay_text_color='#ffffff' animation='no-animation']
Create Custom R Models in Azure Machine Learning
[/av_image]
[av_textblock size='' font_color='' color='']
Create Custom R Models in Azure Machine Learning
[/av_textblock]
[/av_one_fifth][av_one_fifth min_height='' vertical_alignment='' space='' custom_margin='' margin='0px' padding='0px' border='' border_color='' radius='0px' background_color='' src='' background_position='top left' background_repeat='no-repeat' animation='']
[av_image src='https://datasciencedojo.com/wp-content/uploads/2014/05/type1and2error-180x180.gif' attachment='4838' attachment_size='square' align='center' styling='' hover='' link='post,4763' target='' caption='' font_size='' appearance='' overlay_opacity='0.4' overlay_color='#000000' overlay_text_color='#ffffff' animation='no-animation']
Type I and Type II Error, Smoke Detector and the Boy Who Cried Wolf
[/av_image]
[av_textblock size='' font_color='' color='']
Type I and Type II Error, Smoke Detector and the Boy Who Cried Wolf
[/av_textblock]
[/av_one_fifth][av_one_fifth min_height='' vertical_alignment='' space='' custom_margin='' margin='0px' padding='0px' border='' border_color='' radius='0px' background_color='' src='' background_position='top left' background_repeat='no-repeat' animation='']
[av_image src='https://datasciencedojo.com/wp-content/uploads/2016/09/Steves-House-01-180x180.png' attachment='27952' attachment_size='square' align='center' styling='' hover='' link='post,27947' target='' caption='' font_size='' appearance='' overlay_opacity='0.4' overlay_color='#000000' overlay_text_color='#ffffff' animation='no-animation']
Predicting the Value of Your House
[/av_image]
[av_textblock size='' font_color='' color='']
Predicting the Value of Your House
[/av_textblock]
[/av_one_fifth][av_one_fifth min_height='' vertical_alignment='' space='' custom_margin='' margin='0px' padding='0px' border='' border_color='' radius='0px' background_color='' src='' background_position='top left' background_repeat='no-repeat' animation='']
[av_image src='https://datasciencedojo.com/wp-content/uploads/R-for-Excel-Users-04-180x180.png' attachment='32269' attachment_size='square' align='center' styling='' hover='' link='post,32184' target='' caption='' font_size='' appearance='' overlay_opacity='0.4' overlay_color='#000000' overlay_text_color='#ffffff' animation='no-animation']
Text Analytics with R Programming
[/av_image]
[av_textblock size='' font_color='' color='']
[/av_textblock]
[/av_one_fifth]
[/av_section]