Text Mining Use Case

David Haertzen – October 2019

Text mining methods are techniques that can turn unstructured data like emails, tweets and recordings into actionable insights.  The knowledge gained can be used to both identify opportunities and serve customers as well as management risks such as cybercrimes.  Examples of text mining use cases that capitalize on opportunities include:

  • Customer Experience: Obtain knowledge about customers through diverse sources such as emails, surveys and calls to provide automated response and to identify opportunities and issues.
  • Contextual Advertising: Target advertising to specific customers based on analysis of text
  • Business Intelligence: Answer specific business questions through scan and analysis of thousands of documents.
  • Knowledge Management: Gain value from huge amounts of information in areas like product research and clinical patient data.
  • Content Enrichment: Add value to content by organizing, summarizing and tagging.
  • Social Media Analysis: Scan large data volumes to gather opinions, sentiments and intentions relating to organization reputation, brands and offerings.

Examples of text mining use cases that address risks and losses include:

  • Cybercrime Detection: detect malicious threats such as ransomeware and identity theft using machine learning to identify likely malware. Machine learning identifies trends and improved its predictions formed through experience.
  • Fraud Detection: Identify potential fraudulent activity such as insurance claim fraud through analysis of unstructured data.
  • Risk Management: Scan thousands of documents to find patterns that identifying risks to be addressed.
  • Spam Filtering: Reduce the volume of spam through better filtering tuned through machine learning.

How can we take advantage of these use case?  One way, is to use the Text Frequency – Inverse Data Frequency (TF-IDF) method to quantity the strength of words that make up documents – based on the relative frequency of words. The flow of this process is illustrated in the following diagram.

There are five major steps to this process:

  1. Gather Text: Read in the body of text (corpus) from sources such as:  emails, reports, tweets, comments and notes which may be stored as separate files or as fields in a database.
  2. Preprocess Text: Produce a streamlined version of the text by removing punctuation, shifting to lower case, removing stop words and location words, resolving to word stems (stemming). Using tokenization methods such as “bag of words” render words into streams of numbers.
  3. Apply TF-IDF Algorithm: Calculate the strength of words using  the TD-IDF calculation. Text Frequency (TF) for each word in a document = specific word count divided by total words in document count.  Inverse Document Frequency (IDF) = log e(total number of documents / total documents containing the word. Finally, TD-IDF = TF * IDF.
  4. Output Structured Data File: Generate one flat file record for each input document. Each record will contain a document identifier plus a field for each word of interest. See the example structured flat file below.
  5. Apply Data Science Algorithms: The generated flat file is in a format where data can be better understood or outcomes predicted using data science algorithms such as: regression, decision tree, clustering or neural network.

In conclusion, text mining methods are available that can be used to capitalize on opportunites, reduce losses and manage risks.  The TF-IDF method is one of many approaches to successful data mining and is a good example of the overall approach.  Typically multaple documents are scanned, pre-processed and then analyzed using an algoritm like TF-IDF, Keyword Association Network (KAN) or Support Vector Machines (SVM).  Libraries of algorithms such as Python Scikit-learn support text processing via machine learning.  I encourage you to learn more about text processing and its applications.

Recruiting the BIA Sponsor – Management for Profitable Analytics – Part 2

thumb_project_schedule_wpclipart_600x435In this article you will better understand the role of the Business Intelligence and Analytics (BIA) sponsor  and you will be ready with a five step approach to recruiting a person to fill that role.  The role of the sponsor is critical to the success of the BIA project or initiative.   This individual is a senior management person who takes overall responsibility for the effort.

Seek out a BIA sponsor who has a large stake in the project outcome as well as authority over the resources needed for the project.   Look for someone with enough authority throughout the organization to manage competing priorities.   It will help if there is organization wide recognition that put business intelligence and analytics as high priority and worthy of sponsorship.

The BIA sponsor fills a number of roles including:

  • Definer of the BIA vision
  • Owner of the business case
  • Harvester of benefits
  • Overseer of the project and chair of the BIA steering committee
  • Ambassador of the BIA effort to upper management.

BIA champions complement the work of the project sponsor.   Look for people who will promote data warehousing efforts across the organization.   They make sure that the project is aligned with enterprise goals and help sell the project to the rest of the organization.

The scope of a BIA effort is also highly dependent on the level of authority of the project sponsor.   If the sponsor is the CEO or CFO, then the scope can be enterprise wide. If the sponsor is a business unit head,   then the scope is likely the business unit. If the sponsor is a department head, then the scope is likely limited to a single department.

Conversely, an Enterprise BIA or Data Warehouse project requires a higher level executive sponsor with more authority  and resources than is required for a Departmental Data Mart project.

Five Steps to Recruiting the BIA Project Sponsor

Recruiting the BIA project sponsor is too important to leave to chance. Follow this step by step approach to recruiting your BIA project sponsor and champions:

1. Define the Requirements

      Start by identifying the characteristics that are needed in the BIA sponsor.    Consider the scope of the project. What you need the sponsor to do? Who needs to be influenced?   What needs to be signed off> What personality type needed?  Do you need to motivate an enterprise wide team or a departmental unit?   Put the requirements and their weighted priority in a spreadsheet.

2. Identifiy the Best Candidates

      Build a list of candidates for the role and narrow it down to a short list based on the identified requirements.   This means scoring each of the candidates based on the prioritized requirements.     Remove any candidate from the list who score poorly on the most critical requirements.

3. Analyze Candiate Capacity

      Perform an analysis of the short listed candidates to make sure that they have the time available to support the BIA effort.   It is critical that the selected person on the list is devote enough time to your project and not be a sponsor in name only.

4. Plan your Marketing Approach

      Selling the BIA effort to the BIA sponsor and other critical stakeholders requires  building a short “elevator pitch” which summarizes the benefits of the BIA program in 10 to 20 seconds. You will use the elevator pitch to gain initial interest so that you can provide greater detail and “close the sale”.   Components of the elevator pitch may include:
      • Problem identification
      • Proposed solution
      • Value proposition
      • Competition reference – what competitors are doing
      • Team identification
      • Resource needs

The BIA sponsor is more likely to respond to a value proposition that addresses organization pain points and strategies rather than technical features.  Put in business benefit terms and then package that into the “elevator speech”.

Technical Feature

Business Results

Integrated customer data

Effective use of marketing dollars

Improved customer experience

Dashboards

Visibility of enterprise performance

Data warehouse cubes

Fast Results

Advanced hardware platform

Capacity to enable business growth

Cloud BI

Faster time to market and lower cost of ownership

5. Present the Pitch

          The presentation of the elevator speech to the candidate sponsor definitely should be done in person.  First, rehearse – be ready for a smooth presentation followed by questions from the prospective sponsor.  Second, if you do not have a direct connection with the sponsor, find a connection with a mutual contact who can introduce you to the sponsor. This is better than trying to corner by the water cooler or literally by the elevator. You should ask the sponsor to meet for an executive briefing rather than immediately asking the sponsor to be a sponsor.
          The executive briefing will enable you to better understand the candidate sponsor and build toward the close when the candidate sponsor is asked to commit to the BIA program.

 

Partnering with the BIA Sponsor

You have gained a better understanding of the BIA sponsor role  and you cam use the five step approach to recruiting a person to fill that role.   Recruting the BIA sponsor is just the beginning. Now it is time to work with the BIA sponsor to launch and implement the BIA effort. Be sure to keep him or her in the loop:

  • Provide regular status reports
  • Request help in dealing with roadblocks, especially those requiring cooperating across the organization
  • Align BIA effort with organization goals

 

Management for Profitable Analytics – Part 1

thumb_project_schedule_wpclipart_600x435In this blog tutorial series, you will learn about the management of successful business intelligence and analytics projects. Topics include:

  • Defining Scope and Objectives
  • Finding the Right Sponsor
  • Producing the Project Roadmap and Plans
  • Organizing the Team
  • Executing the Plan
  • Finishing the Project
  • Avoiding Major Data Warehouse Mistakes.

Defining Scope and Objectives

Scope specifies the boundaries of the project. It tells what is in and what is out. The scope definition started in the business case will be expanded, if needed, when the project is underway. This effort includes:

  • Overview of the project (Mission, Scope, Goals, Objectives, Benefits)
  • Scope plan
  • Scope definition
  • Alternative development.

Defining the correct scope and setting realistic objectives are critical to any project’s success, and a data warehouse project is no exception. Scope defines project boundaries including:

  • Business requirements addressed
  • Anticipated/planned users
  • Subject Areas such as inventory transactions or customer service interactions
  • Project success criteria, including quantified planned benefits.

Defining an overly large project scope and letting scope grow in an uncontrolled fashion (scope creep) are certain to cause project failure. Remember you cannot please everyone:

I cannot give you a formula for success,

but I can give you a formula for failure: try to please everybody.

Herbert Bayard Swope

Enterprise vs. Departmental Focus

The choice of Enterprise Data Warehouse vs. Departmental Data Mart is critical to the success of data warehousing projects. This choice is a major component of project scope. Examples of factors that arise with each focus, based on my experience, are shown in Table 1.

Table 1: Enterprise vs. Departmental Focus

Factor Enterprise Focus Department / Functional Focus
Organizational Scope Enterprise Wide Business Unit or Business Process Focused
Time to Build Multi-year phased effort Single Year effort
Sponsorship Required Executive Sponsor Management Sponsor
Complexity High Medium
Typical Cost Often a multimillion dollar effort Often less than $1 million effort

The project may require both an Enterprise Data Warehouse and one or more Data Marts. The future Technical Architecture blog article will explain more about this choice.