Dave's Reflections

Successful Machine Learning: Part 2 (What is Being Learned?)

June 10th, 2021

Machine Learning Chip

For background on this post, please see my last entry, Part 1: Questions and Baselining.

What separates today’s machine learning from human learning? One word: concepts.

“How so?” you might ask. To see what I mean, let’s start by looking at standard machine learning inputs and outputs. I’ll focus on supervised learning.

Supervised Machine Learning

Supervised machine learning is an approach where we start with a set of records. In each record, one field contains the correct answer, known as the target attribute. The other fields in the record contain related information, formally known as descriptive attributes. For example, we might have a set of measurements for flower petals and the flower’s name for each set of measures. We want the computer to learn how to identify different types of flowers. For those of you with machine learning experience, you’ll recognize the Iris data set as the inspiration for my example.

Supervised machine learning is similar to how we might teach children some set of math facts. We give them many examples of addition problems and answers. Over time we would like them to understand the mechanics of addition and solve novel problems. We have a similar goal with supervised learning. We want to give the computer lots of examples with the correct answers and have it figure out how to answer new problems.

Decision Trees

Table 1: Flower Data
Petal Length	Petal Width	Flower (Answer)
2	1.7	Rose
2.5	2.1	Rose
3.2	0.5	Daisy
3.6	0.6	Daisy

Figure 1: Example Decision Tree

We’ll begin looking at what the machine is learning using a basic supervised approach, decision trees. In this case, the computer looks at the correct answer, the target attribute. It uses the descriptive attributes in the record to create a decision that would use that record’s data to arrive at the correct result. In Table 1, there are four records. For each, there are two measurements for rose petals and daisy petals. The resulting decision tree might look like Figure 1.

This is a simple example, but the interesting point is that the computer is limited to making a decision using the data in the record. As discussed in my previous post, the text “Rose” doesn’t mean anything to the system. We could add additional data to the record, such as details about petal color and whether the stem has thorns. But the machine learning process won’t have that information available without explicitly adding it to the data. Since the computer doesn’t know what the text “Rose” means, it can’t incorporate other knowledge about roses into its decision tree.

This is a considerable hurdle in machine learning. As people learn new information, they build a knowledge base and apply it to new learning. That isn’t how these discreet learning processes work. And that limitation is imposed chiefly because the computer isn’t using concepts.

Read the rest of this entry »

Tags: AI, artificial intelligence, business intelligence, data mining, machine learning, semantics
Posted in Artificial Intelligence (AI), Data Analytics, Data Mining, infuzIT, Intelligent Business Systems, Machine Learning, Semantic Technology | No Comments »

Successful Machine Learning: Part 1 (Questions and Baselining)

May 17th, 2021

Machine Learning Chip

In this series of posts, I’m delving into the limitations of machine learning and AI, hamstrung by current techniques, while considering technologies and practices to transform business intelligence efforts beyond the status quo.

Question of Intelligence

What is intelligence? What underlies intelligence? What aspects of intelligence do we want machine learning to demonstrate? What is artificial intelligence as opposed to intelligence? What capabilities does a computer need to achieve intelligence? Can programs be written to derive intelligence within a modern computer?

Questions delving into intelligent systems go on and on. I’m going to spend a few blog entries exploring machine learning and our quest to create and benefit from intelligent computer systems. Through this discussion, I’ll explore these questions.

Framing the Discussion

Note that my focus is business automation, what are organizations seeking to gain from machine learning and intelligent systems. I am purposefully avoiding a philosophical discussion of intelligence. To that end, a primary assumption is that we are interested in applying human-style intelligence to advance business or operational success. Put another way, animals and plants demonstrate intelligence of differing types; however, mimicking these is not an organization’s goal when employing machine learning.

Key Terms

To begin, I need working definitions for learning and intelligence. These will serve as touchstones for exploring computer-based learning and intelligence. Merriam-Webster’s dictionary provides helpful initial entries for each. The definition for “learn” is “to gain knowledge or understanding of or skill in by study, instruction, or experience.” While “Intelligence” is defined in two parts, “the ability to learn or understand or to deal with new or trying situations” and “the ability to apply knowledge to manipulate one’s environment or to think abstractly as measured by objective criteria (such as tests).”

The terms knowledge and understanding appear in both definitions and are vital to successful machine learning applications. Knowledge and understanding are based on information.

Read the rest of this entry »

Thoughts on Blockchain’s Relationship to Data Security

June 13th, 2018

After reading an article in the Wall Street Journal, “Blockchain Could Be the Security Answer. Maybe.” (May 30, 2018) I was concerned that information in the article could mislead readers regarding the place of blockchain in a cybersecurity discussion. Further, ruminations regarding blockchain’s ability to protect information spout from various media sources with insufficient detail regarding exactly how the information is protected.

This post isn’t meant to explain blockchain, there are many resources for that. Instead I focus on a few points made in the article specific to data security. In general, I find there is a lack of understanding about blockchain’s place in a data security context, the article simply highlights a few. I’ll frame my discussion using a common cybersecurity framework, the CIA triad.

When considering data security we often separate information protection into three categories: 1) Confidentiality – data should only be visible to those with a legitimate reason to access it; 2) Integrity – data should be accurate and no unauthorized changes should be made to it; and 3) Availability – the data should be accessible when it is needed. These three categories of protection, Confidentiality, Integrity, and Availability, form the CIA triad. To secure information, computers and programs must effectively provide all three.

Blockchain Protects Data Integrity

Blockchain was created to focus on the integrity of data. That is, the premise for blockchain is that a group wants to share information and assure that no one changes the data without consensus. The data is visible to anyone with access to the blockchain. Public and private keys in blockchain are only used to authenticate data changes – managing the integrity of the data.

A byproduct of a typical blockchain deployment is enhanced availability. If there are multiple organizations each with a complete copy of the blockchain, then the information is redundantly stored across multiple systems and accessible through multiple networks. Although not the focus of blockchain, and not a guaranteed security feature, especially if a single organization is using the technology privately, blockchain’s support for a distributed implementation can be used to enhance availability.

Confidentiality Is Another Issue

As relates to confidentiality, keeping private data private, the article implies that the keys used with blockchain encrypt the data, and hence aid in confidentiality. For instance, the article mentions, “With blockchain, the patient’s entire medical record is stored in a ledger and encrypted with the patient’s private key.” There are a three significant errors in this statement.

Read the rest of this entry »

Tags: blockchain, data, data security, disk encryption
Posted in Data Security, Security | No Comments »

MongoDB and Java – Powerful Complementary Platforms

May 31st, 2016

I have found that including MongoDB in the design of Java applications allows me a valuable level of flexibility in meeting client objectives. I have created an initial open source project on GitHub, JavaMongo, with the goal of providing working examples of Java and MongoDB integration. A secondary goal is to include development best practices, such as using testing frameworks and good coding style.

This posting is intended to give a little background on why I find Java and MongoDB to be useful tools in my software development arsenal and then to introduce the JavaMongo project. Future postings will include some videos walking developers through the examples as well as the frameworks being used (like JUnit, Cobertura and Checkstyle)

Background

Java is an ubiquitous platform for creating business applications. It has proven itself across a wide range of use cases from small point-based solutions to large generalized solution stacks. The variety of libraries, frameworks and tools for designing, building, testing and managing Java applications provides significant benefits to companies building solutions using Java. However, an application without ready access to data isn’t particularly useful. As enterprise-scale database options have broadened to include NoSQL, those individuals creating Java-based solutions must be sure to take advantage of new data options in order to benefit from the strengths of such components.

MongoDB is a great NoSQL platform that can be used to provide additional capabilities to your applications. MongoDB is a document store that has proven its reliability, scalability and integrate-ability across numerous small and large-scale applications. Its value and focus complements the way we use relational databases for online transaction-oriented processing (OLTP) and offers advantages over the way we use relational databases for data marts and warehouses.

A point of clarification before proceeding: I’m not here to say that MongoDB is better than some other data product, or, more generally, that document stores are better than relational databases. I find such arguments meaningless without a specific use case or project goal. These technologies are different and have individual strengths and weaknesses in the face of a specific set of project objectives.

I have found that MongoDB plugs in well when I need a place to federate data (structured, semi-structured and unstructured). Given a common platform, it simplifies the work required to build and alter connections between attributes. If you’ve looked at other information about my background you’ll see that I find the use of semantic technology to be incredibly valuable for data federation and classification. MongoDB as a flexible repository plays well with semantics. At the end of this post I’ll give you a small example of that.

JavaMongo Project

The JavaMongo project is intended to provide Java developers with working examples of Java and MongoDB integrations. Over time I expect a variety of common situations to be demonstrated, with associated documentation explaining the use case and the resulting implementation.

In order to have some interesting data to work with, I’m using data sets that my company releases to the public domain. In order to work with the JavaMongo examples you’ll need to import that data into your MongoDB instance. For more information about downloading and importing the sample data, see the discussion on MongoDB Collection of Honeypot Data on my NoSQL topic page .

The initial JavaMongo project contains a basic README file with information on running the example code. Instead of rehashing that information in this post, I’d like to walk through the basic operations being demonstrated in the example code. The main class we’ll explore is BasicStatistics (us.daveread.education.mongo.honeypot.BasicStatistics).

As you know, a Java program starts execution with the main() method. We see that the first step that the BasicStatistics’ main() method takes is to create an instance of the BasicStatictics class.

BasicStatistics Constructor

The constructor code goes through the entire process of connecting to a MongoDB database, accessing a collection and running a query on data in the collection.

First, an instance of MongoClientOptions is created. This class allows us to configure certain client side options related to the connection. I’ll get into more detail with this in future examples. In this case, the program is simply setting the connection timeout to 2000 milliseconds (2 seconds) so that if the instance is not available the program won’t hang for a long time. You wouldn’t make the timeout this short in a production environment but it helps for debugging our local environment by failing fast if something is wrong.

Read the rest of this entry »

Tags: Allegrograph, data, Java, lightweight data federation, MongoDB, programming, semantics
Posted in infuzIT, Java, NoSQL, Semantic Technology, Software Development, Tools and Applications | No Comments »

Accountable Care Organizations, Data Federation and CMS’ Updated Final Rule for the Medicare Shared Savings Program

June 8th, 2015

CMS has published a final rule (http://federalregister.gov/a/2015-14005) focused on changes to the Medicare Shared Savings Program (MSSP) which impacts Accountable Care Organizations (ACO) significantly. There are a variety of interesting changes being made to the program. For this discussion I’m looking at CMS’ continual drive toward data use and integration as a basis for improving quality of care, gaining efficiency and cutting costs in health care. One way this drive is manifested in the new rule regards an ACO’s plans as related to “enabling technologies,” which is an umbrella term for leveraging electronic data.

As background, Subpart B (425.100 to 425.114) of the MSSP describes ACO eligibility requirements. Two of the changes in this section clearly underscore the importance of electronic data and data integration to the fundamental operation of an ACO. Specifically, looking at page 127, the following updates are being made to section 425.112(b)(4) (emphasis mine):

Therefore, we proposed to add a new requirement to the eligibility requirements under § 425.112(b)(4)(ii)(C) which would require an ACO to describe in its application how it will encourage and promote the use of enabling technologies for improving care coordination for beneficiaries. Such enabling technologies and services may include electronic health records and other health IT tools (such as population health management and data aggregation and analytic tools), telehealth services (including remote patient monitoring), health information exchange services, or other electronic tools to engage patients in their care.

It goes on to add:

Finally, we proposed to add a provision under § 425.112(b)(4)(ii)(E) to require that an ACO define and submit major milestones or performance targets it will use in each performance year to assess the progress of its ACO participants in implementing the elements required under § 425.112(b)(4). For instance, providers would be required to submit milestones and targets such as: projected dates for implementation of an electronic quality reporting infrastructure for participants;

It is clear from the first change that an ACO must have a documented plan in place for continually expanding its use of electronic data and providing data visibility and integration between itself and its beneficiaries and providers. This is a tall order. The number of different systems and data formats along with myriad reporting and analytic platforms makes a traditional integration approach tedious at best and a significant business risk at worst.

The second change, keeping CMS apprised of the progress of data-centric projects, is clearly intended to keep the attention on these data publishing and integration projects. It won’t be enough to have a well-articulated plan, the ACO must be able to demonstrate progress on a regular basis.

Read the rest of this entry »

Tags: Centers for Medicare and Medicaid Services (CMS), data, healthcare, Information Systems, lightweight data federation, medicare, semantics
Posted in Agile Data Integration, Data Analytics, Healthcare Plan, Information Systems, Medicare, NoSQL, Semantic Technology | No Comments »

Impetus for Our Semantics and NoSQL Workshop at the 2015 SmartData Conference

May 15th, 2015

I’m looking forward to being one of the presenters for infuzIT’s hands-on data integration and analysis workshop at this year’s SmartData Conference in San Jose. Giving people the opportunity to see the amazing power of semantics combined with NoSQL to quickly integrate and analyze data makes my day.

My background includes significant work with data, both as an application developer and data warehouse architect. The acceleration of data-centric hardware and software capabilities over the past 10 years now supports a very different paradigm for exploring, reporting and analyzing data. Processes and procedures for creating a data warehouse or mart, the accepted rules of the road for creating integrated data repositories, are no longer clear cut. The data federation debate is no longer Inmon or Kimball.

A significant shift in data integration revolves around the required lifespan of the integrated data. This lifespan has two key aspects whose evolution now allows us to rethink our approach to data federation. This permits us to be much more agile when bringing heterogeneous data sources together. The two aspects are reflected in these design questions: 1) what data, if any, will be rehosted; and 2) what relationships will be supported within the integrated data?

Rehosting Data

In a traditional data warehouse the data must be rehosted. The new repository is the target where transformed data (cleaned-up, standardized) exists. The queries that will be retrieving data from multiple sources are really pulling data from a single source that has been populated from multiple sources. It represents a heavyweight process, driven by Extract-Transform-Load (ETL) scripts and requiring space to host redundant information.

Relationships Between Data Elements

The target warehouse schema determines what relationships are defined between the data elements being combined. Getting this “right” requires careful planning and coordination between the various groups that will use the warehouse. Given the significant effort, represented as cost, organizations tend to design data warehouses to support broad constituencies as a way to amortize the investment across departments and projects.

Paradigm Shift

Semantics and NoSQL allow us to reduce the effort of integrating data by orders of magnitude. They support a completely different mindset for bringing data together. Instead of carefully designing a model that works well in the general sense (reducing the value in specific cases) we have environments that allow us to experiment, adjust and focus on each case.

Below are several drivers which allow us to approach data federation differently using semantics and NoSQL.

Read the rest of this entry »

Tags: data, data integration, lightweight data federation, NoSQL, semantics, workshop
Posted in Agile Data Integration, Architecture, Data, Data Analytics, infuzIT, NoSQL, Semantic Technology | No Comments »

Medicaid Managed Care Congress Conversations Highlight the Value of Data Federation

May 22nd, 2014

This week I had the opportunity to attend the Medicaid Managed Care Congress (MMCC) in Baltimore, MD and the privilege of speaking with a variety of leaders from provider, payer, and services organizations. With me from Blue Slate Solutions were Scott Van Buren and Chris Garber. A common theme we heard as we spoke with the attendees was the challenge of bringing data together from multiple sources and making sense of that information.

Medicaid is potentially the most complex government program that exists in the United States. There are federal and state aspects as well as portions that are handled at a local level. Some funding and services are defined as required while others are optional. The financial models’ formulas involve many variables. In short, there are numerous challenges in Medicaid, including the dual eligible changes that seek to address the services disconnects that often exist when a person is eligible for both Medicare and Medicaid.

Combining data from providers, payers, patients, government entities and the community are all necessary in order to optimize the quality of care that is provided to each patient. The definition of provider continues to expand, covering not just the medical needs of a person but incorporating the various social services, so important to the holistic care of an individual, under the umbrella of “provider.”

As we listened to people and talked about their data challenges we were also able to walk them through the Data Unleashed™ approach. The iterative learn-as-you-go process resonated across the board, whether people represented patient advocacy groups, provider organizations or healthcare plans. The capability to start small, obtain value quickly and adapt rapidly to changing environments fits the Medicaid complexities well.

If you would like to learn more about our agile and lightweight approach to accessing data from across your enterprise in order to quickly begin creating meaningful reporting and analytics, please check out dataunleashed.com for descriptions, videos and case studies. We’d also appreciate the opportunity to host a webinar with your team where we can explore Data Unleashed™ in more depth and discuss your specific data challenges.

Tags: analytics, data, enterprise systems, Information Systems, lightweight data federation, ontology, Public Data, reporting, semantics, system integration
Posted in Data, Data Analytics, Data Unleashed, Medicaid, Semantic Technology, Tools and Applications | No Comments »

Data Unleashed™ Headed to the 2014 Medicaid Managed Care Congress

May 15th, 2014

For those of you spending time in Baltimore next week (May 19-21, 2014) to attend the Medicaid Managed Care Congress please stop by Blue Slate’s booth. Our MINI road trip begins Sunday as we head for Camden Yards and the beautiful inner harbor area. Our goal in attending? Having the opportunity to speak with you about your data challenges as well as your Medicaid journey.

We will be demonstrating what we mean by lightweight data federation and agile analytics as the drivers behind creating the Data Unleashed™ service platform. Given our extensive healthcare focus, we have deep experience working with companies on Medicaid initiatives, such as those involving dual eligibles, for instance the FIDA program in New York State.

Beyond data integration and analytics, we provide expertise for plans to: implement business process and business rule management solutions; prepare for site reviews and audits; and unify data from a variety of internal and cloud-based systems. More broadly beyond Medicaid, we work extensively in the Medicare and commercial healthcare space, leading transformative change for businesses such as Medicare Administrative Contractors (MACs) and Blues plans.

We look forward to having a chance to learn more about your operational challenges and share with you our organization’s background and focus areas. Let’s get together and explore opportunities to advance your organization’s strategic goals around improving quality of care and reducing costs.

Tags: conference, data, healthcare, ontology, semantics
Posted in Cognitive Corporation, Data, Data Analytics, Data Unleashed, Healthcare Plan, Medicaid, Medicare, Semantic Technology | No Comments »

Why Isn’t Everybody Doing It?

April 28th, 2014

That is a very dangerous question for a leader to ask when evaluating options. Yet it is one I hear far too often in the healthcare realm. It encapsulates a rejection of innovation, evolution and learning all in one terse, often rhetorical, question.

A common context for this question, often prefixed by, “If this is so great…,” is when discussing semantics and semantic technology. Although these concepts are not new to some industries, such as media, they are foreign in many healthcare organizations. Yet we know that healthcare payers and providers alike struggle with massive data integration and data analytics challenges just like media conglomerates.

The needs to: combine siloed information; drive an analytics mindset throughout an organization; and support the flexibility of a constantly changing IT environment are common in large healthcare organizations. Repeated attempts by organizations to meet these needs betray a lack of consensus around how to best achieve a valuable result.

Further, the implication that how most organizations solve a problem is optimal ignores the fact that best practices must change over time. The best way to solve a problem last year may not be the same this year. The healthcare industry is changing, the physical world of servers, networks, disk drives, memory is changing, and the expectations of members are changing. What was infeasible years ago becomes commonplace. Relational databases were all but unworkable in the 1970s due to a lack of experienced DBAs, slow disk drives, slow processors and limited memory.

In the same way, semantic formalization and graph databases were too new and limited to deal with large data sets until people gained expertise with ontologies while system hardware benefitted from another generation of Moore’s law. In the face of ongoing innovation, the question leaders should ask when approaching a challenge is, “What advancements have been made since the last time we looked at this problem?”

Leadership requires leading, not following. Leaders mentor their organizations through change in order to reach new levels of success. Leadership is based on learning, open-mindedness, creativity and risk-taking. The question, “Why isn’t everybody doing it?” is the antithesis of leadership and has no place there. In fact, if everybody is doing something, a leader would be better off asking, “How do we get ahead of what everybody is doing?”

Leaders must be on the forefront of pushing for better, faster, cheaper. Questioning the status quo, looking for new opportunities, seeking to leapfrog the competition, those are key foci for leadership.

As a leader, the next time you find yourself limiting your willingness to explore an option because everybody isn’t doing it, keep in mind that calculators, computers, automobiles, elevators, white boards, LED light bulbs, Google maps, telephones, the Internet, 3-D printing, open heart surgery, and many more concepts that are accepted or gaining traction, had a day when only one person or organization was “doing it.” Challenge yourself and your organization to find new options, new best practices and new paradigms for advancing your strategy and goals.

Tags: cognitive corporation, creativity, data, enterprise systems, Information Systems, semantics, system integration
Posted in Architecture, Cognitive Corporation, Data, Data Unleashed, Information Systems, Leadership, Semantic Technology | No Comments »

How Does Semantic Technology Enable Agile Data Analytics?

April 25th, 2014

I’m glad you asked. Scott Van Buren and I will be presenting a Dataversity webinar entitled, Using Semantic Technology to Drive Agile Analytics, on exactly that topic. Scheduled for May 14, 2014 (and available for replay afterwards), this webinar will highlight key semantic technology capabilities and how those provide an environment for data agility.

We will focus most of the webinar on a case study that demonstrates the agility of semantic technology being used to conduct data analysis within a healthcare payer organization. Healthcare expertise is not required in order to understand the case study.

As we look into several iterations of data federation and analysis, we will see the effectiveness of bringing the right subset of data together at the right time for a particular data-centric use. This concept translates well to businesses that have multiple sets of data or applications, including data from third parties, and seek to combine relevant subsets of that information for reporting or analytics. Further, we will see how this augments data warehousing projects, where the lightweight and agile data federation approach informs the warehouse design.

Please plan to join us virtually on May 14 as we describe semantic technology, lightweight data federation and agile data analytics. There will also be time for you to pose questions and delve into areas of interest that we do not cover in our presentation.

The webinar registration page is: http://content.dataversity.net/051414BlueslateWebinar_DVRegistrationPage.html

We look forward to having the opportunity to share our data agility thoughts and experiences with you.

Tags: agile analytics, data, lightweight data federation, ontology, semantics, teaching, webinar
Posted in Architecture, Data, Data Analytics, Data Unleashed, Information Systems, Semantic Technology, Tools and Applications | No Comments »

David S. Read