Response: BMJ article: B2C apps not up to standard.

In February 2020, skin cancer researchers released a paper calling into question the quality of evidence behind the current crop of consumer-facing skin cancer apps. Given that we’re focused on the integration of our AI into health providers like the NHS and Vitality Health, we weren’t part of this paper. The conclusion was that the studies were poorly designed and didn’t validate the claims the companies are making about their products.

The paper was published in the BMJ and is a stark reminder that machine learning applied to healthcare requires high quality clinical evidence. That lesson was shared with me by one of the authors of the paper way back in 2013. It has been a key pillar of our strategy at Skin Analytics ever since.

Critically, the researchers suggest that the evidence provided shows a significantly lower ability to identify cancers than the consumer-facing apps are marketing. Further, the paper states that the over referral rates shown would have significant negative financial impacts on any health system that operated the service. 

The researchers also pointed out that the current regulatory landscape is insufficient to cover the risks associated with the deployment of these services; though they point out the new Medical Device Regulations (MDR) coming into effect in May 2020 will resolve this to some degree.

Unfortunately, the MDR implementation was hit by a significant lack of resources in those tasked with evaluating those looking to secure clearance. The result of which was this Corrigendum announced by the EU which would see devices like these and us at Skin Analytics remain Class I devices until 2024. 

I have lobbied for and feel strongly that regulations need to be tougher for all services such as ours and welcomed the MDR. The delay is unavoidable and out of the control of companies like us, which makes the clinical evidence requirements all the more important.

I wanted to pick up the three main points of the researchers, some of whom I am indebted to for educating me about this field over the last eight years and to show how, at Skin Analytics, our own clinical studies we designed with these limitations in mind. 

Studies were small and of poor methodological quality 

Innovators face a challenge to build clinical evidence and it can seem that securing a clinician will to run a study covers this requirement, this is only the first step. Study design is a skill that researchers cultivate over many years and a poor study design will fail to answer the question the company is asking.

Furthermore, biostatisticians can be overlooked but must be intimately involved to ensure the study is powered to statistically validate the outcomes.

At Skin Analytics we were fortunate enough to work with a great group of research dermatologists across seven NHS hospitals as well as global thought leaders and a cancer biostatistician to design and power our prospective study. 

Despite this, no single study can answer all the research questions for a new technology like AI for skin cancer, so even our study also had some limitations. The key, however, is being aware of them and not over-claiming. As a result, we have four more clinical studies in the pipeline with institutions like Chelsea & Westminster, The Royal Free Hospital and Imperial College to continue to prove our technology.

Studies used selective recruitment

This refers to the way patients were recruited to the studies. They are either prospectively recruited from a real world setting, which represents the population being tested, or you use selection criteria and retrospective data which includes bias, both known and unknown.

At Skin Analytics, we focused on a prospective study recruitment strategy, which is harder and more expensive, but gives repeatable, real world results.

Selective recruitment also refers to testing the population that the AI is to be used on. Specifically, the concern is that the recruitment of patients was in specialist settings, while the use of the product is on the general population. The difference in the amount of the disease you see in these different settings affects dramatically the performance of the AI and the over referral rates expected.

For practical reasons, and to power a study, it is necessary to run evaluation studies in secondary care. For our study, we needed 65 melanoma to ensure a statistically significant result and to collect that number in GP practices within a reasonable period, say less than three years, would be a significant challenge.

Careful thought then needs to be given to how you can then apply this to the population presenting at a GP practice. Professor Giuseppe Argenziano helped design in a feature of our study which really gave our system the chance to fail with this concept in mind and I am extremely proud that it did not.

We’ve been working primary care providers for the last four years and now have a really good idea of what the data looks like at that level. One of the studies mentioned above is to get a good clinical validation for what we’ve seen from that work.

Studies had high rates of unevaluable images

There are a number of reasons that this may happen and the detail matters. One example from the studies evaluated was a case of lesions which were counted out of the study results because it had an equal probability of two results. This is a problem with the technology and should be counted in the evaluation and not doing this is concerning.

We would never exclude data where it is clearly a fault of the test being evaluated. However, in our prospective study, we also had challenges around image acquisition, especially for the digital SLR we used. In our study, nurses were using multiple capture devices which were connected to a system that stored the data against the data collected for the patient. 

The automated data collection component meant that there were technology lags between pressing the image capture button and the image being captured. Which resulted in a number of poor quality images. 

We decided to count these out of the evaluation for two reasons:

  • Firstly, the data capture wasn’t representative of how we capture data for our system. 
  • Secondly the first step in our system is to ensure we have an image of sufficient quality which upon failing the user would be required to retake the image. We only included data in our assessment that passed that test.

Summary

To summarise, there are many challenges for innovators looking to build, test and clinically validate a new technology. One of these challenges is that building high quality clinical evidence is neither simple nor a single study effort. 

While I have great sympathy for innovators, the evidence must match the claims being made. The researchers in this BMJ paper have clearly shown that the evidence produced for consumer-facing skin cancer services is not sufficient. 

At Skin Analytics, our strategy involves the use of dermascopes and the integration of our AI into skin cancer pathways provided by health providers like the NHS and Vitality Health.

We have invested in the first powered prospective clinical study to evaluate our AI and designed it in a way that introduced real challenges to our system. I am proud of the clinical validation we have which show how we can support GPs to better diagnose skin cancer while dramatically reducing the referral rate into hospitals. 

We’re not stopping there though, so watch this space for more clinical validation.