Wednesday, 19 December 2012

List of media coverage from recent paper: Weekday Affects Attendance Rate for Medical Appointments...

Ellis DA, Jenkins R (2012) Weekday Affects Attendance Rate for Medical Appointments: Large-Scale Data Analysis and Implications. PLOS ONE 7(12): e51365. doi:10.1371/journal.pone.0051365

The paper is open access however, an infographic summarising the main findings of the study can be found here. 

Media Coverage (last update 18/02/2013)

BBC Radio 4. The Today Programme (Flagship early morning news and current affairs)

BBC News Channel. News headlines (National television news and current affairs; rotation)

BBC Scotland. Reporting Scotland. (National television news and current affairs)

BBC Radio Scotland. Good Morning Scotland. (Scottish breakfast radio news)

BBC Radio Ulster. Talkback. (News and on-air reaction to breaking stories)

Radio Clyde. News headlines (National radio news and current affairs; rotation)

Publication: BBC News 
Title: BBC News - NHS appointments at the start of the week 'more likely to be missed'

Publication: The Herald

Title: Medical 'no shows' cost NHS £600m

Publication: The Evening Times

Title: 'Monday blues' hit Glasgow appointments

Publication: The British Psychology Society
Title: Hospital appointments and the Monday blues

Publication: The Daily Mail 
Title: Why the Monday blues means you should make your appointments for the end of the week

Publication: The Scottish Daily Mail 
Title: The doctor will see you... later in the week 
(print edition only)

Publication: University of Glasgow
Title: Monday blues explain why patients miss hospital and GP appointments

Publication: Teesside University

Title: Tackling the cost of missed GP appointments


Title: More GP appointments missed at start of week

Publication: Zesty

Title: Thousands of NHS appointments 'more likely to be missed' on Mondays

Publication: Simply Health
Title: Monday blues and missed medical appointments linked

Publication: Nursing Times

Title: Monday blues may increase likelihood of DNA

Publication: Commissioning.GP

Title: More DNAs for GP appointments at the beginning of the week

Publication: The British Medical Journal
Title: Appointments later in the week are less likely to be missed, finds study

Publication: MJOG
Title: DNAs increase with "Monday Blues"

Publication: AVIVA
Title: Monday blues 'responsible for missed appointments'

Publication: Healthcare Today
Title: Appointments at the beginning of week more likely to be missed

Publication: BioPortfolio
Title: Appointments at the beginning of week more likely to be missed

Publication: Rights & Wrongs
Title: Medical 'Monday Blues' for missed appointments

Publication: Human Health & Science
Title: 'Monday Blues' hit appointments

Publication: Men's Health
Title: The best time to schedule a doctors appointment

Publication: Spire Healthcare
Title: Monday blues 'equals no-show' for early week GP appointments

Publication: Glasgow City of Science
Title: Monday blues explain why patients miss hospital and GP appointments

Publication: Child and Maternal Health Observatory
Title: Weekday affects attendance rate for medical appointments

Publication: OnMedica
Title: More appointments missed on Mondays

Publication: My Science
Title: Monday blues explain why patients miss hospital and GP appointments

Publication: Prevention
Title: The best time for a doctor's appointment

Publication: Health Canal
Title: Monday blues explain why patients miss hospital and GP appointments

Publication: Brighton/Bristol /Edinburgh/Glasgow/Liverpool/London Wired
Title: NHS appointments at the start of the week 'more likely to be missed'

Publication: Pathway Software
Title: More DNA's for NHS appointments at the start of the week

Publication: Nursing Personnel
Title: Monday Blues

Publication: Medical Xpress
Title: Monday blues explain why patients miss hospital and GP appointements

Publication: primenetwork
Title: I don't like Mondays (or January)

Publication: NBC News
Title: Worst day for a doctor's appointment is...

Publication: Progress
Title: Patients miss Monday appointments

Publication: The Daily Express
Title: Best day to see the dentist
(print edition only)

Wednesday, 5 December 2012

Why are we still paying for statistical software?

'What programme should I use to analyse this data?'

About ten years ago there was little choice and expensive software would have arrived in a box containing a CD-ROM!

I still have SPSS and MATLAB in my applications folder. They don't come on CDs anymore, but from a central university server. Like CDs however, these programmes are on the verge of becoming a redundant medium. 

Given the choice of free tools available today, how are commercial alternatives going to survive? IBM acquired SPSS a few years back for $1.2 billion, which I am not convinced was a particularly smart move. 

Psychologists typically want to test predictions, visualise data and produce models. That said, additional functionality can often be required quickly and unexpectedly as a research project or idea develops. An open-source community allows for a flexibility that paid alternatives do not offer (yet).

The basic SPSS package has barely changed in the last decade, which is a long time for a piece of software. Microsoft took a similar approach with Windows XP, which in turn stagnated development and led to several high-profile disasters. Did anyone ever have anything good to say about Windows ME or Windows Vista?

Extra functionality beyond the basic SPSS package will involve an additional financial outlay. Alternatively, you could just go and get a free library for R that does the same thing that gets better on a monthly rather than annual basis.  

IBM and MathWorks are responding to these new developments by simply ignoring them. The record industry took a similar approach to the MP3 file. By the time they acknowledged its existence the whole distribution of music had changed beyond recognition.

Apple took a different approach with software development. 

Developer kits previously cost hundreds if not thousands of pounds 10 or 15 years ago. Now Apple makes the iPhone development kit avalible for free and profits from the creatively that this propagates. They have successfully adapted the business model to fit a change in consumer behaviour. 

I am not suggesting that commercial statistical programmes should be given away for free, but history would suggest that no change at all is likely to result in long-term obscurity.  Given their resources, IBM should have opened up SPSS to user development years ago and taken a similar approach where people could pay a small free for well-developed home-brew modules. 

Some clever person will develop a Graphical User Interface (GUI) for R that will give it the same point and click functionality as SPSS. It will cost the end-user zero pounds. R Studio for example, already provides a very similar MATLAB interface. 

What then for SPSS?

Wednesday, 14 November 2012

At the pub with a time series

When following the variation of some quantity over time, it is termed a time series. For example, when plotting the number of births per month in New York city from 1946-1960, there is a peak each summer (seasonability). There is also long-term growth in the number of births each year (a trend).

The pilot data below comes from an accelerometer showing the amount of movement produced by an individual over time. This sensor is worn around the neck and produces a data point every second.

To keep things interesting, I should mention that the individual wearing this sensor was at the pub for around 2 hours and while there consumed 2 units of alcohol. The rapid spike at the start shows them walking down the stairs to the bar (conveniently located below the School of Psychology).

Due to the large number of data points and variation within the data, it is almost impossible to get an idea of what is going on as the evening progresses (if anything!). Although we might predict that visits to the bar would show an spike in movement, we might also expect a general increase in movement as more alcohol is consumed.

Data Smoothing*

The presence of noise is a common feature in most time series - that is, random (or apparently random) changes in the quantity of interest. This means that removing noise, or at least reducing its influence, is of particular importance. In other words, we want to smooth the signal.

The simplest smoothing algorithm possible is often referred to as the running, moving, or floating average.

The idea is relatively straightforward: for any odd number of consecutive points, replace the centre-most value with the average of the other points. It is possible to adjust the number of points at which the function averages over - the output below shows the original data and a running average using 50, 100 or 200 points.

However, this can have a rather serious drawback.

When a signal has some sudden jumps and occasional large spikes, the moving average is abruptly distorted. One way to avoid this problem is to instead use a weighted moving average, which places less weight on the points at the edge of the smoothing window. Using a weighted average, any new point that enters the smoothing window is only gradually added to the average and gradually removed over time.

That said, in the case of my pilot data, it doesn't make a huge difference to the end result. Probably because the resolution of the original data is very high.

So what can we conclude about this individual at the pub? Well it looks like there is a general trend suggesting that movement increases as they make additional trips to the bar. The spikes in the WMA200 perfectly illustrate those visits, which itself starts to look like an additional seasonal component.

But is it possible to predict what their pattern of movement might look like after a third drink?

Maybe, but any future forecast could easily be the pub.

*note: the appropriate R code and information on the TTR moving averages library can be found here

Sunday, 28 October 2012

Q:Google's biggest problem today? A:Non-existent Customer Service

Despite the recent unveiling of the new iPad mini, I would maintain that Google's Nexus 7 tablet represents superior value for money. Granted it weighs a little more and lacks a front camera, but it's over £100 cheaper and that includes a quad-core processor!

But there lies a fatal flaw. While the likes of Apple and Amazon have customer service down to a fine art, Google appears to be living on another planet. Imagine an Amazon order that arrived broken, faulty or late. In almost every case, they would immediately despatch a replacement or issue a refund. Problem solved.

My recent experience with Google has taken over a month to reach a similar conclusion.

Part 1: The Order*

I placed an order for a Google Nexus 7 Tablet on the 18th of September. Money was taken from my account and my tablet dispatched with an accompanying TNT tracking number. Unfortunately, this tracking number was invalid. After several days I phoned Google customer service to make some enquires.

The customer service agent informed me that the tracking number provided was incorrect. I knew this already. The customer service advisor then 'advised' me to phone TNT to retrieve the correct tracking number. I suggested that it might make more sense for them to contact TNT on my behalf and call me back. The advisor didn't like this idea. After some negotiation, he put me on hold. Five minutes later, he terminated the call.

Lesson 1 in customer service for Goolge: When you make a mistake as a retailer, do not expect the customer to correct that mistake. 

So anyway, I phoned TNT and finally got my tracking number. It turned out they had attempted delivery and would not be able to deliver again until the following Monday. This was unfortunate as the tablet was meant to be a gift and so I asked TNT to return the tablet to Google. I then swiftly emailed Google who confirmed that I would receive a full refund after the item had been returned. This would take between 1 and 14 days.

TNT's tracking confirmed that Google received the item on the 24th of September, but 14 days elapsed and still no refund appeared. Google then stopped replying to emails.

Additional lessons in customer service: 
-Take a leaf out of Amazons book. When an item is returned unopened, a refund should be processed within 48 hours.
-A retailer must fulfil promises made to customers.
-Most large online retailers reply to customer emails within 4 working hours.

Finally, I again phoned customer service who informed me that someone would look into this asap. This annoyed me further as I wonder how long they would have attempted to keep my money if I hadn't got in touch. Clearly, no-one had done anything until I picked-up the phone. However, to their credit I received an email on the 16th of October (the next day) to confirm my refund had been processed.

This refund took until the 27th of October to appear in my account.

A final lesson in customer service: It should never take an entire month to process a refund.

Google has a big problem here. They almost appear uninterested in serving customers. At every stage their service is slow, inefficient and on one occasion, just plain rude!

Isn't it ironic that I am perfectly happy (and very grateful) for the free services that Google provide. The list is endless. A calendar, email, search, and web analytical services.....the list goes on. But after spending some money on a company that I thought I could trust, it all went very wrong. How annoying.

*The observant amongst you will notice that there is no 'Part 2'. This is because I have yet to risk a second purchase!

Sunday, 14 October 2012

Kernel Density Plots: Has the histogram had its day?

Simple statistical concepts include the mean, median, standard deviation, and percentiles. These are useful for summarising data. Except these summary statistics are only useful under certain circumstances. When basic assumptions are not met, then any conclusions based on simple summary statistics are likely to be inaccurate. Unable to give a hint as to what is wrong, the numbers can often look perfectly reasonable.

Lets consider a sample of 64 reaction time observations (in milliseconds):

Mean = 387ms
Median = 340ms

These look ok, until you view the distribution, which is not unimodal. 
Despite being a staple in data visualisation, histograms can often be a poor method for determining the shape of a distribution because they are strongly affected by the number of bins used. For example, visualising the same data with only four bins can make the same observations appear normally distributed. Similarly, a box-plot can also hide data irregularities and, like histograms, they do not handle outliers gracefully. 

Histograms are however, easy to understand and calculate. But the fact that something is easy and popular doesn't make it right. 

So is there an alternative?

Well yes there are several, but I think Kernel Density plots (KDP) are a more effective way to illustrate the distribution of a variable. This is now surprisingly easy to do. To form a *KDP, a kernel - that is, a smooth, strongly peaked function - is placed at the position of each data point. The contributions from all kernels are added to obtain a smooth curve, which can be evaluated at any point along the x-axis.

Taking the same data set, a Kernel Density Plot is interpreted in a similar manner to a histogram, but avoids the problems outlined earlier.

KDEs require the power of a relatively modern computer to be effective. They cannot be done 'by hand'. Being able to compute KDEs is only possible thanks to the accessibility of modern computing, which in turn provides a new way to think about data. However, they should only be used with larger data sets, as the smoothing can lead to misleading artefacts. There is also a danger in changing things just for the sake of it!

'The purpose of computing is insight, not pictures
L. N. Trefethen 

Update 15/10/2012
A colleague at work pointed out today that that kernel functions come in a variety of flavours. 

Update 13/10/2013**
Bandwidth may need to be adjusted in some instances. However, R is pretty good at selecting a default that will accomodate most data sets.

*To produce KDEs for a given variable 'x' and a  KD plot in R, use the following code:

d <- density(x) # returns the density data
plot(d, main='Kernal Density Plot') # plots the graph

**to specify bandwidth:

plot(d, main='Kernal Density Plot', bw=#) # plots the graph

Saturday, 8 September 2012

The Hexaco Personality Inventory - SPSS Script

I am currently using the 60-item version of the Hexaco-PI-R personality inventory and decided to write a short script for SPSS to help speed up the coding process. I have posted it below because I couldn't find anyone else who had posted one online.

All items should be labeled as separate numeric variables as hexaco1, hexaco2, hexaco3 ...etc

The script computes and prints the results for all reverse scored items and then calculates and prints Factor/ Facet scores. It will also produce Cronbach's Alpha coefficients for each factor.

The original scoring key for the HEXACO-PI-R can be found here.


*Part 1 - reverse scoring of specific items


COMPUTE rhexaco30 = 6 - hexaco30.
COMPUTE rhexaco12 = 6 - hexaco12.
COMPUTE rhexaco60 = 6 - hexaco60.
COMPUTE rhexaco42 = 6 - hexaco42.
COMPUTE rhexaco24 = 6 - hexaco24.
COMPUTE rhexaco48 = 6 - hexaco48.


COMPUTE rhexaco53 = 6 - hexaco53.
COMPUTE rhexaco35 = 6 - hexaco35.
COMPUTE rhexaco41 = 6 - hexaco41.
COMPUTE rhexaco59 = 6 - hexaco59.


COMPUTE rhexaco28 = 6 - hexaco28.
COMPUTE rhexaco52 = 6 - hexaco52.
COMPUTE rhexaco10 = 6 - hexaco10.
COMPUTE rhexaco46 = 6 - hexaco46.


COMPUTE rhexaco9 = 6 - hexaco9.
COMPUTE rhexaco15 = 6 - hexaco15.
COMPUTE rhexaco57 = 6 - hexaco57.
COMPUTE rhexaco21 = 6 - hexaco21.


COMPUTE rhexaco26 = 6 - hexaco26.
COMPUTE rhexaco32 = 6 - hexaco32.
COMPUTE rhexaco14 = 6 - hexaco14.
COMPUTE rhexaco20 = 6 - hexaco20.
COMPUTE rhexaco44 = 6 - hexaco44.
COMPUTE rhexaco56 = 6 - hexaco56.


COMPUTE rhexaco1 = 6 - hexaco1.
COMPUTE rhexaco31 = 6 - hexaco31.
COMPUTE rhexaco49 = 6 - hexaco49.
COMPUTE rhexaco19= 6 - hexaco19.
COMPUTE rhexaco55 = 6 - hexaco55.


*Part 2 - calculating factor and facet scores

COMPUTE Honesty_Humility = (hexaco6+hexaco54+hexaco36+hexaco18+rhexaco30+rhexaco12+rhexaco60+rhexaco42+rhexaco24+rhexaco48)/10.
COMPUTE Sincerity = (hexaco6+hexaco54+rhexaco30)/3.
COMPUTE Fairness = (hexaco36+rhexaco12+rhexaco60)/3.
COMPUTE Greed_Avoidance = (hexaco18+rhexaco42)/2.
COMPUTE Modesty = (rhexaco24+rhexaco48)/2.

COMPUTE Emotionality = (hexaco5+hexaco29+hexaco11+hexaco17+hexaco23+hexaco47+rhexaco53+rhexaco35+rhexaco41+rhexaco59)/10.
COMPUTE Fearfulness = (hexaco5+hexaco29+rhexaco53)/3.
COMPUTE Anxiety = (hexaco11+rhexaco35)/2.
COMPUTE Dependence = (hexaco17+rhexaco41)/2.
COMPUTE Sentimentality = (hexaco23+hexaco47+rhexaco59)/3.

COMPUTE Extraversion = (hexaco4+hexaco34+hexaco58+hexaco16+hexaco40+hexaco22+rhexaco28+rhexaco52+rhexaco10+rhexaco46)/10.
COMPUTE Social_Self_esteem = (hexaco4+rhexaco28+rhexaco52)/3.
COMPUTE Social_Boldness = (hexaco34+hexaco58+rhexaco10)/3.
COMPUTE Sociability = (hexaco16+hexaco40)/2.
COMPUTE Liveliness = (hexaco22+rhexaco46)/2.

COMPUTE Agreeableness = (hexaco3+hexaco27+hexaco33+hexaco51+hexaco39+hexaco45+rhexaco9+rhexaco15+rhexaco57+rhexaco21)/10.
COMPUTE Forgiveness = (hexaco3+hexaco27)/2.
COMPUTE Gentleness = (hexaco33+hexaco51+rhexaco9)/3.
COMPUTE Flexibility = (hexaco39+rhexaco15+rhexaco57)/3.
COMPUTE Patience = (hexaco45+rhexaco21)/2.

COMPUTE Conscientiousness =(hexaco2+hexaco8+hexaco38+hexaco50+rhexaco26+rhexaco32+rhexaco14+rhexaco20+rhexaco44+rhexaco56)/10.
COMPUTE Organization = (hexaco2+rhexaco26)/2.
COMPUTE Diligence = (hexaco8+rhexaco32)/2.
COMPUTE Perfectionism = (hexaco38+hexaco50+rhexaco14)/3.
COMPUTE Prudence = (rhexaco20+rhexaco44+rhexaco56)/3.

COMPUTE Openness_to_Experience = (hexaco25+hexaco7+hexaco13+hexaco37+hexaco43+rhexaco1+rhexaco31+rhexaco49+rhexaco19+rhexaco55) /10.
COMPUTE Aestheic_Appreciation = (hexaco25+rhexaco1)/2.
COMPUTE Inquisitiveness = (hexaco7+rhexaco31)/2.
COMPUTE Creativity = (hexaco13+hexaco37+rhexaco49)/3.
COMPUTE Unconventionality = (hexaco43+rhexaco19+rhexaco55)/3.


*Part 3 Calculating reliability scores (Cronbach Alpha) for each factor

* Honest humility

  /VARIABLES=hexaco6 hexaco54 hexaco36 hexaco18 rhexaco30 rhexaco12 rhexaco60 rhexaco42 rhexaco24 rhexaco48


  /VARIABLES=hexaco5 hexaco29 hexaco11 hexaco17 hexaco23 hexaco47 rhexaco53 rhexaco35 rhexaco41 rhexaco59


  /VARIABLES=hexaco4 hexaco34 hexaco58 hexaco16 hexaco40 hexaco22 rhexaco28 rhexaco52 rhexaco10 rhexaco46


  /VARIABLES=hexaco3 hexaco27 hexaco33 hexaco51 hexaco39 hexaco45 rhexaco9 rhexaco15 rhexaco57 rhexaco21


  /VARIABLES=hexaco2 hexaco8 hexaco38 hexaco50 rhexaco26 rhexaco32 rhexaco14 rhexaco20 rhexaco44 rhexaco56

*Opennes to experience

  /VARIABLES=hexaco25 hexaco7 hexaco13 hexaco37 hexaco43 rhexaco1 rhexaco31 rhexaco49 rhexaco19 rhexaco55

Sunday, 2 September 2012

Network analysis: Where are you in my social network?

Michael Slater-Townshend talks extensively about the merits of understanding your own and other online communities in this in this months Royal Statistical Society magazine. After following his advice, I have discovered that it is surprisingly easy to download your own Facebook data and see which of your friends form connected groups.

Several apps allow you to download 'raw' Facebook data in a format that suits almost any statistical package. I used NameGenWeb. The resulting file can then be imported into a variety of statistical packages. I chose to use Gelphi for this example.

My unprocessed Facebook network looks like this...
Each dot (or node) is a friend and the lines show friendship connections between each individual. 

In order to make things manageable, I ran a cluster-analysis to look for groups of people who are more connected to each other. This quickly produced three distinct groups. The larger circles represent clusters of 3 or more people who share many connections. 

The dark purple, yellow and pink nodes are difficult to categorize because they don't fit well into any of the defined groups. Those who sit in the middle could be described as being at the epicentre of my [Facebook] existence because they have strong links with all three clusters. Each individual can be identified from this model, but I opted to remove the name tags for clarity.

Of course this network is virtual and constantly changing as it relies on the behavioural patterns of nearly 350 individual data points. For many people, the resulting network may not reflect their real life social interaction. For example, I almost never see people in the school cluster with the exception of one large orange circle. This shows a minority of 3 close friends from school who I have continue to socialise with on a regular basis. 

There are many applications for this type of analysis, particularly when it comes to comparing real life and online social interaction. Other research has started to suggest that Facebook and Twitter status updates may also help predict personality. How these networks change and evolve over time (assuming Facebook is still around in 20 years) would presumably give a valuable insight into how friendship groups change as we age.

As you can probably imagine, the vast amounts of present information has already become a valuable source of information for any future employer or recruitment agency! 

Monday, 20 August 2012

Do expensive HDMI cables matter?

Having been on the hunt for a new CD player (yes some people still listen to CDs!), I did the usual browse of hi-fi magazines and websites to help guide me towards what might be an improvement over my 18 year old Rotel.

That said, it is becoming increasingly difficult to trust any review when some magazines describe a £300 HDMI cable as sounding 'controlled and composed'.

This is a cable that carries a digital signal - digital meaning 1's and 0's. By that logic, a more expensive ethernet cable linking your personal computer to a network should also result in a more 'controlled and composed' internet experience. It won't.

In the digital domain, the correct information is either received or it isn't. I have been unable to find any scientific evidence suggesting that a difference in picture quality can be detected between an HDMI cable costing £20 or £200. What I have found is a lot of anecdotal evidence from people who have invested in these cables. See cognitive dissonance and/or the placebo effect.

Surely if a company wants to sell a cable for £200 - they would do well to demonstrate scientifically that it will deliver something? If hair conditioner costing a couple of pounds can manage that - surely  an expensive HDMI cable can do the same.

Of course the companies that manufacture these cables can't, because it [*probably] won't show the desired result. Instead they are forced to rely on claims and awards by publications that repeatedly tell readers how expensive HDMI cables can driver improvements that:  

'are immediately apparent, especially with sound quality. Dynamically, it's amongst the best HDMI cables around'.

The same can be said about the way hi-fi journalists talk about equipment racks that 'serve up a muscular sound'. My hi-fi manages a similar sound...while sitting snugly on the floor.

On the other hand, I am aware that there is a solid argument when it comes to investing in a decent cable that carries an analogue signal, which is more vulnerable to interference. Speaker cable is a good example.

The real problem I have with these publications is that it is now almost impossible to trust any review. If they seriously believe that a digital cable costing hundreds of pounds can improve a image then presumably their judgement of everything else is also impaired.

One obvious way to prove me wrong would be for a publication to run an experiment where cables are compared in a double blind study. But a hi-fi magazine will never do that for obvious reasons.

I might however.

*This is an untested hypothesis.