Groups patient records from multiple isolates with a single integer patientID by grouping patient identifiers.
Grouping is based on five stages:
matching nhs number and date of birth
Hospital number & Date of Birth
NHS number & Hospital Number
Date of Birth & Surname IF nhs unknown
Sex & Date of Birth & Fuzzy Name
Identifiers are copied over where they are missing or invalid to the grouped records.
uk_patient_id(
x,
nhs_number,
hospital_number,
date_of_birth,
sex_mfu,
forename = "NONAME",
surname = "NONAME",
.sortOrder,
.keepValidNHS = FALSE,
.forceCopy = FALSE,
.experimental = FALSE
)
a data.frame or data.table containing the cleaned line list
a column as a character containing the patient NHS numbers
a column as a character containing the patient Hospital numbers
a column as a date variable containing the patient date of birth in date format
column as a character containing the patient sex;
NOTE only works if coded only as character versions of Male/Female/Unknown;
does not currently work with additional options #future update
a column as a character containing the patient forename; leave as NONAME if unavailable
a column as a character containing the patient surname; leave as NONAME if unavailable
optional; a column as a character to allow a sorting order on the id generation
optional, default FALSE; set TRUE if you wish to retain the column with the NHS checksum result stored as a BOOLEAN
optional, default FALSE; TRUE will force data.table to take a copy instead of editing the data without reference
optional, default FALSE; TRUE will enable the experimental features for recoding NA values based on the mode
A dataframe with one new variable:
id
a unique patient id
valid_nhs
if retained using argument .keepValidNHS=TRUE
, a
BOOLEAN containing the result of the NHS checksum validation
id_test <- data.frame(
nhs_n = c(
9434765919,9434765919,9434765919,NA,NA,
3367170666,5185293519,5185293519,5185293519,8082318562,NA,NA,NA
),
hosp_n = c(
'13','13','13','UNKNOWN','13','13','13','31','31','96','96',NA,'96'),
sex = c(rep('F',6),rep('Male',4), 'U', 'U', 'M'),
dateofbirth = as.Date(
c(
'1988-10-06','1988-10-06','1900-01-01','1988-10-06','1988-10-06',
'1988-10-06','1988-10-06','1988-10-06','1988-10-06','2020-01-28',
'2020-01-28','2020-01-28','2020-01-28'
)
),
firstname = c(
'Danger','Danger','Denger','Danger','Danger','DANGER','Danger',
'Danger','Danger','Crazy','Crazy','Krazy','C'
),
lastname = c(
'Mouse','Mause','Mouse','Moose','Moose','Mouse','MOUSe',
'Mouse','Mouse','Frog','FROG','Frug','Frog'
),
testdate = sample(seq.Date(Sys.Date()-21,Sys.Date(),"day"),13,replace = TRUE)
)
uk_patient_id(x = id_test,
nhs_number = 'nhs_n',
hospital_number = 'hosp_n',
forename = 'firstname',
surname = 'lastname',
sex_mfu = 'sex',
date_of_birth = 'dateofbirth',
.sortOrder = 'testdate')[]
#> id nhs_n hosp_n sex dateofbirth firstname lastname testdate
#> 1: 1 NA 13 F 1988-10-06 DANGER MOOSE 2022-06-22
#> 2: 1 5185293519 13 M 1988-10-06 DANGER MOUSE 2022-06-24
#> 3: 1 9434765919 13 F 1988-10-06 DANGER MAUSE 2022-06-28
#> 4: 1 NA UNKNOWN F 1988-10-06 DANGER MOOSE 2022-06-30
#> 5: 1 3367170666 13 F 1988-10-06 DANGER MOUSE 2022-06-30
#> 6: 1 9434765919 13 F 1900-01-01 DENGER MOUSE 2022-07-03
#> 7: 1 9434765919 13 F 1988-10-06 DANGER MOUSE 2022-07-11
#> 8: 2 NA <NA> <NA> 2020-01-28 KRAZY FRUG 2022-06-22
#> 9: 3 5185293519 31 M 1988-10-06 DANGER MOUSE 2022-07-02
#> 10: 3 5185293519 31 M 1988-10-06 DANGER MOUSE 2022-07-08
#> 11: 9 NA 96 <NA> 2020-01-28 CRAZY FROG 2022-07-05
#> 12: 9 NA 96 M 2020-01-28 C FROG 2022-07-05
#> 13: 9 8082318562 96 M 2020-01-28 CRAZY FROG 2022-07-10