I had a list of species names written up by a colleague, but the colleague had abbreviated subsequent adjacent instances of a genus in the list to the first letter of the genus with a dot after it, which is common in written prose, but is pretty daft in a dataset.
Instead of going through and manually writing in all the genus names, I wrote a function in R to do it for me:
fill.genus <- function(x, abbrev = "."){
rel_enc <- rle(as.character(x))
empty <- which(grepl("\\.", rel_enc$value))
rel_enc$values[empty] <- rel_enc$value[empty-1]
inverse.rle(rel_enc)
}
So if the dataset looks like this:
genus <- c("Tapiphyllum",
"Terminalia",
"T.",
"Tortuga",
"T.",
"Vangueriopsis",
"V.",
"V.",
"Xeroderris",
"Xylopia",
The output of fill.genus(genus)
would look like:
c("Tapiphyllum",
"Terminalia",
"Terminalia",
"Tortuga",
"Tortuga",
"Vangueriopsis",
"Vangueriopsis",
"Vangueriopsis",
"Xeroderris",
"Xylopia",