mixedSort {jamba} | R Documentation |
sort alphanumeric values keeping numeric values in proper order
Description
sort alphanumeric values keeping numeric values in proper order
Usage
mixedSort(
x,
blanksFirst = TRUE,
na.last = NAlast,
keepNegative = FALSE,
keepInfinite = FALSE,
keepDecimal = FALSE,
ignore.case = TRUE,
useCaseTiebreak = TRUE,
honorFactor = FALSE,
sortByName = FALSE,
verbose = FALSE,
NAlast = TRUE,
...
)
Arguments
x |
|
blanksFirst |
|
na.last |
|
keepNegative |
|
keepInfinite |
|
keepDecimal |
|
ignore.case |
|
useCaseTiebreak |
|
honorFactor |
|
sortByName |
|
verbose |
|
NAlast |
|
... |
additional parameters are sent to |
Details
This function is a refactor of gtools
mixedsort(), a clever bit of
R coding from the gtools
package. It was extended to make it slightly
faster, and to handle special cases slightly differently.
It was driven by the need to sort gene symbols, miRNA symbols, chromosome
names, all with proper numeric order, for example:
- test set:
miR-12,miR-1,miR-122,miR-1b,mir-1a
- gtools::mixedsort:
miR-122,miR-12,miR-1,miR-1a,mir-1b
- mixedSort:
miR-1,miR-1a,miR-1b,miR-12,miR-122
The function does not by default recognize negative numbers as negative,
instead it treats '-' as a delimiter, unless keepNegative=TRUE
.
This function also attempts to maintain '.' as part of a decimal number, which can be problematic when sorting IP addresses, for example.
This function is really just a wrapper function for mixedOrder()
,
which does the work of defining the appropriate order.
The sort logic is roughly as follows:
Split each term into alternating chunks containing
character
ornumeric
substrings, split across columns in a matrix.Apply appropriate
ignore.case
logic to the character substrings, effectively applyingtoupper()
on substringsDefine rank order of character substrings in each matrix column, maintaining ties to be resolved in subsequent columns.
Convert
character
tonumeric
ranks viafactor
intermediate, defined higher than the highestnumeric
substring value.When
ignore.case=TRUE
anduseCaseTiebreak=TRUE
, an additional tiebreaker column is defined using thecharacter
substring values without applyingtoupper()
.A final tiebreaker column is the input string itself, with
toupper()
applied whenignore.case=TRUE
.Apply order across all substring columns.
Therefore, some expected behaviors:
When
ignore.case=TRUE
anduseCaseTiebreak=TRUE
(default for both) the input data is ordered without regard to case, then the tiebreaker applies case-specific sort criteria to the final product. This logic is very close to defaultsort()
except for the handling of internalnumeric
values inside each string.
Value
vector
of values from argument x
, ordered by
mixedOrder()
. The output class should match class(x)
.
See Also
Other jam sort functions:
mixedOrder()
,
mixedSortDF()
,
mixedSorts()
,
mmixedOrder()
Examples
x <- c("miR-12","miR-1","miR-122","miR-1b", "miR-1a", "miR-2");
sort(x);
mixedSort(x);
# test honorFactor
mixedSort(factor(c("Cnot9", "Cnot8", "Cnot10")))
mixedSort(factor(c("Cnot9", "Cnot8", "Cnot10")), honorFactor=TRUE)
# test ignore.case
mixedSort(factor(c("Cnot9", "Cnot8", "CNOT9", "Cnot10")))
mixedSort(factor(c("CNOT9", "Cnot8", "Cnot9", "Cnot10")))
mixedSort(factor(c("Cnot9", "Cnot8", "CNOT9", "Cnot10")), ignore.case=FALSE)
mixedSort(factor(c("Cnot9", "Cnot8", "CNOT9", "Cnot10")), ignore.case=TRUE)
mixedSort(factor(c("Cnot9", "Cnot8", "CNOT9", "Cnot10")), useCaseTiebreak=TRUE)
mixedSort(factor(c("CNOT9", "Cnot8", "Cnot9", "Cnot10")), useCaseTiebreak=FALSE)