{
    "content": [
        {
            "type": "text",
            "text": "# sort (info)\n\n## Section Outline\n\n- **File: coreutils.info,  Node: sort invocation,  Next: shuf invocation,  Up: Operating on sorted files** (542 lines)\n- **First, it is ineffective if 'LCALL' is also set.  Second, it has** (4 lines)\n\n## Full Content\n\n### File: coreutils.info,  Node: sort invocation,  Next: shuf invocation,  Up: Operating on sorted files\n\n7.1 'sort': Sort text files\n===========================\n\n'sort' sorts, merges, or compares all the lines from the given files, or\nstandard input if none are given or for a FILE of '-'.  By default,\n'sort' writes the results to standard output.  Synopsis:\n\nsort [OPTION]... [FILE]...\n\nMany options affect how 'sort' compares lines; if the results are\nunexpected, try the '--debug' option to see what happened.  A pair of\nlines is compared as follows: 'sort' compares each pair of fields (see\n'--key'), in the order specified on the command line, according to the\nassociated ordering options, until a difference is found or no fields\nare left.  If no key fields are specified, 'sort' uses a default key of\nthe entire line.  Finally, as a last resort when all keys compare equal,\n'sort' compares entire lines as if no ordering options other than\n'--reverse' ('-r') were specified.  The '--stable' ('-s') option\ndisables this \"last-resort comparison\" so that lines in which all fields\ncompare equal are left in their original relative order.  The '--unique'\n('-u') option also disables the last-resort comparison.\n\nUnless otherwise specified, all comparisons use the character\ncollating sequence specified by the 'LCCOLLATE' locale.(1)  A line's\ntrailing newline is not part of the line for comparison purposes.  If\nthe final byte of an input file is not a newline, GNU 'sort' silently\nsupplies one.  GNU 'sort' (as specified for all GNU utilities) has no\nlimit on input line length or restrictions on bytes allowed within\nlines.\n\n'sort' has three modes of operation: sort (the default), merge, and\ncheck for sortedness.  The following options change the operation mode:\n\n'-c'\n'--check'\n'--check=diagnose-first'\nCheck whether the given file is already sorted: if it is not all\nsorted, print a diagnostic containing the first out-of-order line\nand exit with a status of 1.  Otherwise, exit successfully.  At\nmost one input file can be given.\n\n'-C'\n'--check=quiet'\n'--check=silent'\nExit successfully if the given file is already sorted, and exit\nwith status 1 otherwise.  At most one input file can be given.\nThis is like '-c', except it does not print a diagnostic.\n\n'-m'\n'--merge'\nMerge the given files by sorting them as a group.  Each input file\nmust always be individually sorted.  It always works to sort\ninstead of merge; merging is provided because it is faster, in the\ncase where it works.\n\nExit status:\n\n0 if no error occurred\n1 if invoked with '-c' or '-C' and the input is not sorted\n2 if an error occurred\n\nIf the environment variable 'TMPDIR' is set, 'sort' uses its value as\nthe directory for temporary files instead of '/tmp'.  The\n'--temporary-directory' ('-T') option in turn overrides the environment\nvariable.\n\nThe following options affect the ordering of output lines.  They may\nbe specified globally or as part of a specific key field.  If no key\nfields are specified, global options apply to comparison of entire\nlines; otherwise the global options are inherited by key fields that do\nnot specify any special options of their own.  In pre-POSIX versions of\n'sort', global options affect only later key fields, so portable shell\nscripts should specify global options first.\n\n'-b'\n'--ignore-leading-blanks'\nIgnore leading blanks when finding sort keys in each line.  By\ndefault a blank is a space or a tab, but the 'LCCTYPE' locale can\nchange this.  Note blanks may be ignored by your locale's collating\nrules, but without this option they will be significant for\ncharacter positions specified in keys with the '-k' option.\n\n'-d'\n'--dictionary-order'\nSort in \"phone directory\" order: ignore all characters except\nletters, digits and blanks when sorting.  By default letters and\ndigits are those of ASCII and a blank is a space or a tab, but the\n'LCCTYPE' locale can change this.\n\n'-f'\n'--ignore-case'\nFold lowercase characters into the equivalent uppercase characters\nwhen comparing so that, for example, 'b' and 'B' sort as equal.\nThe 'LCCTYPE' locale determines character types.  When used with\n'--unique' those lower case equivalent lines are thrown away.\n(There is currently no way to throw away the upper case equivalent\ninstead.  (Any '--reverse' given would only affect the final\nresult, after the throwing away.))\n\n'-g'\n'--general-numeric-sort'\n'--sort=general-numeric'\nSort numerically, converting a prefix of each line to a long\ndouble-precision floating point number.  *Note Floating point::.\nDo not report overflow, underflow, or conversion errors.  Use the\nfollowing collating sequence:\n\n* Lines that do not start with numbers (all considered to be\nequal).\n* NaNs (\"Not a Number\" values, in IEEE floating point\narithmetic) in a consistent but machine-dependent order.\n* Minus infinity.\n* Finite numbers in ascending numeric order (with -0 and +0\nequal).\n* Plus infinity.\n\nUse this option only if there is no alternative; it is much slower\nthan '--numeric-sort' ('-n') and it can lose information when\nconverting to floating point.\n\n'-h'\n'--human-numeric-sort'\n'--sort=human-numeric'\nSort numerically, first by numeric sign (negative, zero, or\npositive); then by SI suffix (either empty, or 'k' or 'K', or one\nof 'MGTPEZY', in that order; *note Block size::); and finally by\nnumeric value.  For example, '1023M' sorts before '1G' because 'M'\n(mega) precedes 'G' (giga) as an SI suffix.  This option sorts\nvalues that are consistently scaled to the nearest suffix,\nregardless of whether suffixes denote powers of 1000 or 1024, and\nit therefore sorts the output of any single invocation of the 'df',\n'du', or 'ls' commands that are invoked with their\n'--human-readable' or '--si' options.  The syntax for numbers is\nthe same as for the '--numeric-sort' option; the SI suffix must\nimmediately follow the number.  Note also the 'numfmt' command,\nwhich can be used to reformat numbers to human format after the\nsort, thus often allowing sort to operate on more accurate numbers.\n\n'-i'\n'--ignore-nonprinting'\nIgnore nonprinting characters.  The 'LCCTYPE' locale determines\ncharacter types.  This option has no effect if the stronger\n'--dictionary-order' ('-d') option is also given.\n\n'-M'\n'--month-sort'\n'--sort=month'\nAn initial string, consisting of any amount of blanks, followed by\na month name abbreviation, is folded to UPPER case and compared in\nthe order 'JAN' < 'FEB' < ... < 'DEC'.  Invalid names compare low\nto valid names.  The 'LCTIME' locale category determines the month\nspellings.  By default a blank is a space or a tab, but the\n'LCCTYPE' locale can change this.\n\n'-n'\n'--numeric-sort'\n'--sort=numeric'\nSort numerically.  The number begins each line and consists of\noptional blanks, an optional '-' sign, and zero or more digits\npossibly separated by thousands separators, optionally followed by\na decimal-point character and zero or more digits.  An empty number\nis treated as '0'.  The 'LCNUMERIC' locale specifies the\ndecimal-point character and thousands separator.  By default a\nblank is a space or a tab, but the 'LCCTYPE' locale can change\nthis.\n\nComparison is exact; there is no rounding error.\n\nNeither a leading '+' nor exponential notation is recognized.  To\ncompare such strings numerically, use the '--general-numeric-sort'\n('-g') option.\n\n'-V'\n'--version-sort'\nSort by version name and number.  It behaves like a standard sort,\nexcept that each sequence of decimal digits is treated numerically\nas an index/version number.  (*Note Version sort ordering::.)\n\n'-r'\n'--reverse'\nReverse the result of comparison, so that lines with greater key\nvalues appear earlier in the output instead of later.\n\n'-R'\n'--random-sort'\n'--sort=random'\nSort by hashing the input keys and then sorting the hash values.\nChoose the hash function at random, ensuring that it is free of\ncollisions so that differing keys have differing hash values.  This\nis like a random permutation of the inputs (*note shuf\ninvocation::), except that keys with the same value sort together.\n\nIf multiple random sort fields are specified, the same random hash\nfunction is used for all fields.  To use different random hash\nfunctions for different fields, you can invoke 'sort' more than\nonce.\n\nThe choice of hash function is affected by the '--random-source'\noption.\n\nOther options are:\n\n'--compress-program=PROG'\nCompress any temporary files with the program PROG.\n\nWith no arguments, PROG must compress standard input to standard\noutput, and when given the '-d' option it must decompress standard\ninput to standard output.\n\nTerminate with an error if PROG exits with nonzero status.\n\nWhite space and the backslash character should not appear in PROG;\nthey are reserved for future use.\n\n'--files0-from=FILE'\nDisallow processing files named on the command line, and instead\nprocess those named in file FILE; each name being terminated by a\nzero byte (ASCII NUL). This is useful when the list of file names\nis so long that it may exceed a command line length limitation.  In\nsuch cases, running 'sort' via 'xargs' is undesirable because it\nsplits the list into pieces and makes 'sort' print sorted output\nfor each sublist rather than for the entire list.  One way to\nproduce a list of ASCII NUL terminated file names is with GNU\n'find', using its '-print0' predicate.  If FILE is '-' then the\nASCII NUL terminated file names are read from standard input.\n\n'-k POS1[,POS2]'\n'--key=POS1[,POS2]'\nSpecify a sort field that consists of the part of the line between\nPOS1 and POS2 (or the end of the line, if POS2 is omitted),\ninclusive.\n\nIn its simplest form POS specifies a field number (starting with\n1), with fields being separated by runs of blank characters, and by\ndefault those blanks being included in the comparison at the start\nof each field.  To adjust the handling of blank characters see the\n'-b' and '-t' options.\n\nMore generally, each POS has the form 'F[.C][OPTS]', where F is the\nnumber of the field to use, and C is the number of the first\ncharacter from the beginning of the field.  Fields and character\npositions are numbered starting with 1; a character position of\nzero in POS2 indicates the field's last character.  If '.C' is\nomitted from POS1, it defaults to 1 (the beginning of the field);\nif omitted from POS2, it defaults to 0 (the end of the field).\nOPTS are ordering options, allowing individual keys to be sorted\naccording to different rules; see below for details.  Keys can span\nmultiple fields.\n\nExample: To sort on the second field, use '--key=2,2' ('-k 2,2').\nSee below for more notes on keys and more examples.  See also the\n'--debug' option to help determine the part of the line being used\nin the sort.\n\n'--debug'\nHighlight the portion of each line used for sorting.  Also issue\nwarnings about questionable usage to stderr.\n\n'--batch-size=NMERGE'\nMerge at most NMERGE inputs at once.\n\nWhen 'sort' has to merge more than NMERGE inputs, it merges them in\ngroups of NMERGE, saving the result in a temporary file, which is\nthen used as an input in a subsequent merge.\n\nA large value of NMERGE may improve merge performance and decrease\ntemporary storage utilization at the expense of increased memory\nusage and I/O.  Conversely a small value of NMERGE may reduce\nmemory requirements and I/O at the expense of temporary storage\nconsumption and merge performance.\n\nThe value of NMERGE must be at least 2.  The default value is\ncurrently 16, but this is implementation-dependent and may change\nin the future.\n\nThe value of NMERGE may be bounded by a resource limit for open\nfile descriptors.  The commands 'ulimit -n' or 'getconf OPENMAX'\nmay display limits for your systems; these limits may be modified\nfurther if your program already has some files open, or if the\noperating system has other limits on the number of open files.  If\nthe value of NMERGE exceeds the resource limit, 'sort' silently\nuses a smaller value.\n\n'-o OUTPUT-FILE'\n'--output=OUTPUT-FILE'\nWrite output to OUTPUT-FILE instead of standard output.  Normally,\n'sort' reads all input before opening OUTPUT-FILE, so you can sort\na file in place by using commands like 'sort -o F F' and 'cat F |\nsort -o F'.  However, it is often safer to output to an\notherwise-unused file, as data may be lost if the system crashes or\n'sort' encounters an I/O or other serious error while a file is\nbeing sorted in place.  Also, 'sort' with '--merge' ('-m') can open\nthe output file before reading all input, so a command like 'cat F\n| sort -m -o F - G' is not safe as 'sort' might start writing 'F'\nbefore 'cat' is done reading it.\n\nOn newer systems, '-o' cannot appear after an input file if\n'POSIXLYCORRECT' is set, e.g., 'sort F -o F'.  Portable scripts\nshould specify '-o OUTPUT-FILE' before any input files.\n\n'--random-source=FILE'\nUse FILE as a source of random data used to determine which random\nhash function to use with the '-R' option.  *Note Random sources::.\n\n'-s'\n'--stable'\n\nMake 'sort' stable by disabling its last-resort comparison.  This\noption has no effect if no fields or global ordering options other\nthan '--reverse' ('-r') are specified.\n\n'-S SIZE'\n'--buffer-size=SIZE'\nUse a main-memory sort buffer of the given SIZE.  By default, SIZE\nis in units of 1024 bytes.  Appending '%' causes SIZE to be\ninterpreted as a percentage of physical memory.  Appending 'K'\nmultiplies SIZE by 1024 (the default), 'M' by 1,048,576, 'G' by\n1,073,741,824, and so on for 'T', 'P', 'E', 'Z', and 'Y'.\nAppending 'b' causes SIZE to be interpreted as a byte count, with\nno multiplication.\n\nThis option can improve the performance of 'sort' by causing it to\nstart with a larger or smaller sort buffer than the default.\nHowever, this option affects only the initial buffer size.  The\nbuffer grows beyond SIZE if 'sort' encounters input lines larger\nthan SIZE.\n\n'-t SEPARATOR'\n'--field-separator=SEPARATOR'\nUse character SEPARATOR as the field separator when finding the\nsort keys in each line.  By default, fields are separated by the\nempty string between a non-blank character and a blank character.\nBy default a blank is a space or a tab, but the 'LCCTYPE' locale\ncan change this.\n\nThat is, given the input line ' foo bar', 'sort' breaks it into\nfields ' foo' and ' bar'.  The field separator is not considered to\nbe part of either the field preceding or the field following, so\nwith 'sort -t \" \"' the same input line has three fields: an empty\nfield, 'foo', and 'bar'.  However, fields that extend to the end of\nthe line, as '-k 2', or fields consisting of a range, as '-k 2,3',\nretain the field separators present between the endpoints of the\nrange.\n\nTo specify ASCII NUL as the field separator, use the two-character\nstring '\\0', e.g., 'sort -t '\\0''.\n\n'-T TEMPDIR'\n'--temporary-directory=TEMPDIR'\nUse directory TEMPDIR to store temporary files, overriding the\n'TMPDIR' environment variable.  If this option is given more than\nonce, temporary files are stored in all the directories given.  If\nyou have a large sort or merge that is I/O-bound, you can often\nimprove performance by using this option to specify directories on\ndifferent disks and controllers.\n\n'--parallel=N'\nSet the number of sorts run in parallel to N.  By default, N is set\nto the number of available processors, but limited to 8, as there\nare diminishing performance gains after that.  Note also that using\nN threads increases the memory usage by a factor of log N.  Also\nsee *note nproc invocation::.\n\n'-u'\n'--unique'\n\nNormally, output only the first of a sequence of lines that compare\nequal.  For the '--check' ('-c' or '-C') option, check that no pair\nof consecutive lines compares equal.\n\nThis option also disables the default last-resort comparison.\n\nThe commands 'sort -u' and 'sort | uniq' are equivalent, but this\nequivalence does not extend to arbitrary 'sort' options.  For\nexample, 'sort -n -u' inspects only the value of the initial\nnumeric string when checking for uniqueness, whereas 'sort -n |\nuniq' inspects the entire line.  *Note uniq invocation::.\n\n'-z'\n'--zero-terminated'\nDelimit items with a zero byte rather than a newline (ASCII LF).\nI.e., treat input as items separated by ASCII NUL and terminate\noutput items with ASCII NUL. This option can be useful in\nconjunction with 'perl -0' or 'find -print0' and 'xargs -0' which\ndo the same in order to reliably handle arbitrary file names (even\nthose containing blanks or other special characters).\n\nHistorical (BSD and System V) implementations of 'sort' have differed\nin their interpretation of some options, particularly '-b', '-f', and\n'-n'.  GNU sort follows the POSIX behavior, which is usually (but not\nalways!)  like the System V behavior.  According to POSIX, '-n' no\nlonger implies '-b'.  For consistency, '-M' has been changed in the same\nway.  This may affect the meaning of character positions in field\nspecifications in obscure cases.  The only fix is to add an explicit\n'-b'.\n\nA position in a sort field specified with '-k' may have any of the\noption letters 'MbdfghinRrV' appended to it, in which case no global\nordering options are inherited by that particular field.  The '-b'\noption may be independently attached to either or both of the start and\nend positions of a field specification, and if it is inherited from the\nglobal options it will be attached to both.  If input lines can contain\nleading or adjacent blanks and '-t' is not used, then '-k' is typically\ncombined with '-b' or an option that implicitly ignores leading blanks\n('Mghn') as otherwise the varying numbers of leading blanks in fields\ncan cause confusing results.\n\nIf the start position in a sort field specifier falls after the end\nof the line or after the end field, the field is empty.  If the '-b'\noption was specified, the '.C' part of a field specification is counted\nfrom the first nonblank character of the field.\n\nOn systems not conforming to POSIX 1003.1-2001, 'sort' supports a\ntraditional origin-zero syntax '+POS1 [-POS2]' for specifying sort keys.\nThe traditional command 'sort +A.X -B.Y' is equivalent to 'sort -k\nA+1.X+1,B' if Y is '0' or absent, otherwise it is equivalent to 'sort -k\nA+1.X+1,B+1.Y'.\n\nThis traditional behavior can be controlled with the\n'POSIX2VERSION' environment variable (*note Standards conformance::);\nit can also be enabled when 'POSIXLYCORRECT' is not set by using the\ntraditional syntax with '-POS2' present.\n\nScripts intended for use on standard hosts should avoid traditional\nsyntax and should use '-k' instead.  For example, avoid 'sort +2', since\nit might be interpreted as either 'sort ./+2' or 'sort -k 3'.  If your\nscript must also run on hosts that support only the traditional syntax,\nit can use a test like 'if sort -k 1 </dev/null >/dev/null 2>&1; then\n...' to decide which syntax to use.\n\nHere are some examples to illustrate various combinations of options.\n\n* Sort in descending (reverse) numeric order.\n\nsort -n -r\n\n* Run no more than 4 sorts concurrently, using a buffer size of 10M.\n\nsort --parallel=4 -S 10M\n\n* Sort alphabetically, omitting the first and second fields and the\nblanks at the start of the third field.  This uses a single key\ncomposed of the characters beginning at the start of the first\nnonblank character in field three and extending to the end of each\nline.\n\nsort -k 3b\n\n* Sort numerically on the second field and resolve ties by sorting\nalphabetically on the third and fourth characters of field five.\nUse ':' as the field delimiter.\n\nsort -t : -k 2,2n -k 5.3,5.4\n\nNote that if you had written '-k 2n' instead of '-k 2,2n' 'sort'\nwould have used all characters beginning in the second field and\nextending to the end of the line as the primary numeric key.  For\nthe large majority of applications, treating keys spanning more\nthan one field as numeric will not do what you expect.\n\nAlso note that the 'n' modifier was applied to the field-end\nspecifier for the first key.  It would have been equivalent to\nspecify '-k 2n,2' or '-k 2n,2n'.  All modifiers except 'b' apply to\nthe associated field, regardless of whether the modifier\ncharacter is attached to the field-start and/or the field-end part\nof the key specifier.\n\n* Sort the password file on the fifth field and ignore any leading\nblanks.  Sort lines with equal values in field five on the numeric\nuser ID in field three.  Fields are separated by ':'.\n\nsort -t : -k 5b,5 -k 3,3n /etc/passwd\nsort -t : -n -k 5b,5 -k 3,3 /etc/passwd\nsort -t : -b -k 5,5 -k 3,3n /etc/passwd\n\nThese three commands have equivalent effect.  The first specifies\nthat the first key's start position ignores leading blanks and the\nsecond key is sorted numerically.  The other two commands rely on\nglobal options being inherited by sort keys that lack modifiers.\nThe inheritance works in this case because '-k 5b,5b' and '-k 5b,5'\nare equivalent, as the location of a field-end lacking a '.C'\ncharacter position is not affected by whether initial blanks are\nskipped.\n\n* Sort a set of log files, primarily by IPv4 address and secondarily\nby timestamp.  If two lines' primary and secondary keys are\nidentical, output the lines in the same order that they were input.\nThe log files contain lines that look like this:\n\n4.150.156.3 - - [01/Apr/2004:06:31:51 +0000] message 1\n211.24.3.231 - - [24/Apr/2004:20:17:39 +0000] message 2\n\nFields are separated by exactly one space.  Sort IPv4 addresses\nlexicographically, e.g., 212.61.52.2 sorts before 212.129.233.201\nbecause 61 is less than 129.\n\nsort -s -t ' ' -k 4.9n -k 4.5M -k 4.2n -k 4.14,4.21 file*.log |\nsort -s -t '.' -k 1,1n -k 2,2n -k 3,3n -k 4,4n\n\nThis example cannot be done with a single 'sort' invocation, since\nIPv4 address components are separated by '.' while dates come just\nafter a space.  So it is broken down into two invocations of\n'sort': the first sorts by timestamp and the second by IPv4\naddress.  The timestamp is sorted by year, then month, then day,\nand finally by hour-minute-second field, using '-k' to isolate each\nfield.  Except for hour-minute-second there's no need to specify\nthe end of each key field, since the 'n' and 'M' modifiers sort\nbased on leading prefixes that cannot cross field boundaries.  The\nIPv4 addresses are sorted lexicographically.  The second sort uses\n'-s' so that ties in the primary key are broken by the secondary\nkey; the first sort uses '-s' so that the combination of the two\nsorts is stable.\n\n* Generate a tags file in case-insensitive sorted order.\n\nfind src -type f -print0 | sort -z -f | xargs -0 etags --append\n\nThe use of '-print0', '-z', and '-0' in this case means that file\nnames that contain blanks or other special characters are not\nbroken up by the sort operation.\n\n* Use the common DSU, Decorate Sort Undecorate idiom to sort lines\naccording to their length.\n\nawk '{print length, $0}' /etc/passwd | sort -n | cut -f2- -d' '\n\nIn general this technique can be used to sort data that the 'sort'\ncommand does not support, or is inefficient at, sorting directly.\n\n* Shuffle a list of directories, but preserve the order of files\nwithin each directory.  For instance, one could use this to\ngenerate a music playlist in which albums are shuffled but the\nsongs of each album are played in order.\n\nls */* | sort -t / -k 1,1R -k 2,2\n\n---------- Footnotes ----------\n\n(1) If you use a non-POSIX locale (e.g., by setting 'LCALL' to\n'enUS'), then 'sort' may produce output that is sorted differently than\nyou're accustomed to.  In that case, set the 'LCALL' environment\nvariable to 'C'.  Note that setting only 'LCCOLLATE' has two problems.\n\n### First, it is ineffective if 'LCALL' is also set.  Second, it has\n\nundefined behavior if 'LCCTYPE' (or 'LANG', if 'LCCTYPE' is unset) is\nset to an incompatible value.  For example, you get undefined behavior\nif 'LCCTYPE' is 'jaJP.PCK' but 'LCCOLLATE' is 'enUS.UTF-8'.\n\n"
        }
    ],
    "structuredContent": {
        "command": "sort",
        "section": "",
        "mode": "info",
        "summary": null,
        "synopsis": null,
        "flags": [],
        "examples": [],
        "see_also": [],
        "section_outline": [
            {
                "name": "File: coreutils.info,  Node: sort invocation,  Next: shuf invocation,  Up: Operating on sorted files",
                "lines": 542,
                "subsections": []
            },
            {
                "name": "First, it is ineffective if 'LCALL' is also set.  Second, it has",
                "lines": 4,
                "subsections": []
            }
        ]
    }
}