The Linux command
cutallows you to extract text portions from files or data streams. It is especially useful for working with delimited data, such as csv files . This is what you need to know.
The cut command
cutcommando is a veteran of the world Unix and debuted in 1982 as part of AT&T System III UNIX. Your purpose in life is to cut sections of text from files or streams, according to criteria you set. Its syntax is as simple as its purpose, but it is this overall simplicity that makes it so useful.
In traditional UNIX style, when combined
cutwith other utilities What
grep you can create elegant and powerful solutions to challenging problems. Although there are different versions of
cutwe are going to discuss the standard version of GNU/Linux. Note that other versions, particularly the
cutfound in variants bsd they do not include all of the options described here.
You can check which version is installed on your computer by issuing this command:
If you see “GNU coreutils” in the output, you are on the version we are going to describe in this article. All versions of
cuthave some of this functionality, but improvements have been added to the Linux version.
First steps with cutting
Whether we’re channeling information
cut or using for
cutread a file the commands we use are the same. Anything you can do with an input stream
cutcan be done on a line of text in a file, and vice versa . We can say
cutwork with bytes, characters, or delimited fields.
To select a single byte, we use the
-boption (byte) e
cutwe indicate which byte or bytes we want. In this case, it is byte five. We are sending the string “how to geek” to
cutcommand with a pipe, “|”, from
echo 'cómo hacer friki' | cortar -b 5
The fifth byte in that string is “t”, so
cutresponds by printing “t” in the terminal window.
To specify a rank we use a hyphen. To extract bytes 5 through 11 inclusive, we would issue this command:
echo 'cómo hacer friki' | cortar -b 5-11
You can provide multiple single bytes or ranges by separating them with commas. To extract byte 5 and byte 11, use this command:
echo 'cómo hacer friki' | cortar -b 5,11
To get the first letter of each word we can use this command:
echo 'cómo hacer friki' | cortar -b 1,5,8
If you use the hyphen without a first number,
cutreturns everything from position 1 to number. If you use the hyphen without a second number,
cutreturns everything from the first number to the end of the string or line.
echo 'cómo hacer friki' | cortar -b -6
echo 'cómo hacer friki' | corte -b 8-
Using cut with characters
cutwith characters is more or less the same as using it with bytes. In both cases, special care must be taken with complex characters. When using the
-coption (character), we indicate
cutthat works in terms of characters, not bytes.
echo 'cómo hacer friki' | cortar -c 1,5,8
echo 'cómo hacer friki' | cortar -c 8-11
These work exactly as you would expect. But look at this example. It’s a six-letter word, so request
cutReturning characters one through six should return the entire word. But it’s not like that. It is a short character. To see the complete word we have to ask for the characters from one to seven.
echo 'piñata' | cortar -c 1-6
echo 'piñata' | cortar -c 1-7
The problem is that the “ñ” character is actually made up of two bytes. We can see this very easily. We have a Text file short containing this line of text:
We will examine that file with the
hexdumputility. Use the
-Coption (canonical) gives us a table of hexadecimal digits with the ASCII equivalent on the right. In the ASCII table, the “ñ” is not shown, but there are dots that represent two non-printable characters. These are the highlighted bytes in the table hexadecimal .
volcado hexadecimal -C unicode.txt
These two bytes are used by the display program, in this case, the Bash shell, to identify the «ñ». Many unicode characters they use three or more bytes to represent a single character.
If we request character 3 or character 4, we are shown the symbol of a non-printable character. If we ask for bytes 3 and 4, the shell interprets them as “ñ”.
echo 'piñata' | cortar -c 3
echo 'piñata' | cortar -c 4
echo 'piñata' | cortar -c 3-4
Using cut with delimited data
we can order
cutsplit lines of text using a specific delimiter. By default, cut uses a tab character, but it’s easy to tell it to use whatever we want. The fields in the “/etc/passwd” file are separated by a colon “:”, so we’ll use that as our delimiter and extract some text.
The portions of text between the delimiters are called fields and are referred to as bytes or characters, but are preceded by the
-foption (fields). You can leave a space between the “f” and the digit, or not.
The first command uses the
-doption (delimiter) to tell cut to use “:” as a delimiter. It will extract the first field of each line in the “/etc/passwd” file. It will be a long list, so we are using
-noption (number) to display only the first five responses. The second command does the same thing but uses
tailto show us the last five responses.
cortar -d':' -f1 /etc/contraseña | cabeza -n 5
cortar -d':' -f2 /etc/contraseña | cola -n 5
To extract a selection of fields, list them as a comma-separated list. This command will extract fields one through three, five and six.
cortar -d':' -f1-3,5,6 /etc/passwd | cola -n 5
grepin the command, we can look for lines that include “/bin/bash”. This means that we can list only those entries that have Bash as the default shell. These will usually be the “normal” user accounts. We’ll ask for fields one through six because the seventh field is the default shell field and we already know what it is, we’re looking for it.
grep "/bin/bash" /etc/contraseña | cortar -d':' -f1-6
Another way to include all fields except one is to use the
--complementoption. This reverses the field selection and displays everything I dont know has requested. Let’s repeat the last command but only ask for field seven. Then we’ll run that command again with the
grep "/bin/bash" /etc/contraseña | cortar -d':' -f7
grep "/bin/bash" /etc/contraseña | cortar -d':' -f7 --complemento
The first command finds a list of entries, but field seven gives us nothing to distinguish between them, so we don’t know who the entries refer to. In the second command, adding the
--complementoption we get everything except field seven.
Cut pipe In cut
Sticking with the “/etc/passwd” file, let’s extract field five. This is the user’s real name user account owner .
grep "/bin/bash" /etc/contraseña | cortar -d':' -f5
The fifth field has subfields separated by commas. They are rarely padded, so they appear as a line of commas.
We can remove the commas by piping the output of the above command to another invocation of
cut. The second instance of
cut uses the comma “,” as the delimiter. The
-soption (only delimited) indicates
cutresults that do not have the delimiter at all are suppressed.
grep "/bin/bash" /etc/contraseña | cortar -d':' -s -f5 | cortar -d',' -s -f1
Since the root entry has no comma subfields in the fifth field, it is suppressed and we get the results we are looking for: a list of the names of the “real” users configured on this computer.
The output delimiter
We have a small file with some values separated by commas. The fields in this dummy data are:
- ID : a database identification number
- First : The first name of the subject.
- Last name : The surname of the subject.
- e-mail : Your email address.
- IP adress : His IP adress .
- Brand : The make of the motor vehicle they drive.
- Model : The model of motor vehicle they drive.
- Year : The year your motor vehicle was manufactured.
If we tell cut to use the comma as a delimiter, we can extract fields just like we did before. Sometimes you will have a requirement to extract data from a file, but you don’t want the field delimiter to be included in the results. Using the
--output-delimiterwe can say cut which character, or in fact, sequence of characters, use instead of the actual delimiter.
cortar -d ',' -f 2,3 pequeño.csv
cortar -d ',' -f 2,3 pequeño.csv --output-delimiter=" "
The second command tells you
cutReplace commas with spaces.
We can take this further and use this function to convert the output to a vertical list. This command uses a new line character as the output delimiter. Note the “$” we need to include so that the newline character is acted upon and not interpreted as a literal two-character sequence.
We will use
grepto filter the entrance of Morgana Renwick and we will ask
cutprint all fields from field two to the end of the record, and use a newline character as the output delimiter.
grep 'renwick' pequeño.csv | cortar -d ',' -f2- --output-delimiter=$''
An Oldie but Goldie
At the time of writing this article, the command Little Cut is approaching its 40th birthday, and we’re still using it and writing about it today. I guess cutting text today is the same as it was 40 years ago. That is, much easier when you have the right tool at hand.