fatmawati achmad zaenuri/Shutterstock.com

The Linux command cutallows you to extract text portions from files or data streams. It is especially useful for working with delimited data, such as csv files . This is what you need to know.

The cut command

The cutcommando is a veteran of the world Unix and debuted in 1982 as part of AT&T System III UNIX. Your purpose in life is to cut sections of text from files or streams, according to criteria you set. Its syntax is as simple as its purpose, but it is this overall simplicity that makes it so useful.

In traditional UNIX style, when combined cutwith other utilities Whatgrep you can create elegant and powerful solutions to challenging problems. Although there are different versions of cutwe are going to discuss the standard version of GNU/Linux. Note that other versions, particularly the cutfound in variants bsd they do not include all of the options described here.

You can check which version is installed on your computer by issuing this command:

cortar --versión

If you see “GNU coreutils” in the output, you are on the version we are going to describe in this article. All versions of cuthave some of this functionality, but improvements have been added to the Linux version.

First steps with cutting

Whether we’re channeling informationcut or using for cutread a file the commands we use are the same. Anything you can do with an input stream cutcan be done on a line of text in a file, and vice versa . We can say cutwork with bytes, characters, or delimited fields.

To select a single byte, we use the -boption (byte) e cutwe indicate which byte or bytes we want. In this case, it is byte five. We are sending the string “how to geek” to cutcommand with a pipe, “|”, from echo.

echo 'cómo hacer friki' | cortar -b 5

Extracting a single byte with cut

The fifth byte in that string is “t”, so cutresponds by printing “t” in the terminal window.

To specify a rank we use a hyphen. To extract bytes 5 through 11 inclusive, we would issue this command:

echo 'cómo hacer friki' | cortar -b 5-11

Extracting a range of bytes with cut

You can provide multiple single bytes or ranges by separating them with commas. To extract byte 5 and byte 11, use this command:

RELATED:   Beat Saber is getting another free music pack, new block mechanic

echo 'cómo hacer friki' | cortar -b 5,11

Extracting two bytes with cut

To get the first letter of each word we can use this command:

echo 'cómo hacer friki' | cortar -b 1,5,8

Extracting three bytes with cut

If you use the hyphen without a first number, cutreturns everything from position 1 to number. If you use the hyphen without a second number, cutreturns everything from the first number to the end of the string or line.

echo 'cómo hacer friki' | cortar -b -6
echo 'cómo hacer friki' | corte -b 8-

Extracting byte ranges with cut

Using cut with characters

Use cutwith characters is more or less the same as using it with bytes. In both cases, special care must be taken with complex characters. When using the -coption (character), we indicate cutthat works in terms of characters, not bytes.

echo 'cómo hacer friki' | cortar -c 1,5,8
echo 'cómo hacer friki' | cortar -c 8-11

Character extraction and character ranges with cut

These work exactly as you would expect. But look at this example. It’s a six-letter word, so request cutReturning characters one through six should return the entire word. But it’s not like that. It is a short character. To see the complete word we have to ask for the characters from one to seven.

echo 'piñata' | cortar -c 1-6
echo 'piñata' | cortar -c 1-7

Special characters can take up more than one character

The problem is that the “ñ” character is actually made up of two bytes. We can see this very easily. We have a Text file short containing this line of text:

gato unicode.txt

The content of the short text file

We will examine that file with the hexdumputility. Use the -Coption (canonical) gives us a table of hexadecimal digits with the ASCII equivalent on the right. In the ASCII table, the “ñ” is not shown, but there are dots that represent two non-printable characters. These are the highlighted bytes in the table hexadecimal .

volcado hexadecimal -C unicode.txt

Hexdump of test text file

These two bytes are used by the display program, in this case, the Bash shell, to identify the «ñ». Many unicode characters they use three or more bytes to represent a single character.

If we request character 3 or character 4, we are shown the symbol of a non-printable character. If we ask for bytes 3 and 4, the shell interprets them as “ñ”.

echo 'piñata' | cortar -c 3
echo 'piñata' | cortar -c 4
echo 'piñata' | cortar -c 3-4

Using slice to extract the characters that make up a special character

Using cut with delimited data

we can order cutsplit lines of text using a specific delimiter. By default, cut uses a tab character, but it’s easy to tell it to use whatever we want. The fields in the “/etc/passwd” file are separated by a colon “:”, so we’ll use that as our delimiter and extract some text.

RELATED:   How to compress and decompress files in Windows 11

The portions of text between the delimiters are called fields and are referred to as bytes or characters, but are preceded by the -foption (fields). You can leave a space between the “f” and the digit, or not.

The first command uses the -doption (delimiter) to tell cut to use “:” as a delimiter. It will extract the first field of each line in the “/etc/passwd” file. It will be a long list, so we are using headthe -noption (number) to display only the first five responses. The second command does the same thing but uses tailto show us the last five responses.

cortar -d':' -f1 /etc/contraseña | cabeza -n 5
cortar -d':' -f2 /etc/contraseña | cola -n 5

Extract a range of fields from the /etc/passwd file

To extract a selection of fields, list them as a comma-separated list. This command will extract fields one through three, five and six.

cortar -d':' -f1-3,5,6 /etc/passwd | cola -n 5

Extract a range of fields from the /etc/passwd file

By including grepin the command, we can look for lines that include “/bin/bash”. This means that we can list only those entries that have Bash as the default shell. These will usually be the “normal” user accounts. We’ll ask for fields one through six because the seventh field is the default shell field and we already know what it is, we’re looking for it.

grep "/bin/bash" /etc/contraseña | cortar -d':' -f1-6

Extract fields one through six from the /etc/passwd file

Another way to include all fields except one is to use the --complementoption. This reverses the field selection and displays everything I dont know has requested. Let’s repeat the last command but only ask for field seven. Then we’ll run that command again with the --complementoption.

grep "/bin/bash" /etc/contraseña | cortar -d':' -f7
grep "/bin/bash" /etc/contraseña | cortar -d':' -f7 --complemento

Using the --complement option to reverse a field selection

The first command finds a list of entries, but field seven gives us nothing to distinguish between them, so we don’t know who the entries refer to. In the second command, adding the --complementoption we get everything except field seven.

Cut pipe In cut

Sticking with the “/etc/passwd” file, let’s extract field five. This is the user’s real name user account owner .

grep "/bin/bash" /etc/contraseña | cortar -d':' -f5

The fifth field of the /etc/passwd file can have subfields separated by commas

The fifth field has subfields separated by commas. They are rarely padded, so they appear as a line of commas.

RELATED:   Bypass internet censorship

We can remove the commas by piping the output of the above command to another invocation of cut. The second instance of cut uses the comma “,” as the delimiter. The -soption (only delimited) indicates cutresults that do not have the delimiter at all are suppressed.

grep "/bin/bash" /etc/contraseña | cortar -d':' -s -f5 | cortar -d',' -s -f1

Pipe cut in cut to deal with two types of delimiter

Since the root entry has no comma subfields in the fifth field, it is suppressed and we get the results we are looking for: a list of the names of the “real” users configured on this computer.

The output delimiter

We have a small file with some values ​​separated by commas. The fields in this dummy data are:

  • ID : a database identification number
  • First : The first name of the subject.
  • Last name : The surname of the subject.
  • e-mail : Your email address.
  • IP adress : His IP adress .
  • Brand : The make of the motor vehicle they drive.
  • Model : The model of motor vehicle they drive.
  • Year : The year your motor vehicle was manufactured.
gato pequeño.csv

A dummy CSV data text file

If we tell cut to use the comma as a delimiter, we can extract fields just like we did before. Sometimes you will have a requirement to extract data from a file, but you don’t want the field delimiter to be included in the results. Using the --output-delimiterwe can say cut which character, or in fact, sequence of characters, use instead of the actual delimiter.

cortar -d ',' -f 2,3 pequeño.csv
cortar -d ',' -f 2,3 pequeño.csv --output-delimiter=" "

Using --output-delimiter to change the delimiter on output

The second command tells you cutReplace commas with spaces.

We can take this further and use this function to convert the output to a vertical list. This command uses a new line character as the output delimiter. Note the “$” we need to include so that the newline character is acted upon and not interpreted as a literal two-character sequence.

We will use grepto filter the entrance of Morgana Renwick and we will ask cutprint all fields from field two to the end of the record, and use a newline character as the output delimiter.

grep 'renwick' pequeño.csv | cortar -d ',' -f2- --output-delimiter=$''

Convert a record to a list using a newline character as the output delimiter

An Oldie but Goldie

At the time of writing this article, the command Little Cut is approaching its 40th birthday, and we’re still using it and writing about it today. I guess cutting text today is the same as it was 40 years ago. That is, much easier when you have the right tool at hand.

Leave a Reply

Your email address will not be published. Required fields are marked *