Split files with the split command

Split files with the split command Recently we had some problems in production. After the problems where solved, we had still a backlog of items we wanted to reprocess.

We were lucky that during development api interface was designed in such a manner that we could run the process with an input file. Each line in the input file contains some input parameters for processing 1 item at a time.

Now that I had to relaunch everything, we had a lot to do. We did not want to take the risk that the huge input file was causing problems, so I needed to split the input file in smaller files. How could I create this without manually copying the content to different files? I could create a new python program that does it for me. But I was thinking by myself: “Someone in the world must have the same problem too”. After searching, I found the split command on unix. The manual of split says:

split - split a file into pieces

The same command is also available in windows. If you have installed git, you have a bash shell available. In that shell, the split command is available for us!

By default the split command creates files that are named xaa, xab, xac etc. if you run the command

$ split myfile

I always use the verbose option, because the split command does not have any output. In that way, I can see which files are created.

$ split –-verbose myfile
creating file 'xaa'
creating file 'xab'
creating file 'xac'

The names xaa etc are a little bit strange, so it is a good practice to provide a prefix. In that case, the x is replaced with the prefic you entered in the command. It can be invoked like this to create files like myfile.aa, myfile.ab, etc:

$ split –-verbose myfile myfile.
creating file 'myfile.aa'
creating file 'myfile.ab'
creating file 'myfile.ac'

Note the dot in the end. In case it is not given, the file are named myfileaa, myfileab, etc …

Split into chunks of the same size

You can create the size of the chunks all of the same size. Therefore the -b option is used. Create files of size 100 megabytes for example:

$ split -b100M myfile

File sizes can be specified in kilobytes, megabytes, gigabytes … up to yottabytes! Just use the appropriate letter from K, M, G, T, P, E, Z and Y.

If you want your file to be split based on the number of lines in each chunk, you can use the -l (lines) option. In this example, each file will have 200 lines.

$ split --verbose -l200 myfile myfile.
creating file 'myfile.aa'
creating file 'myfile.ab'
creating file 'myfile.ac'
creating file 'myfile.ad'

Beware

The split just overwrites without any error or warning. So check before you press enter if the command line is correctly and if it is save to overwrite the created files. The input file is never deleted, that is a plus.

References