how to write super fast file streaming code in C#?

12 down vote favorite
15
From: http://stackoverflow.com/questions/955911/how-to-write-super-fast-file-streaming-code-in-c?answertab=votes#tab-top
I have to split huge file into many smaller files. each of the destination file is defined by offset and length as number of bytes. I'm using the following code:

private void copy(string srcFile, string dstFile, int offset, int length)
{
    BinaryReader reader = new BinaryReader(File.OpenRead(srcFile));
    reader.BaseStream.Seek(offset, SeekOrigin.Begin);
    byte[] buffer = reader.ReadBytes(length);

    BinaryWriter writer = new BinaryWriter(File.OpenWrite(dstFile));
    writer.Write(buffer);
}

Considering that I have to call this function about 100,000 times, it is remarkably slow.
Is there a way to make the Writer connected directly to the Reader? (i.e. without actually loading the contents into the Buffer in memory)
c# performance streaming cpu utilization
share
| improve this question
asked
Jun 5 '09 at 13:41

ala
959
4
16
33
94% accept rate

File.OpenRead and File.OpenWrite 100,000 will be slow alright... – siz
Jun 5 '09 at 14:13

Are you splitting the file perfectly, i.e. could you rebuild the large file by just joining all the small files together? If so there are savings to be had there. If not, do the ranges of the small files overlap? Are they sorted in order of offset? – jamie
Jan 29 '11 at 5:39
Was this post useful to you?
9 Answers
active oldest votes
up vote
24 down vote
accepted
I don't believe there's anything within .NET to allow copying a section of a file without buffering it in memory. However, it strikes me that this is inefficient anyway, as it needs to open the input file and seek many times. If you're just splitting up the file, why not open the input file once, and then just write something like:

public static void CopySection(Stream input, string targetFile, int length)
{
    byte[] buffer = new byte[8192];

    using (Stream output = File.OpenWrite(targetFile))
    {
        int bytesRead = 1;
        // This will finish silently if we couldn't read "length" bytes.
        // An alternative would be to throw an exception
        while (length > 0 && bytesRead > 0)
        {
            bytesRead = input.Read(buffer, 0, Math.Min(length, buffer.Length));
            output.Write(buffer, 0, bytesRead);
            length -= bytesRead;
        }
    }
}

This has a minor inefficiency in creating a buffer on each invocation - you might want to create the buffer once and pass that into the method as well:

public static void CopySection(Stream input, string targetFile,
                               int length, byte[] buffer)
{
    using (Stream output = File.OpenWrite(targetFile))
    {
        int bytesRead = 1;
        // This will finish silently if we couldn't read "length" bytes.
        // An alternative would be to throw an exception
        while (length > 0 && bytesRead > 0)
        {
            bytesRead = input.Read(buffer, 0, Math.Min(length, buffer.Length));
            output.Write(buffer, 0, bytesRead);
            length -= bytesRead;
        }
    }
}

Note that this also closes the output stream (due to the using statement) which your original code didn't.
The important point is that this will use the operating system file buffering more efficiently, because you reuse the same input stream, instead of reopening the file at the beginning and then seeking.
I think it'll be significantly faster, but obviously you'll need to try it to see...
This assumes contiguous chunks, of course. If you need to skip bits of the file, you can do that from outside the method. Also, if you're writing very small files, you may want to optimise for that situation too - the easiest way to do that would probably be to introduce a BufferedStream wrapping the input stream.
share
| improve this answer
answered
Jun 5 '09 at 13:49

Jon Skeet
468k
139
2181
3538

I know this is a two year old post, just wondered... is this still the fastest way? (i.e. Nothing new in .Net to be aware of?). Also, would it be faster to perform the Math.Min prior to entering the loop? Or better yet, to remove the length parameter as it can be calculated by means of the buffer? Sorry to be picky and necro this! Thanks in advance. – Smudge202
Apr 19 at 9:13

@Smudge202: Given that this is performing IO, the call to Math.Min is certainly not going to be relevant in terms of performance. The point of having both the length parameter and the buffer length is to allow you to reuse a possibly-oversized buffer. – Jon Skeet
Apr 19 at 9:20

Gotcha, and thanks for getting back to me. I'd hate to start a new question when there is likely a good enough answer right here, but would you say, that if you wanted to read the first x bytes of a large number of files (for the purpose of grabbing the XMP metadata from a large number of files), the above approach (with some tweaking) would still be recommended? – Smudge202
Apr 19 at 9:33

@Smudge202: Well the code above is for copying. If you only want to read the first x bytes, I'd still loop round, but just read into a right-sized buffer, incrementing the index at which the read will write into the buffer appropriately on each iteration. – Jon Skeet
Apr 19 at 9:34

Yup, I'm less interested in the writing part, I just wanted to confirm that the fastest way to read one file is also the fastest way to read many files. I imagined being able to P/Invoke for file pointers/offsets and from there being able to scan across multiple files with the same/less streams/buffers, which in my imaginary world of make believe, would possibly be even faster for what I want to achieve (though not applicable to the OP). If I'm not barking mad, probably best I start a new question. If I am, could you let me know so I don't waste even more peoples' time? :-) – Smudge202
Apr 19 at 9:44

@Smudge202: Do you actually have a performance problem right now? Have you written the simplest code that works and found that it's too slow? Bear in mind that a lot can depend on context - reading in parallel may help if you're using solid state, but not on a normal hard disk, for example. – Jon Skeet
Apr 19 at 10:25

SO is advising I take this to a chat, I think I'll actually start a new question. Thank you thus far! – Smudge202
Apr 19 at 10:34

이 내용에 흥미가 있습니까?

현재 기사가 여러분의 문제를 해결하지 못하는 경우 AI 엔진은 머신러닝 분석(스마트 모델이 방금 만들어져 부정확한 경우가 있을 수 있음)을 통해 가장 유사한 기사를 추천합니다:

WebView2를 Visual Studio 2017 Express에서 사용할 수 있을 때까지

Evergreen .Net Framework SDK 4.8 VisualStudio2017에서 NuGet을 사용하기 때문에 패키지 관리 방법을 packages.config 대신 PackageReference를 사용해야...

텍스트를 자유롭게 공유하거나 복사할 수 있습니다.하지만 이 문서의 URL은 참조 URL로 남겨 두십시오.

CC BY-SA 2.5, CC BY-SA 3.0 및 CC BY-SA 4.0에 따라 라이센스가 부여됩니다.

how to write super fast file streaming code in C#?

좋은 웹페이지 즐겨찾기