FS#180 : lseek64(fileno(fd), ...) is not the same as fseek64(fd, ... )

The Persistence of Vision Raytracer (POV-Ray).

This is the legacy Bug Tracking System for the POV-Ray project. Bugs listed here are being migrated to our github issue tracker . Please refer to that for new reports or updates to existing ones on this system.

Attached to Project: POV-Ray
Opened by Jon Boede - 2010-12-15
Last edited by Thorsten Fröhlich - 2011-08-23

FS#180 - lseek64(fileno(fd), ...) is not the same as fseek64(fd, ... )

FileBackedPixelContainer uses a FILE* to manage the backing store and under Linux/Unix defines fseek64 to be a kind of incantation of lseek64.

Since fseek64 and fseek64 have return values that are logical opposites this causes an exception to be thrown.

Moreover since the buffer size on a FILE is quite a bit smaller than a line of pixels (in terms of bytes), mostly what’s being done in this function is a lot of sloshing of bytes that don’t do much... which is to say that the intended caching mechanism is getting a very low hit rate now that povray is working in 32×32 pixel per-thread chunks.

The cache miss problem and the volume of bytes read/written grows exponentially as the size of the image grows.

Attached is a replacement function that seems to be working well and should produce a performance increase even under Windows. I originally went with simply fixing the fseek/lseek problem but saw a 10% decrease in render time on 2048×1024-sized images when I made the cache target smaller.

fbpc.txt (6.9 KiB)

Closed by Thorsten Fröhlich
Tuesday, 23 August 2011, 05:30 GMT
Reason for closing: Fixed

Jon Boede commented on Wednesday, 15 December 2010, 03:02 GMT

I had originally tried setting the blocksize to be the max number of pixels that would fit in 512 or 1024 bytes figuring that's how much is going to be pulled in a block at the controller level anyway... but simply setting it to 32 pixels seemed to give the best performance in terms of shortest time to render. YAMMV :-)

Chris Cason commented on Wednesday, 15 December 2010, 06:28 GMT

The updated linux beta 40 has a fix for the run-time error (it was fixed a few hours after release - I am guessing you got the source before that time).

32 pixels is the default render block width, so this is probably why you are seeing decent performance using that size, assuming you are rendering with defaults. A larger width wouldn't have helped in that case since once the first 32 pixels are written, the line number would change and thus the cache would be flushed.

Optimizing this code is a good idea and I'm pleased someone is interested in it; in fact I'd like to encourage you to take it a step further, if you have time (we are on a deadline for release so if we are to incorporate this code it'd need to be very soon).

The rendered pixels are sent to the image code by a single thread that receives them in blocks from the messaging system (the implication of this is you don't have to worry about mixed thread access or sub-block coherency being blown away). At the upper level, the image writing code uses a generic pov_base::Image to write the pixels of each block. An entire block is always written at once, at least for the render output (this may not apply where an Image is used elsewhere in the code, however the file-backed images are only used for this purpose). You can see where this writing is done in the current codebase by looking at ImageMessageHandler::DrawPixelBlockSet() in source/frontend/imagemessagehandler.cpp at around line 278 (note it needs to handle cases where the pixel size is not 1).

I propose extending pov_base::Image with at least one new method, which gives the image container a hint as to the block size to be used (memory-based containers will just ignore this); once a hint is set it cannot be changed once the first block is written - if not set it should default to 32x32.

An additional possibility is to also add one which accepts a block of pixels, x/y location, and pixel size, in order to optimize writing when the source data is already stored contiguously in memory and is going straight to disk.

Option (1) gives the file-backed pixel container the opportunity to cache an entire block of pixels (vertically as well as horizontally) at once if it wishes (possibly even caching multiple disparate blocks), rather than the current approach of caching a partial line at a time, and also allows it to store the blocks on disk in an optimal format. Specifically, there is no requirement for the pixels in different rows (but the same render block) to be placed on the disk at different seek offsets (that is, to emulate the layout of scanlines in memory). Instead, the blocks could be written straight to disk as-is, at the offset block_number * the calculated block size, and of course read back the same way.

If option (2) is implemented, and the hint is not set in advance, the size of the first block to be written would determine the native size. The code would need to handle the case where the block being written overlaps multiple blocks of the native size (though this would be rare) and where the block is less than the native size (this can happen e.g. at the edges of an image).

On read the optimal behavior would depend on the output file format - some files (e.g. jpeg) will access the pixel data in blocks, whilst others (e.g. bmp) will always access them sequentially. Being able to read arbitrary-sized blocks via a single method call might be handy but isn't a requirement. There could be an argument for having a hint method to tell the image the expected type of access on read so it can e.g. cache blocks if the reader is for jpeg.

If you feel inclined to implement the code to do this within FileBackedPixelContainer please let me know.

Jon Boede commented on Wednesday, 15 December 2010, 17:35 GMT

Inclined, yes. But I wouldn't be able to meet the release candidate deadline with all the other stuff I have going on ~ I say this knowing that my c++ skills are a special kind of rusty ~ and I don't make commitments that I can't follow through on.

Knowing that these come in a block at a time greatly simplifies the problem.

The attached fbpc.txt goes a long way to reducing the number of GB that have to slosh through kernel ring transitions, but obviously things could be much better. I'll keep an eye on the code and I'm just OCD enough to eventually do something about it... I just can't say when. :-(

Loading...

Keyboard shortcuts

Available keyboard shortcuts

SHIFT+ALT+l Login Dialog / Logout
SHIFT+ALT+a Add new task
SHIFT+ALT+m My searches
SHIFT+ALT+t focus taskid search

Tasklist

o open selected task
j move cursor down
k move cursor up

Task Details

n Next task
p Previous task
SHIFT+ALT+e ENTER Edit this task
SHIFT+ALT+y Close Task

Task Editing

SHIFT+ALT+s save task