Coffeehouse Thread

21 posts

Forum Read Only

This forum has been made read only by the site admins. No new threads or comments can be added.

What manages the 3d environment

Back to Forum: Coffeehouse
  • User profile image
    ScanIAm

    I'll start off by saying that my only experience in this is as a consumer of games, and I'm lazy, so I'm hoping someone knows how this works.

    If I wanted to place a player inside of an environment, which part of the PC is managing that environment?  Is the CPU/Game engine managing the position and volume of the player and other environmental items or is the GPU doing this? 

    If I wanted to put a box 3ft in front of the point of view, how much work is done by the CPU vs. the GPU? 

    Does the CPU need to manage collisions between 2 objects or does it just feed the position and dimensions of the 2 objects to the GPU and allow the GPU to figure it out?

    My point in all of this is that it would seem that the latter case would be the ultimate goal of GPU developers.  The cpu would only have to feed in the dimensions of the objects and then tell the GPU to move them around given the constraints of the resulting volume of those objects.

    Hopefully this makes sense. 

  • User profile image
    Bass

    My impression that that the GPU handles the low level graphics details (eg: how do I render this mesh?), while the CPU handles pretty much everything else (including manipulating the scene graph).

  • User profile image
    figuerres

    very much what bass said is right.

     

    the GPU's do a lot of the rendering but the cpu / app code does things like where things are and what's hitting what.

    some GPU's have phyics engines that help with some things like how things should react to getting struck / gravity.

    but for example the app logic has to "move the camera POV" in a game to make your head move in the game - shooter where your view is from the eyes of the character.

    many game engines have a model of the room that also tells them what sounds to play based on where you and some other object are located, say a dog is outside and you are in a room with an open door, when you get near the door then you hear more and more of the dog barking sound.

    farther away less or none.

    the same model also tells them what mesh data to send to the gpu to render based on visibility in the room.

  • User profile image
    Proton2

    You can do 3D without using the GPU at all. See "Ray Tracing".

  • User profile image
    ZippyV

    Nvidia did make an interesting DirectX 10 demo where pretty much everything is done on the GPU. (The bugs you see flying around are also calculated by the GPU so they don't hit anything)

  • User profile image
    Bass

    , ZippyV wrote

    Nvidia did make an interesting DirectX 10 demo where pretty much everything is done on the GPU. (The bugs you see flying around are also calculated by the GPU so they don't hit anything)


    That's pretty pointless, don't you think. As the NVidia rep says, the CPU is sitting around "virtually idle". Maybe you'd want offload some of that processing to the CPU to improve performance. Tongue Out

  • User profile image
    figuerres

    , Bass wrote

    *snip*


    That's pretty pointless, don't you think. As the NVidia rep says, the CPU is sitting around "virtually idle". Maybe you'd want offload some of that processing to the CPU to improve performance. Tongue Out

    well "it depends" on a whole lot of things... i can see cases where it may be very usefull...

    in the demo the cpu may have been un-used but that was just a demo, in a real game or app....

    it would probably allow you to do more stuff w/o having to up the cpu required etc...

  • User profile image
    ScanIAm

    @ZippyV: Very cool demo.  That's exactly what I was thinking of and more given the wetness properties.  It seems to me that the GPU (and physics engine) would be able to cache up the various items and their properties while the CPU's job would be to instruct the GPU how they should try to interact. 

    I'm sure there's more going on and I don't mean to oversimplify the process, but it's almost like taking 2D sprites into 3D.

  • User profile image
    evildictait​or

    Your GPU is basically a big CPU which is optimised for really efficient SIMD instructions with it's own instruction set and access to DMA, so in theory you could probably design an entire OS to run from a GPU.

    In practise, however, GPUs are all accessed via DirectX or equivilent APIs and they are asked to do a couple of important things:

    * Given a Mesh, turn it into lines on a screen
    * Adding textures to meshes
    * Doing dynamic lighting / luminence
    * Doing bump-mapping to turn low resolution meshes into seemingly high resolution meshes without the added complexity that brings
    * Doing dynamic shadows
    * Doing pixel and subpixel rendering to add things like lightning, fog, water etc. 

    Normally all of the physics, AI, sound and resource management is done by the CPU, because these are usually non-streamable operations, and GPUs suck at non-streamable operations.

  • User profile image
    Cream​Filling512

    Normally a vertex program running on the GPU computes the location of geometry in world-space by applying a world matrix (among other things) as input from the CPU.  You could compute everything in the vertex shader and just have the CPU provide the current time as input, and the vertex shader could compute location based on time or whatever.  But I think its pretty standard practice that the CPU computes the location of high level entities in the game, and produces a matrix for the GPU to use to transform the geometry.

  • User profile image
    magicalclick

    @ZippyV:

    Amazing Video.

    @ScanIAm:

    That depends. Like in the video, simulation based object placements follows a specialized type of instructions.  Anything that is specialized can be accelerated with specialized hardware. Such as GPU, PhysiX card, GPGPU, encoder/decoder, and etc.

    If you are doing AI, it is more obvious to move the character using CPU (assuming there is no AI accelerated hardware in the market).

    It doesn't matter which one you use. The only difference is CPU is generic and can run everything, but, slower. Specialized hardware will simply do things faster. You can do music, physics, graphics, AI, motion detection, and etc in specialized hardware, and they are always faster processing their specialized tasks.

    Anyway, you shouldn't bother with those.

    You don't tell CPU how to play a music, you will DX to play music. You don't tell CPU to draw 2D/3D graphics, you tell DX to draw graphics. Eventually you don't tell CPU to calculate physics, you tell DX to do it. Which hardware DX is using shouldn't matter as your code is still the same. You only set the flags in the config to enable/disable hardware acceleration.

     

     

     

    Leaving WM on 5/2018 if no apps, no dedicated billboards where I drive, no Store name.
    Last modified
  • User profile image
    evildictait​or

    @magicalclick: DirectX is just an API. If you don't have a GPU that supports your app, you can always use "software rendering", which is the CPU performing the calculations itself.

    A case in point is that most laptops don't have a powerful sound card, all of the music and sounds pushed to Direct Music on most laptops are processed by the CPU and then pushed to the system speaker because there isn't any dedicated sound card with an onboard chip.

  • User profile image
    ScanIAm

    @magicalclick: That's basically what I am saying.

    The GPU is given the size, shape, position, and physical properties of the object (like density, etc.) as well as the physics rules that the environment should follow.

    Then, the CPU (or game app) will request that object 1 move forward x units.  The GPU knows if that is possible and if so, it will draw the object in the new location.  If not, it will inform the game app that the object has been stopped or blocked or whatever.

    I'm assuming (wrongly?) that the GPU would be the most efficent place to decide if an object has moved into the volume of another object.

  • User profile image
    Minh

    @ScanIAm, I would think physics calculations is a bad place to do on the GPU. Since video RAM is much more expensive than conventional RAM, you'd want to keep as much out of VRAM as possible.

    And with the trend of CPUs getting more & more cores, all that can be done on the CPU side w/out performance impact.

    I know that NVidia has built some physics-specific instructions into their hardware (and back in the day there was a physics card), but w/ games being so cross-platform these days, I don't know if engine devs want to fork their code for just 1 manufacturer.

  • User profile image
    evildictait​or

    , ScanIAm wrote

    @magicalclick: That's basically what I am saying.

    The GPU is given the size, shape, position, and physical properties of the object (like density, etc.) as well as the physics rules that the environment should follow.

    Then, the CPU (or game app) will request that object 1 move forward x units.  The GPU knows if that is possible and if so, it will draw the object in the new location.  If not, it will inform the game app that the object has been stopped or blocked or whatever.

    I'm assuming (wrongly?) that the GPU would be the most efficent place to decide if an object has moved into the volume of another object.

    If you're asking what could be done, then yes, the GPU could do physics. If you're asking what is done, then no, the CPU does the physics, densities, animation and the rest of the update cycle etc.

    The GPU basically gets given a mesh, textures and shaders and turns them into a ginormous bitmap really quickly.

    Deciding if something has moved within the volume of another is easy to do on the CPU. Why offload it to the GPU? Bear in mind that moving things to the GPU and waiting for a response would take ages, so anything that takes part in a decision is probably done on the CPU.

  • User profile image
    ScanIAm

    I guess the "why do it" question has more to do with my misunderstanding of what the GPU does.  I was envisioning more of an environment processing unit instead that would handle these interactions.  I guess it would function as the hardware equivalent of a game engine. 

    The point to that would be that it would be optimized for doing the calculations involved and would provide a layer of abstraction for developers.  Almost like a very small scale granular version of the universe Smiley

  • User profile image
    evildictait​or

    The GPU isn't a game engine. Nor is it an abstraction. It's a massively parallel processor where every instruction (like mov, xor, add etc) works on massive vectors efficiently.

    As an example, suppose I want to blend two images together by inspecting the alpha component of the two images. In pseudo code I'm doing (bearing in mind that a Color is a byte[4] where byte[3] is the alpha component):

    for(int x,y = 0,0 up to src1.dims)
    {
    byte baseImage[4] = src1[x, y];
    byte topImage[4] = src2[x, y];
    for(int c= 0; c< 3; c++)
      dst[x, y][c] = (int)( baseImage[c] * (256 - topImage[3]) + (topImage[c] * topImage[3]));
    dst[x, y][3] = 0xFF; // resulting image has full opacity.
    }
    

    on the GPU I can do this way more efficiently by doing

    for(int x,y ..)
    {
      // let's load up baseImage and topImage:
      register uint128_t gpureg1 = load_four_colors(src1[x, y]);
      register uint128_t gpureg2 = load_four_colors(src1[x, y]);
      // let's grab the alpha channel of the second one:
      register uint128_t alphachannel_1 = and_128(gpureg2, 0x000000FF000000FF000000FF000000FF);
      // and compute the 256 - topImage[3] for each of those four colors
      register uint128_t alphachannel_2 = subword_128(0x00000100000001000000010000000100, alphachannel_1);
      // copy the lowest byte for each dword into the other three bytes of the dword, so that the alpha channel is in the right place for a multiplication with the R, G and B channels of the image:
      register uint128_t alphachannel_1 = extend1_to_4_128(alphachannel_1);
      register uint128_t alphachannel_2 = extend1_to_4_128(alphachannel_2);
      // baseresult will then hold four colors worth of (baseColor * 256 - topAlpha)
    
      register uint128_t baseresult = bytewise_mult(gpureg1, alphachannel_1);
    
      // top result will hold four colors worth of (topColor * topColor.Alpha)
      register uint128_t topresult = bytewise_mult(gpureg2, alphachannel_2);
      // add them:
      register uint128_t result = bytewise_add(baseresult, topresult);
      // the result is opaque, so set the alpha channel to 0xFF:
      result = or_128(0x000000FF000000FF000000FF000000FF, result);
      // and write out the result.
      store_four_colors(result);
    }

    Note: In the GPU code, each "function call" can be done with a single processor instruction, so the entire loop takes no branches.

     

    Now that's not necessarilly any more pleasant to read or write than the loop, but it happens way faster. In the original version for every four colors I need to go round the loop four times and the inner loop three times. I also perform 1 load and store for every dword involved in the whole process. In the GPU version I can do four colors per loop and the only load and stores I perform are 64-bytes long each. Moreover, I can be doing this on the GPU and letting my CPU get on with something else, like going to the next object or sprite and beginning the process of sending it to the GPU's pipeline, or pushing data to or from the network card.

  • User profile image
    Cream​Filling512

    I'm surprised you need to write all that code to do an alpha blend on the GPU.  I'd have thought it would have dedicated silicon just for blending on there.

Conversation locked

This conversation has been locked by the site admins. No new comments can be made.