The baking is done when the textures are processed and compressed into the virtual texture file. Input is normal, diffuse, spec, gloss textures. Output is VTEX file that is used by runtime.
Now that I'm looking at my code I think it's a bit different than you describe. The Toksvig factor only affects the mipmaps that are generated because it uses the denormalization from mipping. The LOD trick affects all levels and is applied as a filter kernel. Now that I'm recalling this more (I wrote the code maybe a year ago) the Toksvig factor is for minification accuracy, such as bumpy surface looking rough at a distance, but doesn't do much to help aliasing. The LOD trick does the heavy lifting to reduce aliasing. I have it set a bit heavy handed to squash the sparklies because we had a ton. My LOD trick is doing a very similar function to your Toksvig filter. I'll have to try this and compare.
To further explain this LOD trick if it isn't clear yet, I'm precalculating what the hardware texLOD would be when the envmap would be looked up by a reflection vector from the normal map. I calculate it as if the view vector is the vertex normal and the normal map is at pixel resolution. Then since I use the gloss to choose the mip level of the envmap I modify the gloss value to use a higher mip if it needs it. In other words min( gloss, glossFromTexLOD ). Doing this means I don't have to rely on the shitty 2x2 pixel hardware LOD for envmaps and can also use it for blinn highlights but I have to sacrifice view dependence.
I use blinn power = exp2( gloss * c0 + c1 ) and envmap mip = ( 1 - gloss ) * 5. The two are eyeballed to match the close as I can. Yes, so long as you match the amount a light source blurs in each the same data works well for both. The problems I have left are high geometric curvature causing aliasing not caught by the LOD trick due to not having object space normals (think metal railings) and not being able to set LOD on a per pixel basis that I've mentioned before.