2008年12月31日星期三

新年前的 SSAO

相信很多朋友都趁年尾來一篇日誌,我也來熱鬧一番 ^.^
上一篇的 SSAO 日誌距今已有三個月;之後更改了演算法,本想多作改進和包裝成 demo 才放進來,但現在手頭上還正在建造一個給 OpenGl 的 Effect 架構,此刻只好放些 Screen shot 好了。

演算法和 星海爭霸2近,但我對 Image processing 的學識尚淺,Bilateral (edge-preserving) blur 的部分還有問題。

最後祝大家新年快樂


沒有SSAO, 900 fps


屏幕 1/4 SSAO with dither, 252 fps


屏幕 1/4 SSAO with dither + blur, 233 fps

2008年12月20日星期六

Is graphical shader system really that good?

每當我們到什麼的遊戲引擎有著一個可視化的 Shader 系統後都會覺得它分外強勁;
它一般被認為可以令製作流程更順暢,但這概念是否每一處也適用哩?讓我們聽一聽反對的聲音吧。綜合各人的要點:
  • 把電腦語言可視化不是一種萬靈丹。

  • 用於非實時的 Rendering 還不錯,但用於講求高效率的實時 Rendering 就會帶來問題。

  • 經可視化工具產生的 Shader 一般都比手寫的來得低效。

  • 太容易去讓美術師建造數以千計獨一無二互相無關的 Shader,使後期整理成為惡夢。

  • 削弱美術師和程式設計師之間的溝通。

2008年11月26日星期三

穿梭於 UTF-8 與 UTF-16 之間

偶然於 7zipLZMA sdk 裡發現非常簡潔的 UTF-8/UTF-16 變換函數,連一般轉換成 Unicode 的中介動作也省去了。可惜它本身的解壓功能未能滿足遊戲裝載系統的要求,皆因 7zip 的 archive 格式不能以最少的資源去解壓 archive 裡的個別檔案。
以下源始碼引用 LZMA sdk 再加上本人所寫的額外錯誤偵測與註解,enjoy!


typedef byte_t unsigned char;

// Reference: http://en.wikipedia.org/wiki/Utf8
static const byte_t cUtf8Limits[] = {
0xC0, // Start of a 2-byte sequence
0xE0, // Start of a 3-byte sequence
0xF0, // Start of a 4-byte sequence
0xF8, // Start of a 5-byte sequence
0xFC, // Start of a 6-byte sequence
0xFE // Invalid: not defined by original UTF-8 specification
};

/*! Usually it is a 2 steps process to convert the string, invoke utf8ToUtf16() with
dest equals to null so that it gives you destLen (not including null terminator),
then allocate the destination with that amount of memory and call utf8ToUtf16() once
again to perform the actual conversion. You can skip the first call if you sure
the destination buffer is large enough to store the data.

\note Here we assum sizeof(wchar_t) == 2
\ref Modify from 7zip LZMA sdk
*/
bool utf8ToUtf16(wchar_t* dest, size_t& destLen, const char* src, size_t maxSrcLen)
{
size_t destPos = 0, srcPos = 0;

while(true)
{
byte_t c; // Note that byte_t should be unsigned
size_t numAdds;

if(srcPos == maxSrcLen || src[srcPos] == '\0') {
if(dest && destLen != destPos) {
assert(false && "The provided destLen should equals to what we calculated here");
return false;
}

destLen = destPos;
return true;
}

c = src[srcPos++];

if(c < 0x80) { // 0-127, US-ASCII (single byte)
if(dest)
dest[destPos] = (wchar_t)c;
++destPos;
continue;
}

if(c < 0xC0) // The first octet for each code point should within 0-191
break;

for(numAdds = 1; numAdds < 5; ++numAdds)
if(c < cUtf8Limits[numAdds])
break;
uint32_t value = c - cUtf8Limits[numAdds - 1];

do {
byte_t c2;
if(srcPos == maxSrcLen || src[srcPos] == '\0')
break;
c2 = src[srcPos++];
if(c2 < 0x80 || c2 >= 0xC0)
break;
value <<= 6;
value |= (c2 - 0x80);
} while(--numAdds != 0);

if(value < 0x10000) {
if(dest)
dest[destPos] = (wchar_t)value;
++destPos;
}
else {
value -= 0x10000;
if(value >= 0x100000)
break;
if(dest) {
dest[destPos + 0] = (wchar_t)(0xD800 + (value >> 10));
dest[destPos + 1] = (wchar_t)(0xDC00 + (value & 0x3FF));
}
destPos += 2;
}
}

destLen = destPos;
return false;
}

bool utf8ToWStr(const char* utf8Str, size_t maxCount, std::wstring& wideStr)
{
size_t destLen = 0;

// Get the length of the wide string
if(!utf8ToUtf16(nullptr, destLen, utf8Str, maxCount))
return false;

wideStr.resize(destLen);
if(wideStr.size() != destLen)
return false;

return utf8ToUtf16(const_cast<wchar_t*>(wideStr.c_str()), destLen, utf8Str, maxCount);
}

bool utf8ToWStr(const std::string& utf8Str, std::wstring& wideStr)
{
return utf8ToWStr(utf8Str.c_str(), utf8Str.size(), wideStr);
}

//! See the documentation for utf8ToUtf16()
bool utf16ToUtf8(char* dest, size_t& destLen, const wchar_t* src, size_t maxSrcLen)
{
size_t destPos = 0, srcPos = 0;

while(true)
{
uint32_t value;
size_t numAdds;

if(srcPos == maxSrcLen || src[srcPos] == L'\0') {
if(dest && destLen != destPos) {
assert(false && "The provided destLen should equals to what we calculated here");
return false;
}
destLen = destPos;
return true;
}

value = src[srcPos++];

if(value < 0x80) { // 0-127, US-ASCII (single byte)
if(dest)
dest[destPos] = char(value);
++destPos;
continue;
}

if(value >= 0xD800 && value < 0xE000) {
if(value >= 0xDC00 || srcPos == maxSrcLen)
break;
uint32_t c2 = src[srcPos++];
if(c2 < 0xDC00 || c2 >= 0xE000)
break;
value = ((value - 0xD800) << 10) | (c2 - 0xDC00);
}

for(numAdds = 1; numAdds < 5; ++numAdds)
if(value < (uint32_t(1) << (numAdds * 5 + 6)))
break;

if(dest)
dest[destPos] = char(cUtf8Limits[numAdds - 1] + (value >> (6 * numAdds)));
++destPos;

do {
--numAdds;
if(dest)
dest[destPos] = char(0x80 + ((value >> (6 * numAdds)) & 0x3F));
++destPos;
} while(numAdds != 0);
}

destLen = destPos;
return false;
}

bool wStrToUtf8(const wchar_t* wideStr, size_t maxCount, std::string& utf8Str)
{
size_t destLen = 0;

// Get the length of the utf-8 string
if(!utf16ToUtf8(nullptr, destLen, wideStr, maxCount))
return false;

utf8Str.resize(destLen);
if(utf8Str.size() != destLen)
return false;

return utf16ToUtf8(const_cast<char*>(utf8Str.c_str()), destLen, wideStr, maxCount);
}

bool wStrToUtf8(const std::wstring& wideStr, std::string& utf8Str)
{
return wStrToUtf8(wideStr.c_str(), wideStr.size(), utf8Str);
}

2008年11月20日星期四

編程花招的謎思

微軟快要推出下一代 Visual Studio 2010,它對於 C++0x 的支持最令我期待。
儘管 C++0x compiler 還未成熟與普及,已有工程師把弄新的語法,創造耀眼花招
其實我也非常喜歡耍玩語法上的把戲,但我亦知道它會帶來什麼災害。
以下文字引述自花招裡的一篇回覆,也是我心裡想說的:
Interesting acrobatics, but I am a KISS fan.

I prefer not to mandate a C++ black belt (with several Dans on occassion) on coworkers who try to understand and modify my code, so thanks but I'll pass.

Is there anything in the above code that cannot be done in plain C in a way that 90% of the dev population can understand and 80% can modify/extend without a mistake?

Why do architects feel so compelled to save the world by providing infrastructure and plumbing for everything conceivable under the sun?

What about memoization? If I am in such a corner case where caching the results of a function call will *actually* improve performance, what makes you think I would opt for an obscure and totally incomprehensible generic template that I cannot understand or debug, rather than a custom-tailored, totally non-reusable, top-performing, totally understandable and debugable solution?

Don't get me wrong, I am not an anti-STL, do-it-yourself (CMyHashTable, CMyDynamicArray, CMyOS) gangho. I am just a KISS fan (including the rock band). If something can be done in a way that is simpler, easier to understand, debug and extend, then I prefer the simpler way.

I just get so frustrated when people do all this acrobatic stuff in production code just because (a) they can do it (b) it's cool to do it, without thinking back a lil'bit or actually having mastered the 'tools' they are using.

A similar example is 'patternitis'. I have seen countless C++ freshmen reading the GangOf4 Design Patterns book and then creating a total mess in everything, like deciding to implement the Visitor pattern on a problem that required Composite and ended up coding a third pattern alltogether from the same book, still naming the classes CVisitorXYZ (probably they opened the book on the wrong page at some point).

I have met exactly 1 guy (I called him the "Professor") who knew C++ well enough and had the knowledge to apply the patterns where they ought to be applied. His code was a masterpiece, it worked like a breeze, but when he left, none else in the house could figure things out.

So what's the point with these Lambda stuff really? Increase the expression of the language? Are we doing poetry or software? Why should we turn simple code that everyone understands into more and more elegant and concise code that only few can understand and make it work?

I have been coding in C (drivers) and C++ for 15 years and not once was I trapped because I was missing lambda expressions or similar syntactic gizmos.

So what's the point really? Please enlighten me. I don't say that *I* am right and *YOU* are wrong. I am saying that I don't see, I don't understand the positive value that these things bring in that far outweighs the problems they cause by complicating the language.
當然,流行/藝術派與實際派的存在都是有意義的;否則編程世界不是一團糟就是停滯不前。

2008年11月6日星期四

沒有惡意的 Bonjour


無意中在視窗服務裏面發現多了一個服務項,進程為 mDNSResponder.exe。發現時就感覺不妙,還以為是木馬。

原來它不是病毒或惡意程式,一個名為 Bonjour 的服務,是 Apple 公司的產品。一般會在安裝 Adobe CS3 後出現;用於自動發現局域網上的印表機或其他設備,一般沒什麼用處,卸載後也不影響其他軟體的使用,下面是 Adobe 網站上公佈的卸載方法:
  1. 運行 C:\Program Files\Bonjour\mDNSResponder.exe -remove
  2. 重命名 C:\Program Files\Bonjour\mdnsNSP.dll 為 mdnsNSP.old
  3. 重啟電腦
  4. 刪除 C:\Program Files\Bonjour 目錄
[註] Bonjour 在法語中解作 "你好"。

2008年10月31日星期五

陰影映射

完成了紋理投影後,製作基本陰影映射就如吃生一樣容易。




// Pixel shader

varying vec3 normal, lightDir, halfVector;
varying vec2 colorCoord;
varying vec4 shadowCoord;
uniform sampler2D colorTex;
uniform sampler2DShadow shadowTex;

// Light intensity inside shadow
const float shadowIntensity = 0.5;

// Should be supplied as uniform
const float shadowMapPixelScale = 1.0 / float(2048);
const int pcfSize = 1; // The pcf filtering size, 0 -> 1x1 (no filtering), 1 -> 3x3 etc

void main(void)
{
vec4 diffuse = gl_FrontLightProduct[0].diffuse;
vec4 specular = gl_FrontLightProduct[0].specular;
vec4 ambient = gl_FrontLightProduct[0].ambient;
vec3 n = normalize(normal);
float NdotL = max(dot(n, lightDir), 0.0);

diffuse *= NdotL;
vec3 halfV = normalize(halfVector);
float NdotHV = max(dot(n, halfV), 0.0);
specular *= pow(NdotHV, gl_FrontMaterial.shininess);

// Get the shadow value, let the hardware perform perspective divide,
// depth comparison and 2x2 pcf if supported.
// float shadowValue = shadow2DProj(shadowTex, shadowCoord).r;

// Perform PCF filtering
float shadowValue = 0.0;
for(int i=-pcfSize; i<=pcfSize; ++i) for(int j=-pcfSize; j<=pcfSize; ++j)
{
vec4 offset = vec4(i * shadowMapPixelScale, j * shadowMapPixelScale, 0, 0);
shadowValue += shadow2DProj(shadowTex, shadowCoord + offset).r;
}
shadowValue /= (2 * pcfSize + 1) * (2 * pcfSize + 1);

float shadowSpecularFactor = shadowValue == 0 ? 0 : 1;
float shadowDiffuseFactor = min(1.0, shadowIntensity + shadowValue);

gl_FragData[0] = shadowSpecularFactor * specular +
(ambient + shadowDiffuseFactor * diffuse) * vec4(texture2D(colorTex, colorCoord).xyz, 1);
}


當然,還有許多的陰影技術可以嘗試;我比較臨感興趣的有:


相關文章

紋理投影

為了製作陰影映射 (Shadow mapping),先來一個紋理投影 (Projective texture)。




製作紋理投影的關鍵是紋理投影矩陣,是它把物體的世界座標轉換成投影空間的紋理座標;以下是 OpenGL fixed pipeline 的實作:


// The following code assums you have already applied the camera's view matrix
// to the model-view matrix stack
setupViewMatrix();

// Use another texture unit to avoid conflit with the color texture of the model
glActiveTexture(GL_TEXTURE1);

// You can choose between GL_MODULATE and GL_ADD
glTexEnvi(GL_TEXTURE_ENV, GL_TEXTURE_ENV_MODE, GL_ADD);

// Matrix44 is just a simple matrix 4x4 class, but remember opengl use column major layout
// The bias matrix is to map from clip space [-1, 1] to texture space [0, 1]
Matrix44 biasMatrix = Matrix44(
0.5f, 0, 0, 0.5f,
0, 0.5f, 0, 0.5f,
0, 0, 0.5f, 0.5f,
0, 0, 0, 1.0f);

Matrix44 projectorProjection, projectorView;

// Setup projectorProjection and projectorView according to
// how you want the texture to be projected on the scene
// ...

Matrix44 textureMatrix = biasMatrix * projectorProjection * projectorView;

// Preform a transpose so that we get the rows of the matrix rather than columns
textureMatrix = textureMatrix.transpose();

// A post-multiply by the inverse of the CURRENT modelview matrix is applied
// by opengl automatically to the eye plane equations we provide.
// Therefor, it is important to enable these texture coordinate generation
// before appling any model-world matrix transform
glTexGenfv(GL_S, GL_EYE_PLANE, textureMatrix[0]); // Row 0
glTexGenfv(GL_T, GL_EYE_PLANE, textureMatrix[1]); // Row 1
glTexGenfv(GL_R, GL_EYE_PLANE, textureMatrix[2]); // Row 2
glTexGenfv(GL_Q, GL_EYE_PLANE, textureMatrix[3]); // Row 3

glTexGeni(GL_S, GL_TEXTURE_GEN_MODE, GL_EYE_LINEAR);
glTexGeni(GL_T, GL_TEXTURE_GEN_MODE, GL_EYE_LINEAR);
glTexGeni(GL_R, GL_TEXTURE_GEN_MODE, GL_EYE_LINEAR);
glTexGeni(GL_Q, GL_TEXTURE_GEN_MODE, GL_EYE_LINEAR);

// Enable automatic texture coordinate generation
// Note that the R and Q component may not be used in simple projective texture
// but they are needed for shadow mapping
glEnable(GL_TEXTURE_GEN_S);
glEnable(GL_TEXTURE_GEN_T);
glEnable(GL_TEXTURE_GEN_R);
glEnable(GL_TEXTURE_GEN_Q);

// Bind the projector's texture
glBindTexture(GL_TEXTURE_2D, textureHandle);

// You may move the clamp setting to where you initialize the texture
// rather than setting up every frame
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_WRAP_S, GL_CLAMP);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_WRAP_T, GL_CLAMP);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_WRAP_R, GL_CLAMP);

// Set the active texture back to the model's color texture
glActiveTexture(GL_TEXTURE0);

// For each model:
// Apply any world transform for your model
// Draw the model
// End


當我試圖把上述的碼轉移到 glsl,我遇到了一個沒有多被紋理投影項目中詳述的問題,那就是如何在 glsl 裡得到物體的世界座標。這個問題沒有在 fixed pipeline 中出現是因為早於應用物體 - 世界矩陣 (Model-world matrix) 之前,那紋理投影矩陣已計算恰當。縱然 glsl (其實是整個 OpenGL) 沒有單獨的物體 - 世界矩陣可供查詢,我們可以把攝像機的視圖矩陣乘以 gl_ModelViewMatrix 求出物體 - 世界矩陣。


glActiveTexture(GL_TEXTURE1);

Matrix44 biasMatrix = Matrix44(
0.5f, 0, 0, 0.5f,
0, 0.5f, 0, 0.5f,
0, 0, 0.5f, 0.5f,
0, 0, 0, 1.0f);

// We need the camera's view matrix inverse in order to obtain the model-world
// transform in glsl
Matrix44 projectorProjection, projectorView, cameraView;

// Setup projectorProjection, projectorView and cameraView
// ...

Matrix44 textureMatrix =
biasMatrix * projectorProjection * projectorView * cameraView.inverse();

// Set up the texture matrix
glMatrixMode(GL_TEXTURE);
glLoadMatrixf(textureMatrix.getPtr());
glMatrixMode(GL_MODELVIEW);

glBindTexture(GL_TEXTURE_2D, textureHandle);

glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_WRAP_S, GL_CLAMP);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_WRAP_T, GL_CLAMP);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_WRAP_R, GL_CLAMP);

glActiveTexture(GL_TEXTURE0);

// For each model:
// Apply any world transform for your model
// Draw the model
// End


// Vertex shader:
varying vec3 normal, lightDir, halfVector;
varying vec2 colorCoord;
varying vec4 projectiveCoord;

void main(void)
{
gl_Position = ftransform();
normal = gl_NormalMatrix * gl_Normal;
lightDir = normalize(gl_LightSource[0].position.xyz);
halfVector = normalize(gl_LightSource[0].halfVector.xyz);

colorCoord = gl_MultiTexCoord0.xy;

// gl_TextureMatrix[1] should contains the inverse of the view matrix,
// resulting a model matrix when combining with gl_ModelViewMatrix
projectiveCoord = gl_TextureMatrix[1] * gl_ModelViewMatrix * gl_Vertex;
}

// Pixel shader:
varying vec3 normal, lightDir, halfVector;
varying vec2 colorCoord;
varying vec4 projectiveCoord;
uniform sampler2D colorTex;
uniform sampler2D projectiveTex;

void main(void)
{
vec4 diffuse = gl_FrontLightProduct[0].diffuse;
vec4 specular = gl_FrontLightProduct[0].specular;
vec4 ambient = gl_FrontLightProduct[0].ambient;
vec3 n = normalize(normal);
float NdotL = max(dot(n, lightDir), 0.0);

diffuse *= NdotL;
vec3 halfV = normalize(halfVector);
float NdotHV = max(dot(n, halfV), 0.0);
specular *= pow(NdotHV, gl_FrontMaterial.shininess);

gl_FragData[0] = specular + (ambient + diffuse) * vec4(texture2D(colorTex, colorCoord).xyz, 1);

// Apply the projective texture
gl_FragData[0] += texture2DProj(projectiveTex, projectiveCoord);
}

2008年10月13日星期一

免費 Model 寶庫

今天無意中發現了一個博客非常慷慨地把大量高品質 (相對其他免費) 的 3D 模型分享給全世界。
站內有不同種類的模型,但還是汽車的居多;雖然下載的方法有點煩,畢竟是免費的,好應該說聲多謝。

http://i344.photobucket.com/albums/p338/free3dart/nsx_HP_small.jpg

http://i344.photobucket.com/albums/p338/free3dart/f18.jpg

2008年9月22日星期一

SSAO 新進展

Yeah! 利用了法線緩衝所提供的資訊後, SSAO (屏幕空間環境光遮蔽) 的效果迫真了許多。
開始感受到電腦繪圖算法的迷人之處,可惜再沒有人和我分享這份喜悅sad




讓我嘗試簡單地解釋它的原理吧。
螢幕中的每一像素都會和它周圍的 N 個像素作比較,比較時有兩個因數需要考慮
  1. 兩像素於三圍空間中的位置;深度較淺的像素會遮蔽較深的像素,而遮蔽的程度就取決於距離。

  2. 兩像素的法線內積 (Dot product);面向面的像素會比面向同一方向的像素較接觸不到外來的光線。
至於怎樣對周圍的 N 個像素取樣,就是整個算法中最令人頭痛的問題。當然取樣越多效果越理想,但實際經驗告訴大家 N 只可以不大於 32 左右。隨著取樣的數量受限,而又希望有比較廣闊的取樣範圍 (位置較遙遠的像素都可互相影響),可用一些隨機取樣模式;不過暫時我只用了一個十字形的取樣模式,只要取樣範圍不太大是可以接受的。


uniform sampler2DRect texColor; // Color texture
uniform sampler2DRect texDepth; // Depth texture
uniform sampler2DRect texNormal;// Normal texture
uniform vec2 camerarange = vec2(1.0, 500);

varying vec2 texCoord;
const float aoCap = 1.0;
float aoMultiplier = 1000.0;

float pw = 1.0; // Use (1.0 / screensize.x) for GL_TEXTURE2D
float ph = 1.0;

float readDepth(in vec2 coord)
{
float nearZ = camerarange.x;
float farZ = camerarange.y;
float posZ = texture2DRect(texDepth, coord).x;

return (2.0 * nearZ) / (nearZ + farZ - posZ * (farZ - nearZ));
}

vec3 readNormal(in vec2 coord)
{
return normalize(2 * (texture2DRect(texNormal, coord).xyz - 1));
}

float compareDepths(in float depth1, in float depth2)
{
float depthDiff = depth1 - depth2;
const float aorange = 10.0; // Units in space the AO effect extends to (this gets divided by the camera far range)
float diff = clamp(1.0 - depthDiff * (camerarange.y - camerarange.x) / aorange, 0.0, 1.0);
return min(aoCap, max(0.0, depthDiff) * aoMultiplier) * diff;
}

float calAO(float depth, vec3 normal, float dw, float dh)
{
vec2 coord = vec2(texCoord.x + dw, texCoord.y + dh);
float angleFactor = 1 - dot(normal, readNormal(coord));

if(length(normal) == 0)
angleFactor = 0;

return angleFactor * compareDepths(depth, readDepth(coord));
}

void main(void)
{
float depth = readDepth(texCoord);
float ao = 0.0;

vec3 normal = readNormal(texCoord);

for(int i=0; i<8; ++i) {
ao += calAO(depth, normal, pw, ph);
ao += calAO(depth, normal, pw, -ph);
ao += calAO(depth, normal, -pw, ph);
ao += calAO(depth, normal, -pw, -ph);

pw *= 1.4;
ph *= 1.4;
aoMultiplier /= 1.5;
}

ao *= 2.0;

gl_FragColor = vec4(1.0 - ao) * texture2DRect(texColor, texCoord);
}



相關文章

2008年9月14日星期日

《星海爭霸2》引擎技術解析



CryEngine 2Finding Next Gen 之後,一向不與學術界為伍的 Blizzard 也不甘示弱;於 Siggraph 08 發表了一篇論文,當中的內容頗為深入。
期望星海爭霸2可快點推出。

2008年9月13日星期六

初嚐 Shader 編程



完成基本的 Shader 類別後,一口氣連 Multiple Render Target (MRT) 和 Screen Space Ambient Occlusion (SSAO) 都攪定了。這叫 SSAO 的技術是近一年電腦遊戲繪圖領域的新寵兒;它的原理是利用深度緩衝 (Depth Buffer) 計算出當前考慮中的像素和它周圍的像素,於三圍空間中的相互關係,再加上法線緩衝的話,就可以知道這像素有沒有被其他像素所 "遮蔽"。

暫時我的實作只用上了深度緩衝,算不上真正的 SSAO,至多是一個邊緣強調器;但出來的效果也不錯,可凸顯出物件的層次感。現有的實作會繼續改進之餘,亦會留下來給低級別的顯示卡使用。

Vertex shader code:

// Screen space ambient occlusion
// Reference:
// http://www.opengl.org/discussion_boards/ubbthreads.php?ubb=showflat&Number=236698&fpart=1
// http://www.4gamer.net/games/047/G004713/20080223007/screenshot.html?num=002
// http://rgba.scenesp.org/iq/computer/articles/ssao/ssao.htm
// http://meshula.net/wordpress/?p=145

varying vec2 texCoord;

void main(void)
{
gl_Position = ftransform();
texCoord = gl_MultiTexCoord0.xy;
gl_FrontColor = gl_Color;
}


Pixel shader code:


uniform sampler2D texColor; // Color texture
uniform sampler2D texDepth; // Depth texture

uniform vec2 camerarange = vec2(1.0, 500);
uniform vec2 screensize;

varying vec2 texCoord;

float readDepth(in vec2 coord)
{
return (2.0 * camerarange.x) /
(camerarange.y + camerarange.x - texture2D(texDepth, coord).x * (camerarange.y - camerarange.x));
}

void main(void)
{
float depth = readDepth(texCoord);
float d;

float pw = 1.0 / screensize.x;
float ph = 1.0 / screensize.y;

float aoCap = 1.0;

float ao = 0.0;

float aoMultiplier = 1000.0;

float depthTolerance = 0.0001;

for(int i=0; i<4; ++i)
{
d = readDepth(vec2(texCoord.x + pw, texCoord.y + ph));
ao += min(aoCap, max(0.0, depth - d - depthTolerance) * aoMultiplier);

d = readDepth(vec2(texCoord.x - pw, texCoord.y + ph));
ao += min(aoCap, max(0.0, depth - d - depthTolerance) * aoMultiplier);

d=readDepth(vec2(texCoord.x + pw, texCoord.y - ph));
ao += min(aoCap, max(0.0, depth - d - depthTolerance) * aoMultiplier);

d = readDepth(vec2(texCoord.x - pw, texCoord.y - ph));
ao += min(aoCap, max(0.0, depth - d - depthTolerance) * aoMultiplier);

pw *= 2.0;
ph *= 2.0;
aoMultiplier /= 2.0;
}

ao /= 16.0;

gl_FragColor = vec4(1.0 - ao) * texture2D(texColor, texCoord);
}


最後還有一些未解決的問題,是關於 MRT 的;話說有些顯示卡並未支援非二乘方大小的材質緩衝,因此有必要使用 GL_TEXTURE_RECTANGLE_ARB,可惜用了這材質格式後 Pixel Shader 又神奇地把遮蔽量計錯了。看來 Shader 的除錯方法還要好好領會。

還有,材質緩衝是不支援 Fullscreen Anti-aliasing (FSAA) 的,這可以怎樣解決哩?

3DS轉換矩陣

經過半天的勞力,終於成功載入轉換矩陣 (0x4160 trunk),全靠 lib3ds 的原碼。

// Trunk 0x4160 comes before 0x4120,
// during the loading of 0x4160 we got the information to
// change the clockwise/anti-clockwise triangle winding or not
// and apply this information during the loading of 0x4120
bool invertTriangleWinding = false;

// ...

// Loading the local coordinates chunk
// It's base from lib3ds: http://www.lib3ds.org
case 0x4160:
{
// We are using row major matrix
Mat44f matrix = Mat44f::cIdentity;
for(size_t i=0; i<4; ++i)
mStream->read(matrix.row[i], sizeof(float) * 3);

// Flip X coordinate of vertices if mesh matrix has negative determinant
if((invertWinding = (matrix.determinant() < 0)) == true) {
Mat44f inv = matrix.inverse();

matrix.m00 = -matrix.m00;
matrix.m01 = -matrix.m01;
matrix.m02 = -matrix.m02;
matrix.m03 = -matrix.m03;

matrix = (inv * matrix).transpose();

size_t vertexCount = getVertexCount();
Vec3f* vertex = getVertexPointer();

for(size_t i=0; i<vertexCount; ++i) {
// Transform tmp using matrix, where matrix[3] holds the translation
#ifdef FLIP_YZ_AXIS
Vec4f tmp(vertex[i].x, -vertex[i].z, vertex[i].y, 0);
tmp = (matrix * tmp) + matrix[3];
vertex[i] = Vec3f(tmp.x, tmp.z, -tmp.y);
#else
Vec4f tmp(vertex[i].x, vertex[i].y, vertex[i].z, 0);
tmp = (matrix * tmp) + matrix[3];
vertex[i] = Vec3f(tmp.x, tmp.y, tmp.z);
#endif
}
}
} break;

// ...

// Loading the face description (index values) trunk
case 0x4120:
{
uint16_t faceCount = 0;
uint16_t* indexArray = nullptr;

// ...

if(invertTriangleWinding) {
for(size_t i=0; i<faceCount*3; i+=3)
std::swap(indexArray[i], indexArray[i+2]);
}
} break;

2008年9月7日星期日

3DS 文件格式


於 Mesh Factory 下載模型

完成了 3DS 文件的加載器已有多個星期,到今天才有點時間張貼出來。
其實 3DS 已是一個很古老的格式,它的材質參考路徑僅支援 8.3 格式, 是 Dos 年代的產物。但基於這格式的廣泛流傳和結構簡單,因此儘管 Collada 是大勢所趨,我還是決意使用它作為引擎裡的第一個模型加載器。這樣我就可以快點進行其他方面的進修,待其他模組有了原型以後才加入其他加載器。

我的實作很基本,只支援頂點,索引,標準材質和貼圖;所有的法線得自行計算,雖然我已對 Smoothing Group 加以支援,但有些位置的效果還是顯得不平滑。相信最好的辦法都是直接從 DCC 工具中讀入法線向量,可惜 3DS 沒有這資訊。另外有好一些物體的位置錯了,似乎純粹讀入頂點位置並不足夠,還以為 3DS 格式用不著轉換矩陣。

最後我當然把加載器和之前設計好的資源管理和漸進裝載模組整合在一起,當中的細節留待日後再作詳述。

2008年9月3日星期三

谷歌瀏覽器 Chrome


作為全球資訊網的龍頭大哥谷歌 Google, 推出自家的瀏覽器是遲早的事. 這位新晉的瀏覽器名叫 Chrome, 還是 Beta 階段. 它的宣傳標題是 "一個方塊,無所不包", 正合谷歌一貫使用網上平台的理念; 而一個好的瀏覽器將會是這革命的催化劑.

試用後感覺良好, 介面簡單反應夠快. 而我最喜歡的就是把菜單/按鈕等等介面壓縮成不到80圖素的垂直空間裏, 就連標題棒都省去了.

至於內部構和設計理念, 谷歌用了漫畫形式展示出來. 從中得知谷歌為了避開記憶體洩漏以及安全性的問題, 挑選了一個 Tab, 一個進程 (Process) 的設計. 這有點兒走回頭路的感覺, 但不失為簡單快捷的方案; 何況 Chrome 創建新 Tab 的速度奇快, 沒有半點被創建進程的開銷所拖慢.

目前 Chrome 和 Firefox 相比下還缺少一眾好用的插件, 但相信新一輪瀏覽器之戰又開始了.

2008年8月8日星期五

C 函數的新發現

以往如需要從程式的主迴圈讀取鍵盤的輸入去決定是否退出程式, 我會用另一個執行緒去乎叫 std::cin 或 getchar(), 因為它們都是阻塞 (blocking) 的.

原來我一直忽略了 kbhit() 的存在! 有了它, 以上的問題就可簡化為:

#include

int main() {
while(true) {
// Poll the console without blocing, return true if there is
// a keystroke waiting in the buffer.
if(kbhit()) {
if(getchar() == 'q')
break;
}

// Do something usefull
// ...
}
return 0;
}

可惜 kbhit() 不是標準 C 裡的成員, 在若干平台上我們得自行實踐. 以下編碼出於這裡

#include <sys/select.h>

int kbhit(void)
{
struct timeval tv;
fd_set read_fd;

/* Do not wait at all, not even a microsecond */
tv.tv_sec=0;
tv.tv_usec=0;

/* Must be done first to initialize read_fd */
FD_ZERO(&read_fd);

/* Makes select() ask if input is ready: 0 is the file descriptor for stdin */
FD_SET(0, &read_fd);

/* The first parameter is the number of the largest file descriptor to check + 1. */
if(select(1, &read_fd, NULL/*No writes*/, NULL/*No exceptions*/, &tv) == -1)
return 0; /* An error occured */

/* read_fd now holds a bit map of files that are
* readable. We test the entry for the standard
* input (file 0). */
if(FD_ISSET(0, &read_fd))
/* Character pending on stdin */
return 1;

/* no characters were pending */
return 0;
}

2008年7月22日星期二

共享算術運算子

每當實作一些有關數學的類別 (如 Vector, Matrix, Point, Size 等...) 時,加減乘除等運算子都會時常出現。讓我們試試把共同的地方提煉成基類別:

template<int N>
class Tuple {
public:
Tuple operator+(const Tuple& rhs) const
{
Tuple result;
for(int i=0; i<N; ++i)
result.data[i] = data[i] + rhs.data[i];
return result;
}

float data[N];
};

class Vec3 : public Tuple<3> {
public:
Vec3(float x, float y, float z) {
data[0] = x; data[1] = y; data[2] = z;
}
};

int main() {
Vec3 v1(1, 2, 3);
Vec3 v2(4, 5, 6);
// Compilation error: v1 + v2 is returning Tuple but not Vec3
Vec3 v3 = v1 + v2;
return 0;
}

大家不用擔心那個回圈會為性能帶來負面影響,編譯器懂得把它優化 (我已在VC2008上證實了這一點)。
但由於運算子的返回型態出了問題,我們作出以下嘗試:

template<int N, class R>
class Tuple {
public:
R operator+(const Tuple& rhs) const
{
R result;
for(int i=0; i<N; ++i)
result.data[i] = data[i] + rhs.data[i];
return result;
}

float data[N];
};

class Vec3 : public Tuple<3, Vec3> {
public:
Vec3(float x, float y, float z) {
data[0] = x; data[1] = y; data[2] = z;
}
};

int main() {
Vec3 v1(1, 2, 3);
Vec3 v2(4, 5, 6);
Vec3 v3 = v1 + v2;
return 0;
}

非常好。不過還可以更好哩:

template<int N, class R, class U>
class Tuple : public U {
public:
R operator+(const Tuple& rhs) const
{
R result;
for(int i=0; i<N; ++i)
result.data[i] = data[i] + rhs.data[i];
return result;
}
};

struct _Vec3Union {
union {
struct { float x, y, z; };
float data[3];
};
};

class Vec3 : public Tuple<3, Vec3, _Vec3Union> {
public:
Vec3() {}
Vec3(float x_, float y_, float z_) {
x = x_; y = y_; z = z_;
}
};

struct _SizeUnion {
union {
struct { float width, height; };
float data[2];
};
};

class Size : public Tuple<2, Size, _SizeUnion> {
public:
Size() {}
Size(float w, float h) {
width = w; height = h;
}
};

int main() {
Vec3 v1(1, 2, 3);
Vec3 v2(4, 5, 6);
Vec3 v3 = v1 + v2;

Size s1(1, 2);
Size s2(2, 3);
Size s3 = s1 + s2;

return 0;
}

這可讓 Vec3 組成的 x, y 和 z 用方便的形式去存取。

儘管以上的提示未必有多大用途,還望它能加強大家對 C++ 的了解。

2008年7月14日星期一

尼采的神奇 Functor

尼采給我的題目...不如在此發表答案.


#include <iostream>

typedef int (*Functor)();

// Recursive type template that generate a
// compile-time list of Functor using inheritance
template<size_t N> struct Unit : public Unit<N-1> {
Unit() : mFunctor(&Unit<N>::fun) {}
static int fun() { return N; }
Functor mFunctor;
};

// Partial specialization to end the recursion
template<> struct Unit<0> {
Unit() : mFunctor(&Unit<0>::fun) {}
static int fun() { return 0; }
Functor mFunctor;
};

// Partial specialization to reduce recursive template complexity
template<> struct Unit<256> : public Unit<255> {
Unit() : mFunctor(&Unit::fun) {}
static int fun() { return 256; }
Functor mFunctor;
};

// Partial specialization to reduce recursive template complexity
template<> struct Unit<512> : public Unit<511> {
Unit() : mFunctor(&Unit::fun) {}
static int fun() { return 512; }
Functor mFunctor;
};

// A compile-time maximum count of Functor
static const size_t cMaxN = 768;

typedef Unit<cMaxN> List;
static const List cList;

// A function that return a Functor that return i.
// In other words, it selects from a list of compile-time functors
// base on the run-time parameter i.
Functor getFunctor(int i) {
{ // We have the assumption that Unit<> struct are packed tightly together in cList
typedef const char* byte_ptr;
byte_ptr functorAddres1 = byte_ptr(&(static_cast<const Unit<1>*>(&cList)->mFunctor));
byte_ptr functorAddres2 = byte_ptr(&(static_cast<const Unit<2>*>(&cList)->mFunctor));
(void)functorAddres1; (void)functorAddres2;
assert(functorAddres2 - functorAddres1 == sizeof(Functor));
}
assert(i <= cMaxN);
return *((Functor*)(&cList) + i);
}

int main() {
for(size_t i=0; i<=cMaxN; ++i) {
Functor f = getFunctor(i);
std::cout << f() << std::endl;
assert(f() == int(i));
}

return 0;
}


Win少, 我的答案正確嗎?

我認為最好的 Pdf 閱讀器



不知不覺 Adobe Acrobat Reader 已經出到第九代了.
在每一次更新的時候, 大家有沒有想過其實可能更有好的選擇呢?
我有一個好推薦, 它就是 Foxit Reader.

Foxit 的描繪品質和速度與 Acrobat 不分上下, 但記憶體用量和啟動時間就遠勝 Acrobat; 更不用安裝 , 細小可攜 (僅一個 5MB 的可執行檔).

以下的測試是基於網上的一篇文章:


Adobe Reader 9 Foxit Reader 2.2
啟動須時 (秒) 21 7
記憶體用量 74MB 25MB

縱然 Foxit 缺少某些(許多人都不常用的)功能, 它帶來的眾多好處立即使我把 100 多MB的 Acrobat Reader 從系統中撤除.

2008年7月2日星期三

程式語言效率大比拼

有天正在尋找腳本語言的時候無意間發現這個名為 The Computer Language Benchmarks Game 的網站.
那裡的 benchmark 數據覆蓋多達 76 種語言加編輯器的組合, 19 個試驗程式(大部分還有提供源始碼).
不過, 儘管那裡提供的資料準確無誤, 它所包含的參考價值仍然有限. 問題在於那些試驗程式都過於細小, 以及只有單一的功能. 在一個複雜的應用程式 (如 3D Game Engine), 有大量不同種類的資料需要處理, 因此記憶體的存取很容易成為瓶頸.
一個好例子就是 Java/.Net. 有數據顯示 Java/.Net 可以快過 C/C++, 不過當你走出試驗程式返回現實, 你會感到 Java/.Net 總是慢半拍的.

其實那網站的第一段就表明了:
Benchmarking programming languages?
How can we benchmark a programming language?
We can't - we benchmark programming language implementations.

How can we benchmark language implementations?
We can't - we measure particular programs!

2008年6月18日星期三

Resources loading

最近完成了漸進裝載系統 (Progressive resource loading) 的設計與實作.
現階段能夠以多緒形式載入 Jpeg 和 Png



要達到多緒, 首要分開載入和使用中的資料.
函式 load() 會把 istream 中的資料解碼到 loader 裏的私有緩衝區之中, 這時候 LoadingState 為 Loading.
當一定數量的資料已解碼而又可以顯示出來, load() 會返回 LoadingState = PartialLoaded, 這時候你可以乎叫 commit() 把 loader 中的緩衝抄到 resource 去.
請注意 load(), commit() 和 getLoadingState() 好有可能在不同的執行緒中執行, 因此 loader 裏的緩衝區和 LoadingState 成員都要用 Mutex 來保護.



有一點值得討論的就是 load() 這函式.
現在的設計須要用者不停地乎叫 load() 直到它返回 Loaded 或 Aborted. 但我曾經考慮過使用事件 (Event) 來通知用者現在的 LoadingState 從而只需乎叫 load() 一次. 接著當我試圖實作 cancel() 這函式時, 問題出現了; 我得建造另一個事件來控制 load() 內部的循環何時跳出.
實在不好, 複雜跳進來了...還要處理令人頭痛的多緒同步哩! 想起 Pull Xml 帶給我的啟示, 原來只要把 load() 內的循環交給用者來定義, 一切都簡單得多.
再細心的想, 如果用者希望一幅圖像載入了第一個漸進後便去開始載入另一幅圖像的話, 用事件的作法是行不通的 (或變得異常複雜). 所以這個新的設計不止簡單, 還非常靈活. 我又一次體會到什麼是 Less is more, There is No CODE that is more flexible than NO Code!.

2008年6月11日星期三

C++ friend class

有些時候我們會利用 C++ 中的 friend 關鍵字來設定類別的好友.
但可惜一個 friend 宣告只會對一個類別產生作用:

class Texture {
public:
friend class IResourceLoader;
uint width() const { return mWidth; }
uint height() const { return mHeight; }
private:
uint mWidth;
uint mHeight;
};

class IResourceLoader {};

class JpegLoader : public IResourceLoader {
void load() {
// ...
mTexture.mWidth = 128; // Compiler error!!
}
};

以上的 Texture 類別有成員 mWidth 和 mHeight, 它們都只應由 IResourceLoader 的實作類別所能更改. 幸好有 Inner class 可以幫到我們:

// Texture.h
class Texture {
public:
friend class IResourceLoader;
uint width() const { return mWidth; }
uint height() const { return mHeight; }

// We declare a templated inner class here (but not defined yet)
template class PrivateAccessor;

private:
uint mWidth;
uint mHeight;
};

class IResourceLoader {};

// JpegLoader.cpp
// Define Texture::PrivateAccessor here, access what ever you want
template<>
class Texture::PrivateAccessor {
public:
static size_t& width(Texture& texture) {
return texture.mWidth;
}
static size_t& height(Texture& texture) {
return texture.mHeight;
}
};
typedef Texture::PrivateAccessor Accessor;

void JpegLoader::load() {
// ...
Accessor::width(mTexture) = 128; // Ok! no problem
Accessor::height(mTexture) = 128;
}

你可能會認為這樣做等同把所有成員變為公開 (public), 事實上這個方案的精神在於迫使程序員清楚自己正在做什麼, 而不是錯誤地更改了變數.

Syntax highlighting

要貼原碼第一件事當然是設置"語法高亮度顯示" (Syntax highlight)
Syntaxhighlighter 是一個純粹用客戶端 Java Script 來達到效果.

還有一些文章 1, 2 有更好的指示.

2008年6月6日星期五

學習, 再學習

經過三年的遊戲引擎開發後, 縱然我從中已學到了許多; 但還要學的更多, 無窮無盡的多.
僅僅 C++ 這令人又愛又恨的東西我已學了十年 (Teach yourself programming in ten years).
那其實是一件好事, 我就是喜歡學習的過程.

華而不實的東西我已做過不少 (如 Exmat ).
現在我正集中焦點學習怎樣去設計簡單清晰的軟件.

而這裡正是用來把這過程記錄下來, 希望大家多多指教.