Skip to main content

Caffe : Softmax with Loss

Let us assume 2x2 image, 3 channels (labels possibilities) and 2 batch data.

Then the predicted scores (logits) is given by:

Assume 2x2 array is arranged in a single row.
For N = 1 (of batch 2):
[       9       2       5       8       ]       Each row is a class
[       7       8       4       6       ]       Each column is score of scores of every class for a pixel
[       8       4       6       7       ]

For N = 2 (of batch 2):
[       7       3       0       3       ]
[       4       6       1       1       ]
[       2       4       4       2       ]

If we convert it to softmax scores (Caffe's Softmax does the same)
say a column is:
[   x
    y
    z   ]
Then softmax score (which is calculated for every row/class/channel) is:
[   e^x / ( e^x + e^y + e^z )
    e^y / ( e^x + e^y + e^z )
    e^z / ( e^x + e^y + e^z )   ]

So softmax output will be:

[   0.6652409   0.00242826  0.24472848  0.6652409   ]
[   0.09003057  0.9796292   0.09003057  0.09003057  ]
[   0.24472846  0.01794253  0.66524094  0.24472846  ]


[   0.94649917  0.04201007  0.01714783  0.6652409   ]
[   0.04712342  0.8437947   0.04661262  0.09003057  ]
[   0.00637746  0.1141952   0.93623954  0.24472846  ]

Cross Entropy:

Lets consider the actual label is:
[       2       1       2       0       ]

[       0       1       2       2       ]

Cross Entropy Loss = - sum ( gt * loge (pred) )

Loss1 =  - [ loge(0.24472846) + loge(0.9796292) + loge(0.66524094) + loge(0.6652409) ] / 4  = 2.2433988 / 4 = 0.5608497

Loss2 =  - [ loge(0.94649917) + loge(0.8437947) + loge(0.93623954) + loge(0.24472846) ] / 4 = 1.6983210 / 4 = 0.4245803

Average loss over batch = 0.9854300 / 2 = 0.4927150

So for caffe:
Pred = N x C x H x W (C = number of classes) , so like one hot
Labl = N x 1 x H x W (Labels should be 0, 1, .. N)

Output is average log loss over pixels over batch.



import numpy as np
import caffe

ip = np.array([[[[9,2],[5,8]],[[7,8],[4,6]],[[8,4],[6,7]]], [[[7,3],[0,3]],[[4,6],[1,1]],[[2,4],[4,2]]]])
tl = np.array([[[[0,1],[2,0]]],[[[0,1],[2,0]]]])
fl = np.array([[[[2,1],[2,0]]],[[[0,1],[2,2]]]])

caffe.set_mode_cpu()
net = caffe.Net('softmax.py', caffe.TEST)

net.blobs['pred'].data[...] = ip#ip[1][None, :]
net.blobs['label'].data[...] = fl #fl[1][None, :]

net.forward()

sm=net.blobs['softmax'].data #[0]
print(sm.shape)


Prototxt:

layer {
  name: "pred"
  type: "Input"
  top: "pred"
  input_param {
    shape {
      dim: 2
      dim: 3
      dim: 2
      dim: 2
    }
  }
}

layer {
  name: "label"
  type: "Input"
  top: "label"
  input_param {
    shape {
      dim: 2
      dim: 1
      dim: 2
      dim: 2
    }
  }
}

layer {
  name: "softmax"
  type: "Softmax"
  bottom: "pred"
  top: "softmax"
}


layer {
  name: "loss2"
  type: "SoftmaxWithLoss"
  bottom: "pred"
  bottom: "label"
  top: "loss2"
  loss_weight: 1
}



# layer {
#   name: "loss1"
#   type: "MultinomialLogisticLoss"
#   bottom: "softmax"
#   bottom: "label"
#   top: "loss1"
#   loss_weight: 1
# }

If batch size is 2 but you give only 1 input batch, it is duplicated in output.