samedi 11 juin 2016

Caffe HDF5 not learning

I'm fine-tuning the GoogleNet network with Caffe to my own dataset. If I use IMAGE_DATA layers as input learning takes place. However, I need to switch to an HDF5 layer for further extensions that I require. When I use HDF5 layers no learning takes place.

I am using the exact same input images, and the labels match also. I have also checked to ensure that the data in .h5 files can be loaded correctly. It does, and Caffe is also able to find the number of examples I feed it as well as the correct number of classes (2).

This leads me to think that the issue lies in the transformations I am performing manually (since HDF5 layers do not perform any built-in transformations). The code for these is below. I do the following:

  • Convert image from RGB to BGR
  • Resize it to 256x256 so I can subtract the mean file from ImageNet (included in the Caffe library)
  • Since the original GoogleNet prototxt does not divide by 255, I also do not (see here)
  • I resize the image down to 224x224, which is the crop size required by GoogleNet
  • I transpose the image as needed to satisfy CxHxW, as required by Caffe
  • At the moment I am not performing data augmentation, which could be turned on if I let oversample=True.

Can anyone see anything wrong with this approach? Is data augmentation so critical that no learning would take place without it?

The HDF5 conversion code


def resize_convert(img_names, path=None, oversample=False):
    Load images, set to BGR mode and transpose to CxHxW
    and subtract the Imagenet mean. If oversample is True, 
    perform data augmentation.

    img_names (list): list of image names to be processed.
    path (string): path to images.
    oversample (bool): if True then data augmentation is performed
        on each image, and 10 crops of size 224x224 are produced 
        from each image. If False, then a single 224x224 is produced.

    path = path if path is not None else ''
    if oversample == False:
        all_imgs = np.empty((len(img_names), 3, IMG_RESHAPE, IMG_RESHAPE), dtype='float32')
        all_imgs = np.empty((len(img_names), 3, IMG_UNCROPPED, IMG_UNCROPPED), dtype='float32')

    #load the imagenet mean
    mean_val = np.load('/path/to/imagenet/ilsvrc_2012_mean.npy')

    for i, img_name in enumerate(img_names):
        img = ndimage.imread(path+img_name, mode='RGB') # Read as HxWxC

        #subtract the mean of Imagenet
        #First, resize to 256 so we can subtract the mean of dims 256x256 
        img = img[...,::-1] #Convert RGB TO BGR
        img =, (IMG_UNCROPPED, IMG_UNCROPPED), interp_order=1)
        img = np.transpose(img, (2, 0, 1))  #HxWxC => CxHxW 
        #Since mean is given in Caffe channel order: 3xWxH
        #Assume it also is given in BGR order
        img = img - mean_val

        #set to 0-1 range => I don't think googleNet requires this
        #I tried both and it didn't make a difference
        #img = img/255

        #resize images down since GoogleNet accepts 224x224 crops
        if oversample == False:
            img = np.transpose(img, (1,2,0))  # CxHxW => HxWxC 
            img =, (IMG_RESHAPE, IMG_RESHAPE), interp_order=1)
            img = np.transpose(img, (2,0,1)) #convert to CxHxW for Caffe 
        all_imgs[i, :, :, :] = img

    #oversampling requires HxWxC order
    if oversample:
        all_imgs = np.transpose(all_imgs, (0, 3, 1, 2))
        all_imgs =, (IMG_RESHAPE, IMG_RESHAPE))
        all_imgs = np.transpose(all_imgs, (0,2,3,1)) #convert to CxHxW for Caffe 

    return all_imgs

Relevant differences between IMAGE_DATA and HDF5 prototxt files

name: "GoogleNet"
layers {
  name: "data"
  type: HDF5_DATA
  top: "data"
  top: "label"
  hdf5_data_param {
    source: "/path/to/train_list.txt"
    batch_size: 32
  include: { phase: TRAIN }
layers {
  name: "data"
  type: HDF5_DATA
  top: "data"
  top: "label"
  hdf5_data_param {
    source: "/path/to/valid_list.txt"
  include: { phase: TEST }


When I say no learning is taking place I mean that my training loss is not going down consistently when using HDF5 data compared to the IMG_Data. In the images below, the first plot is plot the change in the training loss for the IMG_DATA network, and the other is the HDF5 data network.

One possibility that I am considering is that the network is overfitting to each of the .h5 that I am feeding it. At the moment I am using data augmentation, but all of the augmented examples are stored into a single .h5 file, along with other examples. However, because all of the augmented versions of a single input image are all contained within the same .h5 file, I think this could cause the network to overfit to that specific .h5 file. However, I am not sure whether this is what the second plot suggests.

Training loss with IMG_DATA input Training loss with HDF5 input

